The dramatically increasing number of new protein sequences arising from genomics and proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators' laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions. Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1-6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6-10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http:www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.
Bibliographical noteFunding Information:
We thank all the members of BSGC (H. Yokota, B. Gold, J. Jancarik, W. Wang, M. Bruno, J. Brandsen, M. Henriquez, H. H. Nguyen, and Y. Lou) for their help with cloning, purification and crystallization of the proteins, and especially Dr. Y. Pavlov of National Institute of Environmental Health Sciences and Dr. Y. S. Han of Korea Institute of Science and Technology for Figure 5. We gratefully acknowledge the support of the NIH grant GM62412 for most of the structures cited in this article.
- Berkeley Structural Genomics Center
- Molecular function
- Protein function
- Structural genomics