protein structure

Spatial Structure of Protein Molecules
Collection
zero Useful+1
zero
Protein structure refers to the spatial structure of protein molecules. Protein is mainly composed of carbon hydrogen oxygen nitrogen And other chemical elements Biomacromolecule All proteins are composed of 20 different amino acid Connected POLYMER After forming proteins, these amino acids are also called residues.
Protein and polypeptide The boundary between them is not very clear Domain According to the required residue number, if the residue number is less than 40, it is called polypeptide or peptide. To perform biological functions, proteins need to be correct fold It is a specific configuration, mainly through a large number of non Covalent Interaction (e.g hydrogen bond , ionic bond, Van der Waals and Hydrophobic action )To achieve; In addition, in the folding of some proteins (especially secretory proteins), Disulfide bond It also plays a key role. In order to understand the action mechanism of proteins at the molecular level, it is often necessary to determine the three-dimensional structure of proteins. Developed from the study of protein structure Structural Biology , which includes X-ray crystallography nuclear magnetic resonance And other technologies to analyze the protein structure.
A certain number of residues are necessary for a certain biochemical function; 40-50 residues are usually a functional Domain The lower limit of the size. The size of protein can range from such a lower limit to thousands of residues. The estimated average length of protein varies in different species, generally about 200-380 residues, while Eukaryote Average protein length ratio of prokaryote About 55% long. Larger protein aggregates can pass through many Protein subunit Formation; If thousands of Actin The molecules polymerize to form protein fibers.
Chinese name
protein structure
Foreign name
protein structure
Meaning
protein The spatial structure of molecules
Nature
A class of important Biomacromolecule
Constituent elements
carbon hydrogen oxygen nitrogen sulfur etc. chemical element

Discovery History

Announce
edit
Basic structure of protein molecule
1959 Perutz And Kendrew hemoglobin And muscle blood protein three-dimensional space Structure, awarded in 1962 Nobel Prize in Chemistry
Pauline The basic structure of the protein was found. Crick Watson X-ray diffraction On the basis of the data, a three-dimensional structure model of DNA was proposed. He won the 1962 Nobel Prize in Physiology or Medicine. Hauptmann and Kaller The pure mathematical theory of using X-ray analysis to determine crystal structure by direct method has been established, which has epoch-making significance in crystal research, especially in research Molecular biology Substances such as hormones, antibiotics, proteins and molecular structures of new drugs play an important role. For this reason, they won the 1985 Nobel Prize in Chemistry.

Structure type

Announce
edit
Protein molecules are covalent polypeptide chains condensed from amino acids end to end, but natural protein molecules are not random loose polypeptide chains. Each natural protein has its own unique spatial structure or three-dimensional structure, which is usually called protein conformation Is the structure of the protein.
The molecular structure of proteins can be divided into four levels to describe their different aspects:
  • Primary structure The linear amino acid sequence that constitutes the polypeptide chain of a protein.
  • Secondary structure : depending on the C=O and N-H groups between different amino acids hydrogen bond The stable structure formed is mainly α screw and β fold
  • Tertiary structure The three-dimensional structure of a protein molecule formed by the arrangement of multiple secondary structural elements in three-dimensional space.
  • Four level structure : used to describe different polypeptide chains( Subunit )They interact to form functional protein complex molecules.
In addition to these structural levels, proteins can be transformed in multiple similar structures to perform their biological functions. For functional structural changes, these three levels or Four level structure Usually used Chemical conformation The corresponding structural transformation is called conformational change.
Primary structure
The primary structure of protein is the protein polypeptide chain Amino acid residue The sequence is also the most basic structure of protein. It is determined by the sequence of genetic codes in genes. According to the sequence of genetic code, various amino acids are connected by peptide bonds to form polypeptide chains, so peptide bonds are the main keys in the protein structure.
So far, the primary structure of about 1000 proteins has been studied and determined, such as insulin, Pancreatic ribonuclease Trypsin Etc.
The primary structure of a protein determines its secondary, tertiary and other high-level structures. Ten billion natural proteins have their own special biological activities, which determines the structural characteristics of the biological activities of each protein. First, the amino acid sequence of its peptide chain. Since the 20 amino acids that make up a protein have special side chains, The physical and chemical properties and spatial arrangement of side chain groups are different. When they are combined according to different sequence relationships, they can form a variety of protein molecules with different spatial structures and biological activities.
The polypeptide chain of protein molecule does not extend linearly, but folds and twists to form a unique and stable spatial structure. The biological activity and physical and chemical properties of protein mainly depend on the integrity of its spatial structure. Therefore, only measuring the amino acid composition and their arrangement order of protein molecules can not fully understand the biological activity and physical and chemical properties of protein molecules. for example Globular protein (albumin, globulin, hemoglobin, enzyme, etc., commonly found in plasma) and fibrous proteins (keratin, collagen, myosin fibrin The former is soluble in water, while the latter is insoluble in water. It is obvious that this property cannot be used only in the primary structure of protein sequence of amino acid To explain.
The spatial structure of protein refers to the secondary, tertiary and Four level structure
Secondary structure
The secondary structure of protein refers to the local spatial arrangement of the main chain atoms in the polypeptide chain, that is, conformation, and does not involve the conformation of the side chain.
1. Peptide bond plane (or amide plane).
Pauling et al. carried out X-ray diffraction analysis on some simple peptides and amino acid amides. From the surrounding of a peptide bond, we know that:
(1) The C-N bond in the peptide bond is 0.132nm long, shorter than the adjacent N-C single bond (0.147nm), and longer than the common C=N double bond (0.128nm). It can be seen that the C-N-bond in the peptide bond is between the single bond and the double bond, and has some double bond properties, so it cannot rotate, which will be fixed in a plane.
(2) The sum of the three bond angles around C and N of the peptide bond is 360 °, indicating that they are all in the same plane, that is, the six atoms are basically in the same plane, which is the peptide bond plane. The only thing that can rotate in the peptide chain is α The rotation of the single bond formed by the carbon atom determines the position relationship between the two peptide bond planes, so the peptide bond plane becomes the basic unit of peptide chain coiled folding.
(3) Since C-N in peptide bond has double bond property, there will be different cis trans Stereoisomerism , confirmed to be in reverse position.
2. Structural unit of protein main chain conformation
1) α- Spiral Pauling et al α- Keratin( α- Keratin) has carried out X-ray diffraction analysis, and it can be seen from the diffraction pattern that there are repeated units of 0.5~0.55nm, so it is speculated that there is a repetitive structure in the protein molecule, and it is believed that this repetitive structure is α- Spiral( α- helix).
α- The structural characteristics of the spiral are as follows:
① Multiple peptide bond planes pass through α- The carbon atoms rotate and coil closely with each other to form a stable right-hand spiral.
② The main chain rises in a spiral way, every 3.6 amino acid residues rise one circle, equivalent to 0.54nm, which is consistent with the X-ray diffraction pattern.
③ Between two adjacent coils of helix, C=O and H 桸 in the peptide bond form many intrachain hydrogen bonds, that is, each Amino acid residue Hydrogen bond is formed between NH in and C=O separated by three residues in front, which is stable α- The main key of the spiral.
④ The amino acid side chain R in the peptide chain is distributed on the outside of the helix, and its shape, size and charge influence α- Spiral formation. Acidic or Basic amino acid Concentrated areas are not conducive to α- Spiral formation; Large R (such as phenylalanine, tryptophan, isoleucine) concentration areas also hinder α- Spiral formation; Proline α- The carbon atom is located on the five membered ring, which is not easy to twist. In addition, it is a subamino acid, which is not easy to form hydrogen bond, so it is not easy to form the above α- Spiral; The R group of glycine is H, and the space occupation is very small, which will also affect the stability of the helix.
2) β- Laminar structure Astbury et al β- X-ray diffraction analysis of keratin showed that there was a repeating unit of 0.7 nm. If hair α- Keratin can be stretched to twice its original length under hot and humid conditions α- The X-ray diffraction pattern of the spiral can be changed into β- A similar diffraction pattern of keratin. explain β- Structure and α- The structure is the same after spiral stretching. The parallel lamellar structure of more than two segments of this folded serrated peptide chain connected by hydrogen bonds is called β- Lamella( β- Pleated sheet) structure β- Fold.
β- The characteristics of the lamellar structure are:
① It is a structure with quite extended peptide chains. The peptide chain planes are folded into zigzags, and the adjacent peptide bond planes are at an angle of 110 °. The R side chain of amino acid residues protrudes above or below the serrations.
② The conformation is stabilized by the formation of hydrogen bond between C=O and N-H between two peptide chains or between two segments of a peptide chain.
③ The two peptide chains can be parallel or antiparallel. That is, the former two chains are in the same direction from the "N end" to the "C end", and the latter is in the opposite direction. β- The form of lamellar structure is very diverse, and the positive and anti parallel can alternate with each other.
④ Parallel β- In the lamellar structure, the spacing between the two residues is 0.65 nm; Antiparallel β- For lamellar structure, the spacing is 0.7 nm
3) β- corner
In protein molecules, peptide chains often turn back 180 °, and the conformation at this turn angle is β- Corner( β- Turn or β- bend)。 β- In the corner, a hydrogen bond is formed between C=O of the first amino acid residue and N-H of the fourth residue, thus stabilizing the structure.
4) Random crimp
There is no definite regular partial peptide chain conformation, and the peptide bond plane in the peptide chain is irregularly arranged, which belongs to loose random coil.
Super secondary structure
Supersecondary structure refers to the secondary structures that are adjacent to each other in sequence within the polypeptide chain are often close in space folding and interact with each other to form regular secondary structure aggregates. The discovered super secondary structure has three basic forms: α Spiral combination( αα);β Collapse combination( βββ) and α screw β Collapse combination( βαβ), Among them βαβ Combinations are most common. They can be directly used as the "building blocks" of the tertiary structure or the constituent units of the domain. They are a layer between the secondary structure and the tertiary structure in the protein conformation, so they are called super secondary structures.
Domain is also a layer between secondary structure and tertiary structure in protein conformation. In larger protein molecules, because the adjacent super secondary structure on the polypeptide chain is closely linked, the formation of two or more can clearly distinguish it from Protein subunit Structural differences. Generally, each domain is composed of about 100-200 amino acid residues, each of which has a unique spatial conformation and bears different biological functions. as immunoglobulin (IgG) is composed of 12 domains, including 2 on two light chains and 4 on two heavy chains; The complement binding site and antigen binding site are in different domains. Some domains in a protein molecule are the same, some are different; The domains in the peptide chain between different protein molecules can also be the same. as lactate dehydrogenase , 3-glyceraldehyde phosphate dehydrogenase Malate dehydrogenase They are all dehydrogenases with NAD+as coenzyme, and each of them is composed of two different domains, but the conformation of the domain they bind to NAD+is basically the same.
Tertiary structure
The polypeptide chain of protein further twists or folds on the basis of various secondary structures to form a three-dimensional spatial structure with certain rules, which is called the tertiary structure of protein. Tertiary structure of protein Its stability mainly depends on secondary bonds, including hydrogen bonds, hydrophobic bonds, salt bonds and Van der Waals force. These secondary bonds can exist between the R groups of amino acid residues whose primary structure numbers are far apart. Therefore, the tertiary structure of a protein mainly refers to the binding between the side chains of amino acid residues. The secondary bonds are all non covalent bonds, which are vulnerable to the influence of pH, temperature, ionic strength, etc. in the environment, and may change. Disulfide bond is not a secondary bond, but it can connect two distant peptide segments in some peptide chains, which plays an important role in the stability of the tertiary structure of proteins.
It is also believed that the tertiary structure of protein refers to the formation of a certain conformation by each side chain of the molecule on the basis of the folding and winding of the main chain of the protein molecule to form a conformation. The side chain conformation mainly forms micro regions (or domain domains). For globular proteins, hydrophobic and hydrophilic regions are formed. The hydrophilic region is mostly on the surface of protein molecules and consists of many hydrophilic side chains. The hydrophobic region is mostly inside the molecule and consists of hydrophobic side chains. The hydrophobic region often forms some "holes" or "pockets", in which some auxiliary groups are embedded and become active sites.
From the appearance, some proteins with tertiary structure are slender (the long axis is more than 10 times larger than the short axis) and belong to fibrous proteins, such as Fibroin Some long and short axes are almost spherical, belonging to global proteins, such as plasma albumin, globulin, myoglobin. The hydrophobic groups of globular proteins are mostly gathered inside the molecules, while the hydrophilic groups are mostly distributed on the molecular surface, so globular protein is hydrophilic. More importantly, after such twists of polypeptide chains, It can form some specific regions that play biological functions, such as the active center of the enzyme.
Four level structure
The spatial structure of a protein composed of two or more independent tertiary polypeptide chains is called the quaternary structure of a protein. Each polypeptide chain unit with independent tertiary structure is called subunit. The fourth structure actually refers to the three-dimensional arrangement, interaction and contact location of subunits. There is no covalent bond between subunits, and the binding of secondary bonds between subunits is looser than the secondary and tertiary structures. Therefore, under certain conditions, the protein with the fourth structure can be separated into its constituent subunits, while the basic body conformation of the subunits remains unchanged.
In a protein, the subunit structure can be the same or different. For example, the coat protein of tobacco mottle virus is a polymer formed by 2200 identical subunits; Normal human hemoglobin A is two α Subunits and two β Tetramer formed by subunit; Aspartate carbamoyl transferase consists of six regulatory subunits and six catalytic subunits. Some people call the smallest unit with a full set of different subunits protomer, such as the protomer that combines a catalytic subunit with a regulatory subunit to synthesize aspartate carbamoyl transferase.
Some protein molecules can be further polymerized into polymers. The repeating units in polymers are called monomers. Polymers can be divided into dimers, trimers... oligomers and polymers according to the number of monomers contained in them. For example, insulin can form dimers and hexamers in vivo. [1-2]

effect

Announce
edit
1. Constitute the basic substance in the organism, which is necessary for the growth and maintenance of life;
2. Some proteins can be used as Biocatalyst Enzymes and hormones;
3. Materials necessary for biological immunity;
4. Some proteins may cause food allergy

application

Announce
edit
Application of genetic algorithm in structural genomics
It has been measured Saccharomyces cerevisiae (Saccharomyces cereuisiae)、 Nematode (Caenorhabditis elegans)、 Drosophila melanogaster (Drosophilamelanogaster)、 Arabidopsis thaliana (Arabidopsis thaliana) and other model organisms. In particular, with the development of human gene 310 Journal of Fujian Agriculture and Forestry University With the completion of the Human Genome Program in Volume 35 (Natural Science Edition), the next focus will shift to the study of the structure and function of all genes in these genomes. Therefore, Structural genomics The United States, Japan and Europe have established research institutions of structural genomics. Structural genomics is a branch of genomics that aims to determine the structure of the protein molecules of the expression products of these genes on a large scale and at a high throughput. Its main research contents are high-throughput gene cloning technology, protein expression and purification, protein crystallization, and protein structure determination.
Protein structure determination is much more difficult than genome determination. According to the conventional experimental steps, the determination of protein structure from gene sequence to corresponding protein structure needs to go through gene expression , protein extraction and purification, crystallization X-ray Diffraction analysis. Due to the diversity of protein structures and properties, most of these steps have no fixed rules to follow, so this kind of workshop style research method that requires superb skills and rich experience is difficult to adapt to the determination Biological protein Therefore, theoretical analysis methods need to be established to solve these problems. With the prediction technology level, the accuracy of the prediction results is not as good as X-ray diffraction analysis and NMR And other experimental means, but Protein structure prediction It is an effective way to obtain three-dimensional structure on a large scale, at a low cost and quickly. For example, when the sequence similarity of target protein and template protein exceeds 30%, the three-dimensional structure model of protein established by structure prediction method can be used for general functional analysis. Therefore, the protein prediction technology can Structural genomics Has been widely used.
The application of BPD in drug design
The process from genome data to new drugs can be divided into two parts: one is to select target proteins, and the other is to select appropriate drugs. Drug molecules must be closely combined with target protein molecules, easy to synthesize, and have no side effects. Traditional drug design determines the lead compounds by screening a large number of natural compounds, known substrates or ligand analogs (anaIogs) and biochemical research, less relying on the three-dimensional structure of the target protein, so the R&D cycle is long, the cost is huge, and there is more or less blindness. With the growth of protein structure data and the development of structure prediction technology, the three-dimensional structure information of target protein molecules plays an increasingly important role in the above two processes Rug design) can shorten the R&D cycle and reduce costs.
The Application of GPA in Protein Design
Protein Design The goal of is to generate amino acid sequences that conform to the three-dimensional structure of the target protein through computer aided algorithms. After a long evolution, nature has screened out a large number of proteins, but natural proteins only play their best functions under natural conditions, which restricts people's use of these proteins, Therefore, it is necessary to modify the protein to adapt to specific conditions and perform specific functions. The design of protein molecules can be divided into three categories: minor, medium and major changes. [3]

form

Announce
edit

chemical composition

(1) Simple protein : AAs only
(2) Binding protein : Composed of AAs and other non protein compounds
(3) Derived protein: Compounds obtained by chemical or enzymatic methods

Molecular composition

Basic unit: amino acids have different AAs, which are connected by peptide bonds
Protein → paper → peptone → polypeptide → dipeptide → polypeptide → amino acid

Element composition

It is composed of carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorus, iodine, iron, zinc and other elements.

Function classification

(1) Structural protein : Keratin, collagen, elastin
(2) Yes biological activity Protein: enzyme, hormone, immunoglobulin
(3) Food protein: any edible, digestible, non-toxic and human usable protein

Peptide bond

Announce
edit
Two amino acids can pass through condensation reaction Combine together and form between two amino acids Peptide bond Repeated reaction can form a long residue chain (i.e. polypeptide chain). This reaction is caused by ribosome Catalyzed in the process of translation. Although peptide bond is Single bond , but with partial double bond Properties (from the π electron cloud in the C=O double bond to the Unshared electronic pair happen resonance As a result, the C-N bond (i.e. peptide bond) cannot rotate, so the groups connected at both ends of the peptide bond are on a plane, which is called Peptide plane The corresponding peptide Dihedral angle φ (Peptide plane around N-C α Key rotation angle) and ψ (Peptide plane around C α -C one Key rotation angle) has a certain value range; Once the dihedral angles of all residues are determined, the main chain conformation of the protein will be determined accordingly. According to the φ and ψ To draw, you can get Laplace diagram Since the dihedral angle values of residues forming the same type of secondary structure are limited within a certain range, it is possible to roughly distinguish which type of secondary structure the residues participate in forming on the Laplace diagram. The following table lists peptide bonds, corresponding types of single bonds and hydrogen bond Key length Comparison of.
Peptide bond
Average length
Single bond
Average length
hydrogen bond
Average length (± 30)
C α - C
one hundred and fifty-three pm
C - C
154 pm
O-H --- O-H
280 pm
C - N
133 pm
C - N
148 pm
N-H --- O=C
290 pm
N - Ca
146 pm
C - O
143 pm
O-H --- O=C
280 pm

Side chain conformation

Announce
edit
The atoms on the residue side chain follow the order of the Greek alphabet( α、β、γ、δ、ε Etc.), such as C α It refers to the nearest carbonyl And C β Is the second closest. C α It is usually considered as the constituent atoms of the backbone of the main chain. The dihedral angles corresponding to the bonds between these atoms are χ 1、 χ 2、 χ 3, etc., such as the first and second carbon atoms (C α And C β )The dihedral angle of covalent bond between χ 1。 The side chain can have many different conformations, and each type of residue has several relatively stable side chain conformations.

type

Announce
edit
Many proteins can be divided into multiple structural units, Domain It is such a constituent unit. The structural domain is generally self stable and often independent fold , without the participation of other parts of the protein; Many domains have their own unique biological functions. Many domains are not one gene or gene family It corresponds to a unique structural unit of a protein and is often the common structural unit of many protein classes. Domains are often named for their biological functions, such as "calcium ion binding domain"; Or it is derived from the names of several types of proteins that originally found this domain, such as pdz domain (Originally found in PSD95, DlgA and ZO-1). Since domains can exist stably, domains from different sources can be transferred through genetic engineering They are artificially combined to form hybrid proteins.
Structural motif is also a structural unit, which is a specific combination of several secondary structures (such as Spiral - Corner - Spiral )Composition; These combinations are also called Super secondary structure Structural patterns often also include loop areas with different lengths.
Folding type refers to the overall structure arrangement type, such as Spiral fasciculus and β bucket
Although eukaryotes can express tens of thousands of different proteins, the number of corresponding domains, structural patterns and folding types is much less. One reasonable explanation is that this is the result of evolution; Because genes or parts of genes can genome The inside is doubled or moved. That is, through Gene recombination A domain can be moved from the corresponding protein A to the protein B that does not have this domain, and its evolutionary driving force may be that the corresponding biological function of this domain tends to be utilized by protein B.

protein folding

Announce
edit
Before and after protein folding
The process from primary structure to higher-level structure is called protein folding A sequence specific polypeptide chain (proteins before folding are generally called polypeptide chains) is generally folded into a specific conformation (also called natural conformation); But sometimes it can be folded into more than one conformation, and these different conformations have different biological activities. stay Eukaryotic cell The correct folding of many proteins requires Molecular chaperone Help for.

Structure classification

Announce
edit
There are many ways to classify protein structure, and there are many structure databases (including SCOP CATH and FSSP )Different methods are used to classify the structure. The SCOP classification is cited in the PDB database for protein structure. For most of the classified protein structures, the classification of SCOP, CATH and FSSP is the same, but there are some differences in some structures.

Structure determination

Announce
edit
Specialized in storing the molecular structure of proteins and nucleic acids Protein database Nearly 90% of the protein structure is X-ray crystallography Measured by the method of. X-ray crystallography can resolve the three-dimensional coordinates of all atoms in protein under certain resolution by measuring the spatial distribution of electron density of protein molecules in the crystal. About 9% of known protein structures are nuclear magnetic resonance Technology. This technique can also be used to determine the secondary structure of proteins. In addition to NMR, there are also some biochemical techniques used to determine the secondary structure, including Circular dichroism Cryoelectron microscopy It is a new method to obtain low resolution (less than 5 Angstrom )Protein structure method, the biggest advantage of this method is that it is suitable for large protein complexes (such as virus coat ribosome And amyloid protein fiber); In addition, in some cases, structures with high resolution can also be obtained, such as virus shells with high symmetry and Membrane protein Two dimensional crystal. [4-5]
Analysis of possible problems in protein structure with different resolutions (X-ray crystallography)
Resolution( Angstrom
Possible problems in the structure
>4.0
Single atomic coordinate is meaningless
3.0 - 4.0
The overall folding may be correct, but there may be errors. Many side chains are placed incorrectly.
2.5 - 3.0
The overall folding is basically correct, except that some annular structures located on the surface of the structure may not be modeled correctly. The side chains of polar residues (Lys, Glu, Gln, etc.) of long side chains and small side chain residues (Ser, Val, Thr, etc.) may be placed incorrectly.
2.0 - 2.5
Similar to 2.5 - 3.0, with fewer errors. Water molecules and small ligands can be clearly observed.
1.5 - 2.0
The side chain is basically placed correctly, and even some small errors can be detected. Overall folding, including the annular structure on the surface of the structure, is almost impossible to make mistakes.
0.5 - 1.5
At this resolution, there is generally no structural error. Both side chain isomer library and solid geometry research are carried out using structures within this resolution range.
In recent years, with the Structural genomics With the rise of the new era, a large number of protein structures have been determined, providing important structural information for studying the mechanism of protein action.

structure prediction

Announce
edit
It is much easier to determine protein sequence than protein structure, and protein structure can give much more information about its functional mechanism than sequence. Therefore, many methods are used to predict structures from sequences.
  • Secondary structure prediction
  • Three level structure prediction
    • Homologous modeling : It needs to be predicted based on the tertiary structure of homologous proteins.
    • Start from scratch( Ab initio ): Only protein sequence is needed for structure prediction. Due to the large amount of calculation, supercomputers are needed or Distributed Computing , such as Rosetta@home Etc.
  • Four level structure Prediction: mainly to predict the protein protein interaction mode. [6]

Research progress

Announce
edit
On July 28, 2022, according to the Guardian, Google's AI company DeepMind has further cracked almost all known protein structures. The database built by its AlphaFold algorithm now contains more than 200 million known protein structures, paving the way for the development of new drugs or new technologies to deal with global challenges such as famine or pollution. [7]