Genetic code

A set of rules
open 2 entries with the same name
Collection
zero Useful+1
zero
The genetic code is a set of rules that will DNA or RNA The sequence consists of three nucleotide As a group Codon Translated as protein Of amino acid Sequence for protein synthesis
It decides Peptide chain The synthesis sequence of each amino acid and each amino acid, as well as the start, extension and termination of protein synthesis.
Genetic code is also called codon Genetic codon Triad password , hiding the secrets of life and its historical evolution.
Chinese name
Genetic code
Foreign name
genetic codon

brief introduction

Announce
edit
The genetic code is Living cell Used to encode genetic material Information translated as protein A set of rules for [1-2] MRNA is translated by ribosome Completed, ribosome utilization Transport RNA (tRNA) molecule reads three nucleotides of mRNA at one time, and encodes amino acid according to Messenger RNA (mRNA) assigned sequence completion protein Polypeptide chain Synthesis of. because Deoxyribonucleic acid DNA Double chain In general, only one single chain (called template chain) is transcribed into Messenger ribonucleic acid mRNA ), and another single chain (called Coding chain )It is not transcribed, so even for organisms with double stranded DNA as genetic material, the code is also used Ribonucleic acid RNA )In Nucleotide sequence Instead of the DNA Deoxynucleotide Sequential representation.
The genetic code determines the protein amino acid Sequential nucleotide Sequence, consisting of three consecutive nucleotides Codon Constitute. The genetic code is highly similar in all organisms. Almost all organisms use the same genetic code, which can be expressed in a codon table containing 64 entries [1] Even if not Cellular structure Of Viruses They also use the standard genetic code. But there are also a few creatures that use slightly different genetic codes [2]
Although the "genetic code" determines the amino acid sequence of the protein genome The region has decided to“ gene regulation The time and place of producing these proteins.
The genetic code is composed of two relatively independent systems, RNA and DNA, to realize the simultaneous occurrence of hundreds of thousands of cells biochemical reaction Conduct orderly information management and control, because in the process of life construction and operation, the mRNA will be destroyed immediately after the mission is completed genetic information It is to be preserved forever, which is the basis of racial continuity. The genetic code is related to Primitive life Biochemical system of Coevolution However, the birth of the genetic code is an important symbol of the birth of life [3]

Basic characteristics

Announce
edit

directional

Codon Yes MRNA molecule Of Base In terms of sequence, its reading direction is mRNA The synthesis direction or mRNA coding direction is consistent, that is, from the 5 'end to the 3' end.

Continuity

The reading direction of mRNA is from 5 'end to 3' end Codon None between nucleotide separate. On mRNA strand Base The insertion, deletion and overlap of will cause Frameshift mutation

Degeneracy

Refers to a amino acid With more than one Codon The third digit of the codon Base Changes often do not affect the translation of its triplet coded amino acids.

Wobbliness

MRNA Codon And Transport RNA (tRNA) J Anticodon For pairing identification, observe Base complementary pairing principle However, there may also be loose pairing, especially the third base of the codon and the first anti codon Base pairing Unstrict base complementation often occurs, which is called Wobble pairing

generality

Protein Biosynthesis Complete password for, from prokaryote It is universal to all human beings. But a few exceptions have been found [2] , such as Animal cell Of mitochondrion plant cell Of chloroplast

Cracking process

Announce
edit

history

The discovery of genetic code was a great crystallization of a wonderful imagination and rigorous argument in the 1950s. There are four kinds of mRNA Base adenine (A for short) Uracil (U for short) cytosine (C for short) Guanine (G for short) nucleotide form. At first, scientists suspected that one base determines one amino acid , then only four amino acids can be determined, which is obviously not enough to determine the Twenty amino acids If two bases are combined to determine one amino acid, then 16 amino acids can be determined, which is obviously not enough. If three bases are combined to determine an amino acid, there are sixty-four combinations (4 * 4 * 4=64). front Soviet Union Scientist George Gamow first pointed out that only a group of three nucleic acids can be used for 20 amino acid code [4] Crick's experiment proved for the first time that the codon consists of three DNA Base composition In 1961, Heinrich Matthaei of the National Institutes of Health and Marshall Warren Nirenberg Acellular system (Cell free system) Uracil (U) The RNA composed of Phenylalanine (Phe), which cracked the first Codon (UUU -> Phe)。 Then Har Gobind Khorana cracked other codons, and then Holly (Robert W. Holley) found tRNA responsible for the transcription process. In 1968, Kolana, Holly and Nierenberg shared Nobel Prize in Physiology or Medicine

Reading style

To decipher the genetic code, you must know how to read it. There may be two ways to read genetic code: overlapping reading and non overlapping reading. For example, on mRNA Base The arrangement is AUGCUACCG. If non overlapping reading is AUG, CUA, CCG; if overlapping reading is AUG UGC、GCU、CUA、UAC、ACC、CCG。 Two different reading methods will produce different amino acid Arrange. Crick Equivalent T phage by Experimental materials Found at Coding area Adding or deleting a base will not produce a protein with normal function; Adding or deleting two bases will not produce a protein with normal function. However, when three bases are added or deleted, proteins with normal functions are synthesized. Their experiments proved that three bases in the genetic code encode one amino acid The way to read the password is to start from a fixed starting point in a non overlapping way, and there is no Separator

Decoding method

Nierenberg et al found that nucleotide The constructed micro mRNA can promote the corresponding amino acid -TRNA binds to ribosomes. But micro mRNA cannot Synthetic polypeptide , so it is not necessarily reliable. Kolana (Khorana, Har Gobind) Synthesize mRNA with known two, three or four nucleotide sequences, and add it into the extracellular translation system Radioactive label And then analyze the composition of amino acids in the synthesized polypeptide.
Through comparison, find out the same part of the triplet code in the experiment, and then find out the same part of the polypeptide amino acid Therefore, it can be determined that the triplet code is the genetic code of the amino acid. Kolana This method was used to decipher all the genetic codes, thus obtaining 1968 Nobel Prize
Later, Nierenberg et al. conducted experiments with a variety of different artificial mRNA, and obtained many observations Peptide chain Type of amino acid on, and then statistical method Infer that Triad password The frequency of occurrence, analysis and the frequency of various amino acids in the synthetic protein relevance This method can also find all the genetic codes of 20 amino acids. Finally, the scientists also used three nucleotide To check the corresponding amino acids, further confirming that all Codon

Codon table

Announce
edit
This table lists 64 types Codon as well as amino acid Standard pairing of.
Section
one
position
alkali
base
Second base
Section
three
position
alkali
base
-
U
C
A
G
-
U
UUU (Phe/F) Phenylalanine
UUC (Phe/F) phenylalanine
UUA (Leu/L) leucine
UUG (Leu/L) leucine
UCU (Ser/S) serine
UCC (Ser/S) serine
UCA (Ser/S) serine
UCG (Ser/S) serine
UAU (Tyr/Y) Tyrosine
UAC (Tyr/Y) Tyrosine
UAA (Termination)
UAG (Termination)
UGU (Cys/C) Cysteine
UGC (Cys/C) cysteine
UGA (Termination)
UGG (Trp/W) Tryptophan
U
C
A
G
C
CUU (Leu/L) Leucine
CUC (Leu/L) Leucine
CUA (Leu/L) Leucine
CUG (Leu/L) Leucine
CCU (Pro/P) proline
CCC (Pro/P) Proline
CCA (Pro/P) proline
CCG (Pro/P) Proline
CAU (His/H) histidine
CAC (His/H) histidine
CAA (Gln/Q) glutamine
CAG (Gln/Q) glutamine
CGU (Arg/R) Arginine
CGC (Arg/R) Arginine
CGA (Arg/R) Arginine
CGG (Arg/R) Arginine
U
C
A
G
A
AUU (Ile/I) isoleucine
AUC (Ile/I) isoleucine
AUA (Ile/I) isoleucine
AUG (Met/M) methionine (From)
ACU (Thr/T) threonine
ACC (Thr/T) threonine
ACA (Thr/T) threonine
ACG (Thr/T) threonine
AAU (Asn/N) Asparagine
AAC (Asn/N) asparagine
AAA (Lys/K) lysine
AAG (Lys/K) lysine
AGU (Ser/S) serine
AGC (Ser/S) serine
AGA (Arg/R) arginine
AGG (Arg/R) Arginine
U
C
A
G
G
GUU (Val/V) valine
GUC (Val/V) valine
GUA (Val/V) valine
GUG (Val/V) valine
GCU (Ala/A) alanine
GCC (Ala/A) Alanine
GCA (Ala/A) Alanine
GCG (Ala/A) Alanine
GAU (Asp/D) Aspartate
GAC (Asp/D) aspartic acid
GAA (Glu/E) glutamate
GAG (Glu/E) glutamic acid
GGU (Gly/G) glycine
GGC (Gly/G) glycine
GGA (Gly/G) glycine
GGG (Gly/G) glycine
U
C
A
G
Note: (Start) Standard start code, which is also methionine code. mRNA The first AUG is the starting site of protein translation.

Inverse cipher

Announce
edit
This table lists and 20 amino acid and Codon Standard pairing of.
Ala
A
GCU,GCC,GCA,GCG
Leu
L
UUA,UUG,CUU,CUC,CUA,CUG
Arg
R
CGU,CGC,CGA,CGG,AGA,AGG
Lys
K
AAA,AAG
Asn
N
AAU,AAC
Met
M
AUG
Asp
D
GAU,GAC
Phe
F
UUU,UUC
Cys
C
UGU,UGC
Pro
P
CCU,CCC,CCA,CCG
Gln
Q
CAA,CAG
Ser
S
UCU,UCC,UCA,UCG,AGU,AGC
Glu
E
GAA,GAG
Thr
T
ACU,ACC,ACA,ACG
Gly
G
GGU,GGC,GGA,GGG
Trp
W
UGG
His
H
CAU,CAC
Tyr
Y
UAU,UAC
Ile
I
AUU,AUC,AUA
Val
V
GUU,GUC,GUA,GUG
start
AUG
termination
UAG,UGA,UAA
-

Other information

Announce
edit

Reading box

Reading box Translated by Starting site The initial nucleotide triplet code of Reading Frame , called“ Open reading box ”(ORF)。 For example, a sequence GGGA AAC CC, If read from the first position, it includes three codons GGG, AAA and CCC. If reading from the second digit, it includes GGA and AAC (ignoring incomplete codons). GAA and ACC if read from the third digit. Therefore, each sequence can be divided into three reading frames, each of which can produce different amino acid Sequence (in the above example, the corresponding is Gly - Lys -Pro, Gly Asp, and Glu- Thr )。 And because Double helix of DNA Structure, each piece of DNA actually has six reading frames. The actual framework is composed of Initial codon It is usually the first AUG appearing on the mRNA sequence. Variations that break the reading frame (for example, inserting or deleting 1 or 2 nucleotide )It is called reading frame variation, which usually seriously affects the function of proteins, so it is not common because they usually cannot survive in evolution. stay Eukaryote ORF in exons is often interrupted by introns.

Start and stop codon

Protein translation Initialize from Codon Initial codon )Start. The single initiation codon is not enough to start the translation process, and proper initialization sequence and initiation factor are required to combine mRNA and ribosome, such as Escherichia coli Shine Dalgarno sequence and Starting factor The most common starting codon is AUG, which is encoded at the same time amino acid When bacteria are Formyl methionine In eukaryotes, it is methionine, but in some cases other codons also have the starting function. Other alternative starting codons include "GUG" or "UUG", coded respectively valine or leucine , but as the starting codon, they are translated into methionine or formyl methionine [5]
The stop codon is also called the "stop" or "meaningless" codon. In classical genetics, Termination codon Name: UAG Amber, UGA Opal (opal), UAA is Ochre ochre )。 These names were given by the inventor who first discovered these stop codons. Because no homologous tRNA has these complementary anti codons of termination codons Release factor have a chance to Combine with ribosome to promote the separation of newly synthesized polypeptide from ribosome Translator In addition, in the mammal Of mitochondrion AGA and AGG also act as termination codons.

Degeneracy

Genetic coded Degeneracy It refers to the redundancy of genetic code. This word is given by Bernfield and Nirenberg. Genetic code has redundancy but no ambiguity [6]
gross Codon have Degeneracy That is, two or more codons have the same code amino acid Degenerate codons usually have only the third digit Base Different, for example, GAA and GAG are coded glutamine If no matter what the third digit of the code is nucleotide They all encode the same amino acid, which is called quadruple degeneracy; If the third position contains two of the four possible nucleotides and encodes the same amino acid, it is called double degeneracy. Generally, the two equivalent nucleotides in the third position are purine (A/G) or pyrimidine (C/T)。 Only two amino acids are encoded by only one codon, one is methionine, which is encoded by AUG, and also Initial codon The other is Tryptophan , coded by UGG.
The degeneracy of the genetic code can make the gene more tolerant to point mutations. For example, quadruple Degenerate codon It can tolerate any variation of the third codon; The double degenerate codon makes one third of the possible third position variation do not affect the protein sequence.

Non-standard genetic code

Although the genetic code has strong consistency between different lives, there are also non-standard genetic codes. In the "cell energy factory" mitochondrion There are several differences from the standard genetic code, and even different organisms' mitochondria have different genetic codes. mycoplasma UGA will be translated into tryptophan. a ciliate UAG (and sometimes UAA) is translated into glutamine (some Green algae Same phenomenon), or translate UGA into Cysteine Some yeast will translate GUG into serine In some rare cases, some proteins may have other than AUG Initial codon   fungus Protozoa The main differences between the genetic code and the standard genetic code in the mitochondria of humans and other animals are as follows:
Codon
Common functions
Role of exceptions
Belonging creature
UGA
Abort encoding
Tryptophan code
Mitochondria of human, cattle and yeast, genome of Mycoplasma, such as Capricolum
UGA
Abort encoding
Cysteine coding
Nuclear genomes of some ciliates, such as Euplotes
AGR
Arginine code
Abort encoding
Most animal mitochondria, vertebrate mitochondria
AGA
Arginine code
Serine coding
Drosophila mitochondria
AUA
Isoleucine coding
Methionine coding
Some animal and yeast mitochondria
UAA
Abort encoding
Glutamine coding
Paramecium, some ciliates, such as ThermophAilus tetrahymena
UAG
Abort encoding
Glutamate coding
Nuclear and nuclear genome of paramecium
GUG
Valine coding
Serine coding
Candida nuclear genome
AAA
Lysine code
Aspartate coding
Mitochondria of some animals, Drosophila melanogaster mitochondria
CUG
Leucine coding
Abort encoding
Nuclear genome of Candida cylindracea
CUN
Leucine coding
Threonine coding
Yeast mitochondria
According to the sequence of messenger RNA, the stop codon in some proteins will be translated into non-standard amino acid , e.g. UGA translates as Selenocysteine , UAG translates as Pyrrolysine Selenocysteine and pyrrolysine are considered to be the 21st and 22nd amino acids.
With a deeper understanding of the genome sequence, scientists may also find other non-standard translation methods, as well as the application of other unknown amino acids in biology.

Codon usage preference

The frequency of codons, also known as codon usage bias, can vary from species to species and has a functional significance in controlling translation.