Viral genome

Biological terminology

This entry is missing Overview , add relevant content to make the entry more complete, and it can also be upgraded quickly. Hurry up edit Come on!

Viruses It is the simplest organism. The complete virus particles include the shell protein and the internal Genomic DNA or RNA (Some viruses have a layer of protein host cell Constitutive Capsule (envelope), which contains Viral gene Coded glycoprotein 。 The virus cannot replicate independently and must enter the host cell with the help of some enzymes and Organelle So that the virus can replicate. The function of coat protein (or envelope) is to recognize and invade specific host cells and protect the viral genome from Nuclease Destruction of.

Chinese name: Viral genome
Features: Small amount of information, etc

Field: biology
Type: DNA or RNA

catalog

one Structure function
two Structural characteristics

Structure function

Announce

edit

Structural characteristics of virus genome

Genome structure and function of bovine papillomavirus

RNA phage Of genome Structure and function

hepatitis B Structural characteristics and functions of virus genome

Structural characteristics

Announce

edit

one Viruses Genome size differs greatly from bacteria or Eukaryotic cell In contrast, the genome of viruses is very small, but the genomes of different viruses are also very different. as HBV DNA Only 3kb in size, including information content It is also small, and can only code 4 types protein , and Poxvirus The genome of nucleotide Metabolic enzymes code, so poxvirus dependence It is much smaller than hepatitis B virus.

2. The viral genome can be DNA It can also be composed of RNA. Each virus particle contains only one nucleic acid, either DNA or RNA, which generally do not coexist in the same virus particle. The DNA and RNA that make up the viral genome can be single stranded or Double chain Can be closed loop molecules or Linear molecule 。 as Papillomavirus It is a closed loop Double stranded DNA virus , and adenovirus The genome of Double stranded DNA , Poliovirus It is a single chain RNA virus , and Reovirus Its genome is a double stranded RNA molecule. Generally speaking, most DNA virus The genome of most RNA viruses is Single stranded RNA Molecules.

3. The genomes of most RNA viruses are continuous Ribonucleic acid The genome RNA of some viruses consists of Discontinuity Several nucleic acid chains of influenza virus The genome RNA molecule of; and Reovirus The genome of is composed of Double chain There are 10 RNA molecules Double stranded RNA Each RNA molecule encodes a protein. No segmental DNA molecule Constitute the viral genome.

four Gene overlap That is, the same DNA segment can encode two or even three protein molecules Biological cell Only seen in mitochondrion And plasmid DNA, so it can also be considered as the structural characteristics of the virus genome. This structure enables smaller genomes to carry more genetic information 。 Overlapping gene In 1977, Sanger was studying ΦX174 Was discovered at. Φ X174 is a kind of Single strand DNA virus ， host It is Escherichia coli, therefore, it is also a bacteriophage. It's infected Escherichia coli after Co synthesis 11 protein molecules, with a total molecular weight of about 250000, equivalent to 6078 nucleotide The amount of information contained. However, the virus DNA itself has only 5375 nucleotides, which can encode a protein molecule with a total molecular weight of 200000 at most. Sanger cannot solve this contradiction for a long time before he finds out that some of the 11 genes of Φ X174 overlap. Overlapping gene There are the following situations:

(1) One gene is completely in another gene. For example, gene A and B are two different genes, and B is contained in gene A. Similarly, gene E is within gene D.

(2) Partial overlap. Such as gene K and part of gene A and C Gene overlap 。

(3) Only one of the two genes Base Overlap. Like gene D Termination codon The last base of J gene Initial codon The first base of (such as TAATG). Although most of their DNA is the same, these overlapping genes mRNA When translated into protein Read box Different, the protein molecules produced are often different. Some overlapping genes have the same reading frame but different starting positions. For example, in the SV40DNA genome, there are 122 base overlaps between VP1, VP2, and VP3 genes encoding three coat proteins, but Codon The reading frame of is different. But the small t antigen is completely large T antigen In genes, they have a common starting codon.

5. Most of the viral genome is used to encode proteins, and only a very small copy is not translated, which is similar to Eukaryotic cell The redundancy of DNA is different. For example, the untranslated part in Φ X174 only accounts for 217/5375, and that in G4DNA accounts for 282/5577, less than 5%. The untranslated DNA sequence is usually gene expression Of Control sequence 。 For example, there are 67 sequences (3906-3973) between H gene and A gene of Φ X174 Base , including RNA polymerase Binding Site, Transcriptional Termination signal and Ribosome binding site Isogenic Control area 。 Papillomavirus is a kind of virus that infects humans and animals. Its genome is about 8.0kb, and the untranslated part is about 1.0kb. This region is also expressed by other genes Regulatory area .

6. Virus genome DNA sequence Genes or rRNA The genes of Transcription unit 。 They can be transcribed together into molecules containing multiple mRNA, called Polycistron mRNA (polycistroniemRNA), and then processed into template mRNA of various proteins. as adenovirus Late gene 12 kinds of viral coat proteins Gene transcription When on a Promoter It generates polycistronic mRNA, and then processes it into various mRNA, encoding various viral coat proteins, which are functionally related; The D-E-J-F-G-H gene in Φ X174 genome is also transcribed in the same mRNA, and then translated into various proteins. J, F, G, and H all encode coat proteins. D protein is related to the assembly of viruses, and E protein is responsible for bacterial lysis. They are also functionally related.

7. Except Retrovirus Besides, all virus genomes are haploid Each gene only appears once in virus particles. There are two copies of the retroviral genome.

eight phage (bacterial viruses) whose genes are continuous; and Eukaryotic cell Viral genes are discontinuous and have Intron , except Positive chain In addition to RNA virus, Eukaryote Cytovirus genes are first transcribed into mRNA precursors, and then processed to remove introns to become mature mRNA. More interestingly, the intron or part of it of some eukaryotic viruses is intron for one gene, but it is intron for another gene Exon 。 as SV40 and Polyomavirus (polyomavirus) Early gene this is it. The early genes of SV40, namely large T and small t antigen genes, start from 5146 in a counterclockwise direction. The large T antigen gene terminates at 2676, and the small t antigen terminates at 4624. However, a 346bp segment from 4900 to 4555 is the intron of the large T antigen gene, and the DNA sequence from 4900-4624 in this intron is the coding gene of the small t antigen. Similarly, in polyoma viruses, the intron in the large T antigen gene is the coding gene of the middle T and t antigen.

Genome structure and function of bovine papillomavirus

Papillomavirus infects human and animal skin and mucosa and causes Papilloma A DNA virus of pathological changes, belonging to milk polyvacuolating virus（ papovavirus ）Section. according to viral infection Different hosts can be divided into bovine papillomavirus (BPV), Human papillomavirus (HPV), etc. All the discovered papillomavirus genomes have similar structure. BPV is taken as an example to illustrate the genome structure and function of papillomavirus. BPVDNA has a total length of 7945bp, which is a closed loop super Helical structure , on host cell Can and Histone Syngenetic formation Nucleosome 。 Single Hpa I in BPVDNA Restriction site first Base G is position 1, and the base number is positioned in the direction of 5 '→ 3'. DNA sequence analysis It indicates that all open reading frames（ ORF ）They all exist on one DNA strand, and genes overlap each other. The whole BPV genome is divided into Coding area and Non coding area (NCR), the coding region can be divided into early transcriptional functional region (E region) and late transcriptional functional region (L region) according to the different functions of the protein it encodes. 1. Non coding area (NCR) Non coding area, also known as upstream regulation area (URR) or long control area (LCR), is located in Late gene L1 Termination codon And Early gene E6 First Initial codon The length is different in different papillomavirus, about 1.0 kb in BPV. Transcribed in NCR Promoter Sequence, which can start the transcription and expression of early genes. In addition, there are Enhancer Sequence, which can be Gene product E2 protein is activated to further promote the expression of early gene AAC. The sequence of enhancer in BPVNCR region has been identified, which is TTGGCGGNNG and ATCGGTGCACCGAT Palindrome structure 。 From the structural characteristics of NCR, we can see that its main function is to regulate the expression of BPV gene.

2. The E region of BPV in the early transcription functional region (or early gene region, E region) contains eight open reading frames (ORFs), namely E6 E7、E8、E1、E2、E3、E4、E5， E6, E7 and E1 genes are partially overlapped, E8 is completely in E1, E3 and E4 are all contained in E2, and E5 and E2 are partially overlapped. E2ORF encoded protein products can be Enhancer And increase or decrease the expression level of early genes. In addition, E2ORF and E1ORF can maintain the Dissociative state and Unconformity To the chromosomes of the host cell. The proteins encoded by E6 and E7ORFs may be carcinogenic proteins. E6 and E7 proteins can cause the host to transform into malignant tumour Cells. About E6, E7 protein Cell transformation At this stage, the mechanism of is not clear, but there are two explanations. [1] In E6, E7 protein amino acid Cys-x-x-Cys found in the sequence Repeating sequence It is believed that the structure is intracellular nucleic acid Binding protein Available Specificity Therefore, E6 and E7 proteins are DNA Binding Protein , Yes Regulatory gene And further affect the proliferation and differentiation of host cells, making the process out of control and forming tumors; [2] Recently, on normal cells It is found that there are two proteins with molecular weights of 53KD and 106KD, respectively called p53 And p106 protein. These two proteins are missing or Inactivation It often causes cell malignancy. Studies have found that E7 and E6 proteins of papillomavirus can bind to p53 and p106 proteins to inactivate them, which may also be a mechanism of E6 and E7 proteins leading to cell malignancy.

3. Late transcription functional region (late gene region, L region): There are two ORFs in the L region, namely L1 and L2ORF, encoding the capsid protein of papillomavirus, wherein L1 protein is the main capsid protein and L2 protein is the secondary capsid protein.

Genome structure and function of RNA bacteriophages

The most clearly studied E. coli RNA phages are MS2, R17, f2 and Q β. Their genomes are small, only 3600-4200 nucleotide , contains four genes. MS2.R17 and f2 have almost the same genome structure. Two of the four genes encode bacteriophage structural protein One is A protein gene, 1178 nucleotides long. The function of A protein (called mature protein) is to enable bacteriophages to recognize the host and RNA genome It can enter the host bacteria, and each phage generally only has the molecular A protein. The other structural protein gene is 399 nucleotides long, encoding a coat protein to form viral particles, and each phage has 180 molecules. Other parts of the genome encode RNA Replicase And a lysoprotein. The gene encoding lysoprotein partially overlaps the genes of coat protein and replicase, but the reading frame is different from that of coat protein. There are many in MS2, R17 and f2 genomes Secondary structure , the self pairing of bases in RNA molecules may prevent RNase Degradation has a certain effect. In addition, there is a segment at the 5 'and 3' ends of the coding gene Untranslated sequence This sequence also plays a role in stabilizing RNA molecules.

The genome of another RNA bacteriophage Q β is slightly larger, which is different from the genome of the above RNA bacteriophage as follows:; [1] There is no independent lysoprotein gene, but structural protein A2 (or mature protein) has the function of lysoprotein. [2] It also encodes another coat protein A1.

Structure and function of hepatitis B virus genome

Genome of hepatitis B virus (HBV) DNA structure It's strange. It's a annular With a partial double helix structure, about 3.2kb long. Two thirds of them are Double helix structure 1/3 is single strand, which means that the two strands in DNA are unequal in length. Long chain None at 5 'end and 3' end of Covalent It is covalently linked to a protein. 250-300 pairs at the 5 'end of the long chain Base Complementary combination. The long chain is Negative chain ， Short chain by Positive chain 。 The length of the short chain varies according to the virus, generally about 1.6-2.8 kb, about 2/3 of the long chain. The space between short chains can be determined by DNA Polymerase Filling. Hepatitis B virus is the smallest known double stranded DNA virus infecting humans. In order to replicate independently in cells, viruses try to contain a large number of genetic information 。 Therefore, the genome structure of HBV appears to be particularly precise and concentrated, making full use of its genetic material 。

There are many overlapping gene sequences. There are four confirmed open reading frames in the HBV genome, which respectively encode the Core-shell (C) And envelope (S) proteins, viruses Replicase （ polymerase ）And a virus gene expression Related protein X. stay S gene The front two small ORFs and the S gene ORF belong to the same reading frame. You can read the ORFS through and code two kinds of ORFs S protein Related antigens, these two antigens also exist on the surface of virus particles, and these two antigens are called pre-S1 (pre-S1) and pre-S2 (pre-S2), respectively. Similarly, there is a short ORF in front of ORFC, called pre-C, which encodes a large C-protein related antigen. All these ORFs are in Negative chain On DNA (long strand), S gene completely overlaps with polymerase gene, X gene and polymerase gene C gene Overlap, C gene and polymerase also overlap. Recently, Miller et al. found two ORFs in the HBV genome, namely ORF-5 and ORF-6 Gene overlap ORF6 is not encoded by negative strand DNA, but is encoded by Positive chain DNA coding. The function of these two ORFs is not clear.

The regulatory sequence is located inside the gene, which is also a way for HBV to save genetic material. The sequences related to HBV genome replication are: short chain forward replication sequence (DR1 and DR2) and U5 like sequence (due to Retrovirus The U5 sequence at the end is named after the similar face). DR1 and U5 are located in the pre CORF, which is the starting site for the synthesis of long DNA strands. DR2 is located at the overlap of polymerase gene and X gene, which is the starting site for the synthesis of short DNA strands.

Related to HBV gene expression Signal sequence There are four types: [1] promoter, [2] Enhancer , [3] polyA additional signal, [4] Glucocorticoid Sensitivity factor (GRE). Because genes in HBV genome are transcribed into three kinds of HBV mRNA Transcript Therefore, there should be at least three transcripts at the near 5 'end of each transcript in the viral genome RNA Polymerase II Promoters, although the gene sequence of these promoters is unknown, these promoters obviously exist in the coding protein sequence. Enhancer (ENH) is located in polymerase gene; PolyA additional signal is located in CORF; GRE is located in SORF and polymerase genes. GRE is the same as hormone receptor DNA fragment of structure, which can make a known Gene transcription Level increase.

GRE has many characteristics of enhancers: [1] Cis action [2] plays a role in both directions of transcription, [3] plays a role at different distances from the genes it regulates.

It can be seen from the above that the HBV genome has a tight structure and efficient organization, which is rare in known viruses. HBV DNA not only has its unique structure, but also DNA replication The process is also very special. When HBV DNA enters host cell Then, it first becomes a complete closed-loop double helix DNA to Negative chain by Template synthesis Full length "+" strand RNA (called Pregenome RNA）。 The "+" strand RNA is packaged in immature core like particles, and DNA Polymerase And a protein are also packaged in granules. In this particle, the "+" strand RNA is used as a template Reverse transcriptase The specific mechanism of catalytic synthesis of "-" strand DNA is unclear, which may be related to adenovirus dna replication Similarly, because there are covalently bound proteins at the 5 'end of the "-" strand DNA. The synthesis of "+" strand DNA takes the negative strand DNA as the template and a section of RNA as the primer In the process of aggregation and extension, core like virus particles also become mature virus particles. At this time, Positive chain The DNA has not yet been synthesized, so the length of the two DNA strands of the virus genome is different.

Novice on the road

Growth task Getting Started with Editing Edit Rule Edit by myself

I have questions

Content query Online Service Official post bar Feedback

Complaints and suggestions

Report bad information Failed to appeal through entries Complaint of infringement information Blocking query and unblocking

Jinggong Network Anbei No. 1100000200001