Collection
zero Useful+1
zero

Directed evolution

Simulation of Darwin evolution in test tube
Directed evolution is to simulate the Darwinian evolution process in a test tube, artificially create a large number of mutations through random mutation and recombination, give selection pressure according to specific needs and purposes, screen out proteins with desired characteristics, and achieve molecular level simulated evolution. This is the most promising way to improve protein performance. The development of directed evolution has broadened the design scope of protein engineering, which can be applied to unknown targets protein structure Information and action mechanism.
Chinese name
Directed evolution
Foreign name
Directed Evolution

History

Announce
edit
At the early stage of directed evolution, scientists mainly used it to screen and control (or influence) the required phenotype. In the middle of the 20th century, protein directed evolution was introduced into the laboratory to reproduce and study the natural evolution process. In recent years, directed evolution has been more used to improve protein performance Protein drugs Stability, half-life, immunogenicity, development of new substrate utilization of enzymes, and improvement or expansion of new metabolic pathways. Recent research shows that protein directed evolution has been successfully applied to the design of key enzymes in metabolic pathways, the development of catalytic functions of new substrates, and the creation of new Functional protein In metabolic engineering and Synthetic Biology The field has played an important role [1]

Common strategies

Announce
edit
The essence of protein directed evolution is to build a molecular diversity library and screen mutants with improved characteristics from the library. According to the different principles of library construction, it can be divided into four strategies: random evolution, shuffling technology, semi rational evolution and rational evolution. The general idea is that a target gene or a family of related genes start, Create a molecular diversity library by mutation or recombination of coding genes; Screening the library to obtain genes that can encode improved traits, which can be used as a template for the next round of evolution; It takes thousands of years to complete evolution in nature in a short time, so as to obtain proteins with improved or new functions [1]

Random evolution

Protein separation and purification The rapid development of technology and software related to structure analysis has produced a large number of protein structure Information. However, it is still very difficult to predict the effective mutation points of proteins, so random evolution is still a very effective means of protein directed evolution [2]
(1) Error prone PCR technology
Error prone PCR is a simple and fast method for DNA The method of randomly creating mutations in sequences is to create sequence diversity libraries by changing the concentration of Mg2+, the concentration ratio of dNTP, pH value in the traditional PCR reaction system, or using Taq enzyme with low fidelity, so that bases are randomly introduced into errors to a certain extent. This method is also useful for establishing arbitrary nucleotide sequence libraries or introducing mutations during expression and screening. However, because the key of this method is to control the appropriate mutation frequency, a lower mutation rate (each generation has 2-3 base substitutions or one amino acid substitutions) can accumulate most of the beneficial mutations, while a higher mis mixing rate will produce neutral mutations or harmful mutations. In addition, this method is a single mutation within the molecule, and more harmful mutations than beneficial mutations. When harmful mutations account for the majority in a mutant, only inactive protein molecules can be formed generally; Similarly, if the number of mutation sites is too small, the wild type sequence will take an absolute advantage in the whole system, and the characteristics are not obvious, which is not conducive to the subsequent screening and identification work.
(2) Site directed mutagenesis and site directed saturation mutagenesis
Site directed mutagenesis is generally used to mutate DNA specific sites, so it is necessary to know the sequence of wild type genes in advance. As early as 1978, Michael Smith used oligonucleotides for site directed mutagenesis. The basic principle of site directed mutagenesis is to first synthesize a DNA primer containing the mutant base, and then this synthetic primer can be hybridized to the single strand DNA containing the target gene DNA Polymerase Extend the remaining fragment and transfer the obtained double chain molecule into host cell And cloned, and finally screened out the mutants with a specific screening method. Site directed mutagenesis includes box mutagenesis and many PCR based mutagenesis methods, the most common of which are overlapping PCR. When the target amino acids can be replaced by other 19 amino acids to obtain mutants, this method is called site-specific saturation mutation technology. This method is to focus on finding the optimal amino acids for the target sites. The application of site directed mutagenesis and site directed saturation mutagenesis can greatly enrich the diversity of mutant library.
(3) Combined active center saturation mutation test
The basic principle of combinatorial active site saturation test (CAST) is to find a series of amino acid pairs that are close to each other in space as mutation sites in the active center of the enzyme, and the selected amino acid pairs must have potential synergy in the orientation of side chain groups. Therefore, it is possible to obtain more potential mutants after mutation. This is not possible with single point mutation. If the corresponding position of the selected amino acid is n, the selection of the second amino acid follows the following principles: if the nth amino acid is on the ring, the other amino acid selects the n+1 position; If at β For folding, select the n+2 th bit; if it is on the 310 spiral, select the n+3 th bit; If at α On the spiral, select the n+4 th bit. For the calculation of CAST mutant library capacity: each pair of amino acids mutated will undergo saturated random mutation, that is, this pair of amino acids will mutate into any one of 20 amino acids. With NNK (N for any nucleotide, K for G or T) as the base form of mutation, there are 322=1024 different combinations, while amino acids can mutate into 202=400 different combinations. Therefore, to achieve 95% coverage of all mutations in each library, at least 3000 clones should be selected from each library. In addition, more novel asexual mutation techniques have been developed in recent years. For example, trinucleotide mutation (TriNex), random insertion deletion mutation (RID), sequence saturation mutation (SeSaM) and its improved method SeSaM Tv+. These methods are to maximize the diversity of mutant library, enrich and extend the methods and means of asexual mutation.

DNA shuffling

Compared with random mutation, shuffling technology can obtain more sequence changes. According to different reorganization ideas and different experimental conditions, the following strategies can be selected [3]
(1) Homologous reorganization
The typical representatives of homologous shuffling are DNA Shuffling and Family Shuffling. This technology uses enzyme or physical methods to cut a group of genes with beneficial mutation sites (or naturally existing gene families) into small fragments at random, and then conduct primer free PCR to extend them. Finally, use primers on both sides of the gene to synthesize full-length genes. This technology can be combined with random mutation technology to rapidly accumulate beneficial mutations and obtain the enzyme gene of the best mutation combination. Its advantages are simple operation, no need to understand the protein structure information, and easy to obtain benign mutations. However, the disadvantage is that only a group of sequences with high homology can be reorganized (more than 70%). On the basis of this idea, there are many improved methods. For example, random primers can be used to amplify full-length genes. During amplification, random primer in vitro recombination (RPR) can also be introduced; Recombined extension augmented templates (RETT) using single strand DNA as template; Simplify the experimental process, and complete the stagger extension process (StEP) in the same tube; Random chimerism on transient templates (RACHITT) and so on, using temporary templates to obtain high recombination rate.
(2) Nonhomologous reorganization
DNA Shuffling requires high homology of the genes it operates on, but the similarity of most homologous sequences in nature cannot meet such high requirements. Therefore, many shuffling technologies based on non homologous sequences have been developed in recent years. For example, incremental pruning for the creation of hybrid enzymes (ITCHY) can digest two parent genes separately with nucleic acid exonuclease to produce a series of gene fragments with single base difference, and then connect the two groups of fragment genes with each other to produce a hybrid gene library. The advantage of this technology is that it can produce recombination between two genes with no homology or low homology, but the disadvantage is that recombination must be produced between two different parents, and the ratio of functional heterozygotes in offspring is very low. For the latter defect, Sieber et al proposed sequence homology independent protein recombination (SHIPREC): agarose gel electrophoresis The recovery of single gene length random fragments ensures the conservation of offspring chimeric length, so that the two amino acids at the intersection are still in their parent protein structure Therefore, the proportion of functional heterozygotes in the library is increased. The former defect was solved by SCRATCHY technology, which was developed on the basis of ITCHY and DNA shuffling technology. First, ITCHY was used to establish an extensive heterozygous gene library between two low homologous genes, and then the library was used as a parent gene for DNA shuffling. It is characterized by multiple DNA hybridization sites independent of the homology of gene sequences. There are some others Nonhomologous recombination Technology has been established, such as SCOPE (Structure based Composite Protein Engineering) technology, which is a semi rational protein engineering technology. It can establish a multi site hybrid gene library between non homologous genes, and then screen hybrid proteins. SISDC (Sequence independent site directed housekeeping) technology can make a group of protein genes with little or no homology recombine at multiple scattered sites.
(3) Domain reorganization
Domain reorganization does not take nucleotide as a unit, but takes gene fragments with relative integrity as a reorganization unit. A typical example is Exon shuffling (Exon Shuffling)。 In many eukaryotic genes, one exon encodes a folding domain. Therefore, recombination between introns can be used to assemble independent exons into genes encoding new proteins. By controlling the range of exons involved in reorganization, a mutation library with different characteristics can be generated, which can introduce more rational design, so it can be applied to some special fields such as medicine. Through in-depth consideration of domain reorganization, we can find that we can even reorganize two functionally related adjacent genes as two large domains to improve their synergy. Furthermore, the whole genome of two cells can be reorganized to change the metabolic pathway.

Semi rational evolution

Although the random evolution strategy is very effective, there are still problems such as large mutation library, few positive mutations, and difficulty in screening. Semi rational evolution strategy uses bioinformatics methods to analyze a large amount of protein sequence alignment information, secondary structure data, Even more targeted modification of the protein based on the three-dimensional conformation of the target protein obtained from homologous modeling not only improves the positive mutation rate, but also greatly reduces the capacity of the mutation library, making it easier to screen. The key to semi rational evolution is to obtain potential beneficial mutation sites through computer simulation, Then use appropriate saturation mutation technology to build appropriate mutation library. In addition, for proteins with more complex structures, they can be divided into different structural units, and independently evolve within them. Combine and screen the best evolutionary unit to get complete proteins [1]

Rational evolution

The rational evolution strategy is mainly completed in the computer (in silico). Computer modeling is used to predict the active sites of proteins and investigate the effect of a gene mutation on the stability, folding and binding of target proteins to substrates, so as to guide the design of protein evolution and improve the success rate of experiments.
In metabolic engineering, although the effect of reaction energy barrier on pathways is very important, neither random evolution nor semi rational evolution strategy can directly solve this problem. De novo design can consider these factors. In ab initio design, the target catalytic reaction is first obtained based on quantum mechanical modeling, and a reaction with high energy barrier is additionally considered to speculate its transition state, locate the required catalytic side chain and combine the optimized transition state of the reaction. The theoretical protein mutation library containing the transition state and protein functional groups involved in binding and catalysis is obtained by using the QM/MM model analysis, and then Rosetta, ORBIT PyMol and other software search for protein main chain groups that can support these ideal active sites in a large number of stable protein scaffolds, and finally optimize the gene sequence for experimental verification. This method can automatically select the appropriate algorithm to search the catalytic side chain transition state from hundreds of potential template enzyme side chains, and locate it to the appropriate position. In some studies, a single protein or the best active site can also be manually selected. The analysis of the evolutionary trajectory of enzymes by de novo design helps people to enhance their understanding of the natural evolution of proteins, and the feedback information obtained from experiments can also better help improve rational design. In addition, computer prediction can be used to establish metabolic pathways and analyze Metabolic network So as to use engineering methods to select important enzymes for evolution, so as to achieve the optimization purpose of the entire path or network [1]

Application and prospect

Announce
edit
Molecular modification of proteins through directed evolution technology of proteins has greatly promoted the development of many fields such as enzyme engineering, metabolic engineering and medicine, and has made great achievements in enhancing the stability and substrate specificity of proteins, changing or enhancing the activity of proteins, etc [1-2]

Improve the catalytic activity of enzymes

Improving the catalytic activity of enzymes is one of the most basic desires for protein modification. Many reports have proved that directed evolution technology has played a good role in improving the catalytic activity of enzymes. Shim et al. used the site directed mutagenesis technology to construct the mutant, which changed Met-317 in the "protein-S binding" pocket located in the enzyme activity center, and mutated to Ala, which is conducive to the expansion of the substrate range. In fact, it has also proved that it has a higher catalytic ability to catalyze phosphodiester bonds. Starch sucrase is a kind of amylase obtained by catalyzing sucrose, but its application is greatly limited because of its low activity of directly catalyzing sucrose. Potocki Veronese, etc Error prone PCR And DNAShuffling technology to obtain the mutant Asn387Asp whose catalytic sucrose activity is increased by 60%, so as to expand its application scope, which is of great significance in practical production.

Improve enzyme stability

Another important aspect of improving or changing the characteristics of proteins is to improve the thermal stability, because this is a problem often encountered in industrial production using enzyme catalyzed reactions. In order to solve this problem, many scientists have adopted the method of directed evolution to modify proteins, hoping to improve their thermal stability. Xiao et al. used the results of sequence alignment as guidance to carry out site directed mutagenesis, and obtained that the Tm value of the mutant increased by 6 ℃. At the same time, on the basis of not affecting the original catalytic activity, the half-life at 45 ℃ was 23 times higher than that of the original enzyme. Miyazaki et al. integrated the methods of random mutation, site saturation mutation and DNA shuffling to meet the needs of production Xylanase Modification is carried out to improve its thermal stability. The semi thermal denaturation temperature of the mutant was increased from 58 ℃ to 68 ℃, and the optimum temperature was also increased from 55 ℃ to 65 ℃. At the same time, the thermal stability of the mutant at 60 ℃ was also significantly enhanced.
In addition, proteins often lose their original catalytic activity in organic solvents and water organic solvent mixed solutions, so Moore et al. carried out random mutations from Bacillus subtilis A mutant strain of protein E was screened from the mutant library, which can reach 60% Dimethylformamide The enzyme activity in the solution is equal to that in the water solution; Similarly, the activity of p-nitrophenylesterase in 30% dimethylformamide aqueous solution was increased by more than 100 times through random mutation and gene recombination.

Change substrate specificity of enzyme

Manu et al Error prone PCR After screening, the action substrate of the mutant was transformed from traditional cellobiose to lactose, and then the activity of lactose phosphorylase was improved. The activity of lactose phosphorylase of the mutant strain was 10 times higher than that of the original one. change Enantiomer Specificity. Schmidt et al. used the error prone PCR technology to increase the E value of the corresponding selective wild type of Pseudomonas fluorescens esterase from 63 to 96. At the same time, the activity also increased accordingly, which can transform 50% of the substrate in 2 minutes, equivalent to 1.25 U/mg of protein. Arnold et al. used error prone PCR and saturation mutagenesis methods to successfully convert a hydantoinase strain that is inclined to D-type substrate into an L-type substrate. After analysis, this transformation only generated an amino acid substitution. Compared with the wild type catalytic protein, it increased the production of L-methionine and reduced unnecessary product accumulation.