Severe acute respiratory syndrome coronavirus2(SARS-CoV-2)emerged from ayet-to-be-defined animal reservoir and initiated a pandemic in 2020(1个–5)。安装has acquired limited adaptions,most notably the Asp614→Gly(D614G)substitution in the spike(S)glycoprotein(6个–8个)。Humoral immunity to Sglycoprotein appears to be the strongest correlate of protection(9),and recently approved vaccines deliver this antigen by immunization.Coronaviruses such as SARS-Cov-2acquire substitutions slowly as the result of a proofreading RNA-dependent RNA polymerase(RdRp)(10单击功能区上,11)。Other emerging respiratory viruses have produced pandemics followed by endemic human-to-human spread.The latter is often contingent upon the introduction of antigenic novelty that enables reinfection of previously immune individuals.Whether SARS-CoV-2S glycoprotein will evolted,or specifically how it may change in response to immune pressure,remains unknown.We and others have reported the acquisition of deletions in the N-terminal domain(NTD)of the S glycoprotein during long-term infections of immunocompromised patients(12–15)。We have identified this as an evolutionary pattern defined by recurent deletions that alter defined antibody epitopes.Unlike substitutions,deletions cannot be corrected by proofreading acivity,and this may accelerate adaptive evolution in SARS-Cov-2。
An immunocompromised cancer patient infected with SARS-CoV-2 was unable to clear the virus and succumbed to the infection74 days after COVID-19 diagnosis(15)。Treatment included remdesivir,dexamethasone,and two infusions of convalescent serum.We designate this individual as Pittsburgh long-term infection1(PLTI1)。We consensus-sequenced and cloned S genes directly from clinical material obtained 72 days after COVID-19 diagnosis and identified two variant s with deletions in the NTD()。
Deletions in SARS-Cov-2spike glycoprotein arise during persistent infections of immunosuppressed patients。(A)Top:Sequences of viruses isolated from PLTI1(PT)and viruses from patients with deletions in the same NTD region。Chromatograms are shown for sequences from PLTI1,which include sequencing of bulk reverse transcription products(CON)and individual cDNA clones。Bottom:Sequences从other long-term infections到individuals AM(18),MA-JL(MA)(19),and a MSK cohort(M)with individuals3,4,6,8and11(13)。Letters(A and B)designate different variants from the same patient。(B)Sequences of viruses from two patients(M2and M13)with deletions in a different region of the NTD。所有请求到验证请求{“type”:“entrez-nucleotide”,“attrs”:{“text”:“MN985325”,“term_id”:“1800408777”,“term_text”:“MN985325”N985325(WA-1)。See fig.S1 for genetic analysis of patient isolates.Amino acid abbreviations:A,Ala;D,Asp;F,Phe;G,Gly;H,His;K,Lys;L,Leu;N,Asn;P,Pro;R,Arg;S,Ser;V,Val;Y,Tyr。
从PLTI1and a similar report(12)prompted us to interrogate patient metadata sequences deposited in GISAID(16)。在similar viruses,weidentified eight patients with deletions in the S glycoproteins of viruses sampled longitudinally over a period of weeks to months(and fig.S1A).For each,early time points had intact S sequences and later time points had deletions within the S gene.Six had deletions that were identical to,overlapping with,or adjacent to those in PLTI1.Deletions at a second site were present in viruses isolated from two other patients();reports on these patients have since been published(13单击功能区上,14)。Viruses from all but one patient could be distinguished from one another by nucleotide differences present at both early and late time points(fig.S1B)。On a tree of representative contemporaneously circulating isolates,they form monophyletic clades,making either a second community-acquired or nosocomially acquired infection unlikely(fig.S1C)。The most parsimonious explanation is that these deletions arose independently as the result of a common selective pressure to produce strikingly convergent outcomes。
We searched the GISAID sequence database(16)for additional instances of deletions within S glycoproteins。From a dataset of 146795sequences(deposited from1December2019to24October2020)weidentified1108viruses with deletions in the S gene。为什嚒要这样做)。We term these important sites recurrent deletion regions(RDRs),numbering them1to4from the 5'to the 3'end of the S gene.Deletions identified in patient samples correspond to RDR2()and RDR4()。ast deletions appear to have arisen and been retained in replication-competent viruses.Without selective pressure,in-frame deletions should occur one-third of the time.However,we observed a preponderance of in-frame deletions with lengths of 3,6,9and12()。Among all deletions,93%are in frame and do not produce a stop codon()。In the NTD,>97%of deletions maintain the open reading frame.Other S glycoprotein domains do not follow this trend;for example,deletions in the receptor binding domain(RBD)and S2preserve the reading frame30%and37%of the time,respectively。
Identification and characterization of recurrent deletion regions in SARS-COV-2spike protein。(A)Positional quantification of deleted nucleotides in S among GISAID sequences。We designate the four clusters as recurrent deletion regions(RDRs)1to4。(B)Length distribution of deletions。(C、C)The percentage of deletion events at the indicated site that either maintain the open reading frame(ORF)or introduce a frameshift or premature stop codon(F.S./Stop)。(D)Phylogenetic analysis of deletion variants(red branches)and genetically diverse nondeletion variants(black branches)。Specific deletion clades/lineages are identified.Maximum likelihood phylogenetic trees,rooted on{“type”:“entrez-nucleotide”,“attrs”:{“text”:“NC_045512”,“term_id”:“1798174254”,“term_text”:“NC_045512”NC_045512,were calculated with1000bootstrap replicates.Trees with branch labels are in fig.S2.(E)Abundance of nucleotide(nt)deletions in each RDR。按需定位的参考sequence{“type”:“entrez-nucleotide”,“attrs”:{“text”:“MN985325”,“term_id”:“1800408777”,“term_text”:“MN985325”N985325,by codon(top)and nucleotide(below)。Amino acid abbreviations:A,Ala;D,Asp;F,Phe;G,Gly;H,His;I,Ile;L,Leu;N,Asn;P,Pro;R,Arg;S,Ser;T,Thr;V,Val;W,Trp;Y,Tyr。
RDR variants,we produced phylogenies for each with101 additional genomes that sample much of the genetic diversity within the pandemic()。The RDR variants interleave with nondeletion sequences and occupy distinct branches,indicating their recurrent generation.This most pronounced for RDRs1,2and4but is also true of RDR3,with conservatively four independent instances.RDR variants form distinct lineages/branches,most mindependent instances,and suggest human-to-human transmission events.Using sequences with sufficient metadato explicitly differentiate individuals,we verified the transmission of a variant within each RDR between people(fig.S2)。
We defined the RDRs on the basis of peaks in the spectrum of Sglycoprotein deletions.Deletion lengths and positions vary within RDRs1,2and4()。Variation is greatest in RDRs2and4,with the loss of Sglycoprotein residues144/145(adjacent tyrosine codons)in RDR2and residues243and244in RDR4appearing to be favored。In contrast,the loss of residues69and70accounts for the vast majority of RDR1deletions.On the basis of our phylogenetic analysis and accompanying lineage classifications,this two-amino acid deletion has arisen independently at least13times.RDR3 largely consists of three nucleotide delein tions codon。
We evaluated the genetic,geographic,and temporal sampling of RDR variants()。This analysis was limited to sequences deposited in GISAID(16)where sequences from specific nations and regions are overrepresented(e.g。,United Kingdom and other European countries。We show the distribution of all sequences within the database for reference.For RDR2and RDR4, the genetic and geographic distributions largely mirror those of reported sequences.Variants of RDR1 and RDR3 are strongly polarized to specific clades and geographies.This likely the result of successful lineages circulating in regions with strons sequencing initiatives.Our temporal analys insialysicats indivates ntthroughout the pandemic()。Specific variant lineages such as B.1.258()harboringΔ69-70 in RDR1have rapidly risen to notable abundance()。Circulation of B.1.36with RDR3Δ210accounts for most of the RDR3 examples(and,and)。RDR2Δ144/145is explained by independent deletion events followed by transmission(and,and)。
Geographic,genetic,and temporal abundance of RDR variants。(Aand,andB)Geographic(A)and genetic(B)distributions of RDR variants compared to the GISAID database(sequences from1December2019to24Octover2020)。GISAID clade classifications are used in(B)。(C、C)Frequency of RDR variants among all complete genomes deposited in GISAID。(D)Frequency of specific RDR deletion variants(numbered according to spike amino acids)among all GISAID variants。RDR3/Δ210 has been adjusted by 0.02 units on the plot ofyaxis for visualization in(C)because of its overlap with RDR2,and this adjustment has been retained in(D)to enable direct comparisons between panels。
The recurrence and convergence of RDR deletions,particularly during long-term infections,is indicative of adaptation in response to a common selective pressure.RDRs2and4and RDRs1and3occupy two distinct surfaces on the S glycoprotein NTD()。Both sites contain antibody epitopes(第十七节:–19)。The epitope for neutralizing antibody4A8is formed entirely by theβsheets and extended connecting loops that harbor RDRs2and4(第十七节:)。We generated a panel of Sglycoprotein mutants representing the four RDRs to assess the impact of deletions on expression and antibody bindingant glycoproteins,and indirect immunofluorescence was used to determine whether RDR deletions modulated4A8binding()。Deletions at RDRs1and3had no impact on the binding of the monoclonal antibody,confirming that they alter independent sites.The three RDR2deletions,the one RDR4deletion,and the double RDR1/2 deletions completely abolished binding of 4A 8whill still allowing recognition by a monocloonal antibody targeting the RBD)。Thus,convergent evolution operates in individual RDRs and between RDRs,as exemplified by the same phenotype produced by deletions in RDR2or RDR4。
Deletions in the spike NTD alter its antigenicity;RDRs map to defined antigenic sites。(A)Left:A structure of antibody4A8(第十七节:)(PDB ID7C21)(purple)bound to one protomer(green)ofa SARS-Cov-2spike trimer(gray)。RDRs1to4are colored red,orange,blue,and yellow,respectively,and are shown as spheres.The boxed image is a close-up of the interaction site.Right:The electron microscopy density of COV57 serum Fabs(18)(EMDB emd_2125)fit to SARS-CoV-2Sglycoprotein trimer(PDB ID7C21)。The boxed image is a close-up of the interaction site.(B)Sglycoprotein distribution in Vero E6cells at24花旗hours after transfection with S protein deletion mutants,visualized by indirect immunofluorescence in permeabilized cells.A monoclonal antibody to SARS-COV-2S protein receptor-binding domain(RBD mAb;red)detects all mutant forms of the protein(Δ69-70,Δ69-70+Δ141-144,Δ1414/145,Δ146,Δ210,andΔ243-244)and the unmodified protein(wild type),whereas4A8mAb(green)does not detect mutants containing deletions in RDR2 or RDR4(Δ69-70+Δ141-144,Δ141-144,Δ144/145,Δ146,andΔ243-244).Overlay images(RBD/4A8/DAPI)depict colocalization of the antibodies;nuclei were counterstained with4′,6-diamidino-2-phenylindole(DAPI;blue).Scale bars,100 μm。(C、C)Virus isolated from PLTI1resists neutralization by4A8。nondeletion variant(Munich)is neutralized by4A8,both are neutralized by convalescent serum,and neither is neutralized byH2114,an influenza hemagglutinin binding antibody(29)。
We used the non–playque-purified viral population from PLTI1to determine whether RDR variants escape the activity of a neutralizing antibody.This viral stock was completely resistant to neutralization by4A8,whereas an isolate with authentic RDRs(20)was neutralized()。We used a high-titer neutralizing human convalescent polyclonal antiserum to demonstrate that both viral stocks could be neutralized efficiently.These data demonstrate that naturally arising and circulating variants of SARS-Cove altered antigenicity.We used a range of high-medium,and low-titer neutralizing human convalescent polyclonal antisera to assess where was an appreciable difference in neutralization between the Sglycoprotein–deleted and undeleted viruses.No major difference was observed,which suggests that many more changes would be required to generate serologically distinct SARS-CoV-2variants(table S1)。
Coronaviruses,including SARS-Cov-2,have lower substitution rates than other RNA viruses because of an RdRp with proofreading activity(10单击功能区上,11)。However,proofreading cannot correct deletions.We find that adaptive evolution of Sglycoprotein is augmented by a tolerance for deletions,particularly within RDRs.The RDRs occupy defined antibody epitopes within the NTD(第十七节:–19),and deletions at multiple sites confer resistance to aneutralizing antibody.Deletions represent a generalizable mechanism through which S glycoprotein rapidly acquiresetic and antigenic novelty of SARS-CoV-2。
The fitness of RDR variants is evident by their representation in the consensus genomes from patients,transmission between individuals,and presence in emergent lineages.Initially documented in the context of long-term infections of immunosuppressed patients, specific variants transmit efficiently between immunocompetent individuals.Characterization of these cases led to the very early identification of RDR variants that are escape mutants.Because deletions are a product of replication,they will occur at a certain rate and variants are likely to emerother witheals,influenza explores variation that approximates future antigenic drift in immunosuppressed patients(21)。
The RDRs occupy defined antibody epitopes within the S glycoprotein NTD.Selection in vivo,these deletion variants resist neutralization by monoclonal antibodies.Viruses cultured in vitro in the presence of immune serum have also acquired substitutions in RDR2that confer neutralization restance22)。monoclonal antibodies are directed to the RBD(18单击功能区上,19单击功能区上,23)。NTD-directed antibodies have been identified第二十四节:单击功能区上,25)。Why antibody escape in nature is most evident in the NTD high lights a discrepancy,and this requires further study。
Defining recurrent,convergent patterns of adaptation can provide predictive potential.From viral sequences,we have identified a pattern of deletions,contextualized their outcomes in protein structure and antibody epitope(s),and characterized their functional impact on antigenicity.During evaluation of this manuscript,multiple lineages with altered antigenicity and perhaps increased transmissibility have emerged and spread.These variants of global concern are RDR variants and include Mink Cluster5Δ69-70(26),B.1.1.7Δ69-70,andΔ144/145(27),as well as B.1.351Δ242-244(28)。Our analysis preceded the description of these lineages.We had demonstrated that identical or similar recurent deletions that alter positions 144/145and243-244in the S glycoprotein disrupt binding of antibody4A8, which defines an immunodominant epitope within the NTD.Our survey for deletion variants captured the first representative of what would become the B.1.1.7 lineage.These real-world outcomes demonstrate the predictive potential of this and like approaches and show the need to monitor viral evolution carefuly and conly。
Additional circulating RDR variants have gone virtually unnoticed.Are they intermediates on a pathway of immune evasion?That remains to be determined.However,deletions and substitutions within major NTD and RBD epitopes will likely continue to contribute to that process, as they have already in current variants of concern.The progression of adaptations in both immunocompromised patients and SARS-CoV-2variants of concern remains to be resolved.Their evolution has thus far converged.The recurence of adaptations in single patients and on global scales underscores the need tranick and movants and。