use prefix []or [-]not [+]and [=]has feature [!]exclude feature ie. 'interleukin-6 -animal +phenotypic =protein !tumor'

Displaying 10 papers, 35 pages, start at 1, 663 Hits
106 section matches

Abstract

in the P1 region and non-structural proteins in the P2 (2A-2C) and P3 regions (3A-3D) following proteolytic cleavage ( Figure 1B) . The viral capsid proteins VP1, VP2 and VP3 are displayed on the external structures of the EV-A71 viral particle whereas VP4 is found within the internal structures of the capsid [21] .
RNA viruses are known to replicate by low fidelity polymerases and have high mutation rates whereby the resulting virus population tends to exist as a distribution of mutants. In this review, we aim to explore how genetic events such as spontaneous mutations could alter the genomic organization of RNA viruses in such a way that they impact virus replications and plaque morphology. The phenomenon of quasispecies within a viral population is also discussed to reflect virulence and its implications for RNA viruses. An understanding of how such events occur will provide further evidence about whether there are molecular determinants for plaque morphology of RNA viruses or whether different plaque phenotypes arise due to the presence of quasispecies within a population. Ultimately this review gives an insight into whether the intrinsically high error rates due to the low fidelity of RNA polymerases is responsible for the variation in plaque morphology and diversity in virulence. This can be a useful tool in characterizing mechanisms that facilitate virus adaptation and evolution.

Poliovirus (PV)

Poliovirus is found within the Human Enterovirus C species of the Picornaviridae family and can be classified into three distinct serotypes (1, 2 and 3). Most poliovirus infections cause an asymptomatic incubation period followed by a minor illness characterized by fever, headache and sore throat which mainly affects children. However, PV infections can lead to paralytic poliomyelitis which can result in death. Following the WHO 1988 polio eradication program, the number of poliomyelitis has been reduced by 99% worldwide but a small number of countries still have sporadic outbreaks of polio [22] .
Poliovirus is found within the Human Enterovirus C species of the Picornaviridae family and can be classified into three distinct serotypes (1, 2 and 3). Most poliovirus infections cause an asymptomatic incubation period followed by a minor illness characterized by fever, headache and sore throat which mainly affects children. However, PV infections can lead to paralytic poliomyelitis which can result in death. Following the WHO 1988 polio eradication program, the number of poliomyelitis has been reduced by 99% worldwide but a small number of countries still have sporadic outbreaks of polio [22] .

Dengue Virus (DENV)

Dengue is a re-emerging arboviral infection transmitted by Aedes mosquitoes that infect up to 390 million people worldwide annually, of which 100 million infections are symptomatic [34] . The global incidence of dengue has grown dramatically in recent decades and about half of the world's population is now -at risk [35] . In particular, it represents a growing public health problem in many tropical and sub-tropical countries, mostly in urban and semi-urban areas. The viral etiology of dengue is characterized by biphasic fever, headache, pain in various parts of the body, prostration, rash, lymphadenopathy and leukopenia that affects young children and adults [36] . However, dengue infection can progress to severe dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). Severe dengue affects most Asian and Latin American countries and has become a leading cause of hospitalizations and deaths among children and adults in these regions. It is a lifethreatening disease with 500,000 cases being admitted to hospitals and 25,000 deaths yearly [37] .
A previous study compared a DENV vaccine candidate strain, DENV-3 PGMK30FRhL3 (PGMK30), which produced acute febrile illnesses with another clinically safe DENV vaccine
Dengue is a re-emerging arboviral infection transmitted by Aedes mosquitoes that infect up to 390 million people worldwide annually, of which 100 million infections are symptomatic [34] . The global incidence of dengue has grown dramatically in recent decades and about half of the world's population is now-at risk [35] . In particular, it represents a growing public health problem in many tropical and sub-tropical countries, mostly in urban and semi-urban areas. The viral etiology of dengue is characterized by biphasic fever, headache, pain in various parts of the body, prostration, rash, lymphadenopathy and leukopenia that affects young children and adults [36] . However, dengue infection can progress to severe dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). Severe dengue affects most Asian and Latin American countries and has become a leading cause of hospitalizations and deaths among children and adults in these regions. It is a life-threatening disease with 500,000 cases being admitted to hospitals and 25,000 deaths yearly [37] .
A previous study compared a DENV vaccine candidate strain, DENV-3 PGMK30FRhL3 (PGMK30), which produced acute febrile illnesses with another clinically safe DENV vaccine candidate DENV-2PDK53 (PDK53). Using in vitro and in vivo approaches, the infectivity of the two vaccine strains was investigated to assess the molecular determinants of plaque size. It was revealed that the small plaque displayed by the PGMK30 strain in BHK-21 cells was due to its reduced in vitro growth rate. On the other hand, the PDK53 strain which produced the small plaques was observed to grow rapidly but was unable to evade antiviral responses which restricted its ability to spread. The slow growth rates of the two strains were suggested to be due to two different key mechanisms-the growth of PDK53 appeared to be modulated by antiviral responses while PGMK30 was slow to spread to surrounding cells but was able to evade immune detection. It was hypothesized that if the plaque size of PDK53 was hindered by the antiviral response, interfering with its activation by silencing pSTAT1 would be expected to alter the plaque characteristics of the PDK53 but not PGMK30 or the wild-type. In line with the hypothesis, a marked increase in pSTAT1 was shown in cells surrounding the foci of PDK53 infection but no increase was detected in PGMK30 and the wild-type. At least two different mechanisms dictate the plaque phenotype and elucidating the exact mechanism of how it caused the formation of small plaque size is an efficient way to choose future live-attenuated vaccine strains for clinical development [38] .

West Nile Virus (WNV)

West Nile Virus was first characterized in 1937 in the West Nile district of Uganda and was taxonomically placed in the genus Flavivirus within the Flaviviridae family. The virus later appeared in New York in 1999, where it caused 59 hospitalized infections and 7 deaths before its spread to other parts of the USA between 1999-2001 [50] . WNV survives naturally in a mosquito-bird-mosquito transmission cycle involving the Culex sp. mosquitoes [51] .

Chikungunya Virus (CHIKV)

Comparative sequence alignment of large and small plaque variants of CHIKV primary isolates from the Comoros Island showed two amino acid differences in the nsp2 protease and the nsp3 protein.

Ebola Virus (EboV)

In addition, nucleotide mutations at positions A82V and P382T were present only in the Ebola virus glycoprotein from the new Ebola epidemic 2014 isolates. Mutation at position W291R was also found only in the 2014 isolate but at a very low frequency of occurrence [87] . Having a large number of mutations from previous outbreaks present with more than 90% frequency in the sequence of the new Ebola epidemic isolates as well as the emergence of new mutations (A82V, P382T and W291R) indicated the presence of viral quasispecies in the population. The high mutation rates found in a RNA quasispecies increased the probability of escape mutations and this could explain the escape of the 2014 viral isolates from neutralizing antibodies elicited by the old Ebola epidemic isolates. The structural analysis of the Ebola virus revealed the strong contribution of these residues in the three-
In addition, nucleotide mutations at positions A82V and P382T were present only in the Ebola virus glycoprotein from the new Ebola epidemic 2014 isolates. Mutation at position W291R was also found only in the 2014 isolate but at a very low frequency of occurrence [87] . Having a large number of mutations from previous outbreaks present with more than 90% frequency in the sequence of the new Ebola epidemic isolates as well as the emergence of new mutations (A82V, P382T and W291R) indicated the presence of viral quasispecies in the population. The high mutation rates found in a RNA quasispecies increased the probability of escape mutations and this could explain the escape of the 2014 viral isolates from neutralizing antibodies elicited by the old Ebola epidemic isolates. The structural analysis of the Ebola virus revealed the strong contribution of these residues in the three-dimensional rearrangement of the glycoprotein and they played an important role in the re-emergence of the new epidemic Ebola isolates in 2014.
Several studies of the Ebola virus glycoprotein showed that the two mutations at positions A82V and T544I might have caused an increase in viral infectivity in humans [88] [89] [90] [91] [92] [93] . These two mutations reduced the stability of the pre-fusion conformation of the EBOV glycoprotein. Kurosaki et al. investigated the viral pseudotyping of EBOV glycoprotein derivatives in 10 cell lines from nine mammalian species and the infectivity of each pseudotype. The data showed that isoleucine at position 544 mediated membrane fusion and increased the infectivity of the virus in all host species, whereas valine at position 82 modulated viral infectivity but was dependent on the virus and the host. Analysis via structural modeling revealed that the isoleucine 544 changed the viral fusion. However, the valine 82 residue influenced the interaction with the viral entry receptor, Niemann-Pick C1 [94] . The frequency of these two amino acid substitutions (A82V and T544I) varied between different Ebolavirus species.

Middle East Respiratory Syndrome Coronavirus (MERS-CoV)

Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, singlestranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5′-UTR and promoted translational shutoff against reporter plasmids with or without 5′-UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an aminoterminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ).
Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, single-stranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5 -UTR and promoted translational shutoff against reporter plasmids with or without 5 -UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an amino-terminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ).
Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, singlestranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5′-UTR and promoted translational shutoff against reporter plasmids with or without 5′-UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an aminoterminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ). Lu et al. isolated a diverse population comprising the wild-type and a variant carrying a deletion of 530 nucleotides in the spike glycoprotein gene from the serum of a 75-year-old patient in Taif, Saudi Arabia. The patient subsequently died. Analysis of the MERS-CoV sequence showed an out of frame deletion which led to the loss a large part of the S2 subunit. It contained all the major structures of the membrane fusion in the S2 subunit preceding the early stop codon [103] and this also included the proposed fusion peptide (949-970 aa) [104] . The deletion resulted in the production of a shortened protein bearing only 801 amino acids. In the cell-free serum sample of the patient, mutant genomes with the S530∆ were abundant with an estimated ratio of 4:1 deleted to intact sequence reads. The spike gene deletion would cause the production of a defective virus which was incapable of causing infections or with a lowered rate of infection. Losing the S2 subunit caused a disruption in the membrane holding the spike protein and halted the fusion of the virus to the host. However, in the case of the mutant bearing the S530∆, the mutation helped to sustain the wild-type MERS-CoV infection by producing a free S1 subunit with a "sticky" hydrophobic tail and the additional disulfide bonds caused the aggregation and mis-folding of proteins. In addition, the mutated S530∆ could form steady trimer complexes that retained biding affinity for the dipeptidyl peptidase 4 (DPP4) and acted as a decoy such that the spike-specific MERS-CoV neutralizing antibodies were blocked.

Newcastle Disease Virus (NDV)

The NDV genome codes for seven major viral proteins in the order of 5 -N-P(V)-M-F-HN-L-3 . In NDV, the hemagglutinin neuraminidase (HN) and fusion (F) glycoproteins are presented on the surface of the virion envelope and contribute to viral infection [127] . The fusion protein is expressed as an inactive precursor (F0) prior to activation by proteolytic cleavage. The cleavage of F0 is crucial for infectivity and works as a key virulence indicator for certain viruses such as virulent strains of avian paramyxovirus 1 (NDV). The F0 cleavage site contains several basic residues which cause the cleavage of the F protein by furin, an endopeptidase present in the trans-Golgi network [110] .

Respiratory Syncytial Virus (RSV)

Deplanche et al. studied the BRSV and evaluated the genetic stability of BRSV in cell cultures by analyzing the consensus nucleotide sequences of the highly variable glycoprotein G. The BRSV strain W2-00131 was isolated from a calf with respiratory distress syndrome (BAL-T) and was further propagated in bovine turbinate (BT) cells. The genomic region of the BRSV that encodes for the highly variable glycoprotein G showed constant genetic stability for the three variants (3Cp3, 3Cp9 and 3Cp10) after ten continuous passages in BT cells and after in vivo studies [137] . This led to further analysis of the quasispecies population derived from this field isolate. Genomic analysis of more mutants showed that the G-coding region displayed significant variability with mutations ranging from 6.8 × 10 −4 to 10.1 × 10 −4 substitutions per nucleotide in vitro and in vivo.
The majority of the mutations reported previously were present in the W2-00131 RNA populations. A large dominance of non-synonymous over synonymous mutations was observed in all BRSV mutants. The non-synonymous mutations mapped preferentially within the two variable antigenic regions of the ectodomain or close to the highly conserved domain in the G protein [137] . These results suggested that BRSV populations might have evolved as complex and dynamic mutant swarms, despite apparent genetic stability.

Influenza Virus (IV)

A study investigated the impact of antigenic proximity, genomic substitutions, quasispecies, diversity and reassortment in order to understand the molecular evolution of the influenza A (H3N2) isolated directly from clinical samples. Of the 155/176 whole genomes analyzed, several amino acid substitutions were found to substantially affect the severity of the infection caused by the clade specific viruses. Within the sample, 121 viruses belonged to the genetic clade 3c.2a.1 and eight belonged to 3c.2a2, twenty-four belonged to 3c.2a3, one belonged to 3c.2a4 and one belonged to a different clade 3c.3a. Many distinct substitutions spanning across the whole influenza proteome, HA, NA and non-structural protein 1 were found to be responsible for causing mild and severe disease. Interestingly, two substitutions, V261I and K196E, were found in the NA and the NS1, respectively. These two mutations were found to be particularly significant as they showed the distinction between the strains causing mild and severe infections. Analysis of the clinical isolates showed a difference in a single amino acid residue, 160 K within the HA, whereby 14 cases of glycosylation loss was observed within the quasispecies population linked to severity of infection. Moreover, the degree of diversity within the quasispecies population was reported to be elevated in severe cases when compared to mild ones [143] .

Introduction

With their diverse differences in size, structure, genome organization and replication strategies, RNA viruses are recognized as being highly mutatable [1] . Their high mutation rates make it very difficult for therapeutic interventions to work effectively and very often they develop resistance to antiviral drugs and antibodies elicited by vaccines [2, 3] . This poses a real threat to how emerging infectious agents could be prevented or treated [4] . The success of the evolution of RNA viruses arises from their capacity to utilize varying replication approaches and to adapt to a wide range of biological niches faced during viral spread in the host. One of the factors affecting the emergence or re-emergence of infectious diseases is the genetics of the infectious agents [1] .
Mutants with varying levels of infectivity generated from a mutated gene occurred regularly in a virus population due to the high mutation rates [6] . The error-prone replication ability of RNA viruses and the shorter generation times can be used to explain the variations in evolution rates between DNA and RNA viruses. While mutation rates for DNA genomes have been estimated to be between 10 −7 and 10 −11 per base pair per replication [11] , the RNA dependent RNA polymerase (RdRp) showed typically low fidelity whereby the mutation rate is of roughly 10 −4 mutations per nucleotide copied, which is greater than that of almost all DNA viruses [7, 12, 13] . This characteristic of the RNA polymerase in RNA viruses led to the generation of diverse offspring with different genotypes in shorter generation times.

Picornaviridae

Viruses of the family Picornaviridae can be classified into genera such as Enterovirus, Parechovirus, Aphotovirus and others. The virion is made up of a non-enveloped capsid of 30 nm surrounding a core positive stranded ssRNA genome ( Figure 1A ) [19] . The genome is approximately 7 kb in size and possesses a single long ORF flanked on both ends by the 5 -non-translated region (5 -NTR) and the 3 -non-translated region [20] . The 5 -NTR has an internal ribosome entry site (IRES) which controls cap-independent translation. The ORF comprising 6579 nucleotides can be classified into three polyprotein regions, namely, P1, P2 and P3. They encode for structural proteins (VP1 to VP4) in the P1 region and non-structural proteins in the P2 (2A-2C) and P3 regions (3A-3D) following proteolytic cleavage ( Figure 1B) . The viral capsid proteins VP1, VP2 and VP3 are displayed on the external structures of the EV-A71 viral particle whereas VP4 is found within the internal structures of the capsid [21] . (7.4 Kb) . The Open Reading Frame (ORF) contains the structural viral protein P1 which is cleaved to yield VP1, VP2, VP3 and VP4 and non-structural viral proteins P2 (cleaved to yield 2A, 2B and 2C) and P3 (cleaved to yield 3A, 3B, 3C and 3D). The 3′-NTR end of the genome contains the poly (A) tail.

Poliovirus (PV)

In order to verify whether limiting the genomic diversity of a viral population has any effect on its evolution, a study was conducted on a strain of poliovirus with a substitution of Glycine 64 to Serine (G64S) in the RNA polymerase of the virus. The outcome of one-step growth curves and northern blot analysis of genomic RNA synthesis confirmed that the G64S mutation showed greater fidelity without a considerable reduction in the overall efficiency of RNA replication. The study hypothesized that having greater heterogeneity within a viral population allows it to adapt better to changing environments encountered during an infection and indeed the finding showed that boosting the fidelity of poliovirus replication had a noticeable effect on viral adaptation and pathogenicity. The poliovirus strain with a mutated RNA polymerase carrying an altered amino acid residue (G64S) was observed to replicate similarly to the wild-type counterpart but produced lower genomic diversity and subsequently was incapable of adapting well under detrimental growth situations. This study showed that the diversity of the quasispecies was associated with increased virulence rather than selection of single adaptive mutations. Alongside previous observations, these findings indicated a rise in the error rate over the tolerable error threshold which induced viral extinction, suggesting that the rate of viral mutation was precisely modulated and most likely had been finely tuned during the evolution of the virus [7] . It was further revealed that curbing the diversity of a RNA viral population by raising the fidelity of the RNA polymerase had a direct effect on its pathogenicity and capacity of the viruses to escape antiviral immunity [23] . These findings support the fact that RNA viruses have developed minimal viral polymerase fidelity to facilitate quick evolution and adaptation to novel situations [24] .

Enterovirus 71 (EV-A71)

The course of evolution through which EV-A71 evolves to escape the central nervous system (CNS) was investigated by complete sequencing and haplotype analysis of the strains isolated from the digestive system and the CNS. A novel bottleneck selection was revealed in various environments such as the respiratory system and the central nervous system throughout the dissemination of EV-A71 in the host. Consequently, a dominant haplotype resulting from the bottleneck effect caused a change from viruses harboring VP1-3D to VP1-31G where the amino acid 31 was a favorable site of selection among the circulating EV-A71 sub-genotype C2. VP1-31G was present at elevated levels amongst the population of mutants of EV-A71 in the throat swabs of subjects with severe EV-A71 infections. Furthermore, in vitro studies showed that VP1-31D virus isolates had higher infectivity, fitness and virion stability, which sustained the virus infections in the digestive system. Speculations were that such factors benefitted the virus in gaining added viral adaptation and subsequently enabled viral spread to more tissues. These beneficial abilities could also justify the reduced number of VP1-31D viruses located in the brain following positive selection. The VP1-31G viruses presenting the major haplotype in the central nervous system displayed increased viral fitness and growth rates in neuronal cells. This implied that the VP1-31G mutations aided the spread of the mutant virus in the brain which resulted in serious neurological complications in patients. It was speculated that the fluctuating degree of tissue tropism of EV-A71 at diverse inoculation sites resulted in the bottleneck effect of the viral population having a mutant spectrum. Hence, the adaptive VP1-31G haplotype became dominant in neuronal tissues and once the infection was achieved, VP1-31G viruses expedited bottleneck selection and propagation into the skin and CNS. Among the three minor haplotypes (C to E) which co-existed in various tissues, the minor haplotype C was isolated in the intestinal mucosa and throat swab specimens. The minor haplotype D was isolated from specimens obtained from the respiratory and digestive systems. However, the minor haplotype E appeared in throat swabs and the basal ganglia but not the intestinal mucosa, hence, suggesting that the intestinal mucosa is the initial replication site of the EV-A71. Collectively, these data showed that the EV-A71 quasispecies utilized the dynamic proportion of varying haplotype populations to co-exist, sustained the ability of the population to adapt and enabled the propagation in different tissues. Lastly, the study concluded that the selection of haplotype(s) might be a driving factor in viral dissemination and severity of infections in humans as well as the virulence in EV-A71 infected patients [30] .

Flaviviridae

Of the Flaviviridae family (genera Flavivirus, Pestivirus, Pegivirus and Hepacivirus), there are 89 animal viruses with a small, positive-sense, single stranded RNA genome [31] . The virions are 40-60 nm in diameter, spherical in shape and contain a lipid envelope (Figure 2A ). The majority of these viruses are arthropod-borne and transmitted via infected mosquitoes and ticks. They are considered as emerging and re-emerging pathogens such as dengue virus (DENV), West Nile Virus (WNV), Zika Virus (ZIKV) and these viruses pose a global threat to public health by causing significant mortality [32] . The flaviviral genome is approximately 11 kb and has a single open reading frame (ORF), which is flanked by untranslated regions (5 and 3 NTR). The ORF encodes three structural proteins (C, M and E) and 7 non-structural proteins (NS). The non-structural proteins include large, highly conserved proteins NS1, NS3 and NS5 and four small hydrophobic proteins NS2A, NS2B and NS4A and NS4B ( Figure 2B ) [33] . quasispecies utilized the dynamic proportion of varying haplotype populations to co-exist, sustained the ability of the population to adapt and enabled the propagation in different tissues. Lastly, the study concluded that the selection of haplotype(s) might be a driving factor in viral dissemination and severity of infections in humans as well as the virulence in EV-A71 infected patients [30] .
Of the Flaviviridae family (genera Flavivirus, Pestivirus, Pegivirus and Hepacivirus), there are 89 animal viruses with a small, positive-sense, single stranded RNA genome [31] . The virions are 40-60 nm in diameter, spherical in shape and contain a lipid envelope (Figure 2A ). The majority of these viruses are arthropod-borne and transmitted via infected mosquitoes and ticks. They are considered as emerging and re-emerging pathogens such as dengue virus (DENV), West Nile Virus (WNV), Zika Virus (ZIKV) and these viruses pose a global threat to public health by causing significant mortality [32] . The flaviviral genome is approximately 11kb and has a single open reading frame (ORF), which is flanked by untranslated regions (5′ and 3′ NTR). The ORF encodes three structural proteins (C, M and E) and 7 non-structural proteins (NS). The non-structural proteins include large, highly conserved proteins NS1, NS3 and NS5 and four small hydrophobic proteins NS2A, NS2B and NS4A and NS4B ( Figure 2B ) [33] .

Dengue Virus (DENV)

From another perspective, differences in the envelope (E) gene sequence was investigated using the plasma samples of six DENV infected patients. The first account of viral quasispecies of DENV in vivo was reported using clonal sequencing analysis whereby the simultaneous occurrence of diverse variant genomes was observed. The degree of genetic diversity was revealed to fluctuate among patients with the mean proportion being 1.67%. Moreover, out of 10 clones derived from dengue infected plasma, 33 nucleotide substitutions were detected, of which 30 were non-synonymous mutations. Of particular interest, mutations at amino acid residues 290 and 301 resulted in the presence of two stop codons which indicated that genome-defective dengue viruses (5.8%) were also present within the quasispecies population. It was hypothesized that this might have significant impact on the pathogenesis of the dengue virus [39] . Recently, Parameswaran et al. profiled the intra-host viral diversity of samples from 77 patients via whole-genome amplifications of the entire coding region of the DENV-3 genome. A significant difference in the viral makeup between naïve subjects and patients with DENV-3 immunity revealed that the immune repertoire of the host is responsible for the degree of diversity exhibited by the viral population. Subsequently, identification of the hotspots responsible for the intra-host diversity revealed that few spots were crucial for intra-host diversity. The major hotspots for diversity were revealed in more than 59% of the samples at three codon coordinates-amino acid residues 100 and 101 in the M protein and residue 315 in the AB loop of the E Domain III. The residue E 315 was speculated to have arisen as an immune escape variant in response to the pressure exerted by the immune defense mechanism. These findings highlighted the importance of host-specific selection pressures in the evolution of DENV-3 viral population within the host and this could eventually lead to the intelligent design of a vaccine candidate identified from the prevalent escape variants such as those bearing the E 315 [40] . It was reported that within the quasispecies population, amino acid substitutions occurred on the surface of the E protein which was involved in interactions with other oligomers, antibodies and host cell receptors. In particular, two amino acid substitutions at positions E452 and E455 were mapped to the E protein transmembrane domain, E450 to E472, which functioned as the membrane anchor for E protein. Intra-host quasispecies analysis using the E gene sequences also identified several amino acids on the surface of the E protein which altered the properties of the virus. The conformational rearrangements that led to the fusion of the virus and the host cell membrane was altered. The amino acids detected in the quasispecies consensus sequence were observed to be less frequent in the E proteins from patients suffering from mild disease than from patients with severe onset of dengue infection. Thus, the quasispecies might harbor specific variants that are crucial for the pathogenesis of the disease [41] . Understanding the significant molecular determinant of pathogenesis through the analysis of quasispecies could lead to the rational design of a DENV vaccine.

Zika Virus (ZIKV)

Zika Virus was first discovered in 1947 when it was isolated from Aedes Africanus mosquitoes [42] . It belongs to the Flavivirus genus within the Flaviviridae family. Zika infections have been reported in Egypt [43] , East Africa [44] , India [45] , Thailand, Vietnam [46] , Philippines and Malaysia [47] .
An Asian/American lineage ZIKA virus (ZIKV) formed 2 types of plaques-large and small. The large plaque variant was observed to have faster growth kinetics compared to the small plaque variant. Sequencing of the plaque variants showed that the large plaque variant had a guanine nucleotide at position 796 (230 Gln ) while the small plaque clone had an adenine at the same position. A recombinant clone carrying the G796A mutation was produced using an infectious molecular clone of the ZIKV MR766 strain. The plaque size produced by the recombinant clone was smaller when compared to the parental strain and its growth rate was significantly reduced in Vero cells. In vivo studies demonstrated that the virulence of the MR766 strain in IFNAR1 mice had decreased, showing that the mutation at position 230 in the -M protein is a molecular determinant of plaque morphology, growth property and virulence in mice [48] .
The quasispecies distribution of a ZIKV strain (ZIKV-SL1602) isolated from a 29-year-old female traveler was investigated. Data obtained from single molecule real time (SMRT) sequencing were aligned to a consensus sequence and 24,815,877 nucleotide sequences were read. Phylogenetic analysis was then performed and each nucleotide was analyzed to characterize the quasispecies composition of this clinical isolate. For each nucleotide position, the frequency of occurrence of each of the bases was determined and 3375 single-nucleotide variants (SNV) were detected. Interestingly, four variants of the quasispecies population were found to be present at a level of more than 1% of the total population. Mutations in the E protein accounted for 4.1% of the variants and other mutations in the non-structural region-8.2% in the NS2, 1.6% in the NS1 and 1.4% in the NS5 were detected. The phylogenetic data analysis also disclosed that ZIKV-SL1602 clustered within the Asian lineage in close proximity to the WNVs currently circulating in America. Every South American isolate was found to share similar ancestry with the French Polynesian isolates. Hence, it can be inferred that the current circulating South American clade stems from the island of French Polynesia [49] .

West Nile Virus (WNV)

A small-plaque (SP) variant was picked from a mutant population of WNV isolated from an American crow in New York in 2000. Characterization of this variant in mammalian, avian and mosquito cell lines led to the discovery that the SP variant contained four nucleotides in its genome that differed from the wild-type genome. Two nucleotide changes led to non-synonymous mutations where there was a P54S change in the prM protein and a V61A change in the NS2A protein. Further analysis of the mutations revealed that deletion at the cleavage site of the prM site did not affect virus replication and its release from mammalian BHK cells. However, the progeny of this virus was no longer able to infect BHK cells. A mutation in the prM region of the TBEV was also reported to cause decreased secretion of virus particles with no effect on protein folding. Lower neurovirulence and neuroinvasiveness were reported when mutation A30P occurred in the NS2A region of the isolate. Further sequencing of the isolate showed that most of the small plaque clones initially isolated reverted back to their wild-type sequence at position 625 in the prM region. Remaining isolates reverted at position 3707 in the NS2A region. These findings suggested that the mutation present in the prM region could be responsible for the phenotype of the small plaque. It is probable that the mutation in the NS2A region was responsible for the determination of the plaque size as the mutation in the prM region was sufficient to revert the isolate to the wild-type phenotype [52] .
The genetic diversity of WNV in the avian host was also investigated using next-generation sequencing. The aim was to explore whether the genetically homogeneous cloned virus would go through genetic diversification after passages in young SPF chickens and wild juvenile carrion crows. Data collected revealed that the WNV population showed significant heterogeneity diverging from the quasispecies structure of the initial viral inoculum in both animal models. However, in-depth analysis enacted a comparison between the infection model (SPF chicken and wild juvenile carrion crows) to assess the variations in genetic diversity. It was demonstrated that the WNV genetic diversifications varied significantly from the inoculum in crows with 18 genetic variants but exhibited suboptimal levels of diversifications among the chickens with only 3 single nucleotide variants (SNV) being detected. Hence, natural WNV-susceptible avian hosts could provide a selective setting and contributed to genetic diversifications. NGS technologies have enabled the analysis of WNV quasispecies dynamics, leading to a better understanding of the virus and shed some light on its mechanism of pathogenicity [53] .

Togaviridae

Viruses from the Togaviridae family can be further classified into the genus Alphavirus and Rubivirus. Alphaviruses are anthropod-borne viruses [54] and they formed icosahedral particles of about 70 nm with a lipid envelope ( Figure 3A ) [55] . The spikes of the virion are made up of E1 and E2 glycoproteins organized in a T4 icosahedral lattice of 80 trimers. The alphavirus virion carries a positive single stranded RNA of approximately 11-12 kb as the genetic material [54] . The RNA has a 5 -methylated nucleotide cap and a polyadenylated 3 end. The viral genome is translated into three structural proteins (CP, E2 and E1) and four non-structural proteins (NSP1, NSP2, NSP3 and NSP4) ( Figure 3B ). The genetic diversity of WNV in the avian host was also investigated using next-generation sequencing. The aim was to explore whether the genetically homogeneous cloned virus would go through genetic diversification after passages in young SPF chickens and wild juvenile carrion crows. Data collected revealed that the WNV population showed significant heterogeneity diverging from the quasispecies structure of the initial viral inoculum in both animal models. However, in-depth analysis enacted a comparison between the infection model (SPF chicken and wild juvenile carrion crows) to assess the variations in genetic diversity. It was demonstrated that the WNV genetic diversifications varied significantly from the inoculum in crows with 18 genetic variants but exhibited suboptimal levels of diversifications among the chickens with only 3 single nucleotide variants (SNV) being detected. Hence, natural WNV-susceptible avian hosts could provide a selective setting and contributed to genetic diversifications. NGS technologies have enabled the analysis of WNV quasispecies dynamics, leading to a better understanding of the virus and shed some light on its mechanism of pathogenicity [53] .
Viruses from the Togaviridae family can be further classified into the genus Alphavirus and Rubivirus. Alphaviruses are anthropod-borne viruses [54] and they formed icosahedral particles of about 70 nm with a lipid envelope ( Figure 3A ) [55] . The spikes of the virion are made up of E1 and E2 glycoproteins organized in a T4 icosahedral lattice of 80 trimers. The alphavirus virion carries a positive single stranded RNA of approximately 11-12kb as the genetic material [54] . The RNA has a 5′-methylated nucleotide cap and a polyadenylated 3′ end. The viral genome is translated into three structural proteins (CP, E2 and E1) and four non-structural proteins (NSP1, NSP2, NSP3 and NSP4) ( Figure 3B ).

Chikungunya Virus (CHIKV)

Chikungunya virus (CHIKV) is an arthropod-borne virus transmitted to humans by mosquitoes and has caused significant human morbidity in many parts of the world [56] . Chikungunya virus causes an acute febrile illness with high fever, severe joint pain, polyarthralgia, myalgia, maculopapular rash and edema. While the fever and rash are self-limiting and are able to resolve within a few days, arthralgia can be prolonged from months to years [57, 58] . Some cases of CHIKV disease were associated with neurological complications [59] . The virus has been associated with frequent outbreaks in tropical countries of Africa and Southeast Asia and also in temperate zones around the world. A major outbreak in 2013 affected several countries of the Americas, involving approximately 2 million people [60] .
The original geographical distributions of the CHIKV indicated that there are 3 distinct groups and phylogenetic analysis confirmed the West African, the Asian and the East/Central/South African (ECSA) genotypes. The ECSA virus with an A226V substitution in the E2 envelope gene had caused multiple massive outbreaks in various regions starting in the La Reunion Islands in 2005. The virus then spread to Asia and caused over a million cases in the following years [61] [62] [63] . The Asian genotype started invading the Americas in 2013, causing massive outbreaks in various countries in Central, South America and the Caribbean. The ECSA virus is now the dominant virus all over Africa and Asia and the Asian genotype is the dominant virus in the Americas [62, [64] [65] [66] [67] . Even though a number of CHIKV vaccine candidates are being developed, no effective vaccine is currently available for clinical use [68] .
Similar to other RNA viruses with extensive mutation rates, CHIKV produces populations of genetically diverse genomes within a host. Up to date, the role of several of these mutations and the influence of disease severity in vertebrates and transmission by mosquitoes have been studied. Riemersma et al. investigated the intra-host genetic diversity of high and low-fidelity CHIKV variants using murine models. Both the high and low fidelity variants were expected to lower the virulence of CHIKV as compared to the wild-type (CHIKV-WT). However, the high-fidelity variant caused more acute levels of infection such that the onset of the swelling in the footpad exhibited earlier than the CHIKV-WT at 3-and 4-days post-infection (dpi). Moreover, the high-fidelity CHIKV (CHIKV-HiFi) infected mice also displayed higher peaks of disease severity when compared to the CHIKV-WT 7 dpi. This enhanced diversification was subsequently reproduced after serial in vitro passages. In high-fidelity variants, nsp2 G641D and nsp4 C483Y mutations increased CHIKV virulence in the adult mice. The NGS data showed that the CHIKV-HiFi variant produced more genetically diverse populations than the CHIKV-WT in mice. However, the low-fidelity variant gave rise to reduced rates of replication and disease [69] .
Plaque size is a common feature of viral characterization. Primary isolates of CHIKV containing variants with different plaque sizes were previously reported [70, 71] . Viral variants with different plaque morphology such as small and large plaques had been reported in the 2005 CHIKV outbreak isolates [72] . It is curious how small plaque variants with lower fitness were maintained as a natural viral quasispecies. Plausible explanations indicated that the plaque size might not represent the in vivo growth conditions and that cooperation among variants with different plaque sizes might be required for optimal in vivo replication and transmission fitness. Jaimipak et al. reasoned that if the plaque size did not represent the in vivo growth conditions and the small plaque variants had a similar fitness as the large plaque variants, they would be similarly virulent in a murine model. In order to explore the virulence of the small plaque CHIKV variant in vivo, the pathogenicity of the purified small plaque variant of the CHIKV virus isolated from the sera of the patient in Phang-nga, Thailand in 2009, was tested in neonatal mice [73] . The small plaque variant (CHK-S) showed stable homogenous small plaques after 4 plaque purifications. It also grew slower and produced lower titers when compared with the wild-type virus. After 21 days of infection in the suckling mice with the wild-type and CHK-S variants (injected 103 pfu/mouse), mice which received the CHK-S virus showed 98% survival rate while only 74% of mice survived after infection with the wild-type virus. The small plaque variant of CHIKV obtained by plaque purifications exhibited decreased virulence that makes it appropriate to serve as candidates for live-attenuated vaccine development. The CHIKV variant with the small plaque size formed a major subpopulation in the CHIKV primary isolate during multiple passages in C6/36 cells. This is in line with the reduction of virulence in the suckling mice and indicated that the small plaque variant had reduced in vivo fitness. This suggested that replication cycles in mosquito vectors might play an important role in maintaining the small plaque variant in natural infections. The persistence of the small plaque variant CHK-S clone after multiple passages in C6/36 cells showed that the CHK-S variant might be able to outcompete the large plaque variant when infecting the same cell by an unknown mechanism. Alternatively, small and large plaque variants might cooperate in a way that provided a selective advantage for maintaining the small plaque variant [73] .
However, these particular amino acid residues were observed in other CHIKV isolates previously [71] and the significance of these specific sequences for the small plaque phenotype becomes uncertain. Comparison of the entire genome of the CHK-S with other small plaque variants could provide a better understanding of the small plaque variants. In addition, investigation of reverse genetics can provide further insights into the role of specific mutations in the virulence of the CHK-S variant [73] .

Filoviridae

Viruses found within the Filoviridae family can be further classified into five genera-Marburgvirus, Ebolavirus, Cuevavirus, Striavirus and Thamnovirus. The virions are 80 nm in diameter and appear as branched, circular or filamentous ( Figure 4A ). Filoviruses contain a linear negative sense single stranded RNA of approximately 19 kb. The genome of the filoviruses encodes for four structural proteins, namely nucleoprotein (NP), RNA-dependent RNA polymerase co-factor (VP35), transcriptional activator (VP30) and a RNA-dependent RNA polymerase (L). There are also three non-structural membrane-associated proteins, namely a spike glycoprotein (GP1,2), a primary matrix protein (VP40) and a secondary matrix protein (VP24) present within the virion membrane [78] ( Figure 4B ).

Ebola Virus (EboV)

The Ebolavirus genus belongs to the Filoviridae family within the order Mononegavirales. Five species have been identified within the genus of Ebolavirus-Zaire (EBOV), Bundibugyo (BDBV), Sudan (SUDV), Tai Forest (TAFV) and Reston (RESTV) [81] . Among them, only the Reston virus (RESTV) is assumed to be non-pathogenic for humans. The other four classified as Ebolaviruses are well-known to cause the Ebola virus disease (EVD). The virus causes a severe fever along with systemic inflammation and damage to the endothelial cell barrier, leading to shock and multiple organ failure with high mortality rates in humans and animals [82] . It is transmitted to people from wild animals and spreads in the human population through human-to-human transmission [83] . However, the natural host reservoirs of Ebola viruses are unknown. The average Ebola virus disease (EVD) case fatality rate is around 50%. So far, the largest recorded EVD with 28,652 infections had killed 11,325 people [84] . The Zaire, Bundibugyo and Sudan Ebola viruses are involved in large outbreaks in Africa.
The Ebolavirus genus belongs to the Filoviridae family within the order Mononegavirales. Five species have been identified within the genus of Ebolavirus-Zaire (EBOV), Bundibugyo (BDBV), Sudan (SUDV), Tai Forest (TAFV) and Reston (RESTV) [81] . Among them, only the Reston virus (RESTV) is assumed to be non-pathogenic for humans. The other four classified as Ebolaviruses are well-known to cause the Ebola virus disease (EVD). The virus causes a severe fever along with systemic inflammation and damage to the endothelial cell barrier, leading to shock and multiple organ failure with high mortality rates in humans and animals [82] . It is transmitted to people from wild animals and spreads in the human population through human-to-human transmission [83] . However, the natural host reservoirs of Ebola viruses are unknown. The average Ebola virus disease (EVD) case fatality rate is around 50%. So far, the largest recorded EVD with 28,652 infections had killed 11,325 people [84] . The Zaire, Bundibugyo and Sudan Ebola viruses are involved in large outbreaks in Africa.
Dietzel et al. studied the functional significance of three non-synonymous mutations in the Ebola virus (EBOV) isolates from the outbreak in West Africa. Among 1000 sequenced Ebola virus genomes, approximately 90% carried the signature three mutations at positions 82, 111 and 759 of the Ebola virus genome. The impact of specific mutations on the role of each viral proteins and on the growth of recombinant EBOVs was analyzed by recently engineered virus-like particles and reverse genetics. A D759G substitution in proximity to a highly conserved region of the GDN motif in the enzymatically active center (amino acid 741 to 743) of the L polymerase was able to increase viral transcription and replication. On the other hand, a R111C substitution in the multifunctional region of the nucleoprotein which is essential for homo-oligomerization and nucleocapsid formation was found to reduce viral transcription and replication. Furthermore, the A82V replacement in the glycoprotein region was able to enhance the efficacy of GP-mediated viral entry into target cells. The combination of the three mutations in the recombinant Ebola virus affected the functional activity of viral proteins and enhanced the growth of the recombinant virus in the cell culture when compared to the prototype isolate [93] . A pilot epidemiological NGS study with a substantial sample size suggested that high mortality in the host was not changed by these three mutations since the rate of mortality in the overall study was not considerably altered throughout the outbreak [95] .

Coronaviridae

Viruses within the Coronaviridae family are positive sense, single-stranded RNA viruses capable of infecting three vertebrate classes comprising mammals (Coronavirus and Torovirus), birds (Coronavirus) and fish (Bafinivirus). Coronaviruses are the largest RNA viruses identified so far with the enveloped spherical virions of about 120-160 nm and the viral genome is about 31 kb in length ( Figure 5A ) [97] . The genome consists of many ORFs. Two thirds of the 5 end is occupied by a replicase gene comprising two overlapping ORFs namely-ORF1a and ORF1b. The four structural proteins are spike glycoprotein (S), small envelope protein (E), membrane glycoprotein (M) and nucleocapsid (N). Accessory regions that are group specific ORFs are designated as ORF3, ORF4a, ORF4b and ORF5 [97] (Figure 5B ).

Middle East Respiratory Syndrome Coronavirus (MERS-CoV)

Park et al. analyzed the non-consensus sequences of MERS-CoV derived from 35 specimens obtained from 24 patients and showed the heterogeneity of MERS-CoV among patients. The maximal level of heterogeneity was recovered from the super-spreader specimens. Moreover, this heterogeneity disseminated in close association with variations in the consensus sequences. It can be inferred that MERS-CoV infections were caused by multiple variants. In-depth analysis of heterogeneity among patients showed a link between D510G and I529T mutations in the receptor-binding domain (RBD) of the viral spike glycoprotein. The two mutations resulted in reduced RBD binding affinity to the human CD26. Moreover, the two mutations were observed to be mutually exclusive, implying that the mutants have the ability to significantly hinder viral fitness. However, variants with D510G and I529T mutations in the S protein demonstrated an increase in resistance against neutralizing monoclonal antibodies and reduced sensitivity to antibody-mediated neutralization [105] . The frequency of each of the single mutant varied greatly but their combined frequency of mutations was elevated in the majority of the samples. Meanwhile, the frequency of the wild type was no more than 10% in the majority of the samples. Therefore, it can be deduced that the selection pressure applied by the host immune response played a crucial part in producing genetic variants and how they interacted with the immune system in humans in MERS-CoV outbreaks [106] .

Paramyxoviridae

Paramyxoviridae is a family of viruses in the order Mononegavirales that uses vertebrates as their natural hosts. Currently, 72 species are placed in this family and they are divided amongst 14 genera [109] . Diseases associated with Paramyxoviridae included measles (MeV), mumps and Newcastle disease (NDV). Paramyxoviridae virions are enveloped and pleomorphic which are presented as spherical or filamentous particles with diameters of around 150 to 350 nm ( Figure 7A ). The genome is linear, negative-sense single-stranded RNA, about 15-19 kb in length and encode 9-12 proteins through the production of multiple proteins from the P gene ( Figure 7B ) [110] . On the external surface of the virion, glycoproteins possessing hemagglutinin, neuraminidase and cell fusion activities are present. The middle component of the envelope is a lipid bilayer acquired from the host cell as the virus buds off the cytoplasmic membrane. The innermost surface of the envelope is a non-glycosylated membrane protein layer that maintains the outer structure of the virus. The paramyxoviruses can be characterized by the gene order of the viral proteins and by the biochemical characteristics of the proteins associated with viral attachment. Figure 7A ). The genome is linear, negative-sense single-stranded RNA, about 15-19 kb in length and encode 9-12 proteins through the production of multiple proteins from the P gene ( Figure 7B ) [110] . On the external surface of the virion, glycoproteins possessing hemagglutinin, neuraminidase and cell fusion activities are present. The middle component of the envelope is a lipid bilayer acquired from the host cell as the virus buds off the cytoplasmic membrane. The innermost surface of the envelope is a nonglycosylated membrane protein layer that maintains the outer structure of the virus. The paramyxoviruses can be characterized by the gene order of the viral proteins and by the biochemical characteristics of the proteins associated with viral attachment. The L protein which is the catalytic subunit of RNA-dependent RNA polymerase (RDRP) is associated with the nucleocapsid protein (N) and phosphoprotein (P) to form part of the RNA polymerase complex. The RNA polymerase complex is covered by the viral envelope consisting of a matrix protein (M) and two glycosylated envelope spike proteins, a fusion protein (F) and cell attachment protein. Cell attachment protein is different based on the genera and it could be hemagglutinin (H in Measles), hemagglutinin-neuraminidase (HN in Mumps and NDV viruses) or glycoprotein G (Henipavirus). Some genera within the Paramyxoviridae family also contain various conserved proteins including the non-structural proteins (C, NS1, NS2), a cysteine-rich protein (V), a small integral membrane protein (SH) and transcription factors M2-1 and M2-2 [111] .
Fusion and cell attachment proteins are large glycoprotein spikes that are present on the surface of the virion. Both of these proteins play important roles in the pathogenesis of viruses from Paramyxoviridae family and are responsible for attachment to the cellular receptor(s), whereas the F protein mediates cell entry by inducing fusion between the viral envelope and the host cell membrane. The matrix protein organizes and sustains the virion structure. The nucleocapsid associates with genomic RNA and protects the RNA from nucleases. Extracistronic (noncoding) regions include a 3′ leader sequence with 50 nucleotides in length, which works as a transcriptional promoter and a 5′ trailer with 50-161 nucleotides [111] .
Fusion and cell attachment proteins are large glycoprotein spikes that are present on the surface of the virion. Both of these proteins play important roles in the pathogenesis of viruses from Paramyxoviridae family and are responsible for attachment to the cellular receptor(s), whereas the F protein mediates cell entry by inducing fusion between the viral envelope and the host cell membrane. The matrix protein organizes and sustains the virion structure. The nucleocapsid associates with genomic RNA and protects the RNA from nucleases. Extracistronic (noncoding) regions include a 3 leader sequence with 50 nucleotides in length, which works as a transcriptional promoter and a 5 trailer with 50-161 nucleotides [111] .

Measles Virus (MeV)

Measles virus belongs to the genus Morbillivirus within the family Paramyxoviridae of the order Mononegavirales. Measles is transmitted by air or by direct contact with body fluids. The initial site of viral infection is the respiratory tract, followed by dispersions in the lymphoid tissue, liver, lungs, conjunctiva and skin. The measles virus (MeV) may persist in the brain, causing fatal neurodegenerative diseases. This virus can only infect humans and causes subacute sclerosing panencephalitis and encephalitis [112] [113] [114] . Measles often lead to fatality in young children (below 5 years) due to complications in respiratory tract infections like pneumonia, brain swelling or encephalitis, dehydration, diarrhea and ear infections [115] .
The MeV is a negative sense single stranded RNA virus and the genome is composed of six contiguous, non-overlapping transcription units separated by three untranscribed nucleotides. The genes which code for eight viral proteins are in the order of 5 -N-P/V/C-M-F-H-L-3 [116] . The second transcription unit (P) codes for two non-structural proteins, C and V, which interfere with the host immune response [117] [118] [119] [120] .
Early investigations of MeV infections in the HeLa cells with a vaccine-lineage MeV estimated an intra-population diversity of 6-9 positions per genome [121] . This led to the concept that MeV exists as quasispecies in a population. Donohue et al. discovered that MeV was able to adapt and grow in either of the two cellular environments, viz. lymphocytic (Granta-519) or epithelial (H358) cells. Passaging the MeV in these two different cell lines resulted in variants exhibiting different replication kinetics. Deep sequencing of the lymphocytic adapted variants demonstrated an increasing number of variants showing mutations within the 11-nucleotide region in the middle of the phosphoprotein (P) gene. This sequence mediated the polymerase split and caused an insertion of a pseudo-templated guanosine to the P mRNA, causing a replacement of polymerase cofactor (P) with a type I interferon antagonist (V). The two different variants (lymphocytic and epithelial adapted) had different levels of P and V expressions. It was suggested that the equilibration of the viral quasispecies in the population was based on different V protein expression. Lymphocytic derived MeV variants that exhibited V competent genomes were found at a low frequency for adaptation in epithelial cells. Moreover, a complete wipe out of the V-deficient genomes considerably reduced antiviral innate defense mechanism, suggesting that a good equilibrium of the V and P protein expressions is necessary within the quasispecies population [18] .

Mumps

The strain Urabe AM9 is one of the mumps virus strains that was widely used in vaccines but this strain was associated with meningitis and was withdrawn from the market. Sauder et al. performed serial passaging of the strain Urabe AM9 in cell cultures and compared the whole nucleotide sequences of the parental (Urabe P-AM9) and passaged viruses (Urabe P6-Vero or Urabe P6-CEF) to investigate the attenuation process and to identify the attenuation markers [123] . Passaging of the Urabe AM9 mumps virus in Vero or chicken embryo fibroblast (CEF) cell lines caused changes in the genetic heterogeneity at particular regions of the genome through either changing of one nucleotide at locations where the starting material showed nucleotide heterogeneity or the presentation of an additional nucleotide to produce a heterogenic site. Virulence of the passaged virus was dramatically decreased in the murine model. Moreover, similar growth kinetics of the virulent Urabe P-AM9 and passaged attenuated variants in the rat brain suggested that the impaired replication ability of the attenuated variants was not the main cause of the neuroattenuation. However, in the rat brain, the peak titer of the neuroattenuated variant was almost one log lower than that of the neurovirulent parental strain. For instance, identical but independent induction of heterogeneity at position 370 of the F-gene by substitution of threonine to alanine in passaged virus in Vero and CEF cells suggested a correlation of this mutation to the neuroattenuation phenotype. There was lack of ability to identify heterogeneity for those regions with differences of more than 10% between the detected nucleotides in the consensus sequence. The heterogeneity could be the result of new mutations at these positions or the selection of pre-existing sequences within the minority quasispecies. In addition, passaging of the parental strain in CEF and Vero cells led to the observation of several amino acid alterations in the NP, P, F, HN and L proteins that could affect the virulence of the virus. Thus, the modifications of genetic heterogeneity at particular genome sites could have important consequences on the neurovirulence phenotype. Therefore, extra caution should be exercised in order to evaluate genetic markers of virulence or attenuation of variants based on only a consensus sequence [123] .

Newcastle Disease Virus (NDV)

NDV strains are categorized based on their pathogenicity in chickens as highly virulent (velogenic), intermediately virulent (mesogenic) or nonvirulent (lentogenic). These levels of pathogenicity can be differentiated by the amino acid sequence of the cleavage site in the fusion protein (F0). Lentogenic NDV strains have dibasic amino acids at the cleavage site whereas the velogenic strains contain polybasic residues. Meng et al. studied the changes in virulence of NDV strains, leading to a switch in lentogenic variant (JS10) to velogenic variant (JS10-A10) through 10 serial passages of the virus in chicken air sacs [128] . However, the lentogenic variants (JS10) remained lentogenic after 20 serial passages in chicken embryos (JS10-E20). The nearly identical genome sequences of JS10, JS10-A10 and JS10-E20 showed that after passaging, both variants were directly generated from the parental strain (JS10). Genome sequence analysis of the F0 cleavage site of the parental strain and the passaged variants revealed that the rise in virulence observed in the parental strain (JS10) stemmed from a build-up of velogenic quasispecies population together with a gradual disappearance of the lentogenic quasispecies. The decline of the lentogenic F0 genotypes of 112 -E(G)RQE(G)RL-117 from 99.30% to 0.28% and the rise of the velogenic F0 genotypes of 112 -R(K)RQR(K)RF-117 from 0.34% to 94.87% after 10 serial passages in air sacs was hypothesized to be due to the emergence of velogenic F0 genotypes. Subsequently, this led to the enhancement of virulence in JS10-A10. The data indicated that lentogenic NDV strains circulating among poultry could lead to evolution of the velogenic NDV strain. This velogenic NDV strain has the potential to cause outbreaks due to the difficulty in preventing contact between natural waterfowl reservoirs and sensitive poultry operations.
NDV quasispecies comprised lentogenic and velogenic genomes in various proportions. The change in virulence of the quasispecies composition of JS10 and its variants was investigated by analysis of viral population dynamics. The F0 cleavage site was reported to be the main region in which the majority of amino acid changes had occurred and resulting in an accumulation of variants exhibiting velogenic properties due to serial passages. Furthermore, passaging of the virus caused a transition in the degree of virulence of NDV strains from lentogenic to mesogenic and ultimately an increase of the velogenic type. Therefore, NDV pathogenesis could be controlled by the ratio of avirulent to virulent genomes and their interactions within the chicken air sacs and the embryo. The data clearly demonstrated that the status of the quasispecies population is dependent on the pathogenicity of the NDV [128] . Gould et al. reported the presence of the F0 cleavage sequences of 112 -RRQRRF- 117 and HN extensions of 45 amino acids in virulent Australian NDV strains [129] . Furthermore, the genome analysis of the avirulent field isolates of NDV puts forth the existence of viruses with virulent F0 sequences without causing obvious clinical signs of the disease [130] . Subsequently, Kattenbelt et al. studied the underlying causes that could affect the balance of virulent (pp-PR32 Vir) and avirulent (pp-PR32 Avir) variants throughout viral infections. The variability of the quasispecies population and the rate of accumulation of mutations in vivo and in vitro were analyzed. The in vivo analysis showed that both virulent and avirulent plaque-purified variants displayed a rise in the variability of quasispecies from 26% and 39%, respectively. The error rate in the viral sequences was observed to increase as well, such that one bird out of three displayed virulent viral characteristics ( 112 -RRQRRF-117 ) after passaging of the PR-32 Avir variant. Genome analysis following the in vivo study revealed that a single base mutation occurring in the F0 region led to the switch from RRQGRF to RRQRRF.
On the other hand, in vitro studies showed that the quasispecies distribution of the avirulent isolate harbored 10% of variants bearing the virulent F0 region (RRQRRF). Gene sequence analysis of Australian NDV isolates showed the existence of a novel clade of NDV viruses with the F0 cleavage site sequence of 112 -RKQGRL-117 and the HN region bearing seven additional amino acids. Four field isolates (NG2, NG4, Q2-88 and Q4-88) belonging to the novel clade were propagated for a longer time period in CEF cells prior to sequencing. Analysis revealed the existence of 1-2% of virulent strains with the F0 cleavage site of 112 -RKGRRF-117 in the population [131] .
Quasispecies analysis of all the NDV field isolates in this study showed variable ratios (1:4-1:4000) of virulent to avirulent viral F0 sequences. However, these sequences remained constant in the quasispecies population during replication. It was concluded that the virulent strains present in the quasispecies population did not emerge from an avirulent viral population unless the quasispecies population was placed under direct selective pressure, either by previous infection of the host by other avian viruses or by transient immunosuppression [131] .

Pneumoviridae

The Pneumoviridae family contains large enveloped negative-sense RNA viruses. Previously, this taxon was known as a subfamily of the Paramyxoviridae but it was reclassified in 2016 as a family of its own with two genera, Orthopneumovirus and Metapneumovirus. Some viruses belonging to Pneumoviridae family are only pathogenic to humans, such as the human respiratory syncytial virus (HRSV) and human metapneumovirus (HMPV). Human pneumoviruses do not have animal reservoirs and their primary site of infection is the superficial epithelial cells of the respiratory tract. There are no known vectors for pneumoviruses and transmission is thought to be primarily by aerosol droplets [132] .
The virions of the pneumoviruses are enveloped with a spherical shape and a diameter of about 150 nm. They have a negative-sense RNA genome of 13 to 15 kb ( Figure 8A ). The RNA-dependent RNA polymerase (L) binds to the genome at the leader region and sequentially transcribes each gene. The cellular translation machinery translates the capped and poly-adenylated messenger RNA of the virus in the cytoplasm. Members of the genus Orthopneumovirus possess 10 genes including NS1 and NS2 which are promoter proximal to the N gene. The gene order is NS1-NS2-N-P-M-SH-G-F-M2-L ( Figure 8B ). Alignment of the L proteins showed moderate conservation of the sequences between the human and bovine viruses. Bovine respiratory syncytial virus (BRSV) differs from HRSV in host range and the two viruses bear substantially similar sequences as well as antigenic relatedness [132] . Figure 8B ). Alignment of the L proteins showed moderate conservation of the sequences between the human and bovine viruses. Bovine respiratory syncytial virus (BRSV) differs from HRSV in host range and the two viruses bear substantially similar sequences as well as antigenic relatedness [132] .

Respiratory Syncytial Virus (RSV)

Respiratory syncytial virus (RSV) belongs to the genus Orthopneumovirus under the family Pneumoviridae of the order Mononegavirales. Human respiratory syncytial virus (HRSV) is the primary cause of infection of the upper and lower respiratory tracts with mild, cold-like symptoms in infants and young children. The virus spreads through tiny air droplets. Globally, there are 4-5 million children younger than 4 years with HRSV infections and more than 125,000 are hospitalized every year in the United States. Although the risk of hospital admission is higher in known risk groups such as prematurely born infants. RSV is also responsible for 14,000 deaths in the elderly > 65 years of age annually in the United State [133, 134] . On the other hand, bovine respiratory syncytial virus (BRSV) is a common source of pneumonia in calves. Clinical infections stem from yearly outbreaks of the disease during winter and primarily affect calves less than 6 months of age. The target infection site of the viruses are the epithelial layer of the upper and lower respiratory tracts that can damage the bronchioles, leading to severe onset of bronchiolitis in caws [135] .
Palivizumab (PZ) is the sole humanized monoclonal antibody against an infectious disease that recognizes the fusion protein of respiratory syncytial virus (RSV). Zhao et al. selected a PZ resistant virus by passaging of RSVA2 strain in the presence of PZ in HEp-2 cell culture [136] . Utilization of PZ provided the opportunities to gain new insights into the transmission dynamics and the quasispecies nature of RSV. Protein sequence analysis of a single plaque (MP4) isolated from the fifth passage revealed the substitution of lysine by methionine 272. The mutation caused the cell culturederived virus to be completely resistant to PZ prophylaxis in cotton rats. Dramatic reduction in replication of the parental strain A2 virus was observed at PZ concentrations ranging from 4 to 40
Respiratory syncytial virus (RSV) belongs to the genus Orthopneumovirus under the family Pneumoviridae of the order Mononegavirales. Human respiratory syncytial virus (HRSV) is the primary cause of infection of the upper and lower respiratory tracts with mild, cold-like symptoms in infants and young children. The virus spreads through tiny air droplets. Globally, there are 4-5 million children younger than 4 years with HRSV infections and more than 125,000 are hospitalized every year in the United States. Although the risk of hospital admission is higher in known risk groups such as prematurely born infants. RSV is also responsible for 14,000 deaths in the elderly > 65 years of age annually in the United State [133, 134] . On the other hand, bovine respiratory syncytial virus (BRSV) is a common source of pneumonia in calves. Clinical infections stem from yearly outbreaks of the disease during winter and primarily affect calves less than 6 months of age. The target infection site of the viruses are the epithelial layer of the upper and lower respiratory tracts that can damage the bronchioles, leading to severe onset of bronchiolitis in caws [135] .
Palivizumab (PZ) is the sole humanized monoclonal antibody against an infectious disease that recognizes the fusion protein of respiratory syncytial virus (RSV). Zhao et al. selected a PZ resistant virus by passaging of RSVA2 strain in the presence of PZ in HEp-2 cell culture [136] . Utilization of PZ provided the opportunities to gain new insights into the transmission dynamics and the quasispecies nature of RSV. Protein sequence analysis of a single plaque (MP4) isolated from the fifth passage revealed the substitution of lysine by methionine 272. The mutation caused the cell culture-derived virus to be completely resistant to PZ prophylaxis in cotton rats. Dramatic reduction in replication of the parental strain A2 virus was observed at PZ concentrations ranging from 4 to 40 µg/mL. The replication of the MP4 mutant was not affected by PZ. The growth kinetics of both the parental strain and the variant were almost similar with maximum titers above 10 7 PFU/mL during the third and fourth day post infection. Hence, it was proposed that the fusion protein supported the entry of the MP4 mutants in HEp-2 cells in an early phase of the replication cycle through a fusion step. The A2 parental strain exhibited limited growth in HEp-2 cells due to its reactivity with PZ. However, the lack of reactivity of the MP4 mutants with PZ suggested that the F1 protein of the MP4 mutant caused a loss of antigenic reactivity with the humanized monoclonal antibody. Preclinical studies in cotton rats predicted the efficacy of PZ in humans. However, the usage of PZ up to 40 µg/mL, especially in immunosuppressed patients, could provide opportunities for the emergence of resistant viruses. Therefore, the PZ resistant viruses in humans could cause the PZ prophylaxis to be ineffective.
Larsen et al. analyzed the nucleotides coding for the extracellular part of the G glycoprotein and the full SH protein of bovine respiratory syncytial virus (BRSV) from several outbreaks from the same herd in different years in Denmark. Identical viruses were isolated within a herd during outbreaks but viruses from recurrent infections were found to vary up to 11% in sequences even in closed herds. It is possible that a quasispecies variant of BRSV persisted in some of the calves in each herd and this persistent variant displayed high viral fitness and became dominant. However, based on the high level of diversity, the most likely explanation is that BRSV was reintroduced into the herd prior to each new outbreak. These findings are highly relevant to understand the transmission patterns of BRSV among calves [138] .

Orthomyxoviridae

The family Orthomyxoviridae belongs to the order of Articulavirales and contains seven genera-Influenza A-D, Isavirus, Thogotovirus and Quaranjovirus. The virions within the Orthomyxoviridae family are usually spherical but can be filamentous, 80-120 nm in diameter ( Figure 9A ). The influenza virus genome is 12-15 kb and contains 8 segments of negative-sense, single-stranded RNA which encodes for 11 proteins (HA, NA, NP, M1, M2, NS1, NEP, PA, PB1 and PB2) ( Figure 9B ). Influenza viruses are pathogenic and they can cause influenza in vertebrates, including birds, humans and other mammals [139] . The genome fragments contain both the 5 and 3 terminal repeats which are highly conserved throughout all eight fragments.

Influenza Virus (IV)

The annual influenza epidemics caused about 3 to 5 million cases of severe illness with 290,000 to 680,000 deaths worldwide [141] . Current influenza vaccines have sub-optimal efficacy, as there was a lack of antigenic proximity between the vaccine candidate and the circulating seasonal influenza virus strains. During the 2016-2017 influenza epidemic, the influenza A (H3N2) viruses from the clade 3c.2a were dominant and was associated with severe onset of the disease. The low vaccine efficacy of the 2016-2017 egg-adapted H3N2 (clade 3c.2a) vaccine strain A/Hong Kong/4801/2014 was reported to be due to altered antigenicity [142] . To understand the pathogenesis of A(H3N2) viruses from the 3a.2c clade, it would be of great interest to consider if each infection was being caused by an individual strain or by a swarm of genetically related viruses (quasispecies). This would help to provide an insight into the vaccine coverage and efficacy.
A study investigated the impact of antigenic proximity, genomic substitutions, quasispecies, diversity and reassortment in order to understand the molecular evolution of the influenza A (H3N2) isolated directly from clinical samples. Of the 155/176 whole genomes analyzed, several amino acid substitutions were found to substantially affect the severity of the infection caused by the clade specific viruses. Within the sample, 121 viruses belonged to the genetic clade 3c.2a.1 and eight belonged to 3c.2a2, twenty-four belonged to 3c.2a3, one belonged to 3c.2a4 and one belonged to a different clade 3c.3a. Many distinct substitutions spanning across the whole influenza proteome, HA, NA and non-structural protein 1 were found to be responsible for causing mild and severe disease. Interestingly, two substitutions, V261I and K196E, were found in the NA and the NS1, respectively. These two mutations were found to be particularly significant as they showed the distinction between the strains causing mild and severe infections. Analysis of the clinical isolates showed a difference in a single amino acid residue, 160K within the HA, whereby 14 cases of glycosylation loss was observed within the quasispecies population linked to severity of infection. Moreover, the degree of diversity within the quasispecies population was reported to be elevated in severe cases when compared to mild ones [143] .
The epidemiology and molecular characterization of low and highly pathogenic avian influenza virus strains (LPAIV & HPAIV, respectively) isolated from Germany were investigated. The complete genome analysis of the two strains showed that both LPAIV and HPAIV had high nucleotide similarity with only ten mutations outside the hemagglutinin cleavage site (HACS) which were Orthomyxoviruses employ many different splicing techniques to synthesize their viral proteins while making full use of the coding capacity of the genome. The virion envelope originates from the cell membrane with the addition of one to three virus glycoproteins and one to two non-glycosylated proteins. The viral RNA polymerase (PB1, PB2 and PB3) is involved in the transcription of a single mRNA from every fragment of the genome. The transcription is triggered by cap snatching and the poly(A) tail is added by the viral polymerase stuttering on the poly U sequence. Alternative splicing of the MP and NS mRNA led to the mRNA coding for M2 and NEP proteins. PB1-F2 is translated by leaky scanning from the PB1 mRNA. The structural proteins common to all genera include three polypeptides, the hemagglutinin which is an integral type I membrane glycoprotein involved in virus attachment, the envelope fusion and the non-glycosylated matrix protein (M1 or M) [140] .
The annual influenza epidemics caused about 3 to 5 million cases of severe illness with 290,000 to 680,000 deaths worldwide [141] . Current influenza vaccines have sub-optimal efficacy, as there was a lack of antigenic proximity between the vaccine candidate and the circulating seasonal influenza virus strains. During the 2016-2017 influenza epidemic, the influenza A (H3N2) viruses from the clade 3c.2a were dominant and was associated with severe onset of the disease. The low vaccine efficacy of the 2016-2017 egg-adapted H3N2 (clade 3c.2a) vaccine strain A/Hong Kong/4801/2014 was reported to be due to altered antigenicity [142] . To understand the pathogenesis of A(H3N2) viruses from the 3a.2c clade, it would be of great interest to consider if each infection was being caused by an individual strain or by a swarm of genetically related viruses (quasispecies). This would help to provide an insight into the vaccine coverage and efficacy.
A study aimed to identify the key mechanisms contributing towards co-pathogenesis of BALB/c mice infected with the A(H1N1) quasispecies. It was revealed that the co-evolution of the quasispecies brought about a complex response due to different expressions of the biphasic gene. A significant upregulation of the Ifng was associated with an increased majority of mutants expressing a differentially expressed gene (DEG) named HA-G222 gene. This correlated with the increased levels of pro-inflammatory response observed in the lungs of the mice infected with the quasispecies A(H1N1) [147] . Serial passages of the H1N1 virus was also carried out prior to the analysis of its sequential replication, virulence and rate of transmission. Sequence analysis of the quasispecies in the viral population revealed that from the ninth passage onwards, the presence of five amino acid mutations (A469T, 1129T, N329D, N205K and T48N) in the various gene segments (PB1, PA, NA, NS1 and NEP) was detected. Furthermore, mutations located within the HA region indicated that the genetic makeup of the viral quasispecies was distinctly different in the upper and lower respiratory tracts of the infected pigs [148] .

Hepadnaviridae

Hepadnaviruses can be found within the family Hepadnaviridae. They are further classified into two genera-the mammalian genus Orthohepadnavirus and the avian genus Avihepadnavirus [149] . These viruses are spherical with 42-50 nm diameter and replicate their genomes with the help of a reverse transcriptase (RT) ( Figure 10A ). The approximate size of the DNA genome is 3.3 kb with a relaxed circular DNA (rcDNA) supported by base pairing complementary overlaps [150] . The DNA genome is made up of four partly or completely overlapping ORFs that encode for the core protein (Core and preCore), surface antigen protein (PreS1, PreS2 and S), the reverse transcriptase (Pol protein) and the X transcriptional transactivator protein [151] (Figure 10B ). Replication occurs by reverse transcription of the progenitor RNA by the RNA polymerase II from the covalently close circular form of the HBV DNA [152] .

Hepatitis B Virus (HBV)

Hepatitis B virus (HBV) is the prototype of hepadnaviruses. It infects humans and can be classified into 8 genotypes. More than one billion people have contracted hepatitis B virus (HBV) and more than 200 million patients are chronically infected with hepatitis B (CHB) [153] . CHB infections result in the development of hepatocellular carcinoma and chronic liver failure [151] and every year CHB causes 880,000 deaths worldwide [153] . Analysis of the immunodominant motifs of the HBV core region from the amino acids 40 to 95 indicated that the positions exhibiting peak rates of variability were found in the main core epitopes, thereby confirming their role in stimulating the immune system. Moreover, the distribution of the variability was observed to occur in a genotypedependent manner. For instance, HBV isolated from genotype A had higher variability within the core epitope regions but no significant differences in genotype D were observed in the core epitopes and other positions. Further sequential analysis of the samples put forth the dynamic nature of the HBV quasispecies whereby a strong selection for a single baseline variant was linked to a lower variability within the core region pre-and post-treatment. Leucine (L) at position 76 was determined to be the most highly conserved residue and the role of this amino acid was assessed by substitutions of Valine (V) or Proline (P) at position 76. Proline at position 76 was shown to drastically lower the production of Hepatitis B core antigen protein (HBsAg), likely due to the chemical and physical properties of the amino acid. However, substitution with Valine (V) at a similar position brought about a four-fold increase in the Hepatitis B e antigen protein (HBeAg) production when compared to Leucine at position 76. The decrease in the variability observed was associated with a stable quasispecies population after positive selection of the variant exhibiting high fitness level [154] .
In an attempt to elucidate the link between HBV quasispecies and the role of nucleotide analogues present in the quasispecies population, the heterogeneity and distribution of HBV quasispecies using the RT and S regions as a baseline to document the mutation sites were investigated. The quasispecies for the selected regions was analyzed using 608 sequences. In the RT region, no major differences in the composition and diversity of the quasispecies was identified at the nucleotide or amino acid level in patients who responded well to antiviral therapy and those who did not. Similar findings were observed when the S region was examined. However, sequence analysis within RT and S regions showed that the rtM204V/I resistant mutation was observed across the majority of samples prior to the rescue therapy. Interestingly, the frequency of this mutation was noted to drop six months post treatment. Moreover, 3 out of 5 stop codons consistently observed within the RT and S regions were reported to be associated with nucleotide analogue (NA) resistance Complementary to the viral mRNA is a full length negative strand whereas the positive strand is of varied length. The 5 -NTR of the negative strand DNA is covalently attached to the terminal protein (TP) domain of the viral DNA polymerase whereas the 5 -NTR end of the positive sense DNA has a 5 -capped oligonucleotide primer covalently attached. The 3 -NTR of the positive strand ends at a variable position in different molecules and creating a single stranded gap [150] .
Hepatitis B virus (HBV) is the prototype of hepadnaviruses. It infects humans and can be classified into 8 genotypes. More than one billion people have contracted hepatitis B virus (HBV) and more than 200 million patients are chronically infected with hepatitis B (CHB) [153] . CHB infections result in the development of hepatocellular carcinoma and chronic liver failure [151] and every year CHB causes 880,000 deaths worldwide [153] . Analysis of the immunodominant motifs of the HBV core region from the amino acids 40 to 95 indicated that the positions exhibiting peak rates of variability were found in the main core epitopes, thereby confirming their role in stimulating the immune system. Moreover, the distribution of the variability was observed to occur in a genotype-dependent manner. For instance, HBV isolated from genotype A had higher variability within the core epitope regions but no significant differences in genotype D were observed in the core epitopes and other positions. Further sequential analysis of the samples put forth the dynamic nature of the HBV quasispecies whereby a strong selection for a single baseline variant was linked to a lower variability within the core region pre-and post-treatment. Leucine (L) at position 76 was determined to be the most highly conserved residue and the role of this amino acid was assessed by substitutions of Valine (V) or Proline (P) at position 76. Proline at position 76 was shown to drastically lower the production of Hepatitis B core antigen protein (HBsAg), likely due to the chemical and physical properties of the amino acid. However, substitution with Valine (V) at a similar position brought about a four-fold increase in the Hepatitis B e antigen protein (HBeAg) production when compared to Leucine at position 76. The decrease in the variability observed was associated with a stable quasispecies population after positive selection of the variant exhibiting high fitness level [154] .
In an attempt to elucidate the link between HBV quasispecies and the role of nucleotide analogues present in the quasispecies population, the heterogeneity and distribution of HBV quasispecies using the RT and S regions as a baseline to document the mutation sites were investigated. The quasispecies for the selected regions was analyzed using 608 sequences. In the RT region, no major differences in the composition and diversity of the quasispecies was identified at the nucleotide or amino acid level in patients who responded well to antiviral therapy and those who did not. Similar findings were observed when the S region was examined. However, sequence analysis within RT and S regions showed that the rtM204V/I resistant mutation was observed across the majority of samples prior to the rescue therapy. Interestingly, the frequency of this mutation was noted to drop six months post treatment. Moreover, 3 out of 5 stop codons consistently observed within the RT and S regions were reported to be associated with nucleotide analogue (NA) resistance and affected the HBsAg reading frame. However, the complexity and diversity of the quasispecies of HBV were similar between the CHB patients responsive to the treatment and those who did not. It can be inferred that the characteristics of the quasispecies in CHB patients at the start of the study was not associated to the various viral responses observed in the cohort. Hence, the RT and S regions might not be adequate sites to monitor the response of CHB patients to the rescue treatment [155] .

Introduction

Only about 15 viral diseases can be effectively prevented through FDA approved vaccinations [5] . This attests to the urgency to understand the mechanisms by which viruses can overcome the different pressures applied to restrict their replications. In the last four decades, breakthroughs in molecular biology have favored in-depth analysis of virus isolates. Findings from other studies have suggested that populations of RNA viruses are divergent and favor an active evolution of RNA genomes. The quick evolution of RNA genomes could lead to variant sequences that differ by one or two nucleotides from the wild-type sequence in the population. It is further suggested that each viral RNA population of 10 9 or more infectious particles was always a mixture of various variants despite being isolated from a single clone [6] .
Such heterogeneity within the virus population could be explained by the existence of not a single genotype within the species but rather an ensemble of related sequences known as the quasispecies [7] . Developed by Manfred Eigen and Peter Schuster, the concept of quasispecies was defined as a mutant distribution dominated by a primary sequence with the highest rates of replication between the components of the mutant distribution. The phenomenon of quasispecies was further supported by the "hypercycle" theory as a self-organization principle to include different quasispecies in a higher-order organization that eases evolution into more complicated forms such that the coding capacity and catalytic activities of proteins are taken into account [8] [9] [10] .
There is some general consensus regarding quasispecies that have been established. For instance, the presence of diverse mutants in a population of viruses is a reality which can affect the biological behavior of the virus in vivo due to the complexity and amplitude of the mutant spectra [14] . Moreover, interactions amongst variants of a quasispecies population was classified into three types namely-cooperation, interference and complementation. Cooperative interactions arise from those variants exhibiting advantageous phenotypes compared to the wild-type while interfering interactions are those exemplified in variants with detrimental effects on the replication of the virus. However, complementation interactions have no positive or negative effects on the virus population [15] .
Considerably less is known of the relationships between the evolution of RNA viruses with respect to virulence. The dynamics of quasispecies has explained the failure of monotherapy and synthetic antiviral vaccine but opened up new avenues for exploration [14] . Specifically, unanswered questions pertaining to quasispecies remain-What are the underlying mutations responsible for long term tenacity compared to those of extinction? Are there any molecular determinants which are the root cause of higher virulence in a quasispecies population?
Frequent occasional outbreaks of emerging and re-emerging viral diseases such as Dengue fever, West Nile Fever, Zika virus disease, Chikungunya disease, Middle east respiratory syndrome, Ebola virus disease and many others have been targets for therapeutic interventions. Long lasting protection against viral infections is best achieved via vaccinations through live attenuated viruses (LAVs). In order to generate stable vaccine strains, the evolution of these viruses must be properly understood. This review is centered on the examination of the evidence for the heterogeneous nature of RNA genomes (quasispecies), the factors leading to quasispecies formation and its implications on virulence.
The phenomenon of quasispecies has been well reported for many viruses belonging to different families and genera [16] [17] [18] . There is no clear idea whether emerging viruses such as Zika virus, Ebola virus, West Nile virus, Dengue virus and many others owe part of their evolution to higher virulence conferred by the presence of specific quasispecies within the viral population. Exploring the underlying mechanism of virulence stemming from a quasispecies population remains of interest. In this review, we examine reported cases of quasispecies and their implications on virulence.

Poliovirus (PV)

The poliovirus population diversity was evaluated in the brain of the murine model during viral spread. It was observed that only a fraction of the original injected viral pool was able to move from the initial site of inoculation to the brain via the 'bottleneck effect.' To determine the maintenance of the quasispecies during infection in vivo, 6-10 weeks old mice were inoculated in the leg with individual viruses. Total RNA recovered from the brain tissues revealed that four viruses were shown to be capable of spreading to the brain with their introduced mutations unchanged. Therefore, it was postulated that the innate immune response reduced the viral pathogenicity by limiting the diversity of viruses during spread to vulnerable tissues [16] . Two mechanisms to explain the bottleneck effect have been speculated, namely-the "tough-transit" model and the "burned-bridge" model. The "tough-transit" model suggests that virus trafficking within the murine model has a low probability of success passing the blood-brain barrier. However, once in the CNS, it acts as a founder virus and re-establishes a population with initial limited diversity. On the other hand, the "burned bridge" model stipulates that it is not tough for the virus to physically reach the blood-brain barrier. Thus, when the first few viruses reach the gateway to the brain, the host innate immune response triggers an antiviral state [16] .
In order to verify whether limiting the genomic diversity of a viral population has any effect on its evolution, a study was conducted on a strain of poliovirus with a substitution of Glycine 64 to Serine (G64S) in the RNA polymerase of the virus. The outcome of one-step growth curves and
The poliovirus population diversity was evaluated in the brain of the murine model during viral spread. It was observed that only a fraction of the original injected viral pool was able to move from the initial site of inoculation to the brain via the 'bottleneck effect.' To determine the maintenance of the quasispecies during infection in vivo, 6-10 weeks old mice were inoculated in the leg with individual viruses. Total RNA recovered from the brain tissues revealed that four viruses were shown to be capable of spreading to the brain with their introduced mutations unchanged. Therefore, it was postulated that the innate immune response reduced the viral pathogenicity by limiting the diversity of viruses during spread to vulnerable tissues [16] . Two mechanisms to explain the bottleneck effect have been speculated, namely-the "tough-transit" model and the "burned-bridge" model. The "tough-transit" model suggests that virus trafficking within the murine model has a low probability of success passing the blood-brain barrier. However, once in the CNS, it acts as a founder virus and re-establishes a population with initial limited diversity. On the other hand, the "burned bridge" model stipulates that it is not tough for the virus to physically reach the blood-brain barrier. Thus, when the first few viruses reach the gateway to the brain, the host innate immune response triggers an antiviral state [16] .

Enterovirus 71 (EV-A71)

EV-A71 belongs to the genus Enterovirus within the family of Picornaviridae. It was first characterized in 1969 in California, USA [25] and is one of the main etiological agents of hand, foot and mouth disease (HFMD) [26] . Some cases of EV-A71 infections have been associated with neurological complications such as aseptic meningitis, brainstem encephalitis and acute flaccid paralysis [27] . In China, enteroviruses such as EV-A71 and CV-A16 have caused 7,200,092 cases of HFMD between 2008 to 2012. The mortality rate was highest among children below the age of five. It was further reported that 82,486 patients developed neurological complications and 1617 deaths were confirmed by the laboratory to be caused by EV-A71 [28, 29] .

Chikungunya Virus (CHIKV)

Chikungunya virus (CHIKV) is an arthropod-borne virus transmitted to humans by mosquitoes and has caused significant human morbidity in many parts of the world [56] . Chikungunya virus causes an acute febrile illness with high fever, severe joint pain, polyarthralgia, myalgia, maculopapular rash and edema. While the fever and rash are self-limiting and are able to resolve within a few days, arthralgia can be prolonged from months to years [57, 58] . Some cases of CHIKV
The P5 CHIKV-NoLS clone remained genetically stable after five passages in Vero cells or insect cells when compared to the CHIKV-WT. Sequence analysis of the P5 CHIKV-NoLS plaques showed that the two plaque variants had no mutations in the capsid protein. A single non-synonymous change in the nucleotide of the capsid caused an alanine to serine substitution at position 101 in the third plaque variant. However, the substitution did not cause any change in the small plaque phenotype or replication kinetics of the CHIKV-NoLS clone after ten passages in vitro [74] . The in vivo study showed that the CHIKV-NoLS-immunized mice were able to produce long-term immunity against CHIKV infection following immunization with a single dose of the CHIKV-NoLS small plaque variant. Attenuation of CHIKV-NoLS through the NoLS mutation is most likely due to the disruption of the replication of viruses after viral RNA synthesis, however, the precise mechanism of reduced viral titer remained unsolved [76] . The NoLS mutation caused a considerable change in the very basic capsid region involving two nucleotides which could affect the structure of RNA binding, assembly of nucleocapsid and interaction with the envelope proteins [77] . Since the CHIKV-NoLS small plaque variant was attenuated in immunized mice and produced sera which could effectively neutralize CHIKV infection in vitro, it could serve as a promising vaccine candidate needed to control the explosive large-scale outbreaks of CHIKV [76] .

Ebola Virus (EboV)

The glycoprotein (GP) is responsible for cell attachment, fusion and cell entry. The broad cellular tropism of the GP resulted in multisystem involvement that led to high mortality [85] . The Ebola virus has a high frequency of mutation within a host during the spread of infection and in the reservoir in the human population [86] . Alignment of the Glycoprotein (GP) sequences of 66 Ebola virus isolates from the previous outbreaks (old Ebola outbreak of 1976 to 2005) with the new Ebola outbreak isolates (2014) showed some differences in the positions and frequency of the amino acid replacements. Comparative analysis between the isolates from the old epidemic with the new epidemic isolates showed that 19 out of the 22 amino acid mutations were consistently present in the latter [87] .
The glycoprotein (GP) is responsible for cell attachment, fusion and cell entry. The broad cellular tropism of the GP resulted in multisystem involvement that led to high mortality [85] . The Ebola virus has a high frequency of mutation within a host during the spread of infection and in the reservoir in the human population [86] . Alignment of the Glycoprotein (GP) sequences of 66 Ebola virus isolates from the previous outbreaks (old Ebola outbreak of 1976 to 2005) with the new Ebola outbreak isolates (2014) showed some differences in the positions and frequency of the amino acid replacements. Comparative analysis between the isolates from the old epidemic with the new epidemic isolates showed that 19 out of the 22 amino acid mutations were consistently present in the latter [87] .
Furthermore, Fedewa et al. showed that genomic adaptation was not crucial for efficient infection of the Ebola virus. The genomes were characterized after serial-passages of EBOV in Boa constrictor kidney JK cells. Deep sequencing coverage (>×10,000) confirmed the presence of only one single nonsynonymous variant (T544I) of unknown significance within the viral population that demonstrated a shift in frequency of at least 10% over six serial passages. However, passaging the EBOV in other cell lines, such as HeLa and DpHt cheek cells, showed different mutations in the genomes of the viral population [96] . This brings forth the question as to whether the viral strains of the Ebola virus should be directly isolated from patients in order to determine the quasispecies of the Ebola virus.

Middle East Respiratory Syndrome Coronavirus (MERS-CoV)

Scobey et al. reported the T1015N mutation in the spike glycoprotein during 9 passages of the virus was able to alter the growth kinetics and plaque morphology in vitro. The mutated MERS-CoV virus (MERS-CoV T1015N) replicated approximately 0.5 log more effectively and formed larger plaques compared to the wild type (MERS-CoV). The data suggested that the mutation T1015N was a tissue culture-adapted mutation that arose during serial in vitro passages [107] . The whole genome sequencing of MERS-CoV revealed the presence of sequence variants within the isolate from dromedary camels (DC) which indicated the existence of quasispecies present within the animal. A single amino acid (A520S) was located in the receptor-binding domain of the MERS-CoV variant. Strikingly, when detailed population analysis was performed on samples recovered from human cases, only clonal genomic sequences were reported. Therefore, the study speculated that a model of interspecies transmission of MERS-CoV whereby specific genotypes were able to overcome the bottleneck selection. While host susceptibility to infection is not taken into account in this setting, the findings provided insights into understanding the unique and rare cases of human of MERS-CoV [108] .

Paramyxoviridae

The genomes of viruses within the family Paramyxoviridae are non-segmented and thus cannot undergo genetic reassortment. Like many other RNA viruses, the RNA-dependent RNA polymerase does not have an error proofreading capability and hence many mutations can occur when the RNA is processed. These mutations can build up in the genome and eventually give rise to new variants. Since each protein has an important function, the mutant viruses will exhibit a loss in viral fitness and are eliminated, leaving only those exhibiting good viral fitness [111] . Within the Paramyxoviridae family, mutations leading to a spectrum of mutant distributions among Measles virus, Mumps virus and Newcastle disease virus are reviewed. The L protein which is the catalytic subunit of RNA-dependent RNA polymerase (RDRP) is associated with the nucleocapsid protein (N) and phosphoprotein (P) to form part of the RNA polymerase complex. The RNA polymerase complex is covered by the viral envelope consisting of a matrix protein (M) and two glycosylated envelope spike proteins, a fusion protein (F) and cell attachment protein. Cell attachment protein is different based on the genera and it could be hemagglutinin (H in Measles), hemagglutinin-neuraminidase (HN in Mumps and NDV viruses) or glycoprotein G (Henipavirus). Some genera within the Paramyxoviridae family also contain various conserved proteins including the non-structural proteins (C, NS1, NS2), a cysteine-rich protein (V), a small integral membrane protein (SH) and transcription factors M2-1 and M2-2 [111] .
The genomes of viruses within the family Paramyxoviridae are non-segmented and thus cannot undergo genetic reassortment. Like many other RNA viruses, the RNA-dependent RNA polymerase does not have an error proofreading capability and hence many mutations can occur when the RNA is processed. These mutations can build up in the genome and eventually give rise to new variants. Since each protein has an important function, the mutant viruses will exhibit a loss in viral fitness and are eliminated, leaving only those exhibiting good viral fitness [111] . Within the Paramyxoviridae family, mutations leading to a spectrum of mutant distributions among Measles virus, Mumps virus and Newcastle disease virus are reviewed.

Mumps

Mumps virus belongs to the genus nus Rubulavirus within the Paramyxoviridae family of the order Mononegavirales. Mumps is an extremely contagious, acute, self-limited, systemic viral infection that primarily affects swelling of one or more of the salivary glands, typically the parotid glands. The infection could cause pain in the swollen salivary glands on one or both sides of patient face, fever, headache, muscle aches, weakness, fatigue and loss of appetite. Complications of mumps are rare but they can be potentially serious involving inflammation and body swelling in testicles, brain, spinal cord or pancreas. Infections can lead to hearing loss, heart problems and miscarriage. In the United States, mumps was one of common disease prior to vaccination became routine. Then a dramatic decrease was observed in the number of infections. However, mumps outbreaks still occur in the United States and there was an increase in the number of cases recently. Majority of those who are not vaccinated or are in close-contact with the viruses in schools or college campuses are at high risk. There is currently no specific treatment for mumps [122] .

Newcastle Disease Virus (NDV)

Newcastle disease virus (NDV) belongs to the genus Avulavirus in the family Paramyxoviridae of the order Mononegavirales. NDV is an avian pathogen that can be transmitted to humans and cause conjunctivitis and an influenza like disease [124] . Clinical diseases affecting the neurological, gastrointestinal, reproductive and respiratory systems are detected in naïve, unvaccinated or poorly vaccinated birds [125] . NDV is a continuous problem for poultry producers since it was identified ninety years ago. It has negatively impacted the economic livelihoods and human welfare through reducing food supplies and many countries were affected since 1926 with NDV outbreaks [126] .

Influenza Virus (IV)

The epidemiology and molecular characterization of low and highly pathogenic avian influenza virus strains (LPAIV & HPAIV, respectively) isolated from Germany were investigated. The complete genome analysis of the two strains showed that both LPAIV and HPAIV had high nucleotide similarity with only ten mutations outside the hemagglutinin cleavage site (HACS) which were spread along the six genome segments of the HPAIV. Of the ten mutations, five were previously identified as minor variants in the quasispecies population of the progenitor virus, LPAIV, with 18-42% significant variable frequency [144] . However, studies focusing on the diversity of quasispecies of avian influenza in the human host are few. Watanabe et al. successfully demonstrated that infections caused by a single-virus in vitro produced an evident spectra of mutants in the H5N1 progeny viruses. Analysis of the genetic diversity of the hemagglutinin (HA) revealed that variants with mutated HA had lower thermostability leading to higher binding specificity. Both traits were deemed beneficial for viral infection. On the other hand, other variants with higher thermostability also emerged but were unable to thrive against mutants with lower thermostability [145] . The quasispecies population of influenza A virus was also reported to be in a state of continuous genetic drift in a given subtype population. A viral single nucleotide polymorphism (vSNP) was reported to be important and was shared by more than 15% of the variants within the quasispecies population of the subtype strain in a given season. However, between the season 2010-2011, various vSNPs in the PB2, PA, HA, NP, NA, M and NS segments were shared among variants with more than 58-80% of the sample population and less than 50% of the shared vSNPs were located within the PB1 segment [146] .

Hepatitis B Virus (HBV)

Although a significant number of RNA viruses demonstrated the existence of quasispecies in their populations due to their low-fidelity polymerases, the phenomenon of quasispecies has been reported to exist in DNA viruses such as Hepatitis B virus (HBV) that replicates via a RNA intermediate.
Although a significant number of RNA viruses demonstrated the existence of quasispecies in their populations due to their low-fidelity polymerases, the phenomenon of quasispecies has been reported to exist in DNA viruses such as Hepatitis B virus (HBV) that replicates via a RNA intermediate.
Another challenge to overcome is to accurately determine the origin and spread of a founder population of the virus. Hence, the discrepancies in the evolution of HBV was investigated. Eight related patients with acquired chronic HBV through mother-to-infant transmission were selected and the viral genomes isolated were analyzed. Sequence analysis indicated that the samples originated from a single source of HBV genotype B2 (HBV-B2) which diverged from a tiny common ancestral pool regardless of the route of acquisition. Between individuals, viral strains obtained from a time point showed evidence that they originated from a small pool of the previous time point. This conferred the strain an advantage over other strains with regards to the recovery of the founder state shortly after transmission to the new host and the adaptation to the local environment within the host. Natural selection rather than genetic drift was hypothesized to be the root cause for the evolution of HBV, due to the observed varying patterns of divergence at synonymous and non-synonymous sites. This was in line with the higher rate of substitutions within the host rather than between hosts. Approximately 85/88 amino acid residues changed from common to rare residues. Since these changes were shown not to be a random process, it is concluded that the HBV was able to evolve and change but was limited to a defined range of phenotypes. It can be argued that the mechanism observed thus far suggest that the adaptive mutations accumulated in one individual would not be maintained in another individual and might revert after transmission. Hence, within the host, substitutions were higher than between hosts [156] .

Conclusions

RNA viruses are responsible for numerous outbreaks of viral infections with substantial levels of fatality. We discussed how genetic variants carrying spontaneous mutations could give rise to diverse plaque morphologies in different RNA viruses. How the specific mutations could affect viral replications and have an impact on the virulence of the plaque variants are reviewed. The existence of quasispecies in the viral RNA populations is also explored. Many of the RNA viruses displayed different plaque morphologies and these variants could have arisen from a genetically diverse quasispecies population. Such diverse quasispecies in a population could be a key contributing factor to elevated levels of virulence exhibited by the RNA viruses. Through an extensive analysis of different plaque variants and quasispecies within a population, this study could shed more light on the evolutionary pattern and virulence of RNA viruses. More intricate in vitro and in vivo examination of the phenomenon of quasispecies and the relationship between plaque size determinants and virulence should be undertaken to reveal if serious infections are caused by a single strain or through the combined action of diverse quasispecies carrying different mutations. This can be a valuable tool to characterize the mechanisms that led to viral evolution and adaptation in a host. Eventually, discovering an answer to these concerns might ultimately help to design effective vaccines against the ever-evolving RNA viruses.
74 section matches

Abstract

The main protease of coronaviruses and the 3C protease of enteroviruses share a similar active-site architecture and a unique requirement for glutamine in the P1 position of the substrate. Because of their unique specificity and essential role in viral polyprotein processing, these proteases are suitable targets for the development of antiviral drugs. In order to obtain nearequipotent, broad-spectrum antivirals against alphacoronaviruses, betacoronaviruses, and enteroviruses, we pursued structure-based design of peptidomimetic a-ketoamides as inhibitors of main and 3C proteases. Six crystal structures of protease:inhibitor complexes were determined as part of this study. Compounds synthesized were tested against the recombinant proteases as well as in viral replicons and virus-infected cell cultures; most of them were not cell-toxic. Optimization of the P2 substituent of the aketoamides proved crucial for achieving near-equipotency against the three virus genera. The best near-equipotent inhibitors, 11u (P2 = cyclopentylmethyl) and 11r (P2 = cyclohexylmethyl), display low-micromolar EC50 values against enteroviruses, alphacoronaviruses, and betacoronaviruses in cell cultures. In Huh7 cells, 11r exhibits three-digit picomolar activity against Middle East Respiratory Syndrome coronavirus.
Our efforts to design novel a-ketoamides as broad-spectrum inhibitors of coronavirus M pro s and enterovirus 3C pro s started with a detailed analysis of the following crystal structures of unliganded target enzymes: SARS-CoV M pro (ref. 25-27; PDB entries 1UJ1, 2BX3, 2BX4); bat coronavirus HKU4 M pro as a surrogate for the closely related MERS-CoV protease (our unpublished work (Ma, Xiao et al.; PDB entry 2YNA; see also ref. 27); HCoV-229E M pro (ref. 27,28; PDB entry: 1P9S); Coxsackievirus B3 3C pro (our unpublished work; Tan et al., PDB entry 3ZYD); enterovirus D68 3C pro (ref. 29; PDB entry: 3ZV8); and enterovirus A71 3C pro (ref. 30; PDB entry: 3SJK).
During the course of the present study, we determined crystal structures of a number of lead a-ketoamide compounds in complex with SARS-CoV M pro , HCoV-NL63 M pro , and CVB3 3C pro , in support of the design of improvements in the next round of lead optimization. Notably, unexpected differences between alpha-and betacoronavirus M pro were found in this study. The structural foundation of these was elucidated in detail in a subproject involving the M pro of HCoV NL63; because of its volume, this work will be published separately (Zhang et al., in preparation) and only some selected findings are referred to here. The main protease of the newly discovered coronavirus linked to the Wuhan outbreak of respiratory illness is 96% identical (98% similar) in amino-acid sequence to that of SARS-CoV M pro (derived from the RNA genome of BetaCoV/Wuhan/IVDC-HB-01/2019, Genbank accession code: MN908947.2; http://virological.org/t/initialgenome-release-of-novel-coronavirus/319, last accessed on January 11, 2020), so all results reported here for inhibition of SARS-CoV will most likely also apply to the new virus.
We chose the chemical class of peptidomimetic a-ketoamides to assess the feasibility of achieving antiviral drugs targeting coronaviruses and enteroviruses with near-equipotency. Here we describe the structure-based design, synthesis, and evaluation of inhibitory activity of a series of compounds with broad-spectrum activities afforded by studying the structure-activity relationships mainly with respect to the P2 position of the peptidomimetics. One of the compounds designed and synthesized exhibits excellent activity against MERS-CoV.

INTRODUCTION

Seventeen years have passed since the outbreak of severe acute respiratory syndrome (SARS) in 2003, but there is yet no approved treatment for infections with the SARS coronavirus (SARS-CoV). 1 One of the reasons is that despite the devastating consequences of SARS for the affected patients, the development of an antiviral drug against this virus would not be commercially viable in view of the fact that the virus has been rapidly contained and did not reappear since 2004. As a result, we were empty-handed when the Middle-East respiratory syndrome coronavirus (MERS-CoV), a close relative of SARS-CoV, emerged in 2012. 2 MERS is characterized by severe respiratory disease, quite similar to SARS, but in addition frequently causes renal failure 3 . Although the number of registered MERS cases is low (2494 as of November 30, 2019; www.who.int), the threat MERS-CoV poses to global public health may be even more serious than that presented by SARS-CoV. This is related to the high casefatality rate (about 35%, compared to 10% for SARS), and to the fact that MERS cases are still accumulating seven years after the discovery of the virus, whereas the SARS outbreak was essentially contained within 6 months. The potential for human-to-human transmission of MERS-CoV has been impressively demonstrated by the 2015 outbreak in South Korea, where 186 cases could be traced back to a single infected traveller returning from the Middle East. 4 SARS-like coronaviruses are still circulating in bats in China, [5] [6] [7] [8] from where they may spill over into the human population; this is probably what caused the current outbreak of atypical pneumonia in Wuhan, which is linked to a seafood and animal market. The RNA genome (Gen-Bank accession code: MN908947.2; http://virological.org/t/initialgenome-release-of-novel-coronavirus/319, last accessed on January 11, 2020) of the new betacoronavirus features around 82% identity to that of SARS-CoV.
However, enteroviruses are very different from coronaviruses. While both of them have a single-stranded RNA genome of positive polarity, that of enteroviruses is very small (just 7 -9 kb) whereas coronaviruses feature the largest RNA genome known to date (27 -34 kb) . Enteroviruses are small, naked particles, whereas coronaviruses are much larger and enveloped. Nevertheless, a related feature shared by these two groups of viruses is their type of major protease, 21 which in the enteroviruses is encoded by the 3C region of the genome (hence the protease is designated 3C pro ). In coronaviruses, non-structural protein 5 (Nsp5) is the main protease (M pro ). Similar to the enteroviral 3C pro , it is a cysteine protease in the vast majority of cases and has therefore also been called "3C-like protease" (3CL pro ). The first crystal structure of a CoV M pro or 3CL pro (ref. 22) revealed that two of the three domains of the enzyme together resemble the chymotrypsin-like fold of the enteroviral 3C pro , but there is an additional a-helical domain that is involved in the dimerization of the protease (Fig. 1A) . This dimerization is essential for the catalytic activity of the CoV M pro , whereas the enteroviral 3C pro (Fig. 1B) functions as a monomer. Further, the enteroviral 3C pro features a classical Cys...His...Glu/Asp catalytic triad, whereas the CoV M pro only has a Cys...His dyad. 22 Yet, there are a number of common features shared between the two types of proteases, in particular their almost absolute requirement for Gln in the P1 position of the substrate and space for only small amino-acid residues such as Gly, Ala, or Ser in the P1' position, encouraging us to explore the coronaviral M pro and the enteroviral 3C pro as a common target for the design of broad-spectrum antiviral compounds. The fact that there is no known human protease with a specificity for Gln at the cleavage site of the substrate increases the attractivity of this viral target, as there is hope that the inhibitors to be developed will not show toxicity versus the host cell. Indeed, neither the enterovirus 3C pro inhibitor rupintrivir, which was developed as a treatment of the common cold caused by HRV, nor the peptide aldehyde inhibitor of the coronavirus M pro that was recently demonstrated to lead to complete recovery of cats from the normally fatal infection with Feline Infectious Peritonitis Virus (FIPV), showed any toxic effects on humans or cats, respectively. 23, 24 Figure 1: Crystal structures of SARS-CoV main protease (M pro , ref. 26 ; PDB entry 2BX4) and Coxsackivirus B3 3C protease (3C pro ; Tan et al., unpublished; PDB entry 3ZYD). Catalytic residues are indicated by spheres (yellow, Cys; blue, His; red: Glu). A. The coronavirus M pro is a homodimer, with each monomer comprising three domains. B. The structure of the monomeric CVB3 3C pro resembles the Nterminal two domains of the SARS-CoV M pro . Structure is on the same scale as image A. C. Superimpostion of residues from the two structures involved in ligand binding. Superimposition was carried out by aligning the catalytic Cys-His pair of each protease. Residues of the SARS-CoV M pro are shown with carbon atoms in cyan, CVB3 3C pro residues have orange carbons and are labeled with an asterisk (*).
As the proteases targeted in our study all specifically cleave the peptide bond following a P1-glutamine residue (HCoV-NL63 M pro uniquely also accepts P1 = His at the Nsp13/Nsp14 cleavage site 31 ), we decided to use a 5-membered ring (g-lactam) derivative of glutamine (henceforth called GlnLactam) as the P1 residue in all our aketoamides (see Scheme 1). This moiety has been found to be a good mimic of glutamine and enhance the power of the inhibitors by up to 10-fold, most probably because compared to the flexible glutamine side-chain, the more rigid lactam leads to a reduction of the loss of entropy upon binding to the target protease. 29, 32 Our synthetic efforts therefore aimed at optimizing the substituents at the P1', P2, and P3 positions of the a-ketoamides.

Synthesis of a-ketoamides

The inhibitory potencies of candidate a-ketoamides were evaluated against purified recombinant SARS-CoV M pro , HCoV-NL63 M pro , CVB3 3C pro , and EV-A71 3C pro . The most potent compounds were further tested against viral replicons and against SARS-CoV, MERS-CoV, or a whole range of enteroviruses in cell culture-based assays (Tables 1 -3 and Supplementary Table 1

Viral replicons

To enable the rapid and biosafe screening of antivirals against corona-and enteroviruses, a non-infectious, but replication-competent SARS-CoV replicon was used 33 along with subgenomic replicons of CVB3 34 and EV-A71 (kind gift of B. Zhang, Wuhan, China). The easily detectable reporter activity (firefly or Renilla luciferase) of these replicons has previously been shown to reflect viral RNA synthesis. [33] [34] [35] In-vitro RNA transcripts of the enteroviral replicons were also used for transfection. For the SARS-CoV replicon containing the CMV promoter, only the plasmid DNA was used for transfection.

Initial inhibitor design steps

Crystal structures of compound 11a in complex with SARS-CoV M pro , HCoV-NL63 M pro , and CVB3 3C pro demonstrated that the aketo-carbon is covalently linked to the active-site Cys (no. 145, 144, and 147, resp.) of the protease (Fig. 2, 3a-c) . The resulting thiohemiketal is in the R configuration in the SARS-CoV and HCoV-NL63 M pro but in the S configuration in the CVB3 3C pro complex. The reason for this difference is that the oxygen atom of the thiohemiketal accepts a hydrogen bond from the catalytic His40 in the CVB3 protease, rather than from the main-chain amides of the oxyanion hole as in the SARS-CoV and HCoV-NL63 enzymes (Fig. 3a ,b,c insets). It is remarkable that we succeeded in obtaining a crystal structure of compound 11a in complex with the HCoV-NL63 M pro , even though it has no inhibitory effect on the activity of the enzyme (IC50 > 50 µM), (Fig. 2c) . Apparently, the compound is able to bind to this M pro in the absence of peptide substrate, but cannot compete with substrate for the binding site due to low affinity. A similar observation has been made in one of our previous studies, where we were able to determine the crystal structure of a complex between the inactive Michaelacceptor compound SG74 and the EV-D68 3C pro (ref. 29 ; PDB entry 3ZV9). Figure 2 : Fit of compound 11a (pink carbon atoms) to the target proteases (wheat surfaces) as revealed by X-ray crystallography of the complexes. A. Fo -Fc difference density (contoured at 3s) for 11a in the substrate-binding site of the SARS-CoV M pro (transparent surface). Selected side-chains of the protease are shown with green carbon atoms. B. Another view of 11a in the substrate binding site of the SARS-CoV M pro . Note the "lid" formed by residue Met49 and its neighbors above the S2 pocket. C. 11a in the substrate-binding site of HCoV-NL63 M pro . Because of the restricted size of the S2 pocket, the P2 benzyl group of the compound cannot enter deeply into this site. Note that the S2 pocket is also covered by a "lid" centred around Thr47. D. 11a in the substrate-binding site of the CVB3 3C pro . The S2 site is large and not covered by a "lid".

Properties of the S2 pockets of the target enzymes

The crystal structures of SARS-CoV M pro , HCoV-NL63 M pro , and CVB3 3C pro in complex with 11a revealed a fundamental difference between the S2 pockets of the coronavirus proteases and the enterovirus proteases: The cavities are covered by a "lid" in the former but are open to one side in the latter (Fig. 2,b-d) . In SARS-CoV M pro , the lid is formed by the 310 helix 46 -51 and in HCoV-NL63 M pro by the loop 43-48. Residues from the lid, in particular Met49 in the case of SARS-CoV M pro , can thus make hydrophobic interactions with the P2 substitutent of the inhibitor, whereas such interaction is missing in the enterovirus 3C pro s. In addition to the lid, the S2 pocket is lined by the "back-wall" (main-chain atoms of residues 186 and 188 and Cb atom of Asp187), the side-walls (Gln189, His41), as well as the "floor" (Met165) in SARS-CoV M pro . In HCoV-NL63 M pro , the corresponding structural elements are main-chain atoms of residues 187 and 188 as well as the Cb atom of Asp187 (back-wall), Pro189 and His41 (side-walls), and Ile165 (floor). Finally, in CVB3 3C pro , Arg39, Asn69, and Glu71 form the back-wall, residues 127-132 and His40 form the side-walls, and Val162 constitutes the floor.
In agreement with these observations, a good fit is observed between the P2 benzyl group of 11a and the S2 subsite of the SARS-CoV M pro as well as that of the CVB3 3C pro (Fig. 3a,c) . In contrast, the crystal structure of the complex between 11a and HCoV-NL63 M pro , against which the compound is inactive, demonstrates that the P2-benzyl group cannot fully enter the S2 pocket of the enzyme because of the restricted size of this site (Fig. 3b) .
Thus, the properties of our target proteases with respect to the S2 pocket were defined at this point as "small" and "covered by a lid" for HCoV-NL63 M pro , "large" and "covered" for SARS-CoV M pro , and "large" and "open" for CVB3 3C pro . Through comparison with crystal structures of other proteases of the same virus genus (HCoV-229E M pro for alphacoronaviruses 28 (PDB entry 1P9S); HKU4 M pro for betacoronaviruses (Ma, Xiao et al., unpublished; PDB entry 2YNA); and EV-A71 3C pro for enteroviruses 30 (PDB entry 3SGK), we ensured that our conclusions drawn from the template structures were valid for other family members as well.
To explore the sensitivity of the S2 pocket towards a polar substituent in the para position of the benzyl group, we synthesized compound 11m carrying a 4-fluorobenzyl group in P2. This substitution abolished almost all activity against the SARS-CoV M pro (IC50 > 50 µM), and the compound proved inactive against HCoV-NL63 M pro as well, whereas IC50 values were 2.3 µM against the EV-A71 3C pro and 8.7 µM against CVB3 3C pro . From this, we concluded that the introduction of the polar fluorine atom is not compatible with the geometry of the S2 pocket of SARS-CoV M pro , whereas the fluorine can accept a hydrogen bond from Arg39 in EV-A71 3C pro (ref. 30 ) and probably also CVB3 3C pro . In SARS-CoV M pro , however, the carbonyl groups of residues 186 and 188 might lead to a repulsion of the fluorinated benzyl group. author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.10.936898 doi: bioRxiv preprint Table 2 : a-ketoamide-induced inhibition of subgenomic RNA synthesis using replicons in a cell-based assay (EC50, μM) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.10.936898 doi: bioRxiv preprint Table 3 . The thiohemiketal is in the R configuration, with its oxygen accepting two hydrogen bonds from the oxyanion-hole amides of Gly143 and Cys145. The amide oxygen accepts an H-bond from His41. The side-chains of Ser144 and Arg188 have been omitted for clarity. B. The P2-benzyl substituent of 11a cannot fully enter the S2 pocket of the HCoV-NL63 M pro , which is much smaller and has less plasticity than the corresponding pocket of SARS-CoV M pro (cf. A). The benzyl therefore binds above the pocket in the view shown here; this is probably the reason for the total inactivity (IC50 > 50 µM) of compound 11a against HCoV-NL63 M pro . The small size of the pocket is due to the replacement of the flexible Gln189 of the SARS-CoV M pro by the more rigid Pro189 in this enzyme. The stereochemistry of the thiohemiketal is R. The sidechains of Ala143 and Gln188 have been omitted for clarity. C. Binding of 11a to the CVB3 3C pro . The stereochemistry of the thiohemiketal is S, as the group accepts a hydrogen bond from His41, whereas the amide keto group accepts three H-bonds from the oxyanion hole (residues 145 -147). The side-chain of Gln146 has been omitted for clarity. D. The crystal structure of 11f in complex with HCoV-NL63 M pro shows that this short (inactive) compound lacking a P3 residue has its P2-Boc group inserted into the S2 pocket of the protease. The stereochemistry of the thiohemiketal is S. The sidechains of Ala143 and Gln188 have been omitted for clarity. E. In contrast to P2 = benzyl in 11a, the isobutyl group of 11n is small and flexible enough to enter into the narrow S2 pocket of the HCoV-NL63 M pro . The thiohemiketal is in the R configuration. The side-chains of Ala143 and Gln188 have been omitted for clarity. F. In spite of its small size, the cyclopropylmethyl side-chain in the P2 position of 11s can tightly bind to the S2 subsite of the SARS-CoV M pro , as this pocket exhibits pronounced plasticity due to the conformational flexibility of Gln189 (see also Fig. 4 ). The stereochemistry of the thiohemiketal is S. The side-chains of Ser144 and Arg188 have been omitted for clarity.

P2-alkyl substituents of varying sizes

As the P2-benzyl group of 11a was apparently too large to fit into the S2 pocket of the HCoV-NL63 M pro , we replaced it by isobutyl in 11n. This resulted in improved activities against SARS-CoV M pro (IC50 = 0.33 µM) and in a very good activity against HCoV-NL63 M pro (IC50 = 1.08 µM, compare with the inactive 11a). For EV-A71 3C pro , however, the activity decreased to IC50 = 13.8 µM, different from CVB3 3C pro , where IC50 was 3.8 µM. Our interpretation of this result is that the smaller P2-isobutyl substituent of 11n can still interact with the "lid" (in particular, Met49) of the SARS-CoV M pro S2 site, but is unable to reach the "back-wall" of the EV-A71 3C pro pocket and thus, in the absence of a "lid", cannot generate sufficient enthalpy of binding. We will see from examples to follow that this trend persists among all inhibitors with a smaller P2 substituent: Even though the SARS-CoV M pro S2 pocket has a larger volume than that of the enterovirus 3C pro , the enzyme can be efficiently inhibited by compounds carrying a small P2 residue that makes hydrophobic interactions with the lid (Met49) and floor (Met165) residues.
The EC50 of 11n was >10 µM against the EV-A71 and CVB3 replicons, and even in the SARS-CoV replicon, the activity of 11n was relatively weak (EC50 = 7.0 µM; Table 2 ). In agreement with the replicon data, 11n proved inactive against EV-A71 in RD cells and showed limited activity against HRV2 or HRV14 in HeLa Rh cells (Table 3) . Only the comparatively good activity (EC50 = 4.4 µM) against EV-D68 in HeLa Rh cells was unexpected. The activity of 11n against HCoV 229E in Huh7 cells was good (EC50 = 0.6 µM), and against MERS-CoV in Huh7 cells, it was excellent, with EC50 = 0.0048 µM, while in Vero cells, the EC50 against MERS-CoV was as high as 9.2 µM. Similarly, the EC50 against SARS-CoV in Vero cells was 14.2 µM (Table 3) .
We managed to obtain crystals of 11n in complex with the M pro of HCoV NL63 and found the P2 isobutyl group to be well embedded in the S2 pocket (Fig. 3e) . This is not only a consequence of the smaller size of the isobutyl group compared to the benzyl group, but also of its larger conformational flexibility, which allows a better fit to the binding site.
When we replaced the P2-isobutyl residue of 11n by n-butyl in 11o, the activities were as follows: IC50 = 8.5 µM for SARS-CoV M pro , totally inactive (IC50 > 50 µM) against HCoV-NL63 M pro , IC50 = 3.2 µM for EV-A71 3C pro , and 5.2 µM for CVB3 3C pro . The decreased activity in case of SARS-CoV M pro and the total inactivity against HCoV-NL63 M pro indicate that the n-butyl chain is too long for the S2 pocket of these proteases, whereas the slight improvement against EV-A71 3C pro and CVB3 3C pro is probably a consequence of the extra space that is available to long and flexible substituents because of the lack of a lid covering the enterovirus 3C pro pocket.

Modifying ring size and flexibility of P2-cycloalkylmethyl substituents

The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.10.936898 doi: bioRxiv preprint ment of the phenyl group by the cyclohexyl group led to a significant improvement of the inhibitory activity against the recombinant SARS-CoV M pro and to a dramatic improvement in case of CVB3 3C pro . Even for the HCoV-NL63 M pro , against which 11a was completely inactive, greatly improved albeit still weak activity was observed (Table 1 ). In the viral replicons, 11r performed very well, with EC50 = 0.8 -0.9 µM for the EV-A71 replicon, 0.45 µM for CVB3, and 1.4 µM for SARS-CoV (Table 2 ). In the virus-infected cell culture assays (Table 3) , 11r exhibited EC50 = 3.7 µM against EV-A71 in RD cells and EC50 = 0.48 -0.7 µM against EV-D68, HRV2, and HRV14 in HeLa cells. Against HCoV 229E in Huh7 cells, the EC50 was surprisingly low (1.8 µM). Interestingly, the compound proved extremely potent against MERS-CoV in Huh7 cells, with EC50 = 0.0004 µM (400 picomolar!). Even in Vero cells, EC50 against MERS-CoV was 5 µM, and the EC50 against SARS-CoV in Vero E6 cells was 1.8 -2.1 µM, i.e. the best activity we have seen for an M pro inhibitor against SARS-CoV in this type of cells. The therapeutic index (CC50/EC50) of 11r against EV-D68, HRV2, and HRV14 was >15 in HeLa Rh cells as well as against CVB3 in Huh-T7 cells, but only ~5 for EV-A71 in RD cells.
We next analyzed the crystal structure of the complex between SARS-CoV M pro and compound 11s (Fig. 3f) . The cyclopropylmethyl substituent was found to be incorporated deeply into the S2 pocket, making hydrophobic interactions with Met49 (the lid), Met165 (the floor), and the Cb of Asp187 (the back-wall). In spite of the small size of the P2 substituent, this is possible because the S2 pocket of SARS-CoV M pro is flexible enough to contract and enclose the P2 moiety tightly. This plasticity is expressed in a conformational change of residue Gln189, both in the main chain and in the side-chain. The main-chain conformational change is connected with a flip of the peptide between Gln189 and Thr190. The c1 torsion angle of the Gln189 side-chain changes from roughly antiperiplanar (ap) to (-)synclinal (-sc) (Fig. 4) . The conformational variability of Gln189 has been noted before, both in Molecular Dynamics simulations 26 and in other crystal structures. 37 As a consequence of these changes, the sidechain oxygen of Gln189 can accept a 2.54-Å hydrogen bond from the main-chain NH of the P2 residue in the 11s complex (see Fig. 4 ). The affinity of 11s for the S2 pocket of HCoV-NL63 M pro is good because of an almost ideal match of size and not requiring conformational changes, which this enzyme would not be able to undergo because of the replacement of the flexible Gln189 by the more rigid Pro. On the other hand, docking of the same compound into the crystal structure of the CVB3 3C pro revealed that the cyclopropylmethyl moiety was probably unable to generate sufficient Free Energy of binding because of the missing lid and the large size of the S2 pocket in the enterovirus 3C pro , thereby explaining the poor inhibitory activity of 11s against these targets. (Table 1) . Experiments with the viral replicons confirmed this trend, although the EC50 value for SARS-CoV (6.8 µM) was surprisingly high (Table 2 ). In Huh7 cells infected with MERS-CoV, this compound exhibited EC50 = 0.1 µM (but 9.8 µM in Vero cells), whereas EC50 was 7.0 µM against SARS-CoV in Vero E6 cells. The compound was largely inactive against EV-A71 in RD cells and inhibited the replication of the two HRV subtypes tested (in HeLa Rh cells) with EC50 values of ~4 µM. The CC50 of 11t in HeLa cells was 65 µM, i.e. the therapeutic index was well above 15 (Table 3) .
Obviously, this substituent was still a bit too small for the enterovirus proteases, so as the next step, we tested P2 = cyclopentylmethyl (compound 11u). This turned out to be the one compound with acceptable IC50 values against all tested enzymes: 1.3 µM against SARS-CoV M pro , 5.4 µM against HCoV-NL63 M pro , 4.7 µM against EV-A71 3C pro , and 1.9 µM against CVB3 3C pro ( Table 1 ). The activity against the replicons was between 3.6 and 4.9 µM ( Table 2 ). In Huh7 cells infected with HCoV 229E or MERS-CoV, 11u showed EC50 = 2.5 or 0.03 µM (11.1 µM for MERS-CoV in Vero cells), while EC50 was 4.9 µM against SARS-CoV in Vero E6 cells (Table 3) .

DISCUSSION

The most prominent a-ketoamide drugs are probably telaprivir and boceprivir, peptidomimetic inhibitors of the HCV NS3/4A protease, 39, 40 which have helped revolutionize the treatment of chronic HCV infections. For viral cysteine proteases, a-ketoamides have only occasionally been described as inhibitors and few systematic studies have been carried out.
A number of capped dipeptidyl a-ketoamides have been described as inhibitors of the norovirus 3C-like protease. 41 These were optimized with respect to their P1' substituent, whereas P2 was isobutyl in most cases and occasionally benzyl. The former displayed IC50 values one order of magnitude lower than the latter, indicating that the S2 pocket of the norovirus 3CL protease is fairly small. Although we did not include the norovirus 3CL pro in our study, expanding the target range of our inhibitors to norovirus is probably a realistic undertaking. While our study was underway, Zeng et al. 42 published a series of a-ketoamides as inhibitors of the EV-A71 3C pro . These authors mainly studied the structure-activity relationships of the P1' residue and found small alkyl substituents to be superior to larger ones. Interestingly, they also reported that a six-membered d-lactam in the P1 position led to 2 -3 times higher activities, compared to the fivemembered g-lactam. At the same time, Kim et al. 43 described a series of five a-ketoamides with P1' = cyclopropyl that showed submicromolar activity against EV-D68 and two HRV strains.
Among a series of aldehydes, Prior et al. 45 described the capped tripeptidyl a-ketoamide Cbz-1-naphthylalanine-Leu-GlnLactam-CO-CO-NH-iPr, which showed IC50 values in the 3-digit nanomolar range against HRV 3C pro and SARS-CoV M pro , as well as EC50 values of 0.03 µM against HRV18 and 0.5 µM against HCoV 229E in cell culture. No optimization of this compound was performed and no toxicity data have been reported.
In our series of compounds, we used P1 = GlnLactam (g-lactam) throughout, because this substituent has proven to be an excellent surrogate for glutamine. 29, 32 While we made some efforts to optimize the P1' residue of the compounds as well as the N-cap (P3), we mainly focussed on optimization of the P2 substituent. In nearly all studies aiming at discovering peptidomimetic inhibitors of coronavirus M pro s, P2 is invariably isobutyl (leucine), and this residue has also been used in the efforts to design compounds that would inhibit enterovirus 3C pro s as well (see above). From crystal structures of our early lead compound, 11a (cinnamoyl-Phe-GlnLactam-CO-CO-NH-Bz), in complex with the M pro s of HCoV NL63 (as representative of the alphacoronavirus proteases) and SARS-CoV (beta-CoV) as well as the 3C pro of Coxsackievirus B3 (enterovirus proteases), we found that the S2 pocket has fundamentally different shapes in these enzymes. In the SARS-CoV M pro , the S2 subsite is a deep hydrophobic pocket that is truly three-dimensional in shape: the "walls" of the groove are formed by the polypeptide main chain around residues 186 -188 as well as by the side-chains of His41 (of the catalytic dyad) and Gln189, whereas the "floor" is formed by Met165 and the "lid" by residues 45 -51, in particular Met49. The two methionines provide important interaction points for the P2 substituents of inhibitors; while these interactions are mostly hydrophobic in character, we have previously described the surprising observation of the carboxylate of an aspartic residue in P2 that made polar interactions with the sulfur atoms of these methionines. 37 Because the pocket offers so many opportunities for interaction and features a pronounced plasticity, P2 substituents such as isobutyl (from Leu), which are too small to fill the pocket entirely, can still generate sufficient binding enthalpy. Accordingly, the S2 pocket of SARS-CoV M pro is the most tolerant among the three enzymes investigated here, in terms of versatility of the P2-substituents accepted.
In the S2 pocket of the HCoV-NL63 M pro , Gln189 is replaced by proline and this change is accompanied by a significant loss of flexibility; whereas the side-chain of Gln189 of SARS-CoV M pro is found to accommodate its conformation according to the steric requirements of the P2 substituent, the proline is less flexible, leading to a much smaller space at the entrance to the pocket. As a consequence, a P2benzyl substituent is hindered from penetrating deeply into the pocket, whereas the smaller and more flexible isobutyl group of P2-Leu is not.
Finally, in the 3C pro s of EV-A71 and CVB3, the S2 pocket lacks a lid, i.e. it is open to one side. As a consequence, it offers less interaction points for P2 substituents of inhibitors, so that such substituents must reach the "back-wall" of the pocket (formed by Arg39, Asn69, Glu71) in order to create sufficient binding energy. Hence, large aromatic substituents such as benzyl are favored by the enterovirus 3C pro s.

EXPERIMENTAL SECTION Crystallization and X-ray structure determination of complexes between viral proteases and a-ketoamides

Crystals of HCoV-NL63 M pro with 11a were obtained using cocrystallization. The concentrated HCoV-NL63 M pro (45 mg·mL -1 ) was incubated with 5 mM 11a for 4 h at 20℃, followed by setting up crystallization using the vapor diffusion sitting-drop method at 20℃ with equilibration of 1 µL protein (mixed with 1 µL mother liquor) against 500 µL reservoir composed of 0.1 M lithium sulfate monohydrate, 0.1 M sodium citrate tribasic dihydrate, 25% PEG 1,000, pH 6.0. The crystals were protected by a cryo-buffer containing 0.1 M lithium sulfate monohydrate, 0.1 M sodium citrate tribasic dihydrate, 25% PEG 1,000, 15% glycerol, 2 mM 11a, pH 6.0 and flash-cooled in liquid nitrogen.
Crystals of HCoV-NL63 M pro with 11n or 11f were generated by using the soaking method. Several free-enzyme crystals were soaked in cryo-protectant buffer containing 0.1 M lithium sulfate monohydrate, 0.1 M sodium citrate tribasic dihydrate, 25% PEG 1,000, 15% glycerol, 5 mM 11n (or 11f), pH 6.0. Subsequently, the soaked crystals were flash-cooled in liquid nitrogen.
Freshly prepared CVB3 3C pro at a concentration of 21.8 mg·mL -1 was incubated with 5 mM 11a pre-dissolved in 100% DMSO at room tempature for 1 h. Some white precipitate appeared in the mixture. Afterwards, the sample was centrifuged at 13,000 x g for 20 min at 4 ℃. The supernatant was subjected to crystallization trials using the following, commercially available kits: Sigma TM (Sigma-Aldrich), Index TM , and PEG Rx TM (Hampton Research). Single rod-like crystals were detected both from the Index TM screen, under the condition of 0.1 M MgCl2 hexahydrate, 0.1 M Bis-Tris, 25% PEG 3,350, pH 5.5, and from the Sigma TM screen at 0.2 M Li2SO4, 0.1 M Tris-HCl, 30% PEG 4,000, pH 8.5. Crystal optimization was performed by using the vapor-diffusion sitting-drop method, with 1 μL CVB3 3C pro -inhibitor complex mixed with 1 μL precipitant solution, and equilibration against 500 μL reservoir containing 0.1 M Tris-HCl, 0.2 M MgCl2, pH 8.5, and PEG 3,350 varied from 22% to 27%. Another optimization screen was also performed against a different reservoir, 0.1 M Tris-HCl, 0.2 M MgCl2, pH range from 7.5 to 8.5, and PEG 4,000 varied from 24% to 34%. Crystals were fished from different drops and protected by cryo-protectant solution consisting of the mother liquor and 10% glycerol. Subsequently, the crystals were flash-cooled with liquid nitrogen.

Inhibitory activity assay of alpha-ketoamides.

A buffer containing 20 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, 1 mM DTT, pH 7.3, was used for all the enzymatic assays. Two substrates with the cleavage sites of M pro and 3C pro , respectively (indicated by the arrow, ↓), Dabcyl-KTSAVLQ¯SGFRKM-E(Edans)-NH2 and Dabcyl-KEALFQ¯GPPQF-E(Edans)-NH2 (95% purity; Biosyntan), were employed in the fluorescence resonance energy transfer (FRET)based cleavage assay, using a 96-well microtiter plate. The dequenching of the Edans fluorescence due to the cleavage of the substrate as catalyzed by the proteases was monitored at 460 nm with excitation at 360 nm, using a Flx800 fluorescence spectrophotometer (BioTek). Curves of relative fluorescence units (RFU) against substrate concentration were linear for all substrates up to beyond 50 µM, indicating a minimal influence of the inner-filter effect. Stock solutions of the compounds were prepared by dissolving them in 100% DMSO. The UV absorption of 11a was found to be negligible at l = 360 nm, so that no interference with the FRET signal through the inner-filter effect was to be expected. For the determination of the IC50, different proteases at a specified final concentration (0.5 µM SARS-CoV or HCoV-NL63 M pro , 2 µM CVB3 3C pro , 3 µM EV-A71 3C pro ) were separately incubated with the inhibitor at various concentrations (0 to 100 μM) in reaction buffer at 37℃ for 10 min. Afterwards, the reaction was initiated by adding FRET peptide substrate at 20 μM final concentration (final volume: 50 μl). The IC50 value was determined by using the GraphPad Prism 6.0 software (GraphPad). Measurements of enzymatic activity were performed in triplicate and are presented as the mean ± standard deviations (SD).
For enterovirus (except CVB3) infection experiments, human rhabdomyosarcoma cells (RD; for EV-A71; BRCR strain) and HeLa Rh cells (for EV-D68 and human rhinoviruses) were grown in MEM Rega 3 medium supplemented with 1% sodium bicarbonate, 1% Lglutamine, and fetal calf serum (10% in growth medium and 2% in maintenance medium). For HCoV-229E (a kind gift from Volker Thiel (Bern, Switzerland)), culture and infection experiments were carried out as described. 60 For MERS-CoV or SARS-CoV infection experiments, Vero, Vero E6, and Huh7 cells were cultured as described previously. 61, 62 Infection of Vero and Huh7 cells with MERS-CoV (strain EMC/2012) and SARS-CoV infection of Vero E6 cells (strain Frankfurt-1) at low multiplicity of infection (MOI) were done as described before. 61, 63 All work with live MERS-CoV and SARS-CoV was performed inside biosafety cabinets in biosafety level-3 facilities at Leiden University Medical Center, The Netherlands. Viral replicons. The DNA-launched SARS-CoV replicon harbouring Renilla luciferase as reporter directly downstream of the SARS-CoV replicase polyprotein-coding sequence (pp1a, pp1ab, Urbani strain, acc. AY278741), in the context of a bacterial artificial chromosome (BAC) under the control of the CMV promoter, has been described previously (pBAC-REP-RLuc). 33 Apart from the replicase polyprotein, the replicon encodes the following features: the 5'-and 3'-non-translated regions (NTR), a ribozyme (Rz), the bovine growth hormone sequence, and structural protein N.
Subgenomic replicons of CVB3 (pT7-CVB3-FLuc 34 ) and EV-A71 (pT7-EV71-RLuc) harbouring T7-controlled complete viral genomes, in which the P1 capsid-coding sequence was replaced by the Firefly (Photinus pyralis) or Renilla (Renilla renifor) luciferase gene, were generously provided by F. van Kuppeveld and B. Zhang, respectively. To prepare CVB3 and EV-A71 replicon RNA transcripts, plasmid DNAs were linearized by digestion with SalI or HindIII (New England Biolabs), respectively. Copy RNA transcripts were synthesized in vitro using linearized DNA templates, T7 RNA polymerase, and the T7 RiboMax™ Large-Scale RNA Production System (Promega) according to the manufacturer's recommendations.

Antiviral assay with infectious enteroviruses.

The antiviral activity of the compounds was evaluated in a cytopathic effect (CPE) readout assay using MTS [3-(4,5-dimethylthiazol-2-yl)-5-(3carboxymethoxyphenol)-2-(4-sulfophenyl)-2H-tetrazolium, innersalt]-based assay. Briefly, 24 h prior to infection, cells were seeded in 96-well plates at a density of 2.5 x 10 4 (RD cells) or of 1.7 x 10 4 (HeLa Rh) per well in medium supplemented with 2% FCS. For HRV2 and HRV14 infection, the medium contained 30 mM MgCl2. The next day, serial dilutions of the compounds and virus inoculum were added. The read-out was performed 3 days post infection as follows: The medium was removed and 100 μl of 5% MTS in Phenol Red-free MEM was added to each well. Plates were incubated for 1 h at 37°C, then the optical density at 498 nm (OD498) of each well was measured by microtiter plate reader (Saffire 2 , Tecan). The OD values were converted to percentage of controls and the EC50 was calculated by logarithmic interpolation as the concentration of compound that results in a 50% protective effect against virus-induced CPE. For each condition, cell morphology was also evaluated microscopically.

Antiviral assays with SARS and MERS coronaviruses.

After stirring for 30 min, the cooling bath was removed. The reaction mixture was allowed to warm up to room temperature and then poured into brine (40 mL). The organic layer was concentrated and purified by flash column chromatography (petroleum ether/ethyl acetate = 4/1) to give product 1 (4.92 g, 72%) as colorless oil. 1 H NMR (CDCl3, 400 MHz) δ 5.23 (1H, d, J = 9.0 Hz), 4.43-4.36 (1H, m), 3.77(1H, s), 3.76 (1H, s), 2.89-2.69 (3H, m), 2.20-2.14 (2H, m), 1.45 (9H, s). ESI-MS (m/z): 315 (M + H) + .
Synthesis of (S)-methyl 2-(tert-butoxycarbonylamino)-3-((S)-2oxopyrrolidin-3-yl)propanoate (2) .
Compound 2 (1.0 g, 3.5 mmol) was dissolved in 10 mL dichloromethane (DCM), then 10 mL trifluoroacetic acid (TFA) was added. The reaction mixture was stirred at 20 o C for 0.5 h, and concentrated in vacuo to get a colorless oil, which could be used for the following step without purification.
(S)-methyl 2-cinnamamido-3-phenylpropanoate (4a). The methyl L-phenylalaninate hydrochloride (1.30 g, 6.0 mmol) was dissolved in 20 mL CH2Cl2, and then cinnamoyl chloride (1.00 g, 6.0 mmol), triethylamine (1.69 mL, 12.0 mmol) were added, before the reaction was stirred for 2 h at room temperature. The reaction mixture was diluted with 20 mL CH2Cl2, washed with 50 mL of saturated brine (2×25 mL), and dried over Na2SO4. The solvent was evaporated and the product 4a was obtained as white solid (1.75 g, 95%), which could be used for the next step without further purification.
Synthesis of N-substituted amino acids 5 (general procedure). 1 M NaOH (5 mL) was added to a solution of compound 4 (3.0 mmol) in methanol (5 mL). The reaction was stirred for 20 min at 20 o C. Then 1 M HCl was added to the reaction solution until pH = 1. Then the reaction mixture was extracted with 100 mL of CH2Cl2 (2 × 50 mL) and the organic layer was washed with 50 mL of brine and dried over Na2SO4. The solvent was evaporated and the crude material purified on silica, eluted with mixtures of CH2Cl2/MeOH (20/1) to afford the product 5 (90-96% yield) as a white solid.
Synthesis of compounds 6 (General procedure). Compound 5 (2.7 mmol) was dissolved in 10 mL of dry CH2Cl2. To this solution, 1.5 equiv (1.54 g) of 1-[bis(dimethylamino)methylene]-1H-1,2,3triazolo[4,5-b]pyridinium 3-oxide hexafluorophosphate (HATU) was added, and the reaction was stirred for 0.5 h at 20 o C. Then compound 3 (500 mg, 2.7 mmol) and TEA (0.70 mL, 5.42 mmol) was added to the reaction. The reaction was stirred for another 6 h. The reaction mixture was poured into 10 mL water. The aqueous solution was extracted with 50 mL of CH2Cl2 (2 × 25 mL) and washed with 50 mL of saturated brine (2 × 25 mL) and dried over Na2SO4. The solvent was evaporated and the crude material purified on silica, eluted with a mixture of CH2Cl2/MeOH (40/1) to give the product 6 (62-84% yield).
Synthesis of alcohols 7 (general procedure). Compound 6 (1.1 mmol) was dissolved in methanol (40 mL), then NaBH4 (0.34 g, 8.8 mmol) was added under ambient conditions. The reaction mixture was stirred at 20 o C for 2 h. Then the reaction was quenched with water (30 mL). The suspension was extracted with ethyl acetate. The organic layers were combined, dried, and filtered. The filtrate was evaporated to dryness and could be used for the next step without further purification (46-85% yield).
Synthesis of aldehydes 8 (general procedure). Compound 7 (0.75 mmol) was dissolved in CH2Cl2, then Dess-Martin periodinane (337 mg, 0.79 mmol) and NaHCO3 (66 mg, 0.79 mmol) were added. The resulting mixture was stirred at 20 o C for 1 h. The mixture was concentrated and purified by column chromatography on silica gel (CH2Cl2/MeOH = 20/1) to give the product 8 as white solid (88-95% yield).
Synthesis of compounds 9 (general procedure). Compound 8 (0.40 mmol) was dissolved in CH2Cl2, and then acetic acid (0.028 g, 0.47 mmol) and isocyanide (0.43 mmol) were added successively to the solution. The reaction was stirred at 20 o C for 24 h. Then the solvent was evaporated and the crude material purified on silica, eluted with a mixture of CH2Cl2/MeOH (20/1) to give the product 9 (46-84%).
Synthesis of α-ketoamides 11 (general procedure). Compound 10 was dissolved in CH2Cl2, then Dess-Martin periodinane (74 mg, 0.176 mmol) and NaHCO3 (30 mg, 0.176 mmol) were added. The resulting mixture was stirred at 20 o C for 1 h. The mixture was concentrated and purified by column chromatography on silica gel (CH2Cl2/MeOH = 20/1) to give the a-ketoamides 11 as light yellow solid (52-79% in two steps).

Accession Codes

Authors will release the atomic coordinates and experimental data upon article publication. Atomic coordinates include SARS-CoV M pro in complex with compounds 11a (5N19), 11s (5N5O), HCoV-NL63 M pro in complex with 11a (6FV2), 11n (6FV1), 11f (5NH0), and CVB3 3C pro in complex with (5NFS). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 author/funder. All rights reserved. No reuse allowed without permission.

AUTHOR INFORMATION

The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.10.936898 doi: bioRxiv preprint 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/2020.02.10.936898 doi: bioRxiv preprint

INTRODUCTION

In spite of the considerable threat posed by SARS-CoV and related viruses, as well as by MERS-CoV, it is obvious that the number of cases so far does not warrant the commercial development of an antiviral drug targeting MERS-and SARS-CoV even if a projected steady growth of the number of MERS cases is taken into account. A possible solution to the problem could be the development of broadspectrum antiviral drugs that are directed against the major viral protease, a target that is shared by all coronavirus genera as well as, in a related form, by members of the large genus Enterovirus in the picornavirus family. Among the members of the genus Alphacoronavirus are the human coronaviruses (HCoV) NL63 (ref. 9) and 229E 10 that usually cause only mild respiratory symptoms in otherwise healthy individuals, but are much more widespread than SARS-CoV or MERS-CoV. Therapeutic intervention against alphacoronaviruses is indicated in cases of accompanying disease such as cystic fibrosis 11 or leukemia, 12 or certain other underlying medical conditions. 13 The enteroviruses include pathogens such as EV-D68, the causative agent of the 2014 outbreak of the "summer flu" in the US, 14 EV-A71 and Coxsackievirus A16 (CVA16), the etiological agents of Hand, Foot, and Mouth Disease (HFMD), 15 Coxsackievirus B3 (CVB3), which can cause myocardic inflammation, 16 and human rhinoviruses (HRV), notoriously known to lead to the common cold but also capable of causing exacerbations of asthma and COPD. 17 Infection with some of these viruses can lead to serious outcome; thus, EV-D68 can cause polio-like disease, 18 and EV-A71 infection can proceed to aseptic meningitis, encephalitis, pulmonary edema, viral myocarditis, and acute flaccid paralysis. 15, [19] [20] Enteroviruses cause clinical disease much more frequently than coronaviruses, so that an antiviral drug targeting both virus families should be commercially viable.

Synthesis of a-ketoamides

Synthesis (Scheme 1) started with the dianionic alkylation of N-Boc glutamic acid dimethyl ester with bromoacetonitrile. As expected, this alkylation occurred in a highly stereoselective manner, giving 1 as the exclusive product. In the following step, the cyano group of 1 was subjected to hydrogenation. The in-situ cyclization of the resulting intermediate afforded the lactam 2. The lactam derivative 3 was generated by removal of the protecting group of 2. On the other hand, the amidation of acyl chloride and a-amino acid methyl ester afforded the intermediates 4, which gave rise to the acids 5 via alkaline hydrolysis. The key intermediates 6 were obtained via the condensation of the lactam derivative 3 and the N-capped amino acids 5. The ester group of compounds 6 was then reduced to the corresponding alcohol. Oxidation of the alcohol products 7 by Dess-Martin periodinane generated the aldehydes 8, which followed by nucleophilic addition with isocyanides gave rise to compounds 9 under acidic conditions. Then, the a-hydroxyamides 10 were prepared by removing the acetyl group of compounds 9. In the final step, the oxidation of the exposed alcohol group in compounds 10 generated our target a-ketoamides 11.

Initial inhibitor design steps

The initial compound to be designed and synthesized was 11a, which carries a cinnamoyl N-cap in the P3 position, a benzyl group in P2, the glutamine lactam (GlnLactam) in P1, and benzyl in P1' (Table 1) . This compound showed good to mediocre activities against recombinant SARS-CoV M pro (IC50 = 1.95 µM; for all compounds, see Tables 1 -3 for standard deviations), CVB3 3C pro (IC50 = 6.6 µM), and EV-A71 3C pro (IC50 = 1.2 µM), but was surprisingly completely inactive (IC50 > 50 µM) against HCoV-NL63 M pro . These values were mirrored in the SARS-CoV and in the enterovirus replicons (Table 2) . In virus-infected cell cultures, the results obtained were also good to mediocre (Table 3) : SARS-CoV (EC50 = 5.8 µM in Vero E6 cells), MERS-CoV (EC50 = 0.0047 µM in Huh7 cells), HCoV 229E (EC50 = 11.8 µM in Huh7 cells), or a host of enteroviruses (EC50 = 9.8 µM against EV-A71 in RD cells; EC50 = 0.48 µM against EV-D68 in HeLa Rh cells; EC50 = 5.6 µM against HRV2 in HeLa Rh cells). In all cell types tested, the compound generally proved to be non-toxic, with selectivity indices (CC50/EC50) usually >10 (Table 3) .

P1' and P3 substituents

The crystal structures indicated that the fits of the P1' benzyl group of 11a in the S1' pocket and of the P3 cinnamoyl cap in the S3 subsite might be improved (see Fig. 3a -c). Compounds 11b -11e and 11g -11l were synthesized in an attempt to do so; however, none of them showed better inhibitory activity against the majority of the recombinant proteases, compared to the parent compound, 11a (see Supplementary Results). To investigate whether the P3 residue of the inhibitor is dispensible, we synthesized compound 11f, which only comprises P2 = Boc, P1 = GlnLactam, and P1' = benzyl. 11f was inactive against all purified proteases and in all replicons tested, but showed some activity against HRV2 in HeLa Rh cells (EC50 = 9.0 µM). A crystal structure of 11f bound to HCoV-NL63 M pro demonstrated that the P2-Boc group entered the S2 pocket (Fig. 3d ). In conclusion, although there is probably room for further improvement, we decided to maintain the original design with P1' = benzyl and P3 = cinnamoyl, and focussed on improving the P2 substituen.

Properties of the S2 pockets of the target enzymes

In addition, the S2 pocket is of different size in the various proteases. The SARS-CoV enzyme features the largest S2 pocket, with a volume of 252 Å 3 embraced by the residues (Gln189, His41) defining the side-walls of the pocket in the ligand-free enzyme, as calculated by using Chimera, 36 followed by the CVB3 3C pro S2 pocket with about 180 Å 3 (space between Thr130 and His40). The HCoV-NL63 M pro has by far the smallest S2 pocket of the three enzymes, with a free space of only 45 Å 3 between Pro189 and His41, according to Chimera.

Modifying ring size and flexibility of P2-cycloalkylmethyl substituents

Having realized that in addition to size, flexibility of the P2 substituent may be an important factor influencing inhibitory activity, we introduced flexibility into the phenyl ring of 11a by reducing it. The cyclohexylmethyl derivative 11r exhibited IC50 = 0.7 µM against SARS-CoV M pro , 12.3 µM against HCoV-NL63 M pro , 1.7 µM against EV-A71 3C pro , and 0.9 µM against CVB3 3C pro . Thus, the replace- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 author/funder. All rights reserved. No reuse allowed without permission.
At this point, we decided to systematically vary the size of the ring system in P2. The next substituent to be tried was cyclopropylmethyl (compound 11s, which showed good activities against SARS-CoV M pro (IC50 = 0.24 µM) and HCoV-NL63 M pro (1.4 µM), but poor values against EV-A71 3C pro (IC50 = 18.5 µM) and CVB3 3C pro (IC50 = 4.3 µM) (Table 1) . 11s was shown to inhibit the SARS-CoV replicon with an EC50 of about 2 µM, whereas activity against the EV-A71 and CVB3 replicons was poor (EC50 values > 20 µM) ( Table 2 ). The replicon results were mirrored by the antiviral activity of 11s in enterovirus-infected cells (Table 3) , which was weak or very weak. By contrast, the compound inhibited HCoV 229E and MERS-CoV in Huh7 cells with EC50 of 1.3 and 0.08 µM, respectively. The activity against the latter virus in Vero cells was poor (EC50 ~11 µM), and so was the anti-SARS-CoV activity in Vero E6 cells (Table 3) .
11u appeared so far the best compromise compound, yet for each of the individual viral enzymes, the following compounds proved superior: P2 = cyclopropylmethyl (compound 11s) for SARS-CoV M pro , P2 = isobutyl (compound 11n) and P2 = cyclopropylmethyl (11s) for HCoV-NL63 M pro , P2 = benzyl (11a) or cyclohexylmethyl (11r) for EV-A71 3C pro , and 11r for CVB3 3C pro . In other words, the nearly equipotent 11u is indeed a compromise. Therefore, in view of the surprisingly good antiviral activity of 11r against HCoV 229E in Huh7 cells, we relaxed the condition that the universal inhibitor should show good activity against the recombinant HCoV-NL63 M pro , and selected 11r (P2 = cyclohexylmethyl) as the lead compound for further development. This compound exhibited submicromolar IC50 values against CVB3 3C pro and SARS-CoV M pro , and IC50 = 1.7 µM against EV-A71 3C pro (Table 1) , as well as similarly low EC50 values in the replicons of these viruses (Table 2 ). In Huh7 cells infected with MERS-CoV, the performance of this compound was excellent, with EC50 = 0.0004 µM, and even against HCoV 229E in Huh7 cells and SARS-CoV in Vero E6 cells, EC50 values of 1.8 and 2.1 µM, respectively, were observed (Table 3) . Also in enterovirus-infected cell culture, the compound performed well, with EC50 values of 0.7 µM or below against HRV2, HRV14, and EV-D68 in HeLa (Rh) cells and selectivity values >15. The only concern is the activity of the compound against EV-A71 in RD cells, for which the EC50 value was 3.7 µM, resulting in too low a therapeutic index. On the other hand, only weak toxicity was detected for 11r in Vero or Huh-T7 cells. Preliminary pharmacokinetics tests with the compound in mice did not indicate a toxicity problem (to be published elsewhere).

DISCUSSION

We describe here the structure-based design, the synthesis, and the assessment of capped dipeptide a-ketoamides that target the main protease of alpha-or betacoronaviruses as well as the 3C protease of enteroviruses. Through crystallographic analyses of a total of six inhibitor complexes of three different proteases in this study, we found the a-ketoamide warhead (-CO-CO-NH-) to be sterically more versatile than other warheads such as Michael acceptors (-CH=CH-CO-) and aldehydes (-CH=O), because it features two acceptors for hydrogen bonds from the protein, namely the a-keto oxygen and the amide oxygen, whereas the other warheads have only one such acceptor. In the various complexes, the hydroxy group (or oxyanion) of the thiohemiketal that is formed by the nucleophilic attack of the activesite cysteine residue onto the a-keto carbon, can accept one or two hydrogen bonds from the main-chain amides of the oxyanion hole. In addition, the amide oxygen of the inhibitor accepts a hydrogen bond from the catalytic His side-chain. Alternatively, the thiohemiketal can interact with the catalytic His residue and the amide oxygen with the main-chain amides of the oxyanion hole. Depending on the exact interaction, the stereochemistry at the thiohemiketal C atom would be different. We have previously observed a similar difference in case of aldehyde inhibitors, where the single interaction point -the oxyanion of the thiohemiacetal -can accept a hydrogen bond either from the oxyanion hole or from the catalytic His side-chain, 37 resulting in different stereochemistry of the thiohemiacetal carbon. Both aketoamides and aldehydes react reversibly with the catalytic nucleophile of proteases, whereas Michael acceptors form irreversible adducts.
In addition to better matching the H-bonding donor/acceptor properties of the catalytic center through offering two hydrogen-bond acceptors instead of one, a-ketoamides have another big advantage over aldehydes and a,b-unsaturated esters (Michael acceptors) in that they allow easy extension of the inhibitors to probe the primed specificity subsites beyond S1', although this has so far rarely been explored (e.g., ref. 38 in case of calpain).
Occasionally, individual a-ketoamides have been reported in the literature as inhibitors of both the enterovirus 3C protease and the coronavirus main protease. A single capped dipeptidyl a-ketoamide, Cbz-Leu-GlnLactam-CO-CO-NH-iPr, was described that inhibited the recombinant transmissible gastroenteritis virus (TGEV) and SARS-CoV M pro s as well as human rhinovirus and poliovirus 3C pro s in the one-digit micromolar range. 44 Coded GC-375, this compound showed poor activity in cell culture against EV-A71 though (EC50 = 15.2 µM), probably because P2 was isobutyl. As we have shown here, an isobutyl side-chain in the P2 position of the inhibitors is too small to completely fill the S2 pocket of the EV-A71 3C pro and the CVB3 3C pro .
For compounds with warheads other than a-ketoamides, in-vitro activity against both corona-and enteroviruses has also occasionally been reported. Lee et al. 46 described three peptidyl Michael acceptors that displayed inhibitory activity against the M pro s of SARS-CoV and HCoV 229E as well as against the 3C pro of CVB3. These inhibitors had an IC50 10 -20 times higher for the CVB3 enzyme, compared to SARS-CoV M pro . P2 was invariably isobutyl (leucine) in these compounds, suggesting that further improvement might be possible.
In addition to Michael acceptors, peptide aldehydes have also been used to explore the inhibition of coronavirus M pro s as well as enterovirus 3C pro s. Kim et al. 44 reported a dipeptidyl aldehyde and its bisulfite adduct, both of which exhibited good inhibitory activities against the isolated 3C proteases of human rhinovirus and poliovirus as well as against the 3C-like proteases of a number of coronaviruses, but antiviral activities in cell culture against EV-A71 were poor (EC50 >10 µM), again most probably due to P2 being isobutyl (leucine).
When we introduced a fluoro substituent in the para position of the P2-benzyl group of our lead compound, 11a, we observed good activity against the enterovirus 3C pro s but complete inactivity against the coronavirus M pro s (see Table 1 , compound 11m). This is easily explained on the basis of the crystal structures: In the enterovirus 3C pro s, the fluorine can accept a hydrogen bond from Arg39 (ref. 30) , whereas in the coronavirus M pro s, there would be electrostatic repulsion from the main-chain carbonyls of residues 186 and 188. In agreement with this, rupintrivir (which has P2 = p-fluorobenzyl) is a good inhibitor of the enteroviral 3C pro s, 46 but not of the coronaviral main proteases, as we predicted earlier. 28 In this structure-based inhibitor optimization study, we achieved major improvements over our original lead compound, 11a, by systematically varying the size and the flexibility of the P2-substituent. The compound presenting so far the best compromise between the different requirements of the S2 pockets (SARS-CoV M pro : large and covered, HCoV-NL63 M pro : small and covered, and CVB3 3C pro : large and open) is 11u (P2 = cyclopentylmethyl), which has satisfactory broad-spectrum activity against all proteases tested. However, with regard to its antiviral activities in cell cultures, it is inferior to 11r (P2 = cyclohexylmethyl). The latter compound exhibits very good inhibitory activity against the SARS-CoV M pro as well as the enterovirus 3C pro , and its performance in the SARS-CoV and enterovirus replicons is convincing. Being in the low micromolar range (EV-A71, CVB3), the data for the antiviral activity in cell culture for 11r correlate well with the inhibitory power of the compound against the recombinant proteases as well as in the replicon-based assays. This is not true, though, for the surprisingly good in-cellulo activity of 11r against HCoV 229E in Huh7 cells. Also, the correlation does not seem to hold for LLC-MK2 and CaCo2 cells. We tested the antiviral activity of many of our compounds against HCoV NL63 in these two cell types and found that all of them had low-or submicromolar EC50 values against this virus in LLC-MK2 cells but were largely inactive in CaCo2 cells (not shown). Furthermore, 11r and all other compounds that we synthesized are inactive (EC50 > 87 µM) against CVB3 in Vero cells (not shown), but exhibit good to excellent activities against the same virus in Huh-T7 cells. We have previously observed similar poor antiviral activities in Vero cells not only for aketoamides, but also for Michael acceptors (Zhu et al., unpublished work) . A similar cell-type dependence is seen for the antiviral activity of 11r against MERS-CoV and SARS-CoV. Whereas the inhibitor exhibits excellent activity against MERS-CoV when Huh7 cells are the host cells (400 pM), the inhibitory activity is weaker by a factor of up to 12,500 when Vero cells are used (EC50 = 5 µM). On the other hand, 11r exhibits excellent anti-MERS-CoV activity in human Calu3 lung cells, i.e. in the primary target cells where the compound will have to act in a therapeutic setting (A. Kupke, personal communication). As we tested antiviral activity against SARS-CoV exclusively in Vero cells, the EC50 values determined for our compounds against this virus are in the one-digit micromolar range or higher; the best is again compound 11r with EC50 = 2.1 µM. Interestingly, the relatively weaker activity (or even inactivity) of our inhibitors against RNA viruses in Vero cells was observed independently in the virology laboratories in Leuven and in Leiden. It is thus unlikely that the lack of activity in Vero cells is related to problems with the experimental set-up. In preliminary experiments, we replaced the P3 cinnamoyl group of 11r by the fluorophor coumaryl and found by fluorescence microscopy that much more inhibitor appeared to accumulate in Huh7 cells compared to Vero cells (D.L., R.H. & Irina Majoul, unpublished).

EXPERIMENTAL SECTION Crystallization and X-ray structure determination of complexes between viral proteases and a-ketoamides

Diffraction data collection, structure elucidation and refinement. Diffraction data from the crystal of the SARS-CoV M pro in complex with 11a were collected at 100 K at synchrotron beamline PXI-X06SA (PSI, Villigen, Switzerland) using a Pilatus 6M detector (DECTRIS). A diffraction data set from the SARS-CoV M pro crystal with compound 11s was collected at 100 K at beamline P11 of PETRA III (DESY, Hamburg, Germany), using the same type of detector. All diffraction data sets of HCoV-NL63 M pro complex structures and of the complex of CVB3 3C pro with 11a were collected at synchrotron beamline BL14.2 of BESSY (Berlin, Germany), using an MX225 CCD detector (Rayonics). All data sets were processed by the program XDSAPP and scaled by SCALA from the CCP4 suite. [50] [51] [52] The structure of SARS-CoV M pro with 11a was determined by molecular replacement with the structure of the complex between SARS-CoV M pro and SG85 (PDB entry 3TNT; Zhu et al., unpublished) as search model, employing the MOLREP program (also from the CCP4 suite). 52, 53 The complex structures of HCoV-NL63 M pro with 11a, 11f, and 11n were also determined with MOLREP, using as a search model the structure of the free enzyme determined by us (LZ et al., unpublished) . The complex structure between CVB3 3C pro and 11a was determined based on the search model of the free-enzyme structure (PDB entry 3ZYD; Tan et al., unpublished) . Geometric restraints for the compounds 11a, 11f, 11n, and 11s were generated by using JLIGAND 52, 54 and built into Fo-Fc difference density using the COOT software. 55 Refinement of the structures was performed with REFMAC version 5.8.0131 (ref. 52, 56, 57) .

Inhibitory activity assay of alpha-ketoamides.

Assessment of inhibitory activity of a-ketoamides using viral replicons and virus-infected cells Cells and viruses. Hepatocellular carcinoma cells (Huh7; ref. 58) and their derivative constitutively expressing T7 RNA polymerase (Huh-T7; ref. 59) were grown in Dulbecco's modified minimal essential medium (DMEM) supplemented with 2 mM glutamine, 100 U·mL -1 penicillin, 100 µg·mL -1 streptomycin sulfate, and fetal calf serum (10% in growth medium and 2% in maintenance medium). Huh-T7 cells were additionally supplemented with geneticin (G-418 sulfate, 400 µg·mL -1 ). Huh-T7 cells were used for the enteroviral replicons as well as for infection experiments with CVB3 strain Nancy.
Transfection. Huh-T7 cells grown in 12-well plates to a confluency of 80% -90% (2 -3 x 10 5 cells/well) were washed with 1 mL OptiMEM (Invitrogen) and transfected with 0.25 µg of the replication-competent replicon and Lipofectamin2000 or X-tremeGENE9 in 300 µl OptiMEM (final volume) as recommended by the manufacturer (Invitrogen or Roche, respectively). The transfection mixtures were incubated at 37°C for 4 to 5 h (Lipofectamin2000) or overnight (X-tremeGENE9), prior to being replaced with growth medium containing the compound under investigation. For RNA-launched transfection of enteroviral replicons, DMRIE-C was used as transfection reagent according to the manufacturer's recommendations (Invitrogen). All experiments were done in triplicate or quadruplicate and the results are presented as mean values ± SD.
Testing for inhibitory activity of candidate compounds. Initially, we performed a quick assessment of the inhibitory activity of the candidate compounds towards the enteroviral and coronaviral replicons at a concentration of 40 µM in Huh-T7 cells. Compounds that were relatively powerful and non-toxic at this concentration, were assayed in a dose-dependent manner to estimate their half-maximal effective concentration (EC50) as well as their cytotoxicity (CC50), as described. 29 In brief, different concentrations of a-ketoamides (40 µM in screening experiments or increasing concentration (0, 1.25, 2.5, 5, 10, 20, 40 µM) when determining the EC50) were added to growth medium of replicon-transfected Huh-T7 cells. Twenty-four hours later, the cells were washed with 1 mL phosphate-buffered saline (PBS or OPTIMEM, Invitrogen) and lysed in 0.15 mL Passive lysis buffer (Promega) at room temperature (RT) for 10 min. After freezing (-80 o C) and thawing (RT), the cell debris was removed by centrifugation (16,000 x g, 1 min) and the supernatant (10 or 20 µl) was assayed for Firefly or Renilla luciferase activity (Promega or Biotrend Chemikalien) using an Anthos Lucy-3 luminescence plate reader (Anthos Microsystem).

Antiviral assays with SARS and MERS coronaviruses.

Assays with MERS-CoV and SARS-CoV were performed as previously described. 61, 63 In brief, Huh7, Vero, or Vero E6 cells were seeded in 96-well plates at a density of 1 × 10 4 (Huh7 and Vero E6) or 2 × 10 4 cells (Vero) per well. After overnight growth, cells were treated with the indicated compound concentrations or DMSO (solvent control) and infected with an MOI of 0.005 (final volume 150 µl/well in Eagle's minimal essential medium (EMEM) containing 2% FCS, 2 mM L-glutamine, and antibiotics). Huh7 cells were incubated for two days and Vero/VeroE6 cells for three days, and differences in cell viability caused by virus-induced CPE or by compound-specific side effects were analyzed using the CellTiter 96 AQueous Non-Radioactive Cell Proliferation Assay (Promega), according to the manufacturer's instructions. Absorbance at 490 nm (A490) was measured using a Berthold Mithras LB 940 96-well plate reader (Berthold). Cytotoxic effects caused by compound treatment alone were monitored in parallel plates containing mock-infected cells. HCoV-229E-Rev: 5′-ggTCGTTTAGTTGAGAAAAGT -3′, and 229E-ZNA probe: 5'-6-Fam-AGA (pdC)TT(pdU)G(pdU)GT(pdC)TA(pdC)T-ZNA-3-BHQ-1 -3' (Metabion). Standard curves were prepared using serial dilutions of RNA isolated from virus stock. Data were analysed using GraphPad Prism 5.0; EC50 values were calculated based on a 4parameter logistic statistics equation. In parallel to the qPCR assays with inhibitors, cell viability assays were performed using Alamar-Blue™ Cell Viability Reagent (ThermoFisher) according to the manufacturer's instruction. CC50 values were calculated using inhibitor versus normalized response statistics equation by including proper controls (no inhibitor and 1% Triton-X-100-treated cells).
Determination of the cell toxicity of candidate compounds. The CellTiter 96Aqueous One Solution Cell Proliferation Assay (MTS test, Promega), the CellTiter Glo assay kit (Promega), the Non-Destructive Cytotoxicity Bio-Assay (ToxiLight (measuring the release of adenylate kinase from damaged cells), Lonza Rockland), or the AlamarBlue™ Cell Viability Reagent (ThermoFisher) were used to determine the cytotoxic effect of compounds towards host cells according to the manufacturers' recommendations. 29, 65 Chemical synthesis of a-ketoamides General procedure. Reagents were purchased from commercial sources and used without purification. HSGF 254 (0.15 -0.2 mm thickness) was used for analytical thin-layer chromatography (TLC). All products were characterized by their NMR and MS spectra. 1 H NMR spectra were recorded on 300-MHz, 400-MHz, or 500-MHz instruments. Chemical shifts were reported in parts per million (ppm, δ) down-field from tetramethylsilane. Proton coupling patterns were described as singlet (s), doublet (d), triplet (t), quartet (q), multiplet (m), and broad (br). Mass spectra were recorded using a Bruker ESI ion-trap HCT Ultra. HPLC spectra were recorded by LC20A or LC10A (Shimadzu Corporation) with Shim-pack GIST C18 (5 µm, 4.6x150mm) with three solvent systems (methanol/water, methanol/ 0.1% HCOOH in water or methanol/0.1% ammonia in water). Purity was determined by reversed-phase HPLC and was ≥95% for all compounds tested biologically.
To a solution of N-Boc-L-glutamic acid dimethyl ester (6.0 g, 21.8 mmol) in THF (60 mL) was added dropwise a solution of lithium bis(trimethylsilyl)amide (LHMDS) in THF (47 mL, 1 M) at −78°C under nitrogen. The resulting dark mixture was stirred at −78°C. Meanwhile, bromoacetonitrile (1.62 mL, 23.3 mmol) was added dropwise to the dianion solution over a period of 1 h while keeping the temperature below −70°C. The reaction mixture was stirred at −78°C for additional 2 h. After the consumption of the reactant was confirmed by TLC analysis, the reaction was quenched by methanol (3 mL), and acetic acid (3 mL) in precooled THF (20 mL) was added.
In a hydrogenation flask was placed compound 1 (4.0 g, 12.7 mmol), 5 mL of chloroform and 60 mL of methanol before the addition of PtO2. The resulting mixture was stirred under hydrogen at 20 o C for 12 h. Then the mixture was filtered over Celite to remove the catalyst. NaOAc (6.77 g, 25.5 mmol) was added to the filtrate before the resulting mixture was stirred at 60 o C for 12 h. The reaction was quenched with water (30 mL). The suspension was extracted with ethyl acetate. The organic layers were combined, dried (MgSO4), and filtered. The light-brown filtrate was concentrated and purified by silica gel column chromatography (petroleum ether/ethyl acetate = 4/1) to give the product 2 (2.20 g, 61%) as white solid. 1 H NMR (CDCl3) δ 6.02 (1H, br), 5. Synthesis of (S)-methyl 2-amino-3-((S)-2-oxopyrrolidin-3yl)propanoate (3).

Supporting Information

Detailed results of the variation of the P1' and P3 substituents Synthesis of a-ketoamides 11b -11e and 11g -11l Supplemental Table 1 : inhibitory activities (IC50 (μM)) of α-ketoamides with P1' and P3 modifications against viral proteases Supplemental Table 2 : crystallographic data for complexes between viral proteases and α-ketoamides Molecular formula strings and biological data (CSV)

DISCUSSION

Regardless of which cell system is the most suitable one for the testing of peptidomimetic antiviral compounds, we next plan to test 11r in small-animal models for MERS and for Coxsackievirusinduced pancreatitis. In parallel, we aim to refine the experiments to quantify the accumulation of peptidomimetic protease inhibitors in different host-cell types, in the hope to find an explanation for the observed cell-type dependencies.

CONCLUSIONS

This work demonstrates the power of structure-based approaches in the design of broad-spectrum antiviral compounds with roughly equipotent activity against coronaviruses and enteroviruses. We observed a good correlation between the inhibitory activity of the designed compounds against the isolated proteases, in viral replicons, and in virus-infected Huh7 cells. One of the compounds (11r) exhibits excellent anti-MERS-CoV activity in virus-infected Huh7 cells. Because of the high similarity between the main proteases of SARS-CoV and the novel BetaCoV/Wuhan/2019, we expect 11r to exhibit good antiviral activity against the new coronavirus as well.
92 section matches

Abstract

The complex hide-and-seek game between HIV-1 and the host immune system has impaired the development of an efficient vaccine. In addition, the high variability of the virus impedes the long-term control of viral replication by small antiviral drugs. For more than 20 years, phage display technology has been intensively used in the field of HIV-1 to explore the epitope landscape recognized by monoclonal and polyclonal HIV-1-specific antibodies, thereby providing precious data about immunodominant and neutralizing epitopes. In parallel, biopanning experiments with various combinatorial or antibody fragment libraries were conducted on viral targets as well as host receptors to identify HIV-1 inhibitors. Besides these applications, phage display technology has been applied to characterize the enzymatic specificity of the HIV-1 protease. Phage particles also represent valuable alternative carriers displaying various HIV-1 antigens to the immune system and eliciting antiviral responses. This review presents and summarizes the different studies conducted with regard to the nature of phage libraries, target display mode and biopanning procedures.

Introduction

In 1983, the human immunodeficiency virus (HIV-1) was identified as the causative agent of the Acquired ImmunoDeficiency Syndrome (AIDS) [1, 2] . In 30 years of pandemic, HIV-1 has infected more than 60 million individuals and killed 25 million. Thirty-three million individuals are currently living with HIV-1 making this disease a major worldwide public health problem (UNAIDS 2010). Natural sterilizing immune response against HIV-1 has never been described and despite decades of intensive research, a vaccine against HIV-1 is still lacking, mainly due to the high ability of the virus to escape from the immune response.
In the absence of a vaccine, combinations of small antiviral molecules are intensively used to control HIV-1 infection. The majority of these drugs are reverse transcriptase and protease inhibitors [3] . More recently, new molecules targeting the fusion step, CCR5 or integrase were licensed for clinical use [4] [5] [6] . Despite the increased life expectancy observed with the advent of these therapies, severe side effects, lack of adherence and emergence of drug-resistant virus strains still limit the long-term control of the infection [7] .

Exploration of HIV-1 Epitope Landscape

Monoclonal antibodies (MAbs) or polyclonal antibodies (PAbs) epitopes can be identified through screening either combinatorial or antigen-fragment libraries displayed at the surface of phages. Antibodies may be derived from infected/immunized animals or from HIV seropositive patients with peculiar immunological profiles, such as the Long Term Non Progressors (LTNP) [15] , leading to the identification and characterization of HIV mimotopes. The seminal paper characterizing the epitope recognized by a MAb directed against HIV-1 using the phage display technology was published in 1993. Keller et al. screened a 15-mer Random Peptide Library (RPL) against the BNtAb 447-52D which targets the V3 loop of gp120 (KRKRIHIGPGRAFY) [16] (Figure 3 ) and identified 70 clones presenting a GPxR consensus sequence [17] (Table 1 ). These mimotopes were further used in rabbit immunization experiments and elicited neutralizing responses. Boots et al. later investigated the linear epitope recognized by the MAb 447-52D by combining gp120 competition and panning of V3-region biased/constrained libraries. Such a set-up favors the selection of mimotopes in which residues surrounding the GPGR crown motif are similar to those present in the gp120 used for competition, suggesting that the use of strain-specific competitors with a MAb of broad specificity can select for strain-specific mimotopes [18] . [36, 37] . In their study, a 20-mer RPL was constructed and panned against MAb 58.2 according to two different protocols, either streptavidin capture of phages mixed with biotin-labeled MAb 58.2 (SA-Bio) or panning against MAb 58.2 immobilized on microwells (micropan) [19] . Phages selected with the SA-Bio protocol shared a consensus sequence (Y/L)(V/L/I)GPGRxF homologous to the V3 loop. The micropan protocol allowed for the identification of sequences sharing the same motif, of which two were also identified in the SA-Bio panning. Biopanning results were further validated in peptide array hybridization assays. Hybridization of MAb 58.2 to 14-mer peptides containing all possible point substitutions within the V3 loop sequence demonstrated that both phage display and peptide array experiments identified the same critical amino acids, thereby confirming the quality of the 20-mer RPL and the validity of the screenings performed.

Antibodies Directed against Viral Proteins

In parallel, Grihalde et al. constructed and panned a 30-mer RPL against MAb 1001, which recognizes a constrained linear epitope on the V3 loop [21] . Several clones were obtained and presented the common motif (R/K/H)xGR mimicking the crown of the V3 loop sequence, thereby confirming the epitope sequence of MAb 1001. To assess the reactivity of peptides deprived of the phage scaffold, the mimotope with the highest affinity for the MAb 1001 was expressed in fusion with the E. coli alkaline phosphatase. Binding of the phage and fusion protein to the MAb 1001 was assessed by ELISA, Western Blot and SPR assays and highlighted that binding was independent from the scaffold, although interactions were weaker when the peptide was displayed in the fusion protein format than in the phage scaffold.
The isolation of the BNtAb 2F5, which interacts with an epitope (ELDKWA) located on the gp41 MPER was reported in 1993 [39] . Conley et al. further characterized this epitope by biopanning a 15-mer RPL on immobilized 2F5 Ab. Different sequences were obtained and classified in four groups, whose consensus motifs, DKW, LDxW, ED(K/R)W and ELDKW, revealed information on the residues involved in 2F5 Ab recognition [24] .

Gp120 C1 Domain

RPL screening may also contribute to the elucidation of the antigen structure. To that purpose, Stern et al. used a 20-mer RPL to analyze two different mouse MAbs (GV1A8 and GV4D3) recognizing non-overlapping sequences between residues 1 and 142 of gp120 [27, 40] . Biopanning performed on GV1A8 allowed for identification of mimotopes sharing a (L/I)W motif identical to residues 111-112 of the gp120 C1 domain and highlighted a HxxIxxLW motif compatible with two turns of an α-helix. Computer modeling confirmed that such a structure placed the residues recognized by GV1A8 contiguously on one face of the helix while other secondary structures did not. Similarly, biopanning on MAb GV4D3 yielded sequences with a trend towards an Nx 3 WxxD motif. The epitope maps to the FNMWKND sequence satisfying the helical motif FxxWxxD. In this study, the use of phage display not only predicted the α-helix structure of the C1 domain of gp120, but also pinpointed the contact residues defining the surface of the helix.

Gp120 CD4-Binding Site

Dorgham et al. attempted to map the b12 epitope with a RPL of two random 10-mers joined through an ALLRY spacer (x 10 ALLRYx 10 ) [31] . Selection resulted in the identification of clones sharing a M/VArSD consensus motif (Ar standing for any aromatic residue) as previously observed [29] . A second-and a third-generation of semi-RPL containing fixed consensus motifs identified in the previous panning surrounded by randomized residues were constructed (x 3 (M/V)WSDx 3 and xLXVWxDExx). Phagotopes (phage particle displaying a particular peptide sequence selected on a given target) obtained from the first, second and third generation libraries showed increasing binding affinity for b12, respectively. Phagotopes were able to compete with gp160 for b12 binding and triggered the production of Abs capable of recognizing at least five distinct, unrelated HIV-1 strains. In contrast, the corresponding peptides were not able to compete for b12 binding and did not elicit anti-gp160 MAbs. Such discrepancies between phagotopes and peptides might be explained by constraints imposed by the phage scaffold.
At the same time, both a linear 9-mer RPL and a constrained 10-mer RPL were used in panning experiments against another gp120 CD4 binding site MAb (5145A) [33] . Screening of the 9-mer RPL resulted in selection of a single sequence (WKPVVIDFE), while screening of the 10-mer-c RPL on 5145A allowed for identification of a GPxEPxGxWxC consensus motif. Peptides were synthesized as peptide-pIII fusion proteins and their affinity for 5145A was assessed in phage/MAb and gp120/MAb binding inhibition assays. The two most affine peptides (AECGPAEPRGAWVC and AECGPYEPRGDWTCC) were used to immunize rabbits and elicited antibodies binding to recombinant monomeric gp120. Nevertheless, generated Abs seemed to target a different epitope since they were unable to compete with the 5145A CD4-binding site specific MAb.

Antibodies Directed against Host Proteins

The murine MAbs 3A9 and 5C7 were raised against cells transfected with the seven transmembrane-spanning domains chemokine receptor CCR5, one of the main coreceptors for HIV-1. They recognize a common epitope located near the CCR5 N-terminus [67, 68] . Both MAbs were used to screen a constrained 9-mer RPL [63] . Phagotopes selected on 3A9 displayed the sequence CHASIYDFGSC while CPHWLRDLRVC was the most prevalent sequence isolated on 5C7. These sequences showed homologies to residues located at the N-terminus but also within the first or third extracellular loop (ECL) of CCR5. Both reacted against the targeted MAb either in phage, cyclic peptide or linear peptide formats. Moreover, they were able to bind to gp120 and the peptide selected on 3A9 inhibited binding of the MAb to a cell line expressing CCR5. To further characterize the conformational epitope recognized by 3A9, additional screening rounds of 12-mer, 7-mer and 7-mer-c RPLs were performed [62] .
Sequences with an HW motif homologous to the CPHWLRDLRVC motif selected on the MAb 5C7 were identified, and Ala-scanning confirmed the importance of the HW motif and SIYD motifs previously identified for 3A9 binding [63] .
The CD18 cell surface molecule, a part of the LFA-1 molecule, is involved in the syncytia formation of HIV-1-infected lymphocytes [69] . As MAb MHM23, a CD18 binder, inhibits HIV-1-mediated cell fusion, Poloni et al. applied the phage display technology to map the MHM23 epitope and thereby identify the CD18 domains which account for syncytia formation [66] . Linear and constrained 9-mer RPL were panned on the MHM23 MAb, to allow for the selection of linear and constrained sequences. A PPFxYRK consensus motif was inferred by sequence comparison, assigning the epitope recognized by MHM23 to residues 200-206 of CD18. Two phagotopes inhibited in vitro HIV-1-induced syncytia formation and one of them retained this ability in the peptide format, confirming its role in syncytia formation and highlighting that mimics of this epitope could prevent cell-mediated viral propagation.

Inhibitors of HIV-1 Proteins

Most of the HIV-1 inhibitors selected with the help of phage display were identified by targeting viral proteins (Table 4 ). (2) 5-Helix, IZN36
(3) 5-Helix

Fab Libraries

The extremely high binding affinity of 3B3 was also applied to develop an immunotoxin which could specifically kill HIV-1-infected lymphocytes [71] . The authors engineered 3B3 ScFv fused to a truncated form of Pseudomonas exotoxin A. The 3B3(Fv)-PE38 fusion immunotoxin bound to the MN strain of gp120 with the same affinity as the parental Fab antibody and specifically killed a gp120-expressing cell line and a chronically HIV-infected lymphocytic cell line. This study provided the proof-of-concept that high affinity anti-HIV-1 antibodies have a dual application since they may be used for their neutralizing potency but also as carriers for antiviral compounds.

ScFvs Hydrolyzing gp120

Antibodies recognizing the amino acids 421-436 of the gp120 CD4BS were isolated from patients suffering from the systemic lupus erythematosus autoimmune disease [130, 131] . However, whether these antibodies neutralized HIV-1 was not known, which prompted Karle et al. to quantify gp120-recognizing Abs in an existing ScFv phage-displayed library from the PBMCs of lupus-suffering patients [132] . Biopanning selected for clones binding both gp120 and the 421-436 region of the gp120-CD4-binding site. One of these clones (JL413) neutralized R5 and X4-tropic HIV-1 primary isolates from clades B, C and D with IC 50 ranging from 0.1 to 25.6 µg/mL.
A subset of gp120-binding antibodies was shown to hydrolyze gp120 by a mechanism analogous to serine protease [133] , As the nucleophilic region responsible for this activity was localized in the light chain [134, 135] , a library of light chains prepared from three lupus patients [132] was screened with an electrophilic analogue of gp120 residues 421-433 to isolate antibodies capable of binding and hydrolyzing gp120 [75] . One of the light chain clones selected (SKL6) cleaved a gp120 421-433-reporter substrate as well as full-length gp120. Engineering of Abs composed of such a light chain coupled to a gp120-binding heavy chain might provide Abs with anti-viral proteolytic activities.

Gp41 Heptad Repeat Inhibitors

To follow-up a study demonstrating that affinity-purified IgGs from rabbits immunized with N35 CCG N13 inhibited HIV-1-mediated fusion [156] , Nelson et al. rescued a ScFv antibody library from these animals [99] . Three N35 CCG N13 binders were selected, and one of them, 8K8, displayed neutralizing activity against HXB2. In parallel a more complex Fab library was constructed from the FDA-2 HIV-1positive patient from whom Z13 Ab had previously been isolated [85] . Screening this library against N35 CCG N13 allowed for the isolation of Fab DN9 [99] . Both ScFv 8K8 and Fab DN9 neutralized HIV-1 infection with a panel of viral strains with IC 50 ranging from 50 to 500 nM and targeted the NHR trimeric coiled-coil, presumably close to the hydrophobic pocket. Three additional gp41-specific Abs (M44, M46 and M48) were obtained by screening antibody phage libraries from asymptomatic seropositive patients [159] against gp140 [100] [101] [102] . A recombinant gp140 (gp140 R2) isolated from an asymptomatic seropositive patient with BNtAbs was reported to elicit BNtAbs in monkeys, further demonstrating that immunogenic epitopes were exposed on this recombinant antigen [101] . Competitive antigen panning (CAP) (biopanning approach designed to outcompete phagotopes binding to an immunodominant region of a multi-domain target through concomitant addition of an excess of soluble forms of this immunodominant domain) using a mixture of gp140R2 as antigen and gp120R2 as competitor resulted in the selection of a gp41-specific M46 Ab [101] . M46 displayed broad neutralization properties and recognized a conformational epitope and bound weakly to 5-Helix antigen but not to the trimeric NHR nor to 6-HB. In two other studies, the same libraries were panned against gp140/120 from three different isolates (89.6, cm243 and R2), which led to the identification of the M48 Ab recognizing a conformational epitope of gp140 [102] and the M44 Ab, which binds gp140, 5-Helix and 6-HB but not to the NHR trimeric coiled-coil. M44 recognized a conserved conformational epitope and neutralized isolates from different clades with a significantly higher potency than 4E10 or Z13 [100] . The competitive antigen panning approach against gp140/120 thus allows the selection of Abs recognizing conformational epitopes on gp41, which are not properly folded when gp41 is used as a target.

Viral Protein of Regulation (Vpr)

In 2003, Krichevsky et al. conducted a study to elucidate the exact role of Vpr and its contribution to the nuclear import process of the HIV-1 PIC [104] . To that aim, a semi-synthetic ScFv library [167] was screened against the N-terminal (AA 17-34) part of Vpr (VprN) conjugated to BSA (VprN-BSA). Purified ScFvs fragments featuring their strong and specific binding to the VprN sequence recognized full-length Vpr and inhibited Vpr-mediated nuclear import, indicating that targeting Vpr may lead to the development of new peptides to fight viral infection.

Reverse Transcriptase (RT)

Two years later, a semisynthetic phage display library of human ScFvs with randomized heavy and light chain CDR3 was screened against recombinant RT [120] . Five different ScFv Abs directed against RT were isolated, of which three (F-6, 6E9, 5B11) inhibited the RDDP activity of RT; of note, (F-6) also inhibited RT DNA-dependent DNA polymerase (DDDP) activity. Synthesis of the peptides corresponding to the CDR3 regions of the heavy and light chains showed that the heavy chain CDR3 inhibited RDDP activity while the light chain peptide had no effect. These HCDR3 peptides represent the smallest antibody fragments inhibiting the RT identified to date and demonstrated that HCDR3 repertoire is a potential source of bioactive molecules (see Section 3.2.1.2.).

CCR5 Coreceptor

The CC chemokine receptor 5 (CCR5) is one of the two major HIV-1 coreceptors and binds three different endogenous chemokines CCL5 (RANTES), CCL4 (MIP-1β) and CCL3 (MIP-1α) which were reported to prevent R5-tropic HIV-1 entry. Interestingly, inhibition of CCR5 binding to HIV-1 provides an almost complete protection against R5-tropic viruses with only minor effects on the normal physiological functions of the cells [183] .

CXCR4 Coreceptor

In 2010, Jahnichen et al. isolated llama-derived V HH binding specifically to CXCR4 and inhibiting the entry of X4-tropic virus [187] . To select V HH binding exclusively to functional and properly folded receptor, llamas were immunized with CXCR4-expressing HEK293T cells. A phage library was subsequently constructed from the PBMCs of immunized camelids and several phage clones inhibiting the binding of labeled CXCL12 chemokine to the receptor were identified. In particular, two V HHS (238D2 and 238D4) showed low nanomolar affinity for the receptor and inhibited entry of X4 and X4/R5-viruses into different CXCR4 + cell types with IC 50 values ranging from 10 to 100 nM. Dimerization of 238D2 and 238D4 to form biparatopic proteins increased their antiviral properties to IC 50 values in the picomolar range. Epitope mapping revealed that the two V HH s inhibited CXCR4 mainly through binding to the second extracellular loop.
Very recently, we used a peptide corresponding to this particular extracellular loop (ECL2) as target to identify short CXCR4 antagonists [182] . By screening a non-immune phage library displaying the human HCDR3 peptide repertoire [194] , several small peptides binding to the ECL2 peptide that specifically recognized CXCR4-expressing cells were identified. Notably, one of these HCDR3 peptides (TYPGRY) acted as a CXCR4 antagonist with potency in the micromolar range.

Other Host Protein Inhibitors

Besides therapeutic purposes, the phage display technology has also been applied for fundamental studies of host proteins. In particular, a study focusing on the roles and the diversity of the anti-CD34 autoimmune repertoire in the myelosuppression appearing in HIV-1 infected individuals was reported by Rubinstein et al. [195] . Using a substractive biopanning procedure from the immune repertoire of a HIV-1 seropositive patient, the authors selected Fab fragments binding specifically to the CD34 receptor. Sequencing and binding analyses of these antibody fragments demonstrated the heterogeneous origin of the anti-CD34 autoimmune repertoire and suggested that these autoantibodies might be generated through antigen-specific driven processes.

Phage Substrate

The use of phages for protease cleavage specificity profiling was first described by Matthews and Wells in 1993 [197] . This -phage substrate‖ approach relies on the use of phage particles to screen for enzyme substrates instead of classical binder selection. Protease cleavage profiling using phage takes advantage of the natural resistance of phage particles to proteolysis. Phage particles displaying random peptides are immobilized on a solid support and submitted to proteolytic elution to specifically liberate phages presenting peptides corresponding to the protease cleavage site. The phage substrate approach allows thus to rapidly determine the cleavage profile of a given protease and provides optimized substrate candidates which can be further used as leads for the development of specific inhibitors. Over the last two decades, phage substrate has been applied to a large variety of proteases including the HIV-1 protease (PR) [198, 199] . The HIV-1 PR is a homodimeric aspartic protease responsible for nine critical cleavage steps within both the structural (Gag) and the non-structural (Gag/pol) polyproteins. The HIV-1 protease recognizes substrate residues encompassing P4 to P3' positions (Schechter and Berger's nomenclature), with the primary determinants from positions P2 to P2' positions [200, 201] . Interestingly, alignment of the nine natural substrate cleavage sites of the HIV-1 PR shows a high sequence diversity suggesting a broad proteolytic specificity. In 2000, Beck et al. reported the use of hexapeptide phage library to unravel the HIV-1 PR specificity and develop new protease inhibitors (Table 6 ) [199] . This library was constructed by fusing the Mab 3-E7 epitope upstream of the randomized sequences. Phages were first incubated with the HIV-1 PR and uncleaved phages were removed by addition of pansorbin cells. Biopanning selected for highly diverse sequences consistent with the suggested broad substrate specificity of the HIV-1 protease. However, none of the selected peptides corresponded to the HIV-1 polyproteins cleavage sites.

Conclusions and Future Challenges

Some of the most potent inhibitors originated from Fab libraries derived from asymptomatic HIV-1-infected patients whose reactivity against HIV-1 was previously assessed (b12, X5, Z13) or from immunized camelids (V HH anti CXCR4). Other inhibitors were identified from semi-synthetic (ligand analogues of CCL5) or randomized peptide libraries (D-peptides: PIE12 3 ) and their affinity was improved through secondary libraries. These inhibitors were selected against different targets (Env proteins, coreceptors) by biopanning carried out using different types of support (immobilized proteins, cells, peptides), illustrating the power and versatility of the phage display technology [41, 83, 85] (Figure 5 ). inhibitors blocking key steps in the entry process were identified using the phage display technology. These inhibitors target: the CD4 binding site (Fab b12 and Z13), the coreceptors CCR5 (CCL5 variants) or CXCR4 (V HH 238D2 and 238D4), the CD4-induced epitope of gp120 (Fab X5) or the heptad repeat region of gp41 (peptide PIE12 3 ).

Introduction

Upon CD4 receptor binding, glycoprotein gp120 undergoes conformational changes exposing the V3 loop, a region that further interacts with the chemokine receptors CCR5 or CXCR4 thereby promoting viral entry [8] (Figure 1 ). Coreceptor binding leads to the insertion of the gp41 fusion peptide into the cell membrane, the creation of a hairpin loop intermediate and finally the fusion of both viral and cell membranes. The viral capsid then enters the cell and the genetic material is released in the cytoplasm. Most viral strains use only one coreceptor to enter host cells and are classified accordingly as CCR5-(R5 strains) or CXCR4-tropic (X4 strains), although viruses with broadened coreceptor usage (dual-tropic) have also been described. R5 viruses infect macrophages and CCR5-expressing T lymphocytes, and are mainly associated with transmission. In contrast, X4 viruses infect CXCR4-expressing T-cells and T-cell lines, and often appear at the later stages of infection. The envelope glycoprotein gp120 is composed of variable and more constant regions. Several studies demonstrated that the elicitation or binding of effective neutralizing antibodies are impaired by the gp120 glycan shield or steric hindrance of its constant regions [9] . Moreover, variable immunodominant domains were shown to be recognized by non-neutralizing antibodies. Nonetheless, it is estimated that 10% to 30% of HIV-1-positive subjects develop neutralizing antibodies (NtAbs) appearing at least 1 year after infection. Only 1% of infected patients develop a broad neutralizing response against heterologous virus strains [10] . Among HIV-1-infected patients, such antibodies arise only rarely and tardily, thus inefficiently controlling viral replication. However, the recent identification of broadly neutralizing antibodies (BNtAbs) and mapping of their epitopes fueled interest in the humoral immune response against HIV-1 (reviewed by Overbaugh [11] ).
Bacteriophages (phages) are bacteria-infecting viruses whose DNA or RNA genome is packed in a capsid composed exclusively of surface proteins. The principle of phage display relies on cloning of exogenous DNA in fusion with the phage genetic material allowing the display of foreign peptides in an immunologically and biologically competent form at the surface of phage capsid proteins [12] . The significance of phage display was first demonstrated for filamentous phages such as M13, fd or related phagemids and later extended to lytic bacteriophages λ, T4 and T7 (reviewed by Beghetto [13] ). The phage biopanning process consists of iterative cycles of binding, washing and elution steps leading to the progressive selection of phages displaying peptides/proteins binding to the target of interest [14] . The target is usually immobilized on a solid support which can be plastic, beads or even cells.

Antibodies Directed against Viral Proteins

Epitope mapping was also performed on a monoclonal antibody (MAb 19b) isolated from an asymptomatic HIV-1-infected patient and recognizing the xxIx 3 PGRAFYTT motif within the V3 loop sequence (KRIHIGPGRAFYTT) [38] . Binding of MAb 19b to viral isolates presenting mutations in this sequence revealed that not all residues within this recognition motif were crucial for reactivity [20] . Biopanning with a 15-mer RPL resulted in the selection of sequences compatible with the minimal binding site (-I----G--FY-T) inferred from gp120 sequence alignment from clades A to F which bound MAb 19b. Taken together, data from binding assays as well as phage biopanning experiments demonstrated that the MAb 19b epitope spans both sides of the V3 loop. Substitutions within the residues located at the crown of the loop are however tolerated, provided that the formation of a β-turn induced by the GPGR crown motif is allowed. However, one exception was reported by Boots et al. who reported that the Phe to Trp substitution may be tolerated in the absence of a β-turn [18] .
In another study, Laisney et al. investigated the minimal epitopes recognized by two MAbs interacting with the V3 loop, 110-A and 19.26.4, whose specificity is strictly restricted to the X4-tropic LAI isolate [22] . The screening of a 6-mer RPL on the MAb 110-A allows the selection of numerous sequences with a consensus motif. Binding assays with synthetic peptides further showed that both MAbs reacted with residues 316-320 of the LAI gp120. In this narrow region, the minimal epitope deduced for the MAb 110-A was HyxRGP, whereas the MAb 19.26.4 recognized the xQ(R/K)GP motif (Hy: non-aromatic AA, underlined: AA tolerating substitutions). Interestingly, the essential QR residues located at positions 317-318 correspond to a QR insertion located upstream of the V3 loop GPGR crown motif that is characteristic of the LAI isolate and may thus explain the restricted specificity of the two MAbs. The same authors screened a 6-mer RPL on the MAb 268, specific to the V3 loop of the MN isolate, and identified two groups of sequences [23] . A representative sequence from the first group (268.1, HLGPGR), corresponded to the crown of the V3 loop, a linear epitope, while two sequences of the second group (268.2, KAIHRI and 268.3, KSLHRH), showed no homology to linear HIV-1 epitopes. Both peptides 268.1 and 268.2 nevertheless inhibited the interaction of MAb 268 with gp120, and were even able to compete with each other for binding to the antibody, indicating that peptide 268.2 was also a mimotope of peptide 268.1. When conjugated to KLH and injected separately into rabbits, both peptides 268-1 and 268-2 were able to elicit gp120-reacting antibodies that partially competed with the homologous peptide, confirming that 268.1 and 268.2 peptides are both antigenic and immunogenic mimics of the gp120 MN V3 loop.

Gp120 CD4-Binding Site

The BNtAb IgG1 b12 was the first neutralizing MAb selected from a phage-displayed Fab (antibody fragment composed of one constant and one variable domain of the heavy (CH1 and VH) and the light (CL and VL) chains linked together) library derived from an HIV-1-infected donor (See section 3.1.1.1.1.) [41] . This antibody recognizes a conformational epitope overlapping the CD4-binding site of gp120 [42] . Attempts to precisely map the residues interacting with the IgG1 b12 MAb with 15-mer and 21-mer RPLs provided no consensus sequence [18] . As previous screening of 11 cysteine-enriched peptide libraries resulted in the identification of two sequences bearing an SDL motif flanked by one or two cysteine residues (REKRWIFSDLTHTCI and TCLWSDLRAQCI) [30] , Zwick et al. constructed two sublibraries (x 7 SDLx 3 CI and xCxxSDLx 3 CI) sharing the SDL motif and reflecting the cysteine content of the two clones [29] . A B2.1 peptide (HERSYMFSDLENRCI) containing a unique cysteine bound b12 in Fab as well as IgG1 formats with a much higher affinity than the other clones. Moreover, the phage-borne B2.1 peptide was used to screen the Fab library from which b12 was identified. This -reverse panning‖ experiment showed that B2.1 was able to select only the Fab sequence corresponding to b12, confirming the specificity of the B2.1 mimotope towards the b12 Ab. B2.1 peptide was immunogenic in mice and rabbits but did not elicit significant anti-gp120 cross-reactive Abs titers.
Detailed characterization of the BNtAb b12 was conducted with the Mapitope algorithm developed by Enshell-Seijffers et al. to facilitate the identification of discontinuous epitopes. This approach is based on the assumption that the collection of mimotopes recognized by a given antibody must in some manners reflect the antibody's paratope [32] . A constrained 12-mer RPL was screened against b12 and selected sequences were compared to those obtained from previous panning experiments performed against b12 [18, 29, 30, 43] . Although no similarity was observed with the mimotopes selected by Boots et al. [18] , a consensus WSDL motif was observed in the newly identified mimotopes and the sequences isolated by Bonnycastle et al. [29, 30] . Mapitope analysis conducted on these sequences as well as on the peptide sets isolated by Boots and Bonnycastle resulted for each of the three panels in the prediction of two clusters located at the periphery of the CD4 binding site.

Other Domains

The extreme C-terminus of gp120 forms a pocket which may interact with gp41 and was suggested to undergo conformational changes weakening the interaction between gp120 and gp41 upon CD4 binding. In the absence of available crystallographic information, Ferrer et al. utilized the mouse MAb 803-15.6 to analyze an epitope overlapping with this pocket region [34] . Epitope mapping of MAb 803-15.6 achieved by cross-blocking experiments on gp120 suggested that the Ab recognized residues 502-516 while the screening of an heptapeptide RPL against MAb 803-15.6 preincubated with gp120 allowed for the recovery of phages presenting an AxxKxRH motif homologous to residues 502-508. Affinity studies confirmed that Ala was the N-terminal residue of the MAb 803-15.6 epitope and showed that affinity increased when C-terminal residues were added. The Mapitope algorithm designed by Enshell-Seijffers et al. was initially developed to elucidate the CD4-induced epitope recognized by the MAb 17b [32] . Screening of a 12-mer-c RPL yielded sequences with no homology to gp120. Comparison of the mimotopes to the gp120 structure in complex with MAb 17b and sCD4 predicted candidate epitopes that were in agreement with the actual 17b contact residues. For further validation of the algorithm, RPL libraries were screened against the p24-specific MAb 13b5 and analysis of the selected sequences predicted four clusters, the largest of which corresponded to the genuine epitope. The algorithm was finally applied to the Mab CG10, an Ab with an unknown epitope competing with the Mab 17b for the binding to the CD4/gp120 complex. Mimotopes sequences were analyzed and produced seven clusters, one of them being in accordance with previous mutation analysis impeding Mab CG10 binding [44] . Noteworthingly, when reconstituted in a phage scaffold, the epitope was capable of binding Mab CG10.
After having successfully identified linear or nearly linear epitopes [17, 20, 24] , Boots et al. extended the use of the phage display technology to the identification of epitopes recognized by MAbs binding to discontinuous sequences [18] . One of these Abs (MAb A32) binds to a CD4-induced discontinuous epitope involving residues within the C1, C2 and C4 regions of isolates from clades B, C, D, E and F [45, 46] . Panning of a 15-mer RPL yielded several phages which only shared a Trp residue. In the same study, panning of a 15-mer RPL against MAb 50-69, which reacts with the ID GKLIC region of gp41, resulted in the identification of sequences sharing a common Trp within motifs WGCx(K/R)xLxC and FGxWFxMP. The selected consensus sequences were however not further characterized.
The BNtAb 2G12 presents the typical feature of recognizing a cluster of high-mannose oligosaccharides of gp120 [47] [48] [49] . In an attempt to identify peptidic immunogens capable of eliciting 2G12-like Abs, Menendez et al. screened a set of previously described RPLs [30] against 2G12 and identified one phagotope specifically binding to 2G12 (2G12.1) [35] . The crystal structure of MAb 2G12 complexed to the synthetic 2G12.1 peptide was compared to structures of 2G12-oligomannose epitopes and revealed that interactions with the Abs were different for the two ligands. These results showed that the peptide selected from RPL panning experiments is not a structural mimic of the 2G12 oligomannose epitope. The phagotope 2G12.1 was used in rabbit immunization experiments and elicited high titers of peptide-specific antibodies, but no cross-reactivity with gp120 was obtained, further supporting that peptide 2G12.1 is not an immunogenic mimic of the MAb 2G12 epitope.

Polyclonal Antibodies Directed Against Viral Epitopes

The first attempt at identifying epitopes recognized by HIV-specific PAbs was performed in 1999 on plasma IgG from two LTNP patients (Table 2 ). Using linear and constrained 9-mer RPLs, Scala et al. identified mimotopes of the linear immunodominant (ID) GKLIC region of gp41 or the V1 and C2 domains of gp120 [50] . These mimotopes were immunogenic when injected to mice and elicited an NtAb response against HIV-1. Moreover, the same mimotopes reduced viraemia to undetectable levels in immunized monkeys as shown in a subsequent study [51] . The same year, a similar study conducted on one LTNP with an RPL library of cysteine-constrained 12-mers selected for peptides defining the gp41 ID epitope CSGKLIC. The levels of reactivity of these phagotopes were further assessed against a panel of HIV positive plasma to evaluate the plasticity and polyclonality of the immune response mounted by 30 infected individuals [52] . Later, Palacios-Rodriguez et al. evaluated the impact of factors such as Highly Active AntiRetroviral Treatment (HAART) or Ab titers on a selection of peptides mimicking the ID epitope CSGKLIC [53] . In their study, a mix of linear 12-mer as well as linear and constrained 7-mer RPL was screened against the individual plasma samples of four HIV-1 infected patients initiating HAART and presenting different titers of anti-GKLIC antibodies. A consensus motif CxxKxxC was obtained from the 12-mer linear RPL, and the percentage of occurrence of the motif in the selected sequences was proportional to the anti-GKLIC Ab titers of each sample, indicating that these Abs are involved in selection of the consensus motif. Mice immunization experiments with the two mimotopes resembling most to the gp41 ID parental epitope as well as with pools of phages eluted from the panning experiments showed that all phages elicited reactivity, and that immunization with the phage eluates induced the strongest recognition. These findings indicate that the immunogenic properties of mimotopes are different and additive, opening the possibility of immunizing animals with different mimotope combinations (See Section 4).
In 2007, Humbert et al. investigated the immune response of eight LTNP patients presenting BNtAbs. By using linear and constrained RPLs they identified epitopes recognized by plasma IgGs captured on tosylactivated beads [54] . Each panning round consisted of a positive selection performed on LTNP IgGs followed by a negative selection on the IgGs of healthy donors. Homologies of some selected sequences to immunodominant regions such as the gp120 V3 loop or the gp41 GKLIC region were observed, as reported in previous studies [50, 52, 53] . Further homologies to linear motifs located near the V3 loop (NNNT), downstream of the ID GKLIC region (AVPW motif) and overlapping with the 2F5 BNtAb epitope (PPWx 3 W motif) were also identified. Additionally, the authors applied the 3DEX software to compare the phage insert sequences to HIV-1 protein structure files from the RCSB Protein Data Bank (www.pdb.org) [59, 60] . Phage pools corresponding to the linear V3 loop, GKLIC domain and WxxxW motif, as well as pools representing potential conformational epitopes, were selected for mice immunization assays, and elicited plasma-associated neutralizing activity against primary HIV-1 strains. The highest neutralizing ability was obtained with mice immunized with the V3 mimotopes, although immunization with potential conformational epitopes also provided a modest neutralizing response.
A similar approach was used by the same authors on a rhesus macaque infected with an SHIV chimera encoding the env of a clade C HIV-1 strain (SHIV1157ip) and presenting a broad neutralizing response against homologous SHIV-C as well as heterologous HIV-1 strains of different subtypes [61] . Biopanning yielded clones similar to gp120 (V2 and V3 loops or C-terminal domain) or to regions of gp41 (ID GKLIC region, other ID regions and MPER domain) [55] . Remaining clones showed no significant homology to linear HIV-1 regions and were analyzed with the 3DEX software, which allowed the identification of a discontinuous mimotope located near the V3 loop crown. The antibodies binding to this phagotope were affinity-purified and subsequent assays demonstrated that recognition was conformation-dependent.
An immunofocused immunization of mice primed with a DNA vector coding for the gp160 SHIV1157ip and boosted with pools of phage particles corresponding to the V3 loop, the gp120 C-terminus, the gp41 ID region, the GKLIC region and the MPER domain was set up. Almost all mice developed anti-env Abs and 59% of them presented a neutralizing activity.
In 2009, Dieltjens et al. applied the phage display technology to identify the epitopes potentially involved in the BNtAbs response of an HIV-1 CRF02AG-infected individual (ITM4) and to monitor the evolution of humoral response and viral escape through the course of infection [56] . Biopanning of a 12-mer RPL against plasma samples from ITM4 resulted in the identification of different peptide sequences. Half of these sequences were homologous to linear epitopes on gp41, i.e., the 4E10 epitope region in the MPER domain (NWFNLTQTLMPR) or the lentivirus lytic peptide 2 (LLP2) (SLxxLRL) while the other peptides shared homologies with the C1 domain (KxWWxA) and the crown of the V3 loop (Kx 3 IGPHxxY) of gp120. Further analysis of the levels of reactivity of the phage groups against ITM4 six-year follow-up samples revealed different temporal patterns of recognition, confirming the dynamic nature of the immune response. Interestingly, the MPER region was the only epitope retaining immunogenic properties during this period.
In a more recent study, the same group investigated the antigenic landscape of an HIV-1 subtype A-infected individual with BNtAbs by screening an RPL library against a pool of sequential samples drawn from 1994 to 2005 [57] . The biopanning procedure yielded sequences predicted to represent autologous V2 sequence (Kx 3 Hx 3 Y), V3 loop (KxxHxGPx 3 F) and gp41 ID domain (CxGxLxCTxNxP). Again, follow-up sample recognition of the four phage groups showed different patterns. Antibody reactivity towards gp41 ID region fluctuated slightly in all plasma samples. Reactivity against the V3 loop-like phages decreased over time. In contrast, the V2 loop mimotopes were not recognized before 2001, but once emerged, reactivity persisted until 2005. Env sequence analysis of the follow-up samples showed that a Tyr to His mutation in the V2 loop sequence coincided with the emerging antibody response against this sequence. Additionally, the authors highlighted that the neutralizing activity observed in the samples was partially due to antibodies recognizing the V3 mimotopes.
Besides the multiple reports on the use of RPL to characterize the humoral response against HIV-1 Env proteins, Gupta et al. evaluated the reliability of using targeted antigen gene fragment libraries for the identification of epitopes recognized by antibodies elicited in rabbits immunized with p24. To this end, they constructed a phage library composed of DNAse-digested fragments of Gag DNA [58] . Phagotopes obtained after the first panning round displayed mainly 30-40-mer peptides, 70% of which mapped to of the N-terminus of p24 (150-240 of Gag) and 30% corresponded to the C-terminal region of p24 (310-360 of Gag). Only one phagotope mapped to the central region of Gag (269-310). At the end of the second round, selected phages displaying longer inserts of 40 to 50 AA corresponding to the N-and C-terminal regions of Gag were identified, revealing the presence of two distinct antigenic regions in Gag. This study demonstrated that gene-fragment phage display could be used to identify epitopes targeted by polyclonal Abs.