use prefix []or [-]not [+]and [=]has feature [!]exclude feature ie. 'interleukin-6 -animal +phenotypic =protein !tumor'

Displaying 10 papers, 41 pages, start at 1, 763 Hits
106 section matches

Abstract

in the P1 region and non-structural proteins in the P2 (2A-2C) and P3 regions (3A-3D) following proteolytic cleavage ( Figure 1B) . The viral capsid proteins VP1, VP2 and VP3 are displayed on the external structures of the EV-A71 viral particle whereas VP4 is found within the internal structures of the capsid [21] .
RNA viruses are known to replicate by low fidelity polymerases and have high mutation rates whereby the resulting virus population tends to exist as a distribution of mutants. In this review, we aim to explore how genetic events such as spontaneous mutations could alter the genomic organization of RNA viruses in such a way that they impact virus replications and plaque morphology. The phenomenon of quasispecies within a viral population is also discussed to reflect virulence and its implications for RNA viruses. An understanding of how such events occur will provide further evidence about whether there are molecular determinants for plaque morphology of RNA viruses or whether different plaque phenotypes arise due to the presence of quasispecies within a population. Ultimately this review gives an insight into whether the intrinsically high error rates due to the low fidelity of RNA polymerases is responsible for the variation in plaque morphology and diversity in virulence. This can be a useful tool in characterizing mechanisms that facilitate virus adaptation and evolution.

Introduction

With their diverse differences in size, structure, genome organization and replication strategies, RNA viruses are recognized as being highly mutatable [1] . Their high mutation rates make it very difficult for therapeutic interventions to work effectively and very often they develop resistance to antiviral drugs and antibodies elicited by vaccines [2, 3] . This poses a real threat to how emerging infectious agents could be prevented or treated [4] . The success of the evolution of RNA viruses arises from their capacity to utilize varying replication approaches and to adapt to a wide range of biological niches faced during viral spread in the host. One of the factors affecting the emergence or re-emergence of infectious diseases is the genetics of the infectious agents [1] .
There is some general consensus regarding quasispecies that have been established. For instance, the presence of diverse mutants in a population of viruses is a reality which can affect the biological behavior of the virus in vivo due to the complexity and amplitude of the mutant spectra [14] . Moreover, interactions amongst variants of a quasispecies population was classified into three types namely-cooperation, interference and complementation. Cooperative interactions arise from those variants exhibiting advantageous phenotypes compared to the wild-type while interfering interactions are those exemplified in variants with detrimental effects on the replication of the virus. However, complementation interactions have no positive or negative effects on the virus population [15] .
The phenomenon of quasispecies has been well reported for many viruses belonging to different families and genera [16] [17] [18] . There is no clear idea whether emerging viruses such as Zika virus, Ebola virus, West Nile virus, Dengue virus and many others owe part of their evolution to higher virulence conferred by the presence of specific quasispecies within the viral population. Exploring the underlying mechanism of virulence stemming from a quasispecies population remains of interest. In this review, we examine reported cases of quasispecies and their implications on virulence.

Poliovirus (PV)

Poliovirus is found within the Human Enterovirus C species of the Picornaviridae family and can be classified into three distinct serotypes (1, 2 and 3). Most poliovirus infections cause an asymptomatic incubation period followed by a minor illness characterized by fever, headache and sore throat which mainly affects children. However, PV infections can lead to paralytic poliomyelitis which can result in death. Following the WHO 1988 polio eradication program, the number of poliomyelitis has been reduced by 99% worldwide but a small number of countries still have sporadic outbreaks of polio [22] .
Poliovirus is found within the Human Enterovirus C species of the Picornaviridae family and can be classified into three distinct serotypes (1, 2 and 3). Most poliovirus infections cause an asymptomatic incubation period followed by a minor illness characterized by fever, headache and sore throat which mainly affects children. However, PV infections can lead to paralytic poliomyelitis which can result in death. Following the WHO 1988 polio eradication program, the number of poliomyelitis has been reduced by 99% worldwide but a small number of countries still have sporadic outbreaks of polio [22] .

Enterovirus 71 (EV-A71)

EV-A71 belongs to the genus Enterovirus within the family of Picornaviridae. It was first characterized in 1969 in California, USA [25] and is one of the main etiological agents of hand, foot and mouth disease (HFMD) [26] . Some cases of EV-A71 infections have been associated with neurological complications such as aseptic meningitis, brainstem encephalitis and acute flaccid paralysis [27] . In China, enteroviruses such as EV-A71 and CV-A16 have caused 7,200,092 cases of HFMD between 2008 to 2012. The mortality rate was highest among children below the age of five. It was further reported that 82,486 patients developed neurological complications and 1617 deaths were confirmed by the laboratory to be caused by EV-A71 [28, 29] .
The course of evolution through which EV-A71 evolves to escape the central nervous system (CNS) was investigated by complete sequencing and haplotype analysis of the strains isolated from the digestive system and the CNS. A novel bottleneck selection was revealed in various environments such as the respiratory system and the central nervous system throughout the dissemination of EV-A71 in the host. Consequently, a dominant haplotype resulting from the bottleneck effect caused a change from viruses harboring VP1-3D to VP1-31G where the amino acid 31 was a favorable site of selection among the circulating EV-A71 sub-genotype C2. VP1-31G was present at elevated levels amongst the population of mutants of EV-A71 in the throat swabs of subjects with severe EV-A71 infections. Furthermore, in vitro studies showed that VP1-31D virus isolates had higher infectivity, fitness and virion stability, which sustained the virus infections in the digestive system. Speculations were that such factors benefitted the virus in gaining added viral adaptation and subsequently enabled viral spread to more tissues. These beneficial abilities could also justify the reduced number of VP1-31D viruses located in the brain following positive selection. The VP1-31G viruses presenting the major haplotype in the central nervous system displayed increased viral fitness and growth rates in neuronal cells. This implied that the VP1-31G mutations aided the spread of the mutant virus in the brain which resulted in serious neurological complications in patients. It was speculated that the fluctuating degree of tissue tropism of EV-A71 at diverse inoculation sites resulted in the bottleneck effect of the viral population having a mutant spectrum. Hence, the adaptive VP1-31G haplotype became dominant in neuronal tissues and once the infection was achieved, VP1-31G viruses expedited bottleneck selection and propagation into the skin and CNS. Among the three minor haplotypes (C to E) which co-existed in various tissues, the minor haplotype C was isolated in the intestinal mucosa and throat swab specimens. The minor haplotype D was isolated from specimens obtained from the respiratory and digestive systems. However, the minor haplotype E appeared in throat swabs and the basal ganglia but not the intestinal mucosa, hence, suggesting that the intestinal mucosa is the initial replication site of the EV-A71. Collectively, these data showed that the EV-A71 quasispecies utilized the dynamic proportion of varying haplotype populations to co-exist, sustained the ability of the population to adapt and enabled the propagation in different tissues. Lastly, the study concluded that the selection of haplotype(s) might be a driving factor in viral dissemination and severity of infections in humans as well as the virulence in EV-A71 infected patients [30] .

Dengue Virus (DENV)

Dengue is a re-emerging arboviral infection transmitted by Aedes mosquitoes that infect up to 390 million people worldwide annually, of which 100 million infections are symptomatic [34] . The global incidence of dengue has grown dramatically in recent decades and about half of the world's population is now -at risk [35] . In particular, it represents a growing public health problem in many tropical and sub-tropical countries, mostly in urban and semi-urban areas. The viral etiology of dengue is characterized by biphasic fever, headache, pain in various parts of the body, prostration, rash, lymphadenopathy and leukopenia that affects young children and adults [36] . However, dengue infection can progress to severe dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). Severe dengue affects most Asian and Latin American countries and has become a leading cause of hospitalizations and deaths among children and adults in these regions. It is a lifethreatening disease with 500,000 cases being admitted to hospitals and 25,000 deaths yearly [37] .
Dengue is a re-emerging arboviral infection transmitted by Aedes mosquitoes that infect up to 390 million people worldwide annually, of which 100 million infections are symptomatic [34] . The global incidence of dengue has grown dramatically in recent decades and about half of the world's population is now-at risk [35] . In particular, it represents a growing public health problem in many tropical and sub-tropical countries, mostly in urban and semi-urban areas. The viral etiology of dengue is characterized by biphasic fever, headache, pain in various parts of the body, prostration, rash, lymphadenopathy and leukopenia that affects young children and adults [36] . However, dengue infection can progress to severe dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS). Severe dengue affects most Asian and Latin American countries and has become a leading cause of hospitalizations and deaths among children and adults in these regions. It is a life-threatening disease with 500,000 cases being admitted to hospitals and 25,000 deaths yearly [37] .

Zika Virus (ZIKV)

Zika Virus was first discovered in 1947 when it was isolated from Aedes Africanus mosquitoes [42] . It belongs to the Flavivirus genus within the Flaviviridae family. Zika infections have been reported in Egypt [43] , East Africa [44] , India [45] , Thailand, Vietnam [46] , Philippines and Malaysia [47] .
The quasispecies distribution of a ZIKV strain (ZIKV-SL1602) isolated from a 29-year-old female traveler was investigated. Data obtained from single molecule real time (SMRT) sequencing were aligned to a consensus sequence and 24,815,877 nucleotide sequences were read. Phylogenetic analysis was then performed and each nucleotide was analyzed to characterize the quasispecies composition of this clinical isolate. For each nucleotide position, the frequency of occurrence of each of the bases was determined and 3375 single-nucleotide variants (SNV) were detected. Interestingly, four variants of the quasispecies population were found to be present at a level of more than 1% of the total population. Mutations in the E protein accounted for 4.1% of the variants and other mutations in the non-structural region-8.2% in the NS2, 1.6% in the NS1 and 1.4% in the NS5 were detected. The phylogenetic data analysis also disclosed that ZIKV-SL1602 clustered within the Asian lineage in close proximity to the WNVs currently circulating in America. Every South American isolate was found to share similar ancestry with the French Polynesian isolates. Hence, it can be inferred that the current circulating South American clade stems from the island of French Polynesia [49] .

West Nile Virus (WNV)

West Nile Virus was first characterized in 1937 in the West Nile district of Uganda and was taxonomically placed in the genus Flavivirus within the Flaviviridae family. The virus later appeared in New York in 1999, where it caused 59 hospitalized infections and 7 deaths before its spread to other parts of the USA between 1999-2001 [50] . WNV survives naturally in a mosquito-bird-mosquito transmission cycle involving the Culex sp. mosquitoes [51] .

Chikungunya Virus (CHIKV)

However, these particular amino acid residues were observed in other CHIKV isolates previously [71] and the significance of these specific sequences for the small plaque phenotype becomes uncertain. Comparison of the entire genome of the CHK-S with other small plaque variants could provide a better understanding of the small plaque variants. In addition, investigation of reverse genetics can provide further insights into the role of specific mutations in the virulence of the CHK-S variant [73] .

Ebola Virus (EboV)

The Ebolavirus genus belongs to the Filoviridae family within the order Mononegavirales. Five species have been identified within the genus of Ebolavirus-Zaire (EBOV), Bundibugyo (BDBV), Sudan (SUDV), Tai Forest (TAFV) and Reston (RESTV) [81] . Among them, only the Reston virus (RESTV) is assumed to be non-pathogenic for humans. The other four classified as Ebolaviruses are well-known to cause the Ebola virus disease (EVD). The virus causes a severe fever along with systemic inflammation and damage to the endothelial cell barrier, leading to shock and multiple organ failure with high mortality rates in humans and animals [82] . It is transmitted to people from wild animals and spreads in the human population through human-to-human transmission [83] . However, the natural host reservoirs of Ebola viruses are unknown. The average Ebola virus disease (EVD) case fatality rate is around 50%. So far, the largest recorded EVD with 28,652 infections had killed 11,325 people [84] . The Zaire, Bundibugyo and Sudan Ebola viruses are involved in large outbreaks in Africa.
In addition, nucleotide mutations at positions A82V and P382T were present only in the Ebola virus glycoprotein from the new Ebola epidemic 2014 isolates. Mutation at position W291R was also found only in the 2014 isolate but at a very low frequency of occurrence [87] . Having a large number of mutations from previous outbreaks present with more than 90% frequency in the sequence of the new Ebola epidemic isolates as well as the emergence of new mutations (A82V, P382T and W291R) indicated the presence of viral quasispecies in the population. The high mutation rates found in a RNA quasispecies increased the probability of escape mutations and this could explain the escape of the 2014 viral isolates from neutralizing antibodies elicited by the old Ebola epidemic isolates. The structural analysis of the Ebola virus revealed the strong contribution of these residues in the three-
The Ebolavirus genus belongs to the Filoviridae family within the order Mononegavirales. Five species have been identified within the genus of Ebolavirus-Zaire (EBOV), Bundibugyo (BDBV), Sudan (SUDV), Tai Forest (TAFV) and Reston (RESTV) [81] . Among them, only the Reston virus (RESTV) is assumed to be non-pathogenic for humans. The other four classified as Ebolaviruses are well-known to cause the Ebola virus disease (EVD). The virus causes a severe fever along with systemic inflammation and damage to the endothelial cell barrier, leading to shock and multiple organ failure with high mortality rates in humans and animals [82] . It is transmitted to people from wild animals and spreads in the human population through human-to-human transmission [83] . However, the natural host reservoirs of Ebola viruses are unknown. The average Ebola virus disease (EVD) case fatality rate is around 50%. So far, the largest recorded EVD with 28,652 infections had killed 11,325 people [84] . The Zaire, Bundibugyo and Sudan Ebola viruses are involved in large outbreaks in Africa.
In addition, nucleotide mutations at positions A82V and P382T were present only in the Ebola virus glycoprotein from the new Ebola epidemic 2014 isolates. Mutation at position W291R was also found only in the 2014 isolate but at a very low frequency of occurrence [87] . Having a large number of mutations from previous outbreaks present with more than 90% frequency in the sequence of the new Ebola epidemic isolates as well as the emergence of new mutations (A82V, P382T and W291R) indicated the presence of viral quasispecies in the population. The high mutation rates found in a RNA quasispecies increased the probability of escape mutations and this could explain the escape of the 2014 viral isolates from neutralizing antibodies elicited by the old Ebola epidemic isolates. The structural analysis of the Ebola virus revealed the strong contribution of these residues in the three-dimensional rearrangement of the glycoprotein and they played an important role in the re-emergence of the new epidemic Ebola isolates in 2014.

Middle East Respiratory Syndrome Coronavirus (MERS-CoV)

Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, singlestranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5′-UTR and promoted translational shutoff against reporter plasmids with or without 5′-UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an aminoterminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ).
Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, single-stranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5 -UTR and promoted translational shutoff against reporter plasmids with or without 5 -UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an amino-terminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ).
Middle East respiratory syndrome (MERS) coronavirus is an enveloped, positive-sense, singlestranded RNA virus that was identified for the first time in 2012 in Saudi Arabia. The viral respiratory disease was caused by a novel coronavirus. The causative coronaviruses (CoV) belong to the lineage C of the Betacoronavirus within the family Coronaviridae. MERS-CoV can infect a broad range of mammals, including humans and is transmitted by the infected dromedary camels [98, 99] . Typical MERS symptoms are similar to the common flu but in some patients, pneumonia and gastrointestinal symptoms including diarrhea and organ failure were reported [100] . Since September 2012 to August 2018, 2253 MERS-CoV cases including 840 deaths were reported in 27 countries worldwide [101] . Approximately 35% of patients with MERS-CoV infection have died.
The nsp1 was reported to suppress protein synthesis by degrading the host mRNA but viral RNA could circumvent the nsp-1 mediated translational shutoff. Terada et al. showed that the double mutations (A9G/R13A) in the non-structural protein 1a (nsp1) affected viral propagation and the plaque morphology. The size of the plaque in the mutated MERS-CoV was smaller and the infectious titers and intracellular viral RNA were decreased in infected Huh7 or Vero cells when compared to the wild-type virus. The formation of the small plaque variant was due to impairment of viral replication via the disruption of the stem-loop (SL) structure of the RNA. In addition, analysis of the biological properties of the nsp1-A9G/R13A mutant showed that the mutant virus possessed low binding activity at the 5′-UTR and promoted translational shutoff against reporter plasmids with or without 5′-UTR [102] .
Alterations in the coronavirus spike glycoprotein by means of natural and experimentally induced mutations changed cell and organ tropism and virus pathogenicity. The wild-type MERS-CoV spike glycoprotein precursor contains 1353 amino acids arranged into two subunits-an aminoterminal subunit (S1) carrying the receptor binding domain (RBD) and a carboxy-terminal subunit (S2) containing the putative fusion peptide (FP/IFP), two heptad repeat domains (HR1/HR2) and the transmembrane (TM) and intracellular domains ( Figure 6 ). Lu et al. isolated a diverse population comprising the wild-type and a variant carrying a deletion of 530 nucleotides in the spike glycoprotein gene from the serum of a 75-year-old patient in Taif, Saudi Arabia. The patient subsequently died. Analysis of the MERS-CoV sequence showed an out of frame deletion which led to the loss a large part of the S2 subunit. It contained all the major structures of the membrane fusion in the S2 subunit preceding the early stop codon [103] and this also included the proposed fusion peptide (949-970 aa) [104] . The deletion resulted in the production of a shortened protein bearing only 801 amino acids. In the cell-free serum sample of the patient, mutant genomes with the S530∆ were abundant with an estimated ratio of 4:1 deleted to intact sequence reads. The spike gene deletion would cause the production of a defective virus which was incapable of causing infections or with a lowered rate of infection. Losing the S2 subunit caused a disruption in the membrane holding the spike protein and halted the fusion of the virus to the host. However, in the case of the mutant bearing the S530∆, the mutation helped to sustain the wild-type MERS-CoV infection by producing a free S1 subunit with a "sticky" hydrophobic tail and the additional disulfide bonds caused the aggregation and mis-folding of proteins. In addition, the mutated S530∆ could form steady trimer complexes that retained biding affinity for the dipeptidyl peptidase 4 (DPP4) and acted as a decoy such that the spike-specific MERS-CoV neutralizing antibodies were blocked.
Scobey et al. reported the T1015N mutation in the spike glycoprotein during 9 passages of the virus was able to alter the growth kinetics and plaque morphology in vitro. The mutated MERS-CoV virus (MERS-CoV T1015N) replicated approximately 0.5 log more effectively and formed larger plaques compared to the wild type (MERS-CoV). The data suggested that the mutation T1015N was a tissue culture-adapted mutation that arose during serial in vitro passages [107] . The whole genome sequencing of MERS-CoV revealed the presence of sequence variants within the isolate from dromedary camels (DC) which indicated the existence of quasispecies present within the animal. A single amino acid (A520S) was located in the receptor-binding domain of the MERS-CoV variant. Strikingly, when detailed population analysis was performed on samples recovered from human cases, only clonal genomic sequences were reported. Therefore, the study speculated that a model of interspecies transmission of MERS-CoV whereby specific genotypes were able to overcome the bottleneck selection. While host susceptibility to infection is not taken into account in this setting, the findings provided insights into understanding the unique and rare cases of human of MERS-CoV [108] .

Paramyxoviridae

Paramyxoviridae is a family of viruses in the order Mononegavirales that uses vertebrates as their natural hosts. Currently, 72 species are placed in this family and they are divided amongst 14 genera [109] . Diseases associated with Paramyxoviridae included measles (MeV), mumps and Newcastle disease (NDV). Paramyxoviridae virions are enveloped and pleomorphic which are presented as spherical or filamentous particles with diameters of around 150 to 350 nm ( Figure 7A ). The genome is linear, negative-sense single-stranded RNA, about 15-19 kb in length and encode 9-12 proteins through the production of multiple proteins from the P gene ( Figure 7B ) [110] . On the external surface of the virion, glycoproteins possessing hemagglutinin, neuraminidase and cell fusion activities are present. The middle component of the envelope is a lipid bilayer acquired from the host cell as the virus buds off the cytoplasmic membrane. The innermost surface of the envelope is a non-glycosylated membrane protein layer that maintains the outer structure of the virus. The paramyxoviruses can be characterized by the gene order of the viral proteins and by the biochemical characteristics of the proteins associated with viral attachment. Figure 7A ). The genome is linear, negative-sense single-stranded RNA, about 15-19 kb in length and encode 9-12 proteins through the production of multiple proteins from the P gene ( Figure 7B ) [110] . On the external surface of the virion, glycoproteins possessing hemagglutinin, neuraminidase and cell fusion activities are present. The middle component of the envelope is a lipid bilayer acquired from the host cell as the virus buds off the cytoplasmic membrane. The innermost surface of the envelope is a nonglycosylated membrane protein layer that maintains the outer structure of the virus. The paramyxoviruses can be characterized by the gene order of the viral proteins and by the biochemical characteristics of the proteins associated with viral attachment. The L protein which is the catalytic subunit of RNA-dependent RNA polymerase (RDRP) is associated with the nucleocapsid protein (N) and phosphoprotein (P) to form part of the RNA polymerase complex. The RNA polymerase complex is covered by the viral envelope consisting of a matrix protein (M) and two glycosylated envelope spike proteins, a fusion protein (F) and cell attachment protein. Cell attachment protein is different based on the genera and it could be hemagglutinin (H in Measles), hemagglutinin-neuraminidase (HN in Mumps and NDV viruses) or glycoprotein G (Henipavirus). Some genera within the Paramyxoviridae family also contain various conserved proteins including the non-structural proteins (C, NS1, NS2), a cysteine-rich protein (V), a small integral membrane protein (SH) and transcription factors M2-1 and M2-2 [111] .

Measles Virus (MeV)

Early investigations of MeV infections in the HeLa cells with a vaccine-lineage MeV estimated an intra-population diversity of 6-9 positions per genome [121] . This led to the concept that MeV exists as quasispecies in a population. Donohue et al. discovered that MeV was able to adapt and grow in either of the two cellular environments, viz. lymphocytic (Granta-519) or epithelial (H358) cells. Passaging the MeV in these two different cell lines resulted in variants exhibiting different replication kinetics. Deep sequencing of the lymphocytic adapted variants demonstrated an increasing number of variants showing mutations within the 11-nucleotide region in the middle of the phosphoprotein (P) gene. This sequence mediated the polymerase split and caused an insertion of a pseudo-templated guanosine to the P mRNA, causing a replacement of polymerase cofactor (P) with a type I interferon antagonist (V). The two different variants (lymphocytic and epithelial adapted) had different levels of P and V expressions. It was suggested that the equilibration of the viral quasispecies in the population was based on different V protein expression. Lymphocytic derived MeV variants that exhibited V competent genomes were found at a low frequency for adaptation in epithelial cells. Moreover, a complete wipe out of the V-deficient genomes considerably reduced antiviral innate defense mechanism, suggesting that a good equilibrium of the V and P protein expressions is necessary within the quasispecies population [18] .

Newcastle Disease Virus (NDV)

Newcastle disease virus (NDV) belongs to the genus Avulavirus in the family Paramyxoviridae of the order Mononegavirales. NDV is an avian pathogen that can be transmitted to humans and cause conjunctivitis and an influenza like disease [124] . Clinical diseases affecting the neurological, gastrointestinal, reproductive and respiratory systems are detected in naïve, unvaccinated or poorly vaccinated birds [125] . NDV is a continuous problem for poultry producers since it was identified ninety years ago. It has negatively impacted the economic livelihoods and human welfare through reducing food supplies and many countries were affected since 1926 with NDV outbreaks [126] .
NDV strains are categorized based on their pathogenicity in chickens as highly virulent (velogenic), intermediately virulent (mesogenic) or nonvirulent (lentogenic). These levels of pathogenicity can be differentiated by the amino acid sequence of the cleavage site in the fusion protein (F0). Lentogenic NDV strains have dibasic amino acids at the cleavage site whereas the velogenic strains contain polybasic residues. Meng et al. studied the changes in virulence of NDV strains, leading to a switch in lentogenic variant (JS10) to velogenic variant (JS10-A10) through 10 serial passages of the virus in chicken air sacs [128] . However, the lentogenic variants (JS10) remained lentogenic after 20 serial passages in chicken embryos (JS10-E20). The nearly identical genome sequences of JS10, JS10-A10 and JS10-E20 showed that after passaging, both variants were directly generated from the parental strain (JS10). Genome sequence analysis of the F0 cleavage site of the parental strain and the passaged variants revealed that the rise in virulence observed in the parental strain (JS10) stemmed from a build-up of velogenic quasispecies population together with a gradual disappearance of the lentogenic quasispecies. The decline of the lentogenic F0 genotypes of 112 -E(G)RQE(G)RL-117 from 99.30% to 0.28% and the rise of the velogenic F0 genotypes of 112 -R(K)RQR(K)RF-117 from 0.34% to 94.87% after 10 serial passages in air sacs was hypothesized to be due to the emergence of velogenic F0 genotypes. Subsequently, this led to the enhancement of virulence in JS10-A10. The data indicated that lentogenic NDV strains circulating among poultry could lead to evolution of the velogenic NDV strain. This velogenic NDV strain has the potential to cause outbreaks due to the difficulty in preventing contact between natural waterfowl reservoirs and sensitive poultry operations.
NDV quasispecies comprised lentogenic and velogenic genomes in various proportions. The change in virulence of the quasispecies composition of JS10 and its variants was investigated by analysis of viral population dynamics. The F0 cleavage site was reported to be the main region in which the majority of amino acid changes had occurred and resulting in an accumulation of variants exhibiting velogenic properties due to serial passages. Furthermore, passaging of the virus caused a transition in the degree of virulence of NDV strains from lentogenic to mesogenic and ultimately an increase of the velogenic type. Therefore, NDV pathogenesis could be controlled by the ratio of avirulent to virulent genomes and their interactions within the chicken air sacs and the embryo. The data clearly demonstrated that the status of the quasispecies population is dependent on the pathogenicity of the NDV [128] . Gould et al. reported the presence of the F0 cleavage sequences of 112 -RRQRRF- 117 and HN extensions of 45 amino acids in virulent Australian NDV strains [129] . Furthermore, the genome analysis of the avirulent field isolates of NDV puts forth the existence of viruses with virulent F0 sequences without causing obvious clinical signs of the disease [130] . Subsequently, Kattenbelt et al. studied the underlying causes that could affect the balance of virulent (pp-PR32 Vir) and avirulent (pp-PR32 Avir) variants throughout viral infections. The variability of the quasispecies population and the rate of accumulation of mutations in vivo and in vitro were analyzed. The in vivo analysis showed that both virulent and avirulent plaque-purified variants displayed a rise in the variability of quasispecies from 26% and 39%, respectively. The error rate in the viral sequences was observed to increase as well, such that one bird out of three displayed virulent viral characteristics ( 112 -RRQRRF-117 ) after passaging of the PR-32 Avir variant. Genome analysis following the in vivo study revealed that a single base mutation occurring in the F0 region led to the switch from RRQGRF to RRQRRF.
On the other hand, in vitro studies showed that the quasispecies distribution of the avirulent isolate harbored 10% of variants bearing the virulent F0 region (RRQRRF). Gene sequence analysis of Australian NDV isolates showed the existence of a novel clade of NDV viruses with the F0 cleavage site sequence of 112 -RKQGRL-117 and the HN region bearing seven additional amino acids. Four field isolates (NG2, NG4, Q2-88 and Q4-88) belonging to the novel clade were propagated for a longer time period in CEF cells prior to sequencing. Analysis revealed the existence of 1-2% of virulent strains with the F0 cleavage site of 112 -RKGRRF-117 in the population [131] .
Quasispecies analysis of all the NDV field isolates in this study showed variable ratios (1:4-1:4000) of virulent to avirulent viral F0 sequences. However, these sequences remained constant in the quasispecies population during replication. It was concluded that the virulent strains present in the quasispecies population did not emerge from an avirulent viral population unless the quasispecies population was placed under direct selective pressure, either by previous infection of the host by other avian viruses or by transient immunosuppression [131] .

Pneumoviridae

The Pneumoviridae family contains large enveloped negative-sense RNA viruses. Previously, this taxon was known as a subfamily of the Paramyxoviridae but it was reclassified in 2016 as a family of its own with two genera, Orthopneumovirus and Metapneumovirus. Some viruses belonging to Pneumoviridae family are only pathogenic to humans, such as the human respiratory syncytial virus (HRSV) and human metapneumovirus (HMPV). Human pneumoviruses do not have animal reservoirs and their primary site of infection is the superficial epithelial cells of the respiratory tract. There are no known vectors for pneumoviruses and transmission is thought to be primarily by aerosol droplets [132] .
The virions of the pneumoviruses are enveloped with a spherical shape and a diameter of about 150 nm. They have a negative-sense RNA genome of 13 to 15 kb ( Figure 8A ). The RNA-dependent RNA polymerase (L) binds to the genome at the leader region and sequentially transcribes each gene. The cellular translation machinery translates the capped and poly-adenylated messenger RNA of the virus in the cytoplasm. Members of the genus Orthopneumovirus possess 10 genes including NS1 and NS2 which are promoter proximal to the N gene. The gene order is NS1-NS2-N-P-M-SH-G-F-M2-L ( Figure 8B ). Alignment of the L proteins showed moderate conservation of the sequences between the human and bovine viruses. Bovine respiratory syncytial virus (BRSV) differs from HRSV in host range and the two viruses bear substantially similar sequences as well as antigenic relatedness [132] . Figure 8B ). Alignment of the L proteins showed moderate conservation of the sequences between the human and bovine viruses. Bovine respiratory syncytial virus (BRSV) differs from HRSV in host range and the two viruses bear substantially similar sequences as well as antigenic relatedness [132] .

Respiratory Syncytial Virus (RSV)

Deplanche et al. studied the BRSV and evaluated the genetic stability of BRSV in cell cultures by analyzing the consensus nucleotide sequences of the highly variable glycoprotein G. The BRSV strain W2-00131 was isolated from a calf with respiratory distress syndrome (BAL-T) and was further propagated in bovine turbinate (BT) cells. The genomic region of the BRSV that encodes for the highly variable glycoprotein G showed constant genetic stability for the three variants (3Cp3, 3Cp9 and 3Cp10) after ten continuous passages in BT cells and after in vivo studies [137] . This led to further analysis of the quasispecies population derived from this field isolate. Genomic analysis of more mutants showed that the G-coding region displayed significant variability with mutations ranging from 6.8 × 10 −4 to 10.1 × 10 −4 substitutions per nucleotide in vitro and in vivo.
The majority of the mutations reported previously were present in the W2-00131 RNA populations. A large dominance of non-synonymous over synonymous mutations was observed in all BRSV mutants. The non-synonymous mutations mapped preferentially within the two variable antigenic regions of the ectodomain or close to the highly conserved domain in the G protein [137] . These results suggested that BRSV populations might have evolved as complex and dynamic mutant swarms, despite apparent genetic stability.

Orthomyxoviridae

The family Orthomyxoviridae belongs to the order of Articulavirales and contains seven genera-Influenza A-D, Isavirus, Thogotovirus and Quaranjovirus. The virions within the Orthomyxoviridae family are usually spherical but can be filamentous, 80-120 nm in diameter ( Figure 9A ). The influenza virus genome is 12-15 kb and contains 8 segments of negative-sense, single-stranded RNA which encodes for 11 proteins (HA, NA, NP, M1, M2, NS1, NEP, PA, PB1 and PB2) ( Figure 9B ). Influenza viruses are pathogenic and they can cause influenza in vertebrates, including birds, humans and other mammals [139] . The genome fragments contain both the 5 and 3 terminal repeats which are highly conserved throughout all eight fragments.

Influenza Virus (IV)

The annual influenza epidemics caused about 3 to 5 million cases of severe illness with 290,000 to 680,000 deaths worldwide [141] . Current influenza vaccines have sub-optimal efficacy, as there was a lack of antigenic proximity between the vaccine candidate and the circulating seasonal influenza virus strains. During the 2016-2017 influenza epidemic, the influenza A (H3N2) viruses from the clade 3c.2a were dominant and was associated with severe onset of the disease. The low vaccine efficacy of the 2016-2017 egg-adapted H3N2 (clade 3c.2a) vaccine strain A/Hong Kong/4801/2014 was reported to be due to altered antigenicity [142] . To understand the pathogenesis of A(H3N2) viruses from the 3a.2c clade, it would be of great interest to consider if each infection was being caused by an individual strain or by a swarm of genetically related viruses (quasispecies). This would help to provide an insight into the vaccine coverage and efficacy.
The annual influenza epidemics caused about 3 to 5 million cases of severe illness with 290,000 to 680,000 deaths worldwide [141] . Current influenza vaccines have sub-optimal efficacy, as there was a lack of antigenic proximity between the vaccine candidate and the circulating seasonal influenza virus strains. During the 2016-2017 influenza epidemic, the influenza A (H3N2) viruses from the clade 3c.2a were dominant and was associated with severe onset of the disease. The low vaccine efficacy of the 2016-2017 egg-adapted H3N2 (clade 3c.2a) vaccine strain A/Hong Kong/4801/2014 was reported to be due to altered antigenicity [142] . To understand the pathogenesis of A(H3N2) viruses from the 3a.2c clade, it would be of great interest to consider if each infection was being caused by an individual strain or by a swarm of genetically related viruses (quasispecies). This would help to provide an insight into the vaccine coverage and efficacy.
A study aimed to identify the key mechanisms contributing towards co-pathogenesis of BALB/c mice infected with the A(H1N1) quasispecies. It was revealed that the co-evolution of the quasispecies brought about a complex response due to different expressions of the biphasic gene. A significant upregulation of the Ifng was associated with an increased majority of mutants expressing a differentially expressed gene (DEG) named HA-G222 gene. This correlated with the increased levels of pro-inflammatory response observed in the lungs of the mice infected with the quasispecies A(H1N1) [147] . Serial passages of the H1N1 virus was also carried out prior to the analysis of its sequential replication, virulence and rate of transmission. Sequence analysis of the quasispecies in the viral population revealed that from the ninth passage onwards, the presence of five amino acid mutations (A469T, 1129T, N329D, N205K and T48N) in the various gene segments (PB1, PA, NA, NS1 and NEP) was detected. Furthermore, mutations located within the HA region indicated that the genetic makeup of the viral quasispecies was distinctly different in the upper and lower respiratory tracts of the infected pigs [148] .

Hepadnaviridae

Hepadnaviruses can be found within the family Hepadnaviridae. They are further classified into two genera-the mammalian genus Orthohepadnavirus and the avian genus Avihepadnavirus [149] . These viruses are spherical with 42-50 nm diameter and replicate their genomes with the help of a reverse transcriptase (RT) ( Figure 10A ). The approximate size of the DNA genome is 3.3 kb with a relaxed circular DNA (rcDNA) supported by base pairing complementary overlaps [150] . The DNA genome is made up of four partly or completely overlapping ORFs that encode for the core protein (Core and preCore), surface antigen protein (PreS1, PreS2 and S), the reverse transcriptase (Pol protein) and the X transcriptional transactivator protein [151] (Figure 10B ). Replication occurs by reverse transcription of the progenitor RNA by the RNA polymerase II from the covalently close circular form of the HBV DNA [152] .

Hepatitis B Virus (HBV)

Sera collected from two remote rural communities of Nigeria showed that 11% of the population were actively infected with HBV despite limited contacts with other populations. The high prevalence of HBV infection suggested that the transmission of the parental strain introduced was very efficient within the two selected communities. Further analysis showed that the HBV variants belonged to either genotype A or E, with the predominant genotype HBV/E, having a higher prevalence of 96.4%. Subsequent analysis of HBV quasispecies from 24 residents showed that each individual was infected with many different HBV variants from genotypes D and G. This added complexity to the circulating population of HBV in each community. A large network of common sequences was observed among individuals of the community and this is considered as proof of transmission. Furthermore, this pattern of HBV transmission was hypothesized to be linked to recurrent infections with multiple HBV variants or widespread superinfection with varying HBV variants. The close link observed thus far between the HBV/E variants and the recurrent sharing of HBV sequences among individuals made it difficult to clearly distinguish between the HBV/E variants. This suggested that the population of variants could be considered as one swarm of HBV evolving in many different hosts. The coalescent analysis also suggested that the related HBV/E variants in the community originated from one individual HBV variant that was prevalent many years prior to the parental strain being introduced in that community [157] .

Introduction

Only about 15 viral diseases can be effectively prevented through FDA approved vaccinations [5] . This attests to the urgency to understand the mechanisms by which viruses can overcome the different pressures applied to restrict their replications. In the last four decades, breakthroughs in molecular biology have favored in-depth analysis of virus isolates. Findings from other studies have suggested that populations of RNA viruses are divergent and favor an active evolution of RNA genomes. The quick evolution of RNA genomes could lead to variant sequences that differ by one or two nucleotides from the wild-type sequence in the population. It is further suggested that each viral RNA population of 10 9 or more infectious particles was always a mixture of various variants despite being isolated from a single clone [6] .
Such heterogeneity within the virus population could be explained by the existence of not a single genotype within the species but rather an ensemble of related sequences known as the quasispecies [7] . Developed by Manfred Eigen and Peter Schuster, the concept of quasispecies was defined as a mutant distribution dominated by a primary sequence with the highest rates of replication between the components of the mutant distribution. The phenomenon of quasispecies was further supported by the "hypercycle" theory as a self-organization principle to include different quasispecies in a higher-order organization that eases evolution into more complicated forms such that the coding capacity and catalytic activities of proteins are taken into account [8] [9] [10] .
Mutants with varying levels of infectivity generated from a mutated gene occurred regularly in a virus population due to the high mutation rates [6] . The error-prone replication ability of RNA viruses and the shorter generation times can be used to explain the variations in evolution rates between DNA and RNA viruses. While mutation rates for DNA genomes have been estimated to be between 10 −7 and 10 −11 per base pair per replication [11] , the RNA dependent RNA polymerase (RdRp) showed typically low fidelity whereby the mutation rate is of roughly 10 −4 mutations per nucleotide copied, which is greater than that of almost all DNA viruses [7, 12, 13] . This characteristic of the RNA polymerase in RNA viruses led to the generation of diverse offspring with different genotypes in shorter generation times.
Considerably less is known of the relationships between the evolution of RNA viruses with respect to virulence. The dynamics of quasispecies has explained the failure of monotherapy and synthetic antiviral vaccine but opened up new avenues for exploration [14] . Specifically, unanswered questions pertaining to quasispecies remain-What are the underlying mutations responsible for long term tenacity compared to those of extinction? Are there any molecular determinants which are the root cause of higher virulence in a quasispecies population?

Picornaviridae

Viruses of the family Picornaviridae can be classified into genera such as Enterovirus, Parechovirus, Aphotovirus and others. The virion is made up of a non-enveloped capsid of 30 nm surrounding a core positive stranded ssRNA genome ( Figure 1A ) [19] . The genome is approximately 7 kb in size and possesses a single long ORF flanked on both ends by the 5 -non-translated region (5 -NTR) and the 3 -non-translated region [20] . The 5 -NTR has an internal ribosome entry site (IRES) which controls cap-independent translation. The ORF comprising 6579 nucleotides can be classified into three polyprotein regions, namely, P1, P2 and P3. They encode for structural proteins (VP1 to VP4) in the P1 region and non-structural proteins in the P2 (2A-2C) and P3 regions (3A-3D) following proteolytic cleavage ( Figure 1B) . The viral capsid proteins VP1, VP2 and VP3 are displayed on the external structures of the EV-A71 viral particle whereas VP4 is found within the internal structures of the capsid [21] . (7.4 Kb) . The Open Reading Frame (ORF) contains the structural viral protein P1 which is cleaved to yield VP1, VP2, VP3 and VP4 and non-structural viral proteins P2 (cleaved to yield 2A, 2B and 2C) and P3 (cleaved to yield 3A, 3B, 3C and 3D). The 3′-NTR end of the genome contains the poly (A) tail.

Poliovirus (PV)

The poliovirus population diversity was evaluated in the brain of the murine model during viral spread. It was observed that only a fraction of the original injected viral pool was able to move from the initial site of inoculation to the brain via the 'bottleneck effect.' To determine the maintenance of the quasispecies during infection in vivo, 6-10 weeks old mice were inoculated in the leg with individual viruses. Total RNA recovered from the brain tissues revealed that four viruses were shown to be capable of spreading to the brain with their introduced mutations unchanged. Therefore, it was postulated that the innate immune response reduced the viral pathogenicity by limiting the diversity of viruses during spread to vulnerable tissues [16] . Two mechanisms to explain the bottleneck effect have been speculated, namely-the "tough-transit" model and the "burned-bridge" model. The "tough-transit" model suggests that virus trafficking within the murine model has a low probability of success passing the blood-brain barrier. However, once in the CNS, it acts as a founder virus and re-establishes a population with initial limited diversity. On the other hand, the "burned bridge" model stipulates that it is not tough for the virus to physically reach the blood-brain barrier. Thus, when the first few viruses reach the gateway to the brain, the host innate immune response triggers an antiviral state [16] .
The poliovirus population diversity was evaluated in the brain of the murine model during viral spread. It was observed that only a fraction of the original injected viral pool was able to move from the initial site of inoculation to the brain via the 'bottleneck effect.' To determine the maintenance of the quasispecies during infection in vivo, 6-10 weeks old mice were inoculated in the leg with individual viruses. Total RNA recovered from the brain tissues revealed that four viruses were shown to be capable of spreading to the brain with their introduced mutations unchanged. Therefore, it was postulated that the innate immune response reduced the viral pathogenicity by limiting the diversity of viruses during spread to vulnerable tissues [16] . Two mechanisms to explain the bottleneck effect have been speculated, namely-the "tough-transit" model and the "burned-bridge" model. The "tough-transit" model suggests that virus trafficking within the murine model has a low probability of success passing the blood-brain barrier. However, once in the CNS, it acts as a founder virus and re-establishes a population with initial limited diversity. On the other hand, the "burned bridge" model stipulates that it is not tough for the virus to physically reach the blood-brain barrier. Thus, when the first few viruses reach the gateway to the brain, the host innate immune response triggers an antiviral state [16] .

Flaviviridae

Of the Flaviviridae family (genera Flavivirus, Pestivirus, Pegivirus and Hepacivirus), there are 89 animal viruses with a small, positive-sense, single stranded RNA genome [31] . The virions are 40-60 nm in diameter, spherical in shape and contain a lipid envelope (Figure 2A ). The majority of these viruses are arthropod-borne and transmitted via infected mosquitoes and ticks. They are considered as emerging and re-emerging pathogens such as dengue virus (DENV), West Nile Virus (WNV), Zika Virus (ZIKV) and these viruses pose a global threat to public health by causing significant mortality [32] . The flaviviral genome is approximately 11 kb and has a single open reading frame (ORF), which is flanked by untranslated regions (5 and 3 NTR). The ORF encodes three structural proteins (C, M and E) and 7 non-structural proteins (NS). The non-structural proteins include large, highly conserved proteins NS1, NS3 and NS5 and four small hydrophobic proteins NS2A, NS2B and NS4A and NS4B ( Figure 2B ) [33] . quasispecies utilized the dynamic proportion of varying haplotype populations to co-exist, sustained the ability of the population to adapt and enabled the propagation in different tissues. Lastly, the study concluded that the selection of haplotype(s) might be a driving factor in viral dissemination and severity of infections in humans as well as the virulence in EV-A71 infected patients [30] .
Of the Flaviviridae family (genera Flavivirus, Pestivirus, Pegivirus and Hepacivirus), there are 89 animal viruses with a small, positive-sense, single stranded RNA genome [31] . The virions are 40-60 nm in diameter, spherical in shape and contain a lipid envelope (Figure 2A ). The majority of these viruses are arthropod-borne and transmitted via infected mosquitoes and ticks. They are considered as emerging and re-emerging pathogens such as dengue virus (DENV), West Nile Virus (WNV), Zika Virus (ZIKV) and these viruses pose a global threat to public health by causing significant mortality [32] . The flaviviral genome is approximately 11kb and has a single open reading frame (ORF), which is flanked by untranslated regions (5′ and 3′ NTR). The ORF encodes three structural proteins (C, M and E) and 7 non-structural proteins (NS). The non-structural proteins include large, highly conserved proteins NS1, NS3 and NS5 and four small hydrophobic proteins NS2A, NS2B and NS4A and NS4B ( Figure 2B ) [33] .

Dengue Virus (DENV)

A previous study compared a DENV vaccine candidate strain, DENV-3 PGMK30FRhL3 (PGMK30), which produced acute febrile illnesses with another clinically safe DENV vaccine candidate DENV-2PDK53 (PDK53). Using in vitro and in vivo approaches, the infectivity of the two vaccine strains was investigated to assess the molecular determinants of plaque size. It was revealed that the small plaque displayed by the PGMK30 strain in BHK-21 cells was due to its reduced in vitro growth rate. On the other hand, the PDK53 strain which produced the small plaques was observed to grow rapidly but was unable to evade antiviral responses which restricted its ability to spread. The slow growth rates of the two strains were suggested to be due to two different key mechanisms-the growth of PDK53 appeared to be modulated by antiviral responses while PGMK30 was slow to spread to surrounding cells but was able to evade immune detection. It was hypothesized that if the plaque size of PDK53 was hindered by the antiviral response, interfering with its activation by silencing pSTAT1 would be expected to alter the plaque characteristics of the PDK53 but not PGMK30 or the wild-type. In line with the hypothesis, a marked increase in pSTAT1 was shown in cells surrounding the foci of PDK53 infection but no increase was detected in PGMK30 and the wild-type. At least two different mechanisms dictate the plaque phenotype and elucidating the exact mechanism of how it caused the formation of small plaque size is an efficient way to choose future live-attenuated vaccine strains for clinical development [38] .
From another perspective, differences in the envelope (E) gene sequence was investigated using the plasma samples of six DENV infected patients. The first account of viral quasispecies of DENV in vivo was reported using clonal sequencing analysis whereby the simultaneous occurrence of diverse variant genomes was observed. The degree of genetic diversity was revealed to fluctuate among patients with the mean proportion being 1.67%. Moreover, out of 10 clones derived from dengue infected plasma, 33 nucleotide substitutions were detected, of which 30 were non-synonymous mutations. Of particular interest, mutations at amino acid residues 290 and 301 resulted in the presence of two stop codons which indicated that genome-defective dengue viruses (5.8%) were also present within the quasispecies population. It was hypothesized that this might have significant impact on the pathogenesis of the dengue virus [39] . Recently, Parameswaran et al. profiled the intra-host viral diversity of samples from 77 patients via whole-genome amplifications of the entire coding region of the DENV-3 genome. A significant difference in the viral makeup between naïve subjects and patients with DENV-3 immunity revealed that the immune repertoire of the host is responsible for the degree of diversity exhibited by the viral population. Subsequently, identification of the hotspots responsible for the intra-host diversity revealed that few spots were crucial for intra-host diversity. The major hotspots for diversity were revealed in more than 59% of the samples at three codon coordinates-amino acid residues 100 and 101 in the M protein and residue 315 in the AB loop of the E Domain III. The residue E 315 was speculated to have arisen as an immune escape variant in response to the pressure exerted by the immune defense mechanism. These findings highlighted the importance of host-specific selection pressures in the evolution of DENV-3 viral population within the host and this could eventually lead to the intelligent design of a vaccine candidate identified from the prevalent escape variants such as those bearing the E 315 [40] . It was reported that within the quasispecies population, amino acid substitutions occurred on the surface of the E protein which was involved in interactions with other oligomers, antibodies and host cell receptors. In particular, two amino acid substitutions at positions E452 and E455 were mapped to the E protein transmembrane domain, E450 to E472, which functioned as the membrane anchor for E protein. Intra-host quasispecies analysis using the E gene sequences also identified several amino acids on the surface of the E protein which altered the properties of the virus. The conformational rearrangements that led to the fusion of the virus and the host cell membrane was altered. The amino acids detected in the quasispecies consensus sequence were observed to be less frequent in the E proteins from patients suffering from mild disease than from patients with severe onset of dengue infection. Thus, the quasispecies might harbor specific variants that are crucial for the pathogenesis of the disease [41] . Understanding the significant molecular determinant of pathogenesis through the analysis of quasispecies could lead to the rational design of a DENV vaccine.

Zika Virus (ZIKV)

An Asian/American lineage ZIKA virus (ZIKV) formed 2 types of plaques-large and small. The large plaque variant was observed to have faster growth kinetics compared to the small plaque variant. Sequencing of the plaque variants showed that the large plaque variant had a guanine nucleotide at position 796 (230 Gln ) while the small plaque clone had an adenine at the same position. A recombinant clone carrying the G796A mutation was produced using an infectious molecular clone of the ZIKV MR766 strain. The plaque size produced by the recombinant clone was smaller when compared to the parental strain and its growth rate was significantly reduced in Vero cells. In vivo studies demonstrated that the virulence of the MR766 strain in IFNAR1 mice had decreased, showing that the mutation at position 230 in the -M protein is a molecular determinant of plaque morphology, growth property and virulence in mice [48] .

West Nile Virus (WNV)

A small-plaque (SP) variant was picked from a mutant population of WNV isolated from an American crow in New York in 2000. Characterization of this variant in mammalian, avian and mosquito cell lines led to the discovery that the SP variant contained four nucleotides in its genome that differed from the wild-type genome. Two nucleotide changes led to non-synonymous mutations where there was a P54S change in the prM protein and a V61A change in the NS2A protein. Further analysis of the mutations revealed that deletion at the cleavage site of the prM site did not affect virus replication and its release from mammalian BHK cells. However, the progeny of this virus was no longer able to infect BHK cells. A mutation in the prM region of the TBEV was also reported to cause decreased secretion of virus particles with no effect on protein folding. Lower neurovirulence and neuroinvasiveness were reported when mutation A30P occurred in the NS2A region of the isolate. Further sequencing of the isolate showed that most of the small plaque clones initially isolated reverted back to their wild-type sequence at position 625 in the prM region. Remaining isolates reverted at position 3707 in the NS2A region. These findings suggested that the mutation present in the prM region could be responsible for the phenotype of the small plaque. It is probable that the mutation in the NS2A region was responsible for the determination of the plaque size as the mutation in the prM region was sufficient to revert the isolate to the wild-type phenotype [52] .
The genetic diversity of WNV in the avian host was also investigated using next-generation sequencing. The aim was to explore whether the genetically homogeneous cloned virus would go through genetic diversification after passages in young SPF chickens and wild juvenile carrion crows. Data collected revealed that the WNV population showed significant heterogeneity diverging from the quasispecies structure of the initial viral inoculum in both animal models. However, in-depth analysis enacted a comparison between the infection model (SPF chicken and wild juvenile carrion crows) to assess the variations in genetic diversity. It was demonstrated that the WNV genetic diversifications varied significantly from the inoculum in crows with 18 genetic variants but exhibited suboptimal levels of diversifications among the chickens with only 3 single nucleotide variants (SNV) being detected. Hence, natural WNV-susceptible avian hosts could provide a selective setting and contributed to genetic diversifications. NGS technologies have enabled the analysis of WNV quasispecies dynamics, leading to a better understanding of the virus and shed some light on its mechanism of pathogenicity [53] .

Togaviridae

Viruses from the Togaviridae family can be further classified into the genus Alphavirus and Rubivirus. Alphaviruses are anthropod-borne viruses [54] and they formed icosahedral particles of about 70 nm with a lipid envelope ( Figure 3A ) [55] . The spikes of the virion are made up of E1 and E2 glycoproteins organized in a T4 icosahedral lattice of 80 trimers. The alphavirus virion carries a positive single stranded RNA of approximately 11-12 kb as the genetic material [54] . The RNA has a 5 -methylated nucleotide cap and a polyadenylated 3 end. The viral genome is translated into three structural proteins (CP, E2 and E1) and four non-structural proteins (NSP1, NSP2, NSP3 and NSP4) ( Figure 3B ). The genetic diversity of WNV in the avian host was also investigated using next-generation sequencing. The aim was to explore whether the genetically homogeneous cloned virus would go through genetic diversification after passages in young SPF chickens and wild juvenile carrion crows. Data collected revealed that the WNV population showed significant heterogeneity diverging from the quasispecies structure of the initial viral inoculum in both animal models. However, in-depth analysis enacted a comparison between the infection model (SPF chicken and wild juvenile carrion crows) to assess the variations in genetic diversity. It was demonstrated that the WNV genetic diversifications varied significantly from the inoculum in crows with 18 genetic variants but exhibited suboptimal levels of diversifications among the chickens with only 3 single nucleotide variants (SNV) being detected. Hence, natural WNV-susceptible avian hosts could provide a selective setting and contributed to genetic diversifications. NGS technologies have enabled the analysis of WNV quasispecies dynamics, leading to a better understanding of the virus and shed some light on its mechanism of pathogenicity [53] .
Viruses from the Togaviridae family can be further classified into the genus Alphavirus and Rubivirus. Alphaviruses are anthropod-borne viruses [54] and they formed icosahedral particles of about 70 nm with a lipid envelope ( Figure 3A ) [55] . The spikes of the virion are made up of E1 and E2 glycoproteins organized in a T4 icosahedral lattice of 80 trimers. The alphavirus virion carries a positive single stranded RNA of approximately 11-12kb as the genetic material [54] . The RNA has a 5′-methylated nucleotide cap and a polyadenylated 3′ end. The viral genome is translated into three structural proteins (CP, E2 and E1) and four non-structural proteins (NSP1, NSP2, NSP3 and NSP4) ( Figure 3B ).

Chikungunya Virus (CHIKV)

Chikungunya virus (CHIKV) is an arthropod-borne virus transmitted to humans by mosquitoes and has caused significant human morbidity in many parts of the world [56] . Chikungunya virus causes an acute febrile illness with high fever, severe joint pain, polyarthralgia, myalgia, maculopapular rash and edema. While the fever and rash are self-limiting and are able to resolve within a few days, arthralgia can be prolonged from months to years [57, 58] . Some cases of CHIKV disease were associated with neurological complications [59] . The virus has been associated with frequent outbreaks in tropical countries of Africa and Southeast Asia and also in temperate zones around the world. A major outbreak in 2013 affected several countries of the Americas, involving approximately 2 million people [60] .
The original geographical distributions of the CHIKV indicated that there are 3 distinct groups and phylogenetic analysis confirmed the West African, the Asian and the East/Central/South African (ECSA) genotypes. The ECSA virus with an A226V substitution in the E2 envelope gene had caused multiple massive outbreaks in various regions starting in the La Reunion Islands in 2005. The virus then spread to Asia and caused over a million cases in the following years [61] [62] [63] . The Asian genotype started invading the Americas in 2013, causing massive outbreaks in various countries in Central, South America and the Caribbean. The ECSA virus is now the dominant virus all over Africa and Asia and the Asian genotype is the dominant virus in the Americas [62, [64] [65] [66] [67] . Even though a number of CHIKV vaccine candidates are being developed, no effective vaccine is currently available for clinical use [68] .
Similar to other RNA viruses with extensive mutation rates, CHIKV produces populations of genetically diverse genomes within a host. Up to date, the role of several of these mutations and the influence of disease severity in vertebrates and transmission by mosquitoes have been studied. Riemersma et al. investigated the intra-host genetic diversity of high and low-fidelity CHIKV variants using murine models. Both the high and low fidelity variants were expected to lower the virulence of CHIKV as compared to the wild-type (CHIKV-WT). However, the high-fidelity variant caused more acute levels of infection such that the onset of the swelling in the footpad exhibited earlier than the CHIKV-WT at 3-and 4-days post-infection (dpi). Moreover, the high-fidelity CHIKV (CHIKV-HiFi) infected mice also displayed higher peaks of disease severity when compared to the CHIKV-WT 7 dpi. This enhanced diversification was subsequently reproduced after serial in vitro passages. In high-fidelity variants, nsp2 G641D and nsp4 C483Y mutations increased CHIKV virulence in the adult mice. The NGS data showed that the CHIKV-HiFi variant produced more genetically diverse populations than the CHIKV-WT in mice. However, the low-fidelity variant gave rise to reduced rates of replication and disease [69] .
Plaque size is a common feature of viral characterization. Primary isolates of CHIKV containing variants with different plaque sizes were previously reported [70, 71] . Viral variants with different plaque morphology such as small and large plaques had been reported in the 2005 CHIKV outbreak isolates [72] . It is curious how small plaque variants with lower fitness were maintained as a natural viral quasispecies. Plausible explanations indicated that the plaque size might not represent the in vivo growth conditions and that cooperation among variants with different plaque sizes might be required for optimal in vivo replication and transmission fitness. Jaimipak et al. reasoned that if the plaque size did not represent the in vivo growth conditions and the small plaque variants had a similar fitness as the large plaque variants, they would be similarly virulent in a murine model. In order to explore the virulence of the small plaque CHIKV variant in vivo, the pathogenicity of the purified small plaque variant of the CHIKV virus isolated from the sera of the patient in Phang-nga, Thailand in 2009, was tested in neonatal mice [73] . The small plaque variant (CHK-S) showed stable homogenous small plaques after 4 plaque purifications. It also grew slower and produced lower titers when compared with the wild-type virus. After 21 days of infection in the suckling mice with the wild-type and CHK-S variants (injected 103 pfu/mouse), mice which received the CHK-S virus showed 98% survival rate while only 74% of mice survived after infection with the wild-type virus. The small plaque variant of CHIKV obtained by plaque purifications exhibited decreased virulence that makes it appropriate to serve as candidates for live-attenuated vaccine development. The CHIKV variant with the small plaque size formed a major subpopulation in the CHIKV primary isolate during multiple passages in C6/36 cells. This is in line with the reduction of virulence in the suckling mice and indicated that the small plaque variant had reduced in vivo fitness. This suggested that replication cycles in mosquito vectors might play an important role in maintaining the small plaque variant in natural infections. The persistence of the small plaque variant CHK-S clone after multiple passages in C6/36 cells showed that the CHK-S variant might be able to outcompete the large plaque variant when infecting the same cell by an unknown mechanism. Alternatively, small and large plaque variants might cooperate in a way that provided a selective advantage for maintaining the small plaque variant [73] .
Abeyratne et al. investigated the role of the capsid in CHIKV virulence by studying the nucleolar localization sequence (NoLS) [74] . NoLS is a region in the N-terminal part of the CHIKV capsid protein, between residues 58 and 110 and is rich in basic amino acids [75, 76] . Mutations in the NoLS capsid of the CHIKV led to the development of a unique live-attenuated CHIKV vaccine candidate designated as CHIKV-NoLS.
The P5 CHIKV-NoLS clone remained genetically stable after five passages in Vero cells or insect cells when compared to the CHIKV-WT. Sequence analysis of the P5 CHIKV-NoLS plaques showed that the two plaque variants had no mutations in the capsid protein. A single non-synonymous change in the nucleotide of the capsid caused an alanine to serine substitution at position 101 in the third plaque variant. However, the substitution did not cause any change in the small plaque phenotype or replication kinetics of the CHIKV-NoLS clone after ten passages in vitro [74] . The in vivo study showed that the CHIKV-NoLS-immunized mice were able to produce long-term immunity against CHIKV infection following immunization with a single dose of the CHIKV-NoLS small plaque variant. Attenuation of CHIKV-NoLS through the NoLS mutation is most likely due to the disruption of the replication of viruses after viral RNA synthesis, however, the precise mechanism of reduced viral titer remained unsolved [76] . The NoLS mutation caused a considerable change in the very basic capsid region involving two nucleotides which could affect the structure of RNA binding, assembly of nucleocapsid and interaction with the envelope proteins [77] . Since the CHIKV-NoLS small plaque variant was attenuated in immunized mice and produced sera which could effectively neutralize CHIKV infection in vitro, it could serve as a promising vaccine candidate needed to control the explosive large-scale outbreaks of CHIKV [76] .

Filoviridae

Viruses found within the Filoviridae family can be further classified into five genera-Marburgvirus, Ebolavirus, Cuevavirus, Striavirus and Thamnovirus. The virions are 80 nm in diameter and appear as branched, circular or filamentous ( Figure 4A ). Filoviruses contain a linear negative sense single stranded RNA of approximately 19 kb. The genome of the filoviruses encodes for four structural proteins, namely nucleoprotein (NP), RNA-dependent RNA polymerase co-factor (VP35), transcriptional activator (VP30) and a RNA-dependent RNA polymerase (L). There are also three non-structural membrane-associated proteins, namely a spike glycoprotein (GP1,2), a primary matrix protein (VP40) and a secondary matrix protein (VP24) present within the virion membrane [78] ( Figure 4B ).

Ebola Virus (EboV)

The glycoprotein (GP) is responsible for cell attachment, fusion and cell entry. The broad cellular tropism of the GP resulted in multisystem involvement that led to high mortality [85] . The Ebola virus has a high frequency of mutation within a host during the spread of infection and in the reservoir in the human population [86] . Alignment of the Glycoprotein (GP) sequences of 66 Ebola virus isolates from the previous outbreaks (old Ebola outbreak of 1976 to 2005) with the new Ebola outbreak isolates (2014) showed some differences in the positions and frequency of the amino acid replacements. Comparative analysis between the isolates from the old epidemic with the new epidemic isolates showed that 19 out of the 22 amino acid mutations were consistently present in the latter [87] .
The glycoprotein (GP) is responsible for cell attachment, fusion and cell entry. The broad cellular tropism of the GP resulted in multisystem involvement that led to high mortality [85] . The Ebola virus has a high frequency of mutation within a host during the spread of infection and in the reservoir in the human population [86] . Alignment of the Glycoprotein (GP) sequences of 66 Ebola virus isolates from the previous outbreaks (old Ebola outbreak of 1976 to 2005) with the new Ebola outbreak isolates (2014) showed some differences in the positions and frequency of the amino acid replacements. Comparative analysis between the isolates from the old epidemic with the new epidemic isolates showed that 19 out of the 22 amino acid mutations were consistently present in the latter [87] .
Several studies of the Ebola virus glycoprotein showed that the two mutations at positions A82V and T544I might have caused an increase in viral infectivity in humans [88] [89] [90] [91] [92] [93] . These two mutations reduced the stability of the pre-fusion conformation of the EBOV glycoprotein. Kurosaki et al. investigated the viral pseudotyping of EBOV glycoprotein derivatives in 10 cell lines from nine mammalian species and the infectivity of each pseudotype. The data showed that isoleucine at position 544 mediated membrane fusion and increased the infectivity of the virus in all host species, whereas valine at position 82 modulated viral infectivity but was dependent on the virus and the host. Analysis via structural modeling revealed that the isoleucine 544 changed the viral fusion. However, the valine 82 residue influenced the interaction with the viral entry receptor, Niemann-Pick C1 [94] . The frequency of these two amino acid substitutions (A82V and T544I) varied between different Ebolavirus species.
Dietzel et al. studied the functional significance of three non-synonymous mutations in the Ebola virus (EBOV) isolates from the outbreak in West Africa. Among 1000 sequenced Ebola virus genomes, approximately 90% carried the signature three mutations at positions 82, 111 and 759 of the Ebola virus genome. The impact of specific mutations on the role of each viral proteins and on the growth of recombinant EBOVs was analyzed by recently engineered virus-like particles and reverse genetics. A D759G substitution in proximity to a highly conserved region of the GDN motif in the enzymatically active center (amino acid 741 to 743) of the L polymerase was able to increase viral transcription and replication. On the other hand, a R111C substitution in the multifunctional region of the nucleoprotein which is essential for homo-oligomerization and nucleocapsid formation was found to reduce viral transcription and replication. Furthermore, the A82V replacement in the glycoprotein region was able to enhance the efficacy of GP-mediated viral entry into target cells. The combination of the three mutations in the recombinant Ebola virus affected the functional activity of viral proteins and enhanced the growth of the recombinant virus in the cell culture when compared to the prototype isolate [93] . A pilot epidemiological NGS study with a substantial sample size suggested that high mortality in the host was not changed by these three mutations since the rate of mortality in the overall study was not considerably altered throughout the outbreak [95] .
Furthermore, Fedewa et al. showed that genomic adaptation was not crucial for efficient infection of the Ebola virus. The genomes were characterized after serial-passages of EBOV in Boa constrictor kidney JK cells. Deep sequencing coverage (>×10,000) confirmed the presence of only one single nonsynonymous variant (T544I) of unknown significance within the viral population that demonstrated a shift in frequency of at least 10% over six serial passages. However, passaging the EBOV in other cell lines, such as HeLa and DpHt cheek cells, showed different mutations in the genomes of the viral population [96] . This brings forth the question as to whether the viral strains of the Ebola virus should be directly isolated from patients in order to determine the quasispecies of the Ebola virus.

Coronaviridae

Viruses within the Coronaviridae family are positive sense, single-stranded RNA viruses capable of infecting three vertebrate classes comprising mammals (Coronavirus and Torovirus), birds (Coronavirus) and fish (Bafinivirus). Coronaviruses are the largest RNA viruses identified so far with the enveloped spherical virions of about 120-160 nm and the viral genome is about 31 kb in length ( Figure 5A ) [97] . The genome consists of many ORFs. Two thirds of the 5 end is occupied by a replicase gene comprising two overlapping ORFs namely-ORF1a and ORF1b. The four structural proteins are spike glycoprotein (S), small envelope protein (E), membrane glycoprotein (M) and nucleocapsid (N). Accessory regions that are group specific ORFs are designated as ORF3, ORF4a, ORF4b and ORF5 [97] (Figure 5B ).

Middle East Respiratory Syndrome Coronavirus (MERS-CoV)

Park et al. analyzed the non-consensus sequences of MERS-CoV derived from 35 specimens obtained from 24 patients and showed the heterogeneity of MERS-CoV among patients. The maximal level of heterogeneity was recovered from the super-spreader specimens. Moreover, this heterogeneity disseminated in close association with variations in the consensus sequences. It can be inferred that MERS-CoV infections were caused by multiple variants. In-depth analysis of heterogeneity among patients showed a link between D510G and I529T mutations in the receptor-binding domain (RBD) of the viral spike glycoprotein. The two mutations resulted in reduced RBD binding affinity to the human CD26. Moreover, the two mutations were observed to be mutually exclusive, implying that the mutants have the ability to significantly hinder viral fitness. However, variants with D510G and I529T mutations in the S protein demonstrated an increase in resistance against neutralizing monoclonal antibodies and reduced sensitivity to antibody-mediated neutralization [105] . The frequency of each of the single mutant varied greatly but their combined frequency of mutations was elevated in the majority of the samples. Meanwhile, the frequency of the wild type was no more than 10% in the majority of the samples. Therefore, it can be deduced that the selection pressure applied by the host immune response played a crucial part in producing genetic variants and how they interacted with the immune system in humans in MERS-CoV outbreaks [106] .

Paramyxoviridae

Fusion and cell attachment proteins are large glycoprotein spikes that are present on the surface of the virion. Both of these proteins play important roles in the pathogenesis of viruses from Paramyxoviridae family and are responsible for attachment to the cellular receptor(s), whereas the F protein mediates cell entry by inducing fusion between the viral envelope and the host cell membrane. The matrix protein organizes and sustains the virion structure. The nucleocapsid associates with genomic RNA and protects the RNA from nucleases. Extracistronic (noncoding) regions include a 3′ leader sequence with 50 nucleotides in length, which works as a transcriptional promoter and a 5′ trailer with 50-161 nucleotides [111] .
The genomes of viruses within the family Paramyxoviridae are non-segmented and thus cannot undergo genetic reassortment. Like many other RNA viruses, the RNA-dependent RNA polymerase does not have an error proofreading capability and hence many mutations can occur when the RNA is processed. These mutations can build up in the genome and eventually give rise to new variants. Since each protein has an important function, the mutant viruses will exhibit a loss in viral fitness and are eliminated, leaving only those exhibiting good viral fitness [111] . Within the Paramyxoviridae family, mutations leading to a spectrum of mutant distributions among Measles virus, Mumps virus and Newcastle disease virus are reviewed. The L protein which is the catalytic subunit of RNA-dependent RNA polymerase (RDRP) is associated with the nucleocapsid protein (N) and phosphoprotein (P) to form part of the RNA polymerase complex. The RNA polymerase complex is covered by the viral envelope consisting of a matrix protein (M) and two glycosylated envelope spike proteins, a fusion protein (F) and cell attachment protein. Cell attachment protein is different based on the genera and it could be hemagglutinin (H in Measles), hemagglutinin-neuraminidase (HN in Mumps and NDV viruses) or glycoprotein G (Henipavirus). Some genera within the Paramyxoviridae family also contain various conserved proteins including the non-structural proteins (C, NS1, NS2), a cysteine-rich protein (V), a small integral membrane protein (SH) and transcription factors M2-1 and M2-2 [111] .
Fusion and cell attachment proteins are large glycoprotein spikes that are present on the surface of the virion. Both of these proteins play important roles in the pathogenesis of viruses from Paramyxoviridae family and are responsible for attachment to the cellular receptor(s), whereas the F protein mediates cell entry by inducing fusion between the viral envelope and the host cell membrane. The matrix protein organizes and sustains the virion structure. The nucleocapsid associates with genomic RNA and protects the RNA from nucleases. Extracistronic (noncoding) regions include a 3 leader sequence with 50 nucleotides in length, which works as a transcriptional promoter and a 5 trailer with 50-161 nucleotides [111] .
The genomes of viruses within the family Paramyxoviridae are non-segmented and thus cannot undergo genetic reassortment. Like many other RNA viruses, the RNA-dependent RNA polymerase does not have an error proofreading capability and hence many mutations can occur when the RNA is processed. These mutations can build up in the genome and eventually give rise to new variants. Since each protein has an important function, the mutant viruses will exhibit a loss in viral fitness and are eliminated, leaving only those exhibiting good viral fitness [111] . Within the Paramyxoviridae family, mutations leading to a spectrum of mutant distributions among Measles virus, Mumps virus and Newcastle disease virus are reviewed.

Measles Virus (MeV)

Measles virus belongs to the genus Morbillivirus within the family Paramyxoviridae of the order Mononegavirales. Measles is transmitted by air or by direct contact with body fluids. The initial site of viral infection is the respiratory tract, followed by dispersions in the lymphoid tissue, liver, lungs, conjunctiva and skin. The measles virus (MeV) may persist in the brain, causing fatal neurodegenerative diseases. This virus can only infect humans and causes subacute sclerosing panencephalitis and encephalitis [112] [113] [114] . Measles often lead to fatality in young children (below 5 years) due to complications in respiratory tract infections like pneumonia, brain swelling or encephalitis, dehydration, diarrhea and ear infections [115] .
The MeV is a negative sense single stranded RNA virus and the genome is composed of six contiguous, non-overlapping transcription units separated by three untranscribed nucleotides. The genes which code for eight viral proteins are in the order of 5 -N-P/V/C-M-F-H-L-3 [116] . The second transcription unit (P) codes for two non-structural proteins, C and V, which interfere with the host immune response [117] [118] [119] [120] .

Mumps

Mumps virus belongs to the genus nus Rubulavirus within the Paramyxoviridae family of the order Mononegavirales. Mumps is an extremely contagious, acute, self-limited, systemic viral infection that primarily affects swelling of one or more of the salivary glands, typically the parotid glands. The infection could cause pain in the swollen salivary glands on one or both sides of patient face, fever, headache, muscle aches, weakness, fatigue and loss of appetite. Complications of mumps are rare but they can be potentially serious involving inflammation and body swelling in testicles, brain, spinal cord or pancreas. Infections can lead to hearing loss, heart problems and miscarriage. In the United States, mumps was one of common disease prior to vaccination became routine. Then a dramatic decrease was observed in the number of infections. However, mumps outbreaks still occur in the United States and there was an increase in the number of cases recently. Majority of those who are not vaccinated or are in close-contact with the viruses in schools or college campuses are at high risk. There is currently no specific treatment for mumps [122] .
The strain Urabe AM9 is one of the mumps virus strains that was widely used in vaccines but this strain was associated with meningitis and was withdrawn from the market. Sauder et al. performed serial passaging of the strain Urabe AM9 in cell cultures and compared the whole nucleotide sequences of the parental (Urabe P-AM9) and passaged viruses (Urabe P6-Vero or Urabe P6-CEF) to investigate the attenuation process and to identify the attenuation markers [123] . Passaging of the Urabe AM9 mumps virus in Vero or chicken embryo fibroblast (CEF) cell lines caused changes in the genetic heterogeneity at particular regions of the genome through either changing of one nucleotide at locations where the starting material showed nucleotide heterogeneity or the presentation of an additional nucleotide to produce a heterogenic site. Virulence of the passaged virus was dramatically decreased in the murine model. Moreover, similar growth kinetics of the virulent Urabe P-AM9 and passaged attenuated variants in the rat brain suggested that the impaired replication ability of the attenuated variants was not the main cause of the neuroattenuation. However, in the rat brain, the peak titer of the neuroattenuated variant was almost one log lower than that of the neurovirulent parental strain. For instance, identical but independent induction of heterogeneity at position 370 of the F-gene by substitution of threonine to alanine in passaged virus in Vero and CEF cells suggested a correlation of this mutation to the neuroattenuation phenotype. There was lack of ability to identify heterogeneity for those regions with differences of more than 10% between the detected nucleotides in the consensus sequence. The heterogeneity could be the result of new mutations at these positions or the selection of pre-existing sequences within the minority quasispecies. In addition, passaging of the parental strain in CEF and Vero cells led to the observation of several amino acid alterations in the NP, P, F, HN and L proteins that could affect the virulence of the virus. Thus, the modifications of genetic heterogeneity at particular genome sites could have important consequences on the neurovirulence phenotype. Therefore, extra caution should be exercised in order to evaluate genetic markers of virulence or attenuation of variants based on only a consensus sequence [123] .

Newcastle Disease Virus (NDV)

The NDV genome codes for seven major viral proteins in the order of 5 -N-P(V)-M-F-HN-L-3 . In NDV, the hemagglutinin neuraminidase (HN) and fusion (F) glycoproteins are presented on the surface of the virion envelope and contribute to viral infection [127] . The fusion protein is expressed as an inactive precursor (F0) prior to activation by proteolytic cleavage. The cleavage of F0 is crucial for infectivity and works as a key virulence indicator for certain viruses such as virulent strains of avian paramyxovirus 1 (NDV). The F0 cleavage site contains several basic residues which cause the cleavage of the F protein by furin, an endopeptidase present in the trans-Golgi network [110] .

Respiratory Syncytial Virus (RSV)

Respiratory syncytial virus (RSV) belongs to the genus Orthopneumovirus under the family Pneumoviridae of the order Mononegavirales. Human respiratory syncytial virus (HRSV) is the primary cause of infection of the upper and lower respiratory tracts with mild, cold-like symptoms in infants and young children. The virus spreads through tiny air droplets. Globally, there are 4-5 million children younger than 4 years with HRSV infections and more than 125,000 are hospitalized every year in the United States. Although the risk of hospital admission is higher in known risk groups such as prematurely born infants. RSV is also responsible for 14,000 deaths in the elderly > 65 years of age annually in the United State [133, 134] . On the other hand, bovine respiratory syncytial virus (BRSV) is a common source of pneumonia in calves. Clinical infections stem from yearly outbreaks of the disease during winter and primarily affect calves less than 6 months of age. The target infection site of the viruses are the epithelial layer of the upper and lower respiratory tracts that can damage the bronchioles, leading to severe onset of bronchiolitis in caws [135] .
Palivizumab (PZ) is the sole humanized monoclonal antibody against an infectious disease that recognizes the fusion protein of respiratory syncytial virus (RSV). Zhao et al. selected a PZ resistant virus by passaging of RSVA2 strain in the presence of PZ in HEp-2 cell culture [136] . Utilization of PZ provided the opportunities to gain new insights into the transmission dynamics and the quasispecies nature of RSV. Protein sequence analysis of a single plaque (MP4) isolated from the fifth passage revealed the substitution of lysine by methionine 272. The mutation caused the cell culturederived virus to be completely resistant to PZ prophylaxis in cotton rats. Dramatic reduction in replication of the parental strain A2 virus was observed at PZ concentrations ranging from 4 to 40
Respiratory syncytial virus (RSV) belongs to the genus Orthopneumovirus under the family Pneumoviridae of the order Mononegavirales. Human respiratory syncytial virus (HRSV) is the primary cause of infection of the upper and lower respiratory tracts with mild, cold-like symptoms in infants and young children. The virus spreads through tiny air droplets. Globally, there are 4-5 million children younger than 4 years with HRSV infections and more than 125,000 are hospitalized every year in the United States. Although the risk of hospital admission is higher in known risk groups such as prematurely born infants. RSV is also responsible for 14,000 deaths in the elderly > 65 years of age annually in the United State [133, 134] . On the other hand, bovine respiratory syncytial virus (BRSV) is a common source of pneumonia in calves. Clinical infections stem from yearly outbreaks of the disease during winter and primarily affect calves less than 6 months of age. The target infection site of the viruses are the epithelial layer of the upper and lower respiratory tracts that can damage the bronchioles, leading to severe onset of bronchiolitis in caws [135] .
Palivizumab (PZ) is the sole humanized monoclonal antibody against an infectious disease that recognizes the fusion protein of respiratory syncytial virus (RSV). Zhao et al. selected a PZ resistant virus by passaging of RSVA2 strain in the presence of PZ in HEp-2 cell culture [136] . Utilization of PZ provided the opportunities to gain new insights into the transmission dynamics and the quasispecies nature of RSV. Protein sequence analysis of a single plaque (MP4) isolated from the fifth passage revealed the substitution of lysine by methionine 272. The mutation caused the cell culture-derived virus to be completely resistant to PZ prophylaxis in cotton rats. Dramatic reduction in replication of the parental strain A2 virus was observed at PZ concentrations ranging from 4 to 40 µg/mL. The replication of the MP4 mutant was not affected by PZ. The growth kinetics of both the parental strain and the variant were almost similar with maximum titers above 10 7 PFU/mL during the third and fourth day post infection. Hence, it was proposed that the fusion protein supported the entry of the MP4 mutants in HEp-2 cells in an early phase of the replication cycle through a fusion step. The A2 parental strain exhibited limited growth in HEp-2 cells due to its reactivity with PZ. However, the lack of reactivity of the MP4 mutants with PZ suggested that the F1 protein of the MP4 mutant caused a loss of antigenic reactivity with the humanized monoclonal antibody. Preclinical studies in cotton rats predicted the efficacy of PZ in humans. However, the usage of PZ up to 40 µg/mL, especially in immunosuppressed patients, could provide opportunities for the emergence of resistant viruses. Therefore, the PZ resistant viruses in humans could cause the PZ prophylaxis to be ineffective.
Larsen et al. analyzed the nucleotides coding for the extracellular part of the G glycoprotein and the full SH protein of bovine respiratory syncytial virus (BRSV) from several outbreaks from the same herd in different years in Denmark. Identical viruses were isolated within a herd during outbreaks but viruses from recurrent infections were found to vary up to 11% in sequences even in closed herds. It is possible that a quasispecies variant of BRSV persisted in some of the calves in each herd and this persistent variant displayed high viral fitness and became dominant. However, based on the high level of diversity, the most likely explanation is that BRSV was reintroduced into the herd prior to each new outbreak. These findings are highly relevant to understand the transmission patterns of BRSV among calves [138] .

Influenza Virus (IV)

A study investigated the impact of antigenic proximity, genomic substitutions, quasispecies, diversity and reassortment in order to understand the molecular evolution of the influenza A (H3N2) isolated directly from clinical samples. Of the 155/176 whole genomes analyzed, several amino acid substitutions were found to substantially affect the severity of the infection caused by the clade specific viruses. Within the sample, 121 viruses belonged to the genetic clade 3c.2a.1 and eight belonged to 3c.2a2, twenty-four belonged to 3c.2a3, one belonged to 3c.2a4 and one belonged to a different clade 3c.3a. Many distinct substitutions spanning across the whole influenza proteome, HA, NA and non-structural protein 1 were found to be responsible for causing mild and severe disease. Interestingly, two substitutions, V261I and K196E, were found in the NA and the NS1, respectively. These two mutations were found to be particularly significant as they showed the distinction between the strains causing mild and severe infections. Analysis of the clinical isolates showed a difference in a single amino acid residue, 160K within the HA, whereby 14 cases of glycosylation loss was observed within the quasispecies population linked to severity of infection. Moreover, the degree of diversity within the quasispecies population was reported to be elevated in severe cases when compared to mild ones [143] .
The epidemiology and molecular characterization of low and highly pathogenic avian influenza virus strains (LPAIV & HPAIV, respectively) isolated from Germany were investigated. The complete genome analysis of the two strains showed that both LPAIV and HPAIV had high nucleotide similarity with only ten mutations outside the hemagglutinin cleavage site (HACS) which were Orthomyxoviruses employ many different splicing techniques to synthesize their viral proteins while making full use of the coding capacity of the genome. The virion envelope originates from the cell membrane with the addition of one to three virus glycoproteins and one to two non-glycosylated proteins. The viral RNA polymerase (PB1, PB2 and PB3) is involved in the transcription of a single mRNA from every fragment of the genome. The transcription is triggered by cap snatching and the poly(A) tail is added by the viral polymerase stuttering on the poly U sequence. Alternative splicing of the MP and NS mRNA led to the mRNA coding for M2 and NEP proteins. PB1-F2 is translated by leaky scanning from the PB1 mRNA. The structural proteins common to all genera include three polypeptides, the hemagglutinin which is an integral type I membrane glycoprotein involved in virus attachment, the envelope fusion and the non-glycosylated matrix protein (M1 or M) [140] .
A study investigated the impact of antigenic proximity, genomic substitutions, quasispecies, diversity and reassortment in order to understand the molecular evolution of the influenza A (H3N2) isolated directly from clinical samples. Of the 155/176 whole genomes analyzed, several amino acid substitutions were found to substantially affect the severity of the infection caused by the clade specific viruses. Within the sample, 121 viruses belonged to the genetic clade 3c.2a.1 and eight belonged to 3c.2a2, twenty-four belonged to 3c.2a3, one belonged to 3c.2a4 and one belonged to a different clade 3c.3a. Many distinct substitutions spanning across the whole influenza proteome, HA, NA and non-structural protein 1 were found to be responsible for causing mild and severe disease. Interestingly, two substitutions, V261I and K196E, were found in the NA and the NS1, respectively. These two mutations were found to be particularly significant as they showed the distinction between the strains causing mild and severe infections. Analysis of the clinical isolates showed a difference in a single amino acid residue, 160 K within the HA, whereby 14 cases of glycosylation loss was observed within the quasispecies population linked to severity of infection. Moreover, the degree of diversity within the quasispecies population was reported to be elevated in severe cases when compared to mild ones [143] .
The epidemiology and molecular characterization of low and highly pathogenic avian influenza virus strains (LPAIV & HPAIV, respectively) isolated from Germany were investigated. The complete genome analysis of the two strains showed that both LPAIV and HPAIV had high nucleotide similarity with only ten mutations outside the hemagglutinin cleavage site (HACS) which were spread along the six genome segments of the HPAIV. Of the ten mutations, five were previously identified as minor variants in the quasispecies population of the progenitor virus, LPAIV, with 18-42% significant variable frequency [144] . However, studies focusing on the diversity of quasispecies of avian influenza in the human host are few. Watanabe et al. successfully demonstrated that infections caused by a single-virus in vitro produced an evident spectra of mutants in the H5N1 progeny viruses. Analysis of the genetic diversity of the hemagglutinin (HA) revealed that variants with mutated HA had lower thermostability leading to higher binding specificity. Both traits were deemed beneficial for viral infection. On the other hand, other variants with higher thermostability also emerged but were unable to thrive against mutants with lower thermostability [145] . The quasispecies population of influenza A virus was also reported to be in a state of continuous genetic drift in a given subtype population. A viral single nucleotide polymorphism (vSNP) was reported to be important and was shared by more than 15% of the variants within the quasispecies population of the subtype strain in a given season. However, between the season 2010-2011, various vSNPs in the PB2, PA, HA, NP, NA, M and NS segments were shared among variants with more than 58-80% of the sample population and less than 50% of the shared vSNPs were located within the PB1 segment [146] .

Hepatitis B Virus (HBV)

Hepatitis B virus (HBV) is the prototype of hepadnaviruses. It infects humans and can be classified into 8 genotypes. More than one billion people have contracted hepatitis B virus (HBV) and more than 200 million patients are chronically infected with hepatitis B (CHB) [153] . CHB infections result in the development of hepatocellular carcinoma and chronic liver failure [151] and every year CHB causes 880,000 deaths worldwide [153] . Analysis of the immunodominant motifs of the HBV core region from the amino acids 40 to 95 indicated that the positions exhibiting peak rates of variability were found in the main core epitopes, thereby confirming their role in stimulating the immune system. Moreover, the distribution of the variability was observed to occur in a genotypedependent manner. For instance, HBV isolated from genotype A had higher variability within the core epitope regions but no significant differences in genotype D were observed in the core epitopes and other positions. Further sequential analysis of the samples put forth the dynamic nature of the HBV quasispecies whereby a strong selection for a single baseline variant was linked to a lower variability within the core region pre-and post-treatment. Leucine (L) at position 76 was determined to be the most highly conserved residue and the role of this amino acid was assessed by substitutions of Valine (V) or Proline (P) at position 76. Proline at position 76 was shown to drastically lower the production of Hepatitis B core antigen protein (HBsAg), likely due to the chemical and physical properties of the amino acid. However, substitution with Valine (V) at a similar position brought about a four-fold increase in the Hepatitis B e antigen protein (HBeAg) production when compared to Leucine at position 76. The decrease in the variability observed was associated with a stable quasispecies population after positive selection of the variant exhibiting high fitness level [154] .
In an attempt to elucidate the link between HBV quasispecies and the role of nucleotide analogues present in the quasispecies population, the heterogeneity and distribution of HBV quasispecies using the RT and S regions as a baseline to document the mutation sites were investigated. The quasispecies for the selected regions was analyzed using 608 sequences. In the RT region, no major differences in the composition and diversity of the quasispecies was identified at the nucleotide or amino acid level in patients who responded well to antiviral therapy and those who did not. Similar findings were observed when the S region was examined. However, sequence analysis within RT and S regions showed that the rtM204V/I resistant mutation was observed across the majority of samples prior to the rescue therapy. Interestingly, the frequency of this mutation was noted to drop six months post treatment. Moreover, 3 out of 5 stop codons consistently observed within the RT and S regions were reported to be associated with nucleotide analogue (NA) resistance Complementary to the viral mRNA is a full length negative strand whereas the positive strand is of varied length. The 5 -NTR of the negative strand DNA is covalently attached to the terminal protein (TP) domain of the viral DNA polymerase whereas the 5 -NTR end of the positive sense DNA has a 5 -capped oligonucleotide primer covalently attached. The 3 -NTR of the positive strand ends at a variable position in different molecules and creating a single stranded gap [150] .
Hepatitis B virus (HBV) is the prototype of hepadnaviruses. It infects humans and can be classified into 8 genotypes. More than one billion people have contracted hepatitis B virus (HBV) and more than 200 million patients are chronically infected with hepatitis B (CHB) [153] . CHB infections result in the development of hepatocellular carcinoma and chronic liver failure [151] and every year CHB causes 880,000 deaths worldwide [153] . Analysis of the immunodominant motifs of the HBV core region from the amino acids 40 to 95 indicated that the positions exhibiting peak rates of variability were found in the main core epitopes, thereby confirming their role in stimulating the immune system. Moreover, the distribution of the variability was observed to occur in a genotype-dependent manner. For instance, HBV isolated from genotype A had higher variability within the core epitope regions but no significant differences in genotype D were observed in the core epitopes and other positions. Further sequential analysis of the samples put forth the dynamic nature of the HBV quasispecies whereby a strong selection for a single baseline variant was linked to a lower variability within the core region pre-and post-treatment. Leucine (L) at position 76 was determined to be the most highly conserved residue and the role of this amino acid was assessed by substitutions of Valine (V) or Proline (P) at position 76. Proline at position 76 was shown to drastically lower the production of Hepatitis B core antigen protein (HBsAg), likely due to the chemical and physical properties of the amino acid. However, substitution with Valine (V) at a similar position brought about a four-fold increase in the Hepatitis B e antigen protein (HBeAg) production when compared to Leucine at position 76. The decrease in the variability observed was associated with a stable quasispecies population after positive selection of the variant exhibiting high fitness level [154] .
In an attempt to elucidate the link between HBV quasispecies and the role of nucleotide analogues present in the quasispecies population, the heterogeneity and distribution of HBV quasispecies using the RT and S regions as a baseline to document the mutation sites were investigated. The quasispecies for the selected regions was analyzed using 608 sequences. In the RT region, no major differences in the composition and diversity of the quasispecies was identified at the nucleotide or amino acid level in patients who responded well to antiviral therapy and those who did not. Similar findings were observed when the S region was examined. However, sequence analysis within RT and S regions showed that the rtM204V/I resistant mutation was observed across the majority of samples prior to the rescue therapy. Interestingly, the frequency of this mutation was noted to drop six months post treatment. Moreover, 3 out of 5 stop codons consistently observed within the RT and S regions were reported to be associated with nucleotide analogue (NA) resistance and affected the HBsAg reading frame. However, the complexity and diversity of the quasispecies of HBV were similar between the CHB patients responsive to the treatment and those who did not. It can be inferred that the characteristics of the quasispecies in CHB patients at the start of the study was not associated to the various viral responses observed in the cohort. Hence, the RT and S regions might not be adequate sites to monitor the response of CHB patients to the rescue treatment [155] .
Another challenge to overcome is to accurately determine the origin and spread of a founder population of the virus. Hence, the discrepancies in the evolution of HBV was investigated. Eight related patients with acquired chronic HBV through mother-to-infant transmission were selected and the viral genomes isolated were analyzed. Sequence analysis indicated that the samples originated from a single source of HBV genotype B2 (HBV-B2) which diverged from a tiny common ancestral pool regardless of the route of acquisition. Between individuals, viral strains obtained from a time point showed evidence that they originated from a small pool of the previous time point. This conferred the strain an advantage over other strains with regards to the recovery of the founder state shortly after transmission to the new host and the adaptation to the local environment within the host. Natural selection rather than genetic drift was hypothesized to be the root cause for the evolution of HBV, due to the observed varying patterns of divergence at synonymous and non-synonymous sites. This was in line with the higher rate of substitutions within the host rather than between hosts. Approximately 85/88 amino acid residues changed from common to rare residues. Since these changes were shown not to be a random process, it is concluded that the HBV was able to evolve and change but was limited to a defined range of phenotypes. It can be argued that the mechanism observed thus far suggest that the adaptive mutations accumulated in one individual would not be maintained in another individual and might revert after transmission. Hence, within the host, substitutions were higher than between hosts [156] .

Introduction

Frequent occasional outbreaks of emerging and re-emerging viral diseases such as Dengue fever, West Nile Fever, Zika virus disease, Chikungunya disease, Middle east respiratory syndrome, Ebola virus disease and many others have been targets for therapeutic interventions. Long lasting protection against viral infections is best achieved via vaccinations through live attenuated viruses (LAVs). In order to generate stable vaccine strains, the evolution of these viruses must be properly understood. This review is centered on the examination of the evidence for the heterogeneous nature of RNA genomes (quasispecies), the factors leading to quasispecies formation and its implications on virulence.

Poliovirus (PV)

In order to verify whether limiting the genomic diversity of a viral population has any effect on its evolution, a study was conducted on a strain of poliovirus with a substitution of Glycine 64 to Serine (G64S) in the RNA polymerase of the virus. The outcome of one-step growth curves and
In order to verify whether limiting the genomic diversity of a viral population has any effect on its evolution, a study was conducted on a strain of poliovirus with a substitution of Glycine 64 to Serine (G64S) in the RNA polymerase of the virus. The outcome of one-step growth curves and northern blot analysis of genomic RNA synthesis confirmed that the G64S mutation showed greater fidelity without a considerable reduction in the overall efficiency of RNA replication. The study hypothesized that having greater heterogeneity within a viral population allows it to adapt better to changing environments encountered during an infection and indeed the finding showed that boosting the fidelity of poliovirus replication had a noticeable effect on viral adaptation and pathogenicity. The poliovirus strain with a mutated RNA polymerase carrying an altered amino acid residue (G64S) was observed to replicate similarly to the wild-type counterpart but produced lower genomic diversity and subsequently was incapable of adapting well under detrimental growth situations. This study showed that the diversity of the quasispecies was associated with increased virulence rather than selection of single adaptive mutations. Alongside previous observations, these findings indicated a rise in the error rate over the tolerable error threshold which induced viral extinction, suggesting that the rate of viral mutation was precisely modulated and most likely had been finely tuned during the evolution of the virus [7] . It was further revealed that curbing the diversity of a RNA viral population by raising the fidelity of the RNA polymerase had a direct effect on its pathogenicity and capacity of the viruses to escape antiviral immunity [23] . These findings support the fact that RNA viruses have developed minimal viral polymerase fidelity to facilitate quick evolution and adaptation to novel situations [24] .

Chikungunya Virus (CHIKV)

Chikungunya virus (CHIKV) is an arthropod-borne virus transmitted to humans by mosquitoes and has caused significant human morbidity in many parts of the world [56] . Chikungunya virus causes an acute febrile illness with high fever, severe joint pain, polyarthralgia, myalgia, maculopapular rash and edema. While the fever and rash are self-limiting and are able to resolve within a few days, arthralgia can be prolonged from months to years [57, 58] . Some cases of CHIKV

Hepatitis B Virus (HBV)

Although a significant number of RNA viruses demonstrated the existence of quasispecies in their populations due to their low-fidelity polymerases, the phenomenon of quasispecies has been reported to exist in DNA viruses such as Hepatitis B virus (HBV) that replicates via a RNA intermediate.
Although a significant number of RNA viruses demonstrated the existence of quasispecies in their populations due to their low-fidelity polymerases, the phenomenon of quasispecies has been reported to exist in DNA viruses such as Hepatitis B virus (HBV) that replicates via a RNA intermediate.

Conclusions

RNA viruses are responsible for numerous outbreaks of viral infections with substantial levels of fatality. We discussed how genetic variants carrying spontaneous mutations could give rise to diverse plaque morphologies in different RNA viruses. How the specific mutations could affect viral replications and have an impact on the virulence of the plaque variants are reviewed. The existence of quasispecies in the viral RNA populations is also explored. Many of the RNA viruses displayed different plaque morphologies and these variants could have arisen from a genetically diverse quasispecies population. Such diverse quasispecies in a population could be a key contributing factor to elevated levels of virulence exhibited by the RNA viruses. Through an extensive analysis of different plaque variants and quasispecies within a population, this study could shed more light on the evolutionary pattern and virulence of RNA viruses. More intricate in vitro and in vivo examination of the phenomenon of quasispecies and the relationship between plaque size determinants and virulence should be undertaken to reveal if serious infections are caused by a single strain or through the combined action of diverse quasispecies carrying different mutations. This can be a valuable tool to characterize the mechanisms that led to viral evolution and adaptation in a host. Eventually, discovering an answer to these concerns might ultimately help to design effective vaccines against the ever-evolving RNA viruses.
126 section matches

Abstract

The complex hide-and-seek game between HIV-1 and the host immune system has impaired the development of an efficient vaccine. In addition, the high variability of the virus impedes the long-term control of viral replication by small antiviral drugs. For more than 20 years, phage display technology has been intensively used in the field of HIV-1 to explore the epitope landscape recognized by monoclonal and polyclonal HIV-1-specific antibodies, thereby providing precious data about immunodominant and neutralizing epitopes. In parallel, biopanning experiments with various combinatorial or antibody fragment libraries were conducted on viral targets as well as host receptors to identify HIV-1 inhibitors. Besides these applications, phage display technology has been applied to characterize the enzymatic specificity of the HIV-1 protease. Phage particles also represent valuable alternative carriers displaying various HIV-1 antigens to the immune system and eliciting antiviral responses. This review presents and summarizes the different studies conducted with regard to the nature of phage libraries, target display mode and biopanning procedures.

Introduction

In 1983, the human immunodeficiency virus (HIV-1) was identified as the causative agent of the Acquired ImmunoDeficiency Syndrome (AIDS) [1, 2] . In 30 years of pandemic, HIV-1 has infected more than 60 million individuals and killed 25 million. Thirty-three million individuals are currently living with HIV-1 making this disease a major worldwide public health problem (UNAIDS 2010). Natural sterilizing immune response against HIV-1 has never been described and despite decades of intensive research, a vaccine against HIV-1 is still lacking, mainly due to the high ability of the virus to escape from the immune response.
In the absence of a vaccine, combinations of small antiviral molecules are intensively used to control HIV-1 infection. The majority of these drugs are reverse transcriptase and protease inhibitors [3] . More recently, new molecules targeting the fusion step, CCR5 or integrase were licensed for clinical use [4] [5] [6] . Despite the increased life expectancy observed with the advent of these therapies, severe side effects, lack of adherence and emergence of drug-resistant virus strains still limit the long-term control of the infection [7] .
HIV-1 is an enveloped virus whose genetic material consists of two identical RNA strands coding for the structural genes gag, pol and env as well as the accessory genes tat, rev, nef, vif, vpr and vpu.
The gag gene codes for structural proteins p17 and p24, while pol codes for viral enzymes (reverse transcriptase, integrase and protease) and env for the gp160 envelope protein precursor that is subsequently cleaved into gp120 and gp41. Gp120 and gp41 proteins assemble at the surface of HIV-1 into trimeric spikes composed of three monomers of membrane-embedded gp41 complexed to free gp120. These two proteins are involved in virus entry and represent the principal targets for the humoral response.
Upon CD4 receptor binding, glycoprotein gp120 undergoes conformational changes exposing the V3 loop, a region that further interacts with the chemokine receptors CCR5 or CXCR4 thereby promoting viral entry [8] (Figure 1 ). Coreceptor binding leads to the insertion of the gp41 fusion peptide into the cell membrane, the creation of a hairpin loop intermediate and finally the fusion of both viral and cell membranes. The viral capsid then enters the cell and the genetic material is released in the cytoplasm. Most viral strains use only one coreceptor to enter host cells and are classified accordingly as CCR5-(R5 strains) or CXCR4-tropic (X4 strains), although viruses with broadened coreceptor usage (dual-tropic) have also been described. R5 viruses infect macrophages and CCR5-expressing T lymphocytes, and are mainly associated with transmission. In contrast, X4 viruses infect CXCR4-expressing T-cells and T-cell lines, and often appear at the later stages of infection. The envelope glycoprotein gp120 is composed of variable and more constant regions. Several studies demonstrated that the elicitation or binding of effective neutralizing antibodies are impaired by the gp120 glycan shield or steric hindrance of its constant regions [9] . Moreover, variable immunodominant domains were shown to be recognized by non-neutralizing antibodies. Nonetheless, it is estimated that 10% to 30% of HIV-1-positive subjects develop neutralizing antibodies (NtAbs) appearing at least 1 year after infection. Only 1% of infected patients develop a broad neutralizing response against heterologous virus strains [10] . Among HIV-1-infected patients, such antibodies arise only rarely and tardily, thus inefficiently controlling viral replication. However, the recent identification of broadly neutralizing antibodies (BNtAbs) and mapping of their epitopes fueled interest in the humoral immune response against HIV-1 (reviewed by Overbaugh [11] ).
To serve this purpose, the phage display technology has been extensively exploited in the field of HIV-1 as it represents one of the most powerful technologies for epitope mapping as well as for the identification of ligand binding to many types of targets.
Numerous studies have described the use of the phage display technology in the field of HIV-1 were reported. They can be classified in four main applications ( Figure 2 ): I. Epitope mapping, which relies on the screening of random peptide libraries on immobilized monoclonal or polyclonal antibodies to determine the linear and/or conformational epitopes recognized by these antibodies (linear epitope: sequence of continuous amino acids recognized by the paratope of a given antibody; conformational/discontinuous epitope: group of amino acids scattered along a protein sequence which come together in the folded protein and are recognized by the paratope of a given antibody). Such screening usually results in the identification of sequences mimicking the natural epitope (mimotopes) and provides precise information about the location of residues forming the natural epitope. These mimotopes may in turn be used as valuable immunogens to elicit antibodies targeting the original epitope, an approach referred to as -reverse vaccinology‖. II. Inhibitor discovery, based on screening of phage libraries displaying random peptides or antibody fragments against viral or host proteins critical for viral replication.
This review is intended as an overview of the different studies conducted using phages in the field of HIV-1, laying special emphasis on the nature of the phage libraries used, the target display mode, the biopanning procedure as well as the results obtained. These studies are classified according to the 4 applications described above and the main results are presented in tables.

Exploration of HIV-1 Epitope Landscape

Monoclonal antibodies (MAbs) or polyclonal antibodies (PAbs) epitopes can be identified through screening either combinatorial or antigen-fragment libraries displayed at the surface of phages. Antibodies may be derived from infected/immunized animals or from HIV seropositive patients with peculiar immunological profiles, such as the Long Term Non Progressors (LTNP) [15] , leading to the identification and characterization of HIV mimotopes. The seminal paper characterizing the epitope recognized by a MAb directed against HIV-1 using the phage display technology was published in 1993. Keller et al. screened a 15-mer Random Peptide Library (RPL) against the BNtAb 447-52D which targets the V3 loop of gp120 (KRKRIHIGPGRAFY) [16] (Figure 3 ) and identified 70 clones presenting a GPxR consensus sequence [17] (Table 1 ). These mimotopes were further used in rabbit immunization experiments and elicited neutralizing responses. Boots et al. later investigated the linear epitope recognized by the MAb 447-52D by combining gp120 competition and panning of V3-region biased/constrained libraries. Such a set-up favors the selection of mimotopes in which residues surrounding the GPGR crown motif are similar to those present in the gp120 used for competition, suggesting that the use of strain-specific competitors with a MAb of broad specificity can select for strain-specific mimotopes [18] . [36, 37] . In their study, a 20-mer RPL was constructed and panned against MAb 58.2 according to two different protocols, either streptavidin capture of phages mixed with biotin-labeled MAb 58.2 (SA-Bio) or panning against MAb 58.2 immobilized on microwells (micropan) [19] . Phages selected with the SA-Bio protocol shared a consensus sequence (Y/L)(V/L/I)GPGRxF homologous to the V3 loop. The micropan protocol allowed for the identification of sequences sharing the same motif, of which two were also identified in the SA-Bio panning. Biopanning results were further validated in peptide array hybridization assays. Hybridization of MAb 58.2 to 14-mer peptides containing all possible point substitutions within the V3 loop sequence demonstrated that both phage display and peptide array experiments identified the same critical amino acids, thereby confirming the quality of the 20-mer RPL and the validity of the screenings performed.

Antibodies Directed against Viral Proteins

Epitope mapping was also performed on a monoclonal antibody (MAb 19b) isolated from an asymptomatic HIV-1-infected patient and recognizing the xxIx 3 PGRAFYTT motif within the V3 loop sequence (KRIHIGPGRAFYTT) [38] . Binding of MAb 19b to viral isolates presenting mutations in this sequence revealed that not all residues within this recognition motif were crucial for reactivity [20] . Biopanning with a 15-mer RPL resulted in the selection of sequences compatible with the minimal binding site (-I----G--FY-T) inferred from gp120 sequence alignment from clades A to F which bound MAb 19b. Taken together, data from binding assays as well as phage biopanning experiments demonstrated that the MAb 19b epitope spans both sides of the V3 loop. Substitutions within the residues located at the crown of the loop are however tolerated, provided that the formation of a β-turn induced by the GPGR crown motif is allowed. However, one exception was reported by Boots et al. who reported that the Phe to Trp substitution may be tolerated in the absence of a β-turn [18] .
In parallel, Grihalde et al. constructed and panned a 30-mer RPL against MAb 1001, which recognizes a constrained linear epitope on the V3 loop [21] . Several clones were obtained and presented the common motif (R/K/H)xGR mimicking the crown of the V3 loop sequence, thereby confirming the epitope sequence of MAb 1001. To assess the reactivity of peptides deprived of the phage scaffold, the mimotope with the highest affinity for the MAb 1001 was expressed in fusion with the E. coli alkaline phosphatase. Binding of the phage and fusion protein to the MAb 1001 was assessed by ELISA, Western Blot and SPR assays and highlighted that binding was independent from the scaffold, although interactions were weaker when the peptide was displayed in the fusion protein format than in the phage scaffold.
In another study, Laisney et al. investigated the minimal epitopes recognized by two MAbs interacting with the V3 loop, 110-A and 19.26.4, whose specificity is strictly restricted to the X4-tropic LAI isolate [22] . The screening of a 6-mer RPL on the MAb 110-A allows the selection of numerous sequences with a consensus motif. Binding assays with synthetic peptides further showed that both MAbs reacted with residues 316-320 of the LAI gp120. In this narrow region, the minimal epitope deduced for the MAb 110-A was HyxRGP, whereas the MAb 19.26.4 recognized the xQ(R/K)GP motif (Hy: non-aromatic AA, underlined: AA tolerating substitutions). Interestingly, the essential QR residues located at positions 317-318 correspond to a QR insertion located upstream of the V3 loop GPGR crown motif that is characteristic of the LAI isolate and may thus explain the restricted specificity of the two MAbs. The same authors screened a 6-mer RPL on the MAb 268, specific to the V3 loop of the MN isolate, and identified two groups of sequences [23] . A representative sequence from the first group (268.1, HLGPGR), corresponded to the crown of the V3 loop, a linear epitope, while two sequences of the second group (268.2, KAIHRI and 268.3, KSLHRH), showed no homology to linear HIV-1 epitopes. Both peptides 268.1 and 268.2 nevertheless inhibited the interaction of MAb 268 with gp120, and were even able to compete with each other for binding to the antibody, indicating that peptide 268.2 was also a mimotope of peptide 268.1. When conjugated to KLH and injected separately into rabbits, both peptides 268-1 and 268-2 were able to elicit gp120-reacting antibodies that partially competed with the homologous peptide, confirming that 268.1 and 268.2 peptides are both antigenic and immunogenic mimics of the gp120 MN V3 loop.
The isolation of the BNtAb 2F5, which interacts with an epitope (ELDKWA) located on the gp41 MPER was reported in 1993 [39] . Conley et al. further characterized this epitope by biopanning a 15-mer RPL on immobilized 2F5 Ab. Different sequences were obtained and classified in four groups, whose consensus motifs, DKW, LDxW, ED(K/R)W and ELDKW, revealed information on the residues involved in 2F5 Ab recognition [24] .

Gp120 C1 Domain

Phage display was also applied to epitope mapping of Antibody-Dependent Cellular Cytotoxicity (ADCC)-inducing MAbs since HIV-1 infected cells may be targets for Fc receptor-bearing effector cells interacting with HIV-1-specific Abs. Screening of 7-mer, 7-mer-c and 12-mer RPLs against the ADCC-inducing MAb ID6 resulted in the identification of phages with TxxFxxWxxD (12-mer RPL) and FxDWxF (7-mer and 7-mer-c RPLs) motifs homologous to the C1 domain of gp120 [28] .
Competition assays showed that binding of MAb ID6 to gp120 or gp160 was abrogated in the presence of 12-mer mimotopes. In contrast, heptapeptide mimics only slightly impaired this binding, supporting the hypothesis that the MAb-ID6 epitope probably encompasses residues additional to the FxDWxF motif. As this epitope is highly conserved among circulating HIV-1 subtypes, it might be useful to induce MAb ID6-like antibodies.

Gp120 CD4-Binding Site

The BNtAb IgG1 b12 was the first neutralizing MAb selected from a phage-displayed Fab (antibody fragment composed of one constant and one variable domain of the heavy (CH1 and VH) and the light (CL and VL) chains linked together) library derived from an HIV-1-infected donor (See section 3.1.1.1.1.) [41] . This antibody recognizes a conformational epitope overlapping the CD4-binding site of gp120 [42] . Attempts to precisely map the residues interacting with the IgG1 b12 MAb with 15-mer and 21-mer RPLs provided no consensus sequence [18] . As previous screening of 11 cysteine-enriched peptide libraries resulted in the identification of two sequences bearing an SDL motif flanked by one or two cysteine residues (REKRWIFSDLTHTCI and TCLWSDLRAQCI) [30] , Zwick et al. constructed two sublibraries (x 7 SDLx 3 CI and xCxxSDLx 3 CI) sharing the SDL motif and reflecting the cysteine content of the two clones [29] . A B2.1 peptide (HERSYMFSDLENRCI) containing a unique cysteine bound b12 in Fab as well as IgG1 formats with a much higher affinity than the other clones. Moreover, the phage-borne B2.1 peptide was used to screen the Fab library from which b12 was identified. This -reverse panning‖ experiment showed that B2.1 was able to select only the Fab sequence corresponding to b12, confirming the specificity of the B2.1 mimotope towards the b12 Ab. B2.1 peptide was immunogenic in mice and rabbits but did not elicit significant anti-gp120 cross-reactive Abs titers.
Dorgham et al. attempted to map the b12 epitope with a RPL of two random 10-mers joined through an ALLRY spacer (x 10 ALLRYx 10 ) [31] . Selection resulted in the identification of clones sharing a M/VArSD consensus motif (Ar standing for any aromatic residue) as previously observed [29] . A second-and a third-generation of semi-RPL containing fixed consensus motifs identified in the previous panning surrounded by randomized residues were constructed (x 3 (M/V)WSDx 3 and xLXVWxDExx). Phagotopes (phage particle displaying a particular peptide sequence selected on a given target) obtained from the first, second and third generation libraries showed increasing binding affinity for b12, respectively. Phagotopes were able to compete with gp160 for b12 binding and triggered the production of Abs capable of recognizing at least five distinct, unrelated HIV-1 strains. In contrast, the corresponding peptides were not able to compete for b12 binding and did not elicit anti-gp160 MAbs. Such discrepancies between phagotopes and peptides might be explained by constraints imposed by the phage scaffold.
Detailed characterization of the BNtAb b12 was conducted with the Mapitope algorithm developed by Enshell-Seijffers et al. to facilitate the identification of discontinuous epitopes. This approach is based on the assumption that the collection of mimotopes recognized by a given antibody must in some manners reflect the antibody's paratope [32] . A constrained 12-mer RPL was screened against b12 and selected sequences were compared to those obtained from previous panning experiments performed against b12 [18, 29, 30, 43] . Although no similarity was observed with the mimotopes selected by Boots et al. [18] , a consensus WSDL motif was observed in the newly identified mimotopes and the sequences isolated by Bonnycastle et al. [29, 30] . Mapitope analysis conducted on these sequences as well as on the peptide sets isolated by Boots and Bonnycastle resulted for each of the three panels in the prediction of two clusters located at the periphery of the CD4 binding site.
At the same time, both a linear 9-mer RPL and a constrained 10-mer RPL were used in panning experiments against another gp120 CD4 binding site MAb (5145A) [33] . Screening of the 9-mer RPL resulted in selection of a single sequence (WKPVVIDFE), while screening of the 10-mer-c RPL on 5145A allowed for identification of a GPxEPxGxWxC consensus motif. Peptides were synthesized as peptide-pIII fusion proteins and their affinity for 5145A was assessed in phage/MAb and gp120/MAb binding inhibition assays. The two most affine peptides (AECGPAEPRGAWVC and AECGPYEPRGDWTCC) were used to immunize rabbits and elicited antibodies binding to recombinant monomeric gp120. Nevertheless, generated Abs seemed to target a different epitope since they were unable to compete with the 5145A CD4-binding site specific MAb.

Other Domains

The extreme C-terminus of gp120 forms a pocket which may interact with gp41 and was suggested to undergo conformational changes weakening the interaction between gp120 and gp41 upon CD4 binding. In the absence of available crystallographic information, Ferrer et al. utilized the mouse MAb 803-15.6 to analyze an epitope overlapping with this pocket region [34] . Epitope mapping of MAb 803-15.6 achieved by cross-blocking experiments on gp120 suggested that the Ab recognized residues 502-516 while the screening of an heptapeptide RPL against MAb 803-15.6 preincubated with gp120 allowed for the recovery of phages presenting an AxxKxRH motif homologous to residues 502-508. Affinity studies confirmed that Ala was the N-terminal residue of the MAb 803-15.6 epitope and showed that affinity increased when C-terminal residues were added. The Mapitope algorithm designed by Enshell-Seijffers et al. was initially developed to elucidate the CD4-induced epitope recognized by the MAb 17b [32] . Screening of a 12-mer-c RPL yielded sequences with no homology to gp120. Comparison of the mimotopes to the gp120 structure in complex with MAb 17b and sCD4 predicted candidate epitopes that were in agreement with the actual 17b contact residues. For further validation of the algorithm, RPL libraries were screened against the p24-specific MAb 13b5 and analysis of the selected sequences predicted four clusters, the largest of which corresponded to the genuine epitope. The algorithm was finally applied to the Mab CG10, an Ab with an unknown epitope competing with the Mab 17b for the binding to the CD4/gp120 complex. Mimotopes sequences were analyzed and produced seven clusters, one of them being in accordance with previous mutation analysis impeding Mab CG10 binding [44] . Noteworthingly, when reconstituted in a phage scaffold, the epitope was capable of binding Mab CG10.
After having successfully identified linear or nearly linear epitopes [17, 20, 24] , Boots et al. extended the use of the phage display technology to the identification of epitopes recognized by MAbs binding to discontinuous sequences [18] . One of these Abs (MAb A32) binds to a CD4-induced discontinuous epitope involving residues within the C1, C2 and C4 regions of isolates from clades B, C, D, E and F [45, 46] . Panning of a 15-mer RPL yielded several phages which only shared a Trp residue. In the same study, panning of a 15-mer RPL against MAb 50-69, which reacts with the ID GKLIC region of gp41, resulted in the identification of sequences sharing a common Trp within motifs WGCx(K/R)xLxC and FGxWFxMP. The selected consensus sequences were however not further characterized.
The BNtAb 2G12 presents the typical feature of recognizing a cluster of high-mannose oligosaccharides of gp120 [47] [48] [49] . In an attempt to identify peptidic immunogens capable of eliciting 2G12-like Abs, Menendez et al. screened a set of previously described RPLs [30] against 2G12 and identified one phagotope specifically binding to 2G12 (2G12.1) [35] . The crystal structure of MAb 2G12 complexed to the synthetic 2G12.1 peptide was compared to structures of 2G12-oligomannose epitopes and revealed that interactions with the Abs were different for the two ligands. These results showed that the peptide selected from RPL panning experiments is not a structural mimic of the 2G12 oligomannose epitope. The phagotope 2G12.1 was used in rabbit immunization experiments and elicited high titers of peptide-specific antibodies, but no cross-reactivity with gp120 was obtained, further supporting that peptide 2G12.1 is not an immunogenic mimic of the MAb 2G12 epitope.

Polyclonal Antibodies Directed Against Viral Epitopes

The first attempt at identifying epitopes recognized by HIV-specific PAbs was performed in 1999 on plasma IgG from two LTNP patients (Table 2 ). Using linear and constrained 9-mer RPLs, Scala et al. identified mimotopes of the linear immunodominant (ID) GKLIC region of gp41 or the V1 and C2 domains of gp120 [50] . These mimotopes were immunogenic when injected to mice and elicited an NtAb response against HIV-1. Moreover, the same mimotopes reduced viraemia to undetectable levels in immunized monkeys as shown in a subsequent study [51] . The same year, a similar study conducted on one LTNP with an RPL library of cysteine-constrained 12-mers selected for peptides defining the gp41 ID epitope CSGKLIC. The levels of reactivity of these phagotopes were further assessed against a panel of HIV positive plasma to evaluate the plasticity and polyclonality of the immune response mounted by 30 infected individuals [52] . Later, Palacios-Rodriguez et al. evaluated the impact of factors such as Highly Active AntiRetroviral Treatment (HAART) or Ab titers on a selection of peptides mimicking the ID epitope CSGKLIC [53] . In their study, a mix of linear 12-mer as well as linear and constrained 7-mer RPL was screened against the individual plasma samples of four HIV-1 infected patients initiating HAART and presenting different titers of anti-GKLIC antibodies. A consensus motif CxxKxxC was obtained from the 12-mer linear RPL, and the percentage of occurrence of the motif in the selected sequences was proportional to the anti-GKLIC Ab titers of each sample, indicating that these Abs are involved in selection of the consensus motif. Mice immunization experiments with the two mimotopes resembling most to the gp41 ID parental epitope as well as with pools of phages eluted from the panning experiments showed that all phages elicited reactivity, and that immunization with the phage eluates induced the strongest recognition. These findings indicate that the immunogenic properties of mimotopes are different and additive, opening the possibility of immunizing animals with different mimotope combinations (See Section 4).
A similar approach was used by the same authors on a rhesus macaque infected with an SHIV chimera encoding the env of a clade C HIV-1 strain (SHIV1157ip) and presenting a broad neutralizing response against homologous SHIV-C as well as heterologous HIV-1 strains of different subtypes [61] . Biopanning yielded clones similar to gp120 (V2 and V3 loops or C-terminal domain) or to regions of gp41 (ID GKLIC region, other ID regions and MPER domain) [55] . Remaining clones showed no significant homology to linear HIV-1 regions and were analyzed with the 3DEX software, which allowed the identification of a discontinuous mimotope located near the V3 loop crown. The antibodies binding to this phagotope were affinity-purified and subsequent assays demonstrated that recognition was conformation-dependent.
An immunofocused immunization of mice primed with a DNA vector coding for the gp160 SHIV1157ip and boosted with pools of phage particles corresponding to the V3 loop, the gp120 C-terminus, the gp41 ID region, the GKLIC region and the MPER domain was set up. Almost all mice developed anti-env Abs and 59% of them presented a neutralizing activity.
In 2009, Dieltjens et al. applied the phage display technology to identify the epitopes potentially involved in the BNtAbs response of an HIV-1 CRF02AG-infected individual (ITM4) and to monitor the evolution of humoral response and viral escape through the course of infection [56] . Biopanning of a 12-mer RPL against plasma samples from ITM4 resulted in the identification of different peptide sequences. Half of these sequences were homologous to linear epitopes on gp41, i.e., the 4E10 epitope region in the MPER domain (NWFNLTQTLMPR) or the lentivirus lytic peptide 2 (LLP2) (SLxxLRL) while the other peptides shared homologies with the C1 domain (KxWWxA) and the crown of the V3 loop (Kx 3 IGPHxxY) of gp120. Further analysis of the levels of reactivity of the phage groups against ITM4 six-year follow-up samples revealed different temporal patterns of recognition, confirming the dynamic nature of the immune response. Interestingly, the MPER region was the only epitope retaining immunogenic properties during this period.
In a more recent study, the same group investigated the antigenic landscape of an HIV-1 subtype A-infected individual with BNtAbs by screening an RPL library against a pool of sequential samples drawn from 1994 to 2005 [57] . The biopanning procedure yielded sequences predicted to represent autologous V2 sequence (Kx 3 Hx 3 Y), V3 loop (KxxHxGPx 3 F) and gp41 ID domain (CxGxLxCTxNxP). Again, follow-up sample recognition of the four phage groups showed different patterns. Antibody reactivity towards gp41 ID region fluctuated slightly in all plasma samples. Reactivity against the V3 loop-like phages decreased over time. In contrast, the V2 loop mimotopes were not recognized before 2001, but once emerged, reactivity persisted until 2005. Env sequence analysis of the follow-up samples showed that a Tyr to His mutation in the V2 loop sequence coincided with the emerging antibody response against this sequence. Additionally, the authors highlighted that the neutralizing activity observed in the samples was partially due to antibodies recognizing the V3 mimotopes.
Besides the multiple reports on the use of RPL to characterize the humoral response against HIV-1 Env proteins, Gupta et al. evaluated the reliability of using targeted antigen gene fragment libraries for the identification of epitopes recognized by antibodies elicited in rabbits immunized with p24. To this end, they constructed a phage library composed of DNAse-digested fragments of Gag DNA [58] . Phagotopes obtained after the first panning round displayed mainly 30-40-mer peptides, 70% of which mapped to of the N-terminus of p24 (150-240 of Gag) and 30% corresponded to the C-terminal region of p24 (310-360 of Gag). Only one phagotope mapped to the central region of Gag (269-310). At the end of the second round, selected phages displaying longer inserts of 40 to 50 AA corresponding to the N-and C-terminal regions of Gag were identified, revealing the presence of two distinct antigenic regions in Gag. This study demonstrated that gene-fragment phage display could be used to identify epitopes targeted by polyclonal Abs.

Antibodies Directed against Host Proteins

Although they occur at a very low frequency in humans, antibodies targeting host proteins involved in HIV-1 infection have been reported in immunized animals. Given their potential value for viral entry inhibition and the general understanding of this mechanism, RPLs were screened on these MAbs to gain better knowledge of their epitopes (Table 3) . NA: data not available.
The murine MAbs 3A9 and 5C7 were raised against cells transfected with the seven transmembrane-spanning domains chemokine receptor CCR5, one of the main coreceptors for HIV-1. They recognize a common epitope located near the CCR5 N-terminus [67, 68] . Both MAbs were used to screen a constrained 9-mer RPL [63] . Phagotopes selected on 3A9 displayed the sequence CHASIYDFGSC while CPHWLRDLRVC was the most prevalent sequence isolated on 5C7. These sequences showed homologies to residues located at the N-terminus but also within the first or third extracellular loop (ECL) of CCR5. Both reacted against the targeted MAb either in phage, cyclic peptide or linear peptide formats. Moreover, they were able to bind to gp120 and the peptide selected on 3A9 inhibited binding of the MAb to a cell line expressing CCR5. To further characterize the conformational epitope recognized by 3A9, additional screening rounds of 12-mer, 7-mer and 7-mer-c RPLs were performed [62] .
Another murine antibody (Mab 2D7) recognizing a conformational epitope on the second ECL of CCR5 [67] was explored by screening a linear 15-mer RPL [64] . Three phagotopes (M14, M23 and M71) were isolated and one of them (M23) was able to inhibit cell infection by the HIV-1 SF162 isolate. The corresponding peptide (FCALDGDFGWLAPAC) fused to the pIII phage coat protein neutralized infection mediated by the JR-FL but not the IIIb strain. The fusion protein specifically bound 2D7 and was recognized in a dose-dependent manner by three CCR5 chemokine ligands, i.e., CCL5 (RANTES), CCL3 (MIP1α) and CCL4 (MIP1ß), confirming its CCR5 mimicry. Six years later, another screening campaign was conducted with a linear 12-mer RPL on 2D7 and the EWQKEGLVTLWL sequence of a high-affinity binding peptide was obtained [65] , revealing that this peptide presented homologies to the N-terminal (170-QKEGL-174) and C-terminal regions of the CCR5 ECL2. Ala substitutions of the TL residues confirmed their crucial role in 2D7 binding. The selected peptide was used in rabbit immunization studies and elicited Abs with 2D7-like biological functions, i.e., which inhibited HIV-1-mediated cell fusion and PBMC infection.
The CD18 cell surface molecule, a part of the LFA-1 molecule, is involved in the syncytia formation of HIV-1-infected lymphocytes [69] . As MAb MHM23, a CD18 binder, inhibits HIV-1-mediated cell fusion, Poloni et al. applied the phage display technology to map the MHM23 epitope and thereby identify the CD18 domains which account for syncytia formation [66] . Linear and constrained 9-mer RPL were panned on the MHM23 MAb, to allow for the selection of linear and constrained sequences. A PPFxYRK consensus motif was inferred by sequence comparison, assigning the epitope recognized by MHM23 to residues 200-206 of CD18. Two phagotopes inhibited in vitro HIV-1-induced syncytia formation and one of them retained this ability in the peptide format, confirming its role in syncytia formation and highlighting that mimics of this epitope could prevent cell-mediated viral propagation.

Identification of HIV-1 Inhibitors by Phage Display

Screening of phage-displayed RPLs, antibody-fragment or ligand libraries on viral or host targets contributed to the discovery of molecules interacting with the key players of HIV-1 infection. Antibody libraries were particularly investigated, and repertoires of Fab, ScFv (antibody fragment corresponding to variable regions of the heavy (VH) and light (VL) chains of antibody connected by a short peptide linker), V HH /nanobodies (single domain antibody fragment (SdAb) corresponding to the variable heavy-chain domain of a camelid heavy-chain only antibody (HcAbs)) or CDR3 fragments from naï ve or HIV-1 infected subjects as well as from immunized animals were displayed at the surface of phages ( Figure 4 ).

Inhibitors of HIV-1 Proteins

Most of the HIV-1 inhibitors selected with the help of phage display were identified by targeting viral proteins (Table 4 ). (2) 5-Helix, IZN36
(1) PIE12-trimer (1) Tat
(2) Cyclin T1 [107]
[108]
108
(1) NCp7
(2) Psi RNA The CD4 binding site represents one of the main Achille's heels of the virus since it is involved in the earliest step of HIV-1 entry and is conserved in almost all HIV-1 strains [126, 127] . Numerous phage display biopannings were performed on gp120 and are classified here according to the type of antibody libraries used.

Fab Libraries

Burton et al. were the first to report the construction of a phage-displayed Fab library from the bone marrow of an asymptomatic HIV-1-infected patient with high titers of gp120-specific Abs [41] . This library was screened against recombinant gp120 from the IIIb isolate and clones displaying high affinities (<10 nM) for gp120 were selected [42] . One Fab (b12) (See Section 2.1.1.4.) was able to neutralize the MN and IIIb strains in different set-ups. This Ab is the most potent neutralizing Ab isolated to date, featuring neutralizing activity against 75% of 36 primary isolates of HIV-1 tested at concentrations that could be achieved by passive immunization [73] .
To improve its affinity, the b12 Fab was submitted to CDR walking, a procedure involving randomization of its CDR and expression of the derived libraries expressed on phages, followed by screening against on gp120 [73] . Sequential CDR walking of the HCDR1 and HCDR3 domains was performed and four clones were chosen for detailed analysis of their binding affinity and neutralization potency against the IIIb and MN isolates. A Pro96Glu mutation of the HCDR3 was identified in the clone with highest affinity, 3B3, which bound IIIb gp120 with an 8-fold improved affinity (0.77 nM) compared to the parental b12 Fab. Similarly, Fab 3B3 was able to neutralize four isolates that were insensitive to the parental b12 Fab. The CDR walking mutagenesis strategy was pursued in a subsequent study and a further 420-fold improvement of the binding affinity of 3B3 for gp120 was achieved, reaching 15 pM [72] .
The extremely high binding affinity of 3B3 was also applied to develop an immunotoxin which could specifically kill HIV-1-infected lymphocytes [71] . The authors engineered 3B3 ScFv fused to a truncated form of Pseudomonas exotoxin A. The 3B3(Fv)-PE38 fusion immunotoxin bound to the MN strain of gp120 with the same affinity as the parental Fab antibody and specifically killed a gp120-expressing cell line and a chronically HIV-infected lymphocytic cell line. This study provided the proof-of-concept that high affinity anti-HIV-1 antibodies have a dual application since they may be used for their neutralizing potency but also as carriers for antiviral compounds.
Most antibodies obtained through screening Fab libraries against monomeric gp120 targeted epitopes related to the CD4-binding domain of gp120, pointing to it as an immunodominant epitope [70] . To expedite the identification of NtAbs directed against weakly immunogenic epitopes, a strategy called epitope-masking was applied in several studies. This biopanning approach is designed to mask a particular epitope with antibodies or ligands directed against the region of interest prior to addition of the phage library. Ditzel et al. panned a Fab library from an asymptomatic HIV-1-infected patient on immobilized recombinant gp120 and identified two dominant clones targeting the CD4-binding site [74] . These two Fabs were then incubated with gp120 to mask their respective epitopes and the panning of the library was repeated, highlighting (based on sequence similarities) four groups of Fabs recognizing gp120 with affinities in the range of 50 to 100 nM. Epitope mapping of one representative Fab for each group showed that gp120 binding of three clones was influenced by the V2 loop and the CD4-binding site and was not affected by the glycosylation status of gp120. Furthermore, one of these Fabs (L78) featured a broad neutralization spectrum against various HIV-1 strains. The authors performed further epitope masking by using different selection strategies with the same Fab phage library [81] . The first strategy involved masking the CD4-binding site (CD4BS) epitopes either with soluble CD4 or with a CD4BS Ab. All Fabs selected on sCD4-bound gp120 recognized the C1 region, while the Fabs isolated from the CD4-BS Ab-captured gp120 were classified in four different groups: (i) Fabs targeting the C1 region, similarly to Fabs isolated on sCD4-bound gp120; (ii) Fabs directed against the C1-C5-region; (iii) Fabs recognizing the V2 loop; and (iv) Fabs directed against a CD4BS/V2 loop region, similar to the neutralizing FAb isolated by Ditzel et al. [74] . Multiple epitope masking was then conducted by masking the CD4BS-MAb-captured gp120 with one of the C1-specific Fabs selected on the sCD4-bound gp120 prior to phage addition, leading to the identification of two C1/C2-dependent Fabs. All isolated Fabs bound their targets with affinities ranging from 4 to 300 nM. However, these FAbs targeting weakly immunogenic regions were not or poorly neutralizing.
More recently, Koefoed et al. investigated the anti-gp120 Ab repertoire of the circulating gp120-binding IgG-bearing B cells of 22 HIV-1-infected patients by constructing phage displayed Fabs libraries from unselected cells or from cells preselected with immobilized gp120 [128] . Panning against gp120 selected for a higher number of phagotopes from the preselected library. Clones from the unselected library recognized the V3 loop, while clones from the preselected library targeted the CD4 BS or a CD4-induced epitope encompassing the C1 region. These Fabs displayed no significant differences with respect to epitope specificity, affinity and neutralization ability compared to Fabs obtained from bone marrow libraries, and most of them were unable to neutralize HIV-1. These results were in accordance with previous findings by Parren et al. (1997) concluding that the majority of the circulating HIV-1 specific antibodies were elicited by viral debris and were therefore devoid of neutralizing activity [129] .

ScFvs Hydrolyzing gp120

Antibodies recognizing the amino acids 421-436 of the gp120 CD4BS were isolated from patients suffering from the systemic lupus erythematosus autoimmune disease [130, 131] . However, whether these antibodies neutralized HIV-1 was not known, which prompted Karle et al. to quantify gp120-recognizing Abs in an existing ScFv phage-displayed library from the PBMCs of lupus-suffering patients [132] . Biopanning selected for clones binding both gp120 and the 421-436 region of the gp120-CD4-binding site. One of these clones (JL413) neutralized R5 and X4-tropic HIV-1 primary isolates from clades B, C and D with IC 50 ranging from 0.1 to 25.6 µg/mL.
A subset of gp120-binding antibodies was shown to hydrolyze gp120 by a mechanism analogous to serine protease [133] , As the nucleophilic region responsible for this activity was localized in the light chain [134, 135] , a library of light chains prepared from three lupus patients [132] was screened with an electrophilic analogue of gp120 residues 421-433 to isolate antibodies capable of binding and hydrolyzing gp120 [75] . One of the light chain clones selected (SKL6) cleaved a gp120 421-433-reporter substrate as well as full-length gp120. Engineering of Abs composed of such a light chain coupled to a gp120-binding heavy chain might provide Abs with anti-viral proteolytic activities.

CD4 Mimics

The CD4 receptor recognizes gp120 through residues located within its V1 region, and engineering of this cell receptor was applied to identify CD4 variants with a better affinity for gp120, thereby displaying HIV-1 inhibitory properties. In 1997, Krykbaev et al. constructed a phage-displayed library of CD4 V1 and V1-V2 variants generated by error-prone PCR and screened it against gp120 [79] . Five clones with increased affinity for gp120 and presenting mutations within the CD4 V1 domain were identified. All of these clones inhibited HIV-1 entry with IC 50 ranging from 0.2 to 1 µg/mL.

Gp120 V3 Loop

The phage display technology was used to improve the affinity of the Mab 447-52D for its V3 loop epitope, which had also been identified through phage display [16, 17] . In 1996, Thompson et al. expressed MAb 447-52D as ScFv on phages and combined its V H with λ and κ chains from a non-immunized PBL repertoire prior to panning on a peptide containing the V3 loop sequence [80] . Additional shuffling of the HCDR1 and HCDR2 regions was combined to HCDR3 -spiking‖, i.e., the introduction of random mutations, resulting in the identification of four key residues that could be mutated to improve Ab affinity. A sublibrary in which all four codons were simultaneously mutated was constructed and biopanning allowed to select for one ScFv (402P5H7) with improved K D against the MAb 447-52D epitope. Two neutralizing Fabs (Fab loop 2 and Fab DO142-10) were obtained by screening a Fab library against recombinant gp120 reaching IC 50 ranging from 0.2 to 8 µg/mL [70, 81] .

Gp120 CD4-Induced Epitope

In 1999, Ferrer and Harrison screened 7-mer, 12-mer and constrained 9-mer RPL against gp120 and identified two peptides from the 12-mer RPL [82] . The first sequence RINNIPWSEAMM (12p1) inhibited CD4 as well as NtMAb17b binding while the second peptide, TSPYEDWQTYLM (12p2) did not affect the CD4 interaction and rather enhanced 17b binding. The 12p1 peptide was further investigated and shown to inhibit binding of monomeric YU2 gp120 to both sCD4 and 17b with IC 50 values of 1.1 and 1.6 µM respectively. The 12p1 peptide also inhibited binding of these ligands to trimeric envelope glycoproteins, blocked binding of gp120 to the native coreceptor CCR5, and specifically inhibited HIV-1 infection of target cells in vitro [138] .
HIV-1 entry is a multi-step process requiring the successive binding of gp120 to CD4 and to a coreceptor, CCR5 or CXCR4, and triggers successive conformational changes that expose transient epitopes. Targeting of these epitopes with NtAbs could therefore prevent HIV-1 infection, as has been proven with the clinically approved fusion inhibitor Enfuvirtide. To identify such receptor-induced epitopes, Moulard et al. screened a Fab library constructed from an HIV-1 infected patient (FDA-2) with high NtAb titers against gp120-CD4-CCR5 complexes [83] . One Fab clone, X5, bound gp120 from several strains with a low nanomolar affinity. Furthermore, binding affinity was significantly increased in the presence of CD4 and slightly enhanced by CCR5. Competition assays with a panel of antibodies targeting different HIV-1 epitopes revealed that X5 recognized an epitope located in close vicinity to the CD4 and coreceptor binding sites. Neutralization assays with isolates from clades A, B, C, D, F and G demonstrated that X5 neutralized all isolates with potency comparable to that of b12. X5 is the first BNtAb recognizing a receptor-induced epitope identified to date.
To select for ligands for CD4-induced epitopes, murine leukemia virus particles carrying the Env protein of the dual-tropic 89.6 strain pre-incubated with sCD4 were recently used to screen 7-mer, 7-mer-c and 12-mer RPLs [84] . One of the selected phagotopes (XD3: HKQPWYDYWLLR) displaying sequence similarities (in bold) with the N-terminal extracellular part of the CCR5 and CXCR4 coreceptors was identified, suggesting novel potential leads for tyrosine sulfation. Both the XD3 phagotope and XD3 peptides strongly and specifically bound to 89.6 gp120 regardless of CD4 and XD3 competed with MAb 17b for binding to a CD4-induced epitope. The sulfated form of the XD3 peptide recognized X5, R4 and dual-tropic strains and inhibited HIV-1 entry in the high micromolar range.

Gp120 C1 Domain

The Salp15 salivary protein of Ixodes scapularis inhibits CD4+ T cells activation by binding to the CD4 molecule in a region that may overlap with the gp120 binding site. This inhibition is mediated by the C-terminal 95-114 GPNGQTCAEKNKCVGHIPGC sequence [139] . Juncadella et al. thus analyzed Salp15 as a potential HIV-1 inhibitor and demonstrated that Salp15 inhibited gp120-CD4 interaction and subsequent cell fusion and that the GPNGQTCAEKNKCVGHIPGC peptide also interacted with gp120 [140] . To identify which gp120 amino acids interacted with peptide 95-114 of Salp15, the authors screened a 7-mer RPL against Salp15 and isolated a HVITPLW sequence homologous to an (I/L)TPL motif of the gp120 C1/V1 domain which is highly conserved across HIV-1 isolates. Finally, they mapped the interaction site of full-length Salp15 protein or of its 95-114 C-terminal domain were able to bind to the PCVKLTPLCVTLNCT peptide within the gp120 C1/V1 region.

Phage Display as a Tool to Unravel the HIV-1-Specific Humoral Response

In addition to epitope mapping and inhibitor identification, phage display was also widely applied to elucidating the determinants of the initial response to HIV-1 antigens and more particularly the importance and different roles of IgM and IgG during the establishment of infection. Indeed, all known BNtAbs are IgGs that are somatically hypermutated and are thus more difficult to elicit. In contrast, IgMs are closer to germline antibodies and the identification of HIV-1 specific IgM could be relevant for the development of vaccine immunogens. The studies listed in this section explored and emphasized the importance of the initial IgM response against HIV-1 and viral strategies to skew it towards non-neutralizing or infection-enhancing antibodies.
Screening against gp120 with IgM and IgG Fab repertoires constructed from a healthy donor demonstrated that only Fabs isolated from IgM were able to recognize gp120, although they were polyreactive, displayed low affinities and no neutralizing properties [141] . Sequence analysis evidenced that selected gp120-binding Fabs originated from different V H germline genes. Several studies reported that gp120 displays superantigenic properties giving it the ability to bind and stimulate non-immune B cells to secrete V H 3 Ig in vitro [142] . Interestingly, the V H 3 antibody family is the most represented immunoglobulin gene family in healthy adults (54% of peripheral repertoire) and HIV-1 infection leads to altered V H 3 production through selective depletion of the anti-HIV-1 V H 3 antibodies [143] . Toran et al. further applied the phage display technology to examine and compare the human V H 3 genes involved in IgM and IgG responses to gp120 to identify the correlates of long-term non progression. Two IgM and IgG phage-displayed Fab libraries from an HIV-1-infected LTNP with high gp120-specific IgM and IgG1 titers were constructed and screened [144] . Several clones were selected from the IgM library (M02, M025, 4M26 and 4M40) and three clones from the IgG library (S20, S19 and S8). All IgM Fabs were polyreactive and had a binding affinity for gp120 in the micromolar range while the IgG Fabs were specific and bound gp120 with affinities in the nanomolar range, as expected. Sequence analysis showed that IgG Fabs originated from the same germline Ab. The IgM Fab M025 displayed the same V H region nucleotide substitutions as those of IgG Fab S8 and used similar D H and J H segments, suggesting that S8 arose from M025 by isotype switching. In addition, a four aminoacid difference in the HCDR3 sequence of M025 (TGQWE) and S8 (RGGSI) was proposed to be associated with the 100-fold affinity increase for gp120 and to the higher neutralizing activity of IgG Fab S8 (ID 50 = 23 ng/mL) than of IgM M025 (ID 50 = 3 µg/mL). In a follow-up study conducted two years later, the three IgG Fabs were submitted to reverse mutations to reconstitute the germline amino acid residues [145] . The higher affinity and neutralizing ability of S20 were due to the Ala30Arg and Ala31Asp somatic mutations in the HCDR1 region of the germline gene sequence, providing clues for rational modifications of CDR in human antibodies to improve affinity and HIV-1 neutralization capacity.
The IgM to IgG isotypic switch generating high affinity neutralizing or non-neutralizing antibodies is triggered by the activation of IgM-producing B cells. To characterize the epitopes recognized by HIV-1-specific IgM and to assess the effects of these Abs on HIV-1 infection, Chen et al. constructed a Fab library from blood, lymph nodes and spleen from 59 healthy donors [146] . The library was panned against gp140 (Env ectodomain containing both gp120 and a truncated gp41 lacking transmembrane domain and cytoplasmic tail) of a clade B isolate and allowed for the selection of one Fab clone (R3H1m) with a relatively high binding affinity for gp140 from different strains. A sublibrary derived from this clone was panned against gp140 from different isolates, resulting in the selection of clones (m19, m19a, m19b, m19c and m19d) binding with high affinity to clade B and F gp140 (EC 50 ranging from 2 nM to 80 nM). While these antibodies had weak neutralizing properties against X4-tropic isolates, they did not inhibit and in some cases even enhanced infection with R5-tropic isolates. The m19 Ab, whose sequence is relatively similar to the germline Ab, targeted highly conserved epitopes located near the CD4 binding site or the coreceptor binding site. The high immunogenic capacity of the conserved non-neutralizing epitopes of such antibodies could divert the immune system from actually neutralizing epitopes. The authors suggested that these newly identified MAbs could be used as probes to further characterize conserved non-neutralizing or enhancing epitopes and to modify or remove them from candidate vaccine immunogens. The epitope masking strategy (section 3.1.1.1.) might be applied to these epitopes to redirect the immune system to elicitation of antibodies targeting neutralizing epitopes.
Very recently, phage-and yeast-displayed Abs libraries constructed from an HIV-1-infected patient with 2F5-like BNtAbs were panned against peptides containing the 2F5 epitope and against the HIV-1 JR-FL gp140 [148] . Two MAbs (M66 and M66.6) were identified and the most mutated variant (M66.6) neutralized HIV-1 with a higher potency than M66. Ala substitutions indicated that both Abs recognized the DKW core of the 2F5 epitope and two additional Leucine residues located upstream (L(660,663)).

Gp41 Heptad Repeat Inhibitors

Fusion of viral and host membranes, the last step of HIV-1 entry, requires the initiated by the insertion of the gp41-encoded fusion peptide into the host cell membrane and the formation of an extended prehairpin intermediate (PHI). The gp41 N-and C-terminal heptad repeats (NHR, CHR) then collapse to form a six-helix bundle (6-HB) in which the NHR form a trimeric coiled-coil, creating grooves where the CHR bind. The viral and cellular membranes are thereby brought into close proximity, enabling fusion ( Figure 1 ). During PHI formation, the NHR and CHR do not interact and may thus be transiently targeted by compounds which prevent the formation of the six-helix bundle in a dominant-negative manner. One such compound is the synthetic CHR mimic Enfuvirtide (T20) [4, 149] . A hydrophobic pocket located on the N-terminal peptide trimer groove of the 6-HB is highly conserved among HIV-1 sequences and plays a critical role in membrane fusion, and therefore represents a select target for inhibitors.
Eckert et al. used the particular approach called -mirror-image phage display‖ to identify D-peptides targeting the gp41 hydrophobic pocket [89] . In this approach, a phage library of natural L peptides is screened against the mirror image of a target synthesized as D-peptide [150] . By symmetry, the selected D-peptide phagotope sequences will bind the natural L-form of the target. The main advantage of D-peptides over L-peptide inhibitors is their resistance to natural proteases which enhances their oral bioavailability and serum half-life. In a first study, the authors screened a constrained 10-mer RPL against the D-peptide sequence of the gp41 hydrophobic pocket fused to a soluble trimeric coiled-coil (IZN17). Pocket-specific binders with a consensus motif Cx 5 EWxWLC were identified and inhibited cell fusion or HIV-1 entry into cells with an IC 50 in the micromolar range when synthesized using D-amino acids. In a second study, the same authors constructed a sublibrary based on the consensus sequence identified in their first study [89] which allowed the selection of sequences with a fourfold potency increase [87] . Surprisingly, the most potent peptide was a 8-mer with a Cx 3 EWxWLC motif which was probably selected from the 10-mer sublibrary because its smaller size favored a more compact hydrophobic core upon binding to the gp41 hydrophobic pocket. Screening of second generation D-peptides from a 8-mer CX 4 WXWLC library led to the selection of pocket-specific inhibitor of entry (PIE) 7 with an IC 50 of 620 nM. Dimeric and trimeric forms of PIE7 had respective IC 50 values of 1.9 nM and 250 pM. In a third study, the same authors constructed a phage library based on the PIE7 core sequence flanked by two randomized amino acids (xxCDYPEWQWLCxx) and obtained phages with the H(A/P)-[PIE7 core]-(R/K/E)L consensus sequence [88] . A peptide (PIE12, HPCDYPEWQWLCEL) exhibited a broad neutralizing spectrum and was even more efficient than T20, reaching an IC 50 of 0.5 nM, when trimerized. Moreover, this third generation PIE12-trimer displays broadened inhibitory potency and resistance to viral variants, as escape mutants required over 65 weeks of selection in vitro to emerge. The PIE12 trimer is thus a promising entry inhibitor and may be used as a topical microbicide in its D conformation.
In 2005, there was no evidence of Abs capable of binding the highly conserved NHR region targeted by the T20 inhibitor. To determine whether antibody fragments could target this determinant on the gp41 protein, Miller et al. constructed a phage-displayed naï ve ScFv library and screened it against a synthetic protein mimicking the 6-HB [90] . This construct, named 5-Helix, lacks one of the three CHRs and the NHR trimer is partially exposed, presenting a single binding site for a CHR mimic [151] . The ScFv library was also panned against the IZN36 compound, a homotrimerized form of 36 NHR amino acids fused to a coiled-coil peptide, therefore representing a 6-HB mimic devoid of the CHR trimer [91] . The authors identified a ScFv (D5), which blocks HIV-1 entry and inhibits infection in a single-cycle infectivity assay. This ScFv retained its properties when produced as a whole IgG1. The antibody was found to bind the hydrophobic pocket of the NHR trimer and Ala scan experiments revealed the crucial role of residues L568, W571, and K574 located in the hydrophobic pocket for this interaction. IgG1 D5 was able to neutralize at least five HIV-1 isolates with IC 50 ranging from 93 to 1750 nM, thereby demonstrating that the hydrophobic pocket of the NHR trimer is accessible for binding of HIV-1 inhibitors as large as IgGs.
Fusion inhibitors were also identified by screening non-immune human Fab libraries. Louis et al. screened such a library [155] against antigens comprising the trimeric coiled-coil NHR fused or not to the gp41 six-helix bundle (N35CCG-N13 and NCCG-gp41, respectively) [156, 157] . They identified Fabs targeting (i) the 6-HB; (ii) the NHR trimeric coiled-coil or (iii) both 6-HB and trimeric coiled-coil [95] . These antibodies were tested in a cell fusion inhibition assay and the two more potent MAbs, belonging to the third group, featured an IC 50 of 6-7 µg/mL. Two years later, the same library was screened against NCCG-gp41 and 6-HB antigens [96] . Two clones, Fabs 3663 and 3670, inhibited cell fusion while one Fab 3674 clone selected against NCCG-gp41 was also effective in infection neutralization assays. Fab 3674 bound the 6-HB as well as stable NHR trimers, and recognized an epitope that partially overlapped the hydrophobic pocket targeted by the D5 Ab. The same authors subsequently demonstrated that the N36 Mut(e,g) peptide presenting mutations within the 5 th and 7 th AA residues of the heptad repeat [158] increased the temporal window of viral sensitivity to Fab 3674 and thereby synergistically enhanced the neutralizing activity of Fab 3674 as well as of the BNtAbs 2F5 and 4E10 [98] . A Fab sublibrary was created by affinity maturation of the Fab 3674 HCDR2 loop and screened against NCCG-gp41, selecting for three Fabs (Fabs 8060, 8066 and 8068) with enhanced potency (average 5-fold decrease in IC 50 ) and neutralization breadth [97] .
To follow-up a study demonstrating that affinity-purified IgGs from rabbits immunized with N35 CCG N13 inhibited HIV-1-mediated fusion [156] , Nelson et al. rescued a ScFv antibody library from these animals [99] . Three N35 CCG N13 binders were selected, and one of them, 8K8, displayed neutralizing activity against HXB2. In parallel a more complex Fab library was constructed from the FDA-2 HIV-1positive patient from whom Z13 Ab had previously been isolated [85] . Screening this library against N35 CCG N13 allowed for the isolation of Fab DN9 [99] . Both ScFv 8K8 and Fab DN9 neutralized HIV-1 infection with a panel of viral strains with IC 50 ranging from 50 to 500 nM and targeted the NHR trimeric coiled-coil, presumably close to the hydrophobic pocket. Three additional gp41-specific Abs (M44, M46 and M48) were obtained by screening antibody phage libraries from asymptomatic seropositive patients [159] against gp140 [100] [101] [102] . A recombinant gp140 (gp140 R2) isolated from an asymptomatic seropositive patient with BNtAbs was reported to elicit BNtAbs in monkeys, further demonstrating that immunogenic epitopes were exposed on this recombinant antigen [101] . Competitive antigen panning (CAP) (biopanning approach designed to outcompete phagotopes binding to an immunodominant region of a multi-domain target through concomitant addition of an excess of soluble forms of this immunodominant domain) using a mixture of gp140R2 as antigen and gp120R2 as competitor resulted in the selection of a gp41-specific M46 Ab [101] . M46 displayed broad neutralization properties and recognized a conformational epitope and bound weakly to 5-Helix antigen but not to the trimeric NHR nor to 6-HB. In two other studies, the same libraries were panned against gp140/120 from three different isolates (89.6, cm243 and R2), which led to the identification of the M48 Ab recognizing a conformational epitope of gp140 [102] and the M44 Ab, which binds gp140, 5-Helix and 6-HB but not to the NHR trimeric coiled-coil. M44 recognized a conserved conformational epitope and neutralized isolates from different clades with a significantly higher potency than 4E10 or Z13 [100] . The competitive antigen panning approach against gp140/120 thus allows the selection of Abs recognizing conformational epitopes on gp41, which are not properly folded when gp41 is used as a target.

Other HIV-1 Proteins

Most of the studies found in the literature that apply the phage display technology to the discovery of HIV-1 inhibitors target the Env protein. However, reports about the identification of peptides directed against other HIV-1 proteins involved in viral replication as well as interfering with RNA sequences have been published and are summarized in this section.

Viral Protein of Regulation (Vpr)

Vpr is involved in the nuclear import of the viral preintegration complex (PIC) as well as in the induction of apoptosis after cell cycle arrest and can be packaged into virions in quantities similar to the structural proteins [160] [161] [162] [163] . Vpr was also reported to be associated with numerous cellular proteins such as glucocorticoid receptors, transcription factors or the uracyl DNA glycosylase (UDG) [160, [164] [165] [166] .
In 2003, Krichevsky et al. conducted a study to elucidate the exact role of Vpr and its contribution to the nuclear import process of the HIV-1 PIC [104] . To that aim, a semi-synthetic ScFv library [167] was screened against the N-terminal (AA 17-34) part of Vpr (VprN) conjugated to BSA (VprN-BSA). Purified ScFvs fragments featuring their strong and specific binding to the VprN sequence recognized full-length Vpr and inhibited Vpr-mediated nuclear import, indicating that targeting Vpr may lead to the development of new peptides to fight viral infection.

Integrase (IN)

The phage display technology has also been applied to the identification of the HIV-1 integrase inhibitors. In 2004, Desjobert et al. screened a 7-mer RPL against recombinant HIV-1 integrase and identified a high affinity phagotope displaying the FHNHGKQ sequence [105] . In peptide format, this sequence inhibited the strand transfer activity of IN by competing with the target DNA, providing the proof-of-concept that IN is also a valuable target for phage display.

Transactivator of Transcription (Tat)/Transactivation Response element (TAR)

Interaction of the viral transcription activator Tat with the human cyclinT1 subunit of the positive transcription elongation factor (P-TEFb) complex and the cooperative binding of this complex to the transactivation response element (TAR) RNA are prerequisites of HIV-1 transcription [168] . Screening RPLs or Fab libraries against Tat, cyclinT1 or TAR elements using the phage display technology identified peptides impairing Tat-mediated HIV-1 replication.
The first study was conducted in 1996, when Pilkington et al. screened a Fab library constructed from the Ab repertoire of an HIV-1-infected asymptomatic patient and selected Fabs recognizing a region comprised between amino acids 22 to 33 of the Tat protein in a conformation-dependent manner [106] .
Many years later, a non-immune human ScFv phage-displayed library was explored to identify peptides binding to cyclinT1 [107] . Clones recognizing the cyclin box domain of cyclinT1 or interacting with the Tat/TAR recognition motif (TRM) were isolated after panning against the 272 N-terminal amino acids of cyclinT1. When expressed as intrabodies (antibody or antibody fragment expressed intracellularly), one of these ScFvs inhibited Tat-mediated transactivation without impairing cellular basal transcription or inducing apoptosis and partially inhibited HIV-1 replication in cultured cells.

Nucleocapsid (NC)/Packaging Signal (psi) Sequence

The HIV-1 nucleocapsid protein p7 (NCp7) is processed from the Gag precursor and is involved in the protection and encapsidation of viral RNA leading to viral assembly through interaction with a specific secondary structure of the 125-base long psi RNA [169] [170] [171] [172] . Lener et al. screened a constrained 9-mer RPL against NCp7 and selected phagotopes sharing a PPx(D/E)R consensus motif [109] . Further binding experiments suggested that the NCp7-phage interactions involved amino acids 30 to 52 of NCp7, encompassing a zinc finger domain.
A similar screening campaign was conducted, where the 5'-end of the psi RNA was covalently immobilized, leaving the secondary structure intact and fully accessible [112] . Screening of a 12-mer RPL selected for four clones with either WHxT or HSSxY motifs which were assessed for specific and dose-dependent binding to psi RNA. The most prevalent sequence (SYQWWWHSPQTL) was expressed in fusion with the maltose-binding protein and was able to compete with NCp7 for binding to psi RNA, confirming the value of the peptide as a potential HIV-1 inhibiting compound.

Negative Factor (Nef)/Virion Infectivity Factor (Vif)

HIV-1 accessory proteins Nef and Vif have an important role in HIV-1 viral replication and infectivity and, as such, represent as such interesting targets for inhibitors. In 2001, Yang et al. demonstrated that Vif was able to multimerize and that its 151-AALIKPKQIKPPLP-164 domain was critical for multimerization pointing to it as an interesting target to impair Vif-mediated viral replication [116] . The authors therefore, screened a 12-mer RPL library against Vif and selected phages sharing a common PxP motif [117] . Four of these sequences synthesized as peptides bound the C-terminus of Vif with high affinity and were able to inhibit Vif-Vif as well as Vif-Hck tyrosine kinase interactions. Moreover, these peptides inhibited HIV-1 replication in cultured cells.
A nanobody (sdAb19) recognizing a conformational epitope and reacting with a high affinity (K D : 2 nM) with Nef proteins from a panel of HIV-1 M, N, O and P groups was isolated through phage displaying the V HH repertoire of a llama immunized with a purified recombinant Nef protein (fragment 57-205) [114] . When expressed as an intrabody, this anti-Nef sdAb inhibited important biologic functions of Nef both in vitro and in vivo in CD4C/HIV-1Nef transgenic mice.

Reverse Transcriptase (RT)

Reverse transcriptase is a valuable target for anti-HIV-1 compounds, as illustrated by the success of the multiple small compounds used in HAART. In 1996, Gargano et al. panned a phage-displayed library of synthetic combinatorial human Fab fragments against recombinant HIV-1 RT [174] . Two Ab fragments that specifically inhibited the RNA-dependent DNA polymerase (RDDP) activity of RT were identified. Both fragments also inhibited the activities of avian and murine retroviral RTs as well as the human DNA polymerase α and prokaryotic DNA polymerases. Because of their lack of specificity, these Abs fragments were not exploited further as anti-HIV-1 molecules.
Two years later, a semisynthetic phage display library of human ScFvs with randomized heavy and light chain CDR3 was screened against recombinant RT [120] . Five different ScFv Abs directed against RT were isolated, of which three (F-6, 6E9, 5B11) inhibited the RDDP activity of RT; of note, (F-6) also inhibited RT DNA-dependent DNA polymerase (DDDP) activity. Synthesis of the peptides corresponding to the CDR3 regions of the heavy and light chains showed that the heavy chain CDR3 inhibited RDDP activity while the light chain peptide had no effect. These HCDR3 peptides represent the smallest antibody fragments inhibiting the RT identified to date and demonstrated that HCDR3 repertoire is a potential source of bioactive molecules (see Section 3.2.1.2.).

Regulator of Virion Expression (Rev)

Rev is a key regulatory protein. Oligomerized Rev binds to unspliced or singly spliced viral mRNA and ensures its transport to the cytoplasm, thereby allowing the translation of viral gene products. Despite considerable efforts, the structure of Rev is poorly characterized since Rev is refractory to crystallization, mainly because of its tendency to form insoluble aggregates [175] . In the absence of structural information, the phage display technology was used by different authors to map the domains involved in the interaction of Rev with its network of partners. Pilkington et al. identified two Rev-specific Fabs from a Fab library derived from the Ab repertoire of an HIV-1-infected asymptomatic patient [106] . These Fabs were directed against sites adjacent to the Rev basic nuclear localization signal (NLS) (residues 52-64) and to the activation domain (residues 75-88). Two years later, Jensen et al. screened a 15-mer RPL to identify potential Rev peptidic antagonists [121] . Three groups of sequences sharing a SRLxG(x) 2-3 R motif (group I), sharing a RVV(x) 2-4 RG/A motif (group II) or featuring no sequence similarity (group III), were obtained. Three clones were selected based on their high frequency of occurrence (p1 and p3, group I) or on their strong binding affinity for Rev (p19, group III). They were synthesized as peptides and were shown to retain Rev binding specificity. More recently, llama nanobody libraries from animals immunized with recombinant Rev allowed the identification of 12 Rev-binding nanobodies [123] . One of them (Nb190) prevented or disrupted Rev multimerization by interacting with Lys20 and Tyr23 of the Rev N-terminal α-helix [122] . Besides inhibitor discovery, Fabs were recently proposed as -crystal chaperones‖ to support crystallization of their partners by locking them in specific conformations and blocking aggregation [124] . Stahl et al. described the preparation, characterization, and crystallization of an equimolar complex formed between Rev and a chimeric rabbit/human Fab (SJS-R1) selected through phage display [124] . The Rev/SJS-R1 Fab complex was successfully crystallized and the Fab SJS-R1 was shown to recognize a conformational epitope in the N-terminal half of Rev. Structural characterization of the crystallized Fab/Rev complex is ongoing and a corresponding ScFv has been engineered and may have anti-HIV-1 properties.

Group-Specific Antigen (Gag)

To identify peptides interfering with HIV-1 capsid assembly, Sticht et al. screened a 12-mer RPL against the capsid (CA) protein generated by the proteolysis of the Gag precursor and identified phagotopes whose sequences could be classified in four groups [125] . One of these sequences (CAI, capsid assembly inhibitor) competed with phagotopes for binding to CA and inhibiting capsid assembly in vitro. Interaction with CAI was mapped to CA amino acids , with additional contacts in helix 4. CAI did not inhibit capsid assembly in vivo, but may nevertheless serve as tool for drug screening and as a starting point for drug design based on its CA-binding properties.

Diagnostic Applications

Peptides and antibody fragments selected by means of phage display may also be used for diagnostic purposes or to assess the diversity of the immune response against HIV-1-specific antigens.
De Haard et al. constructed a ScFv library from PBLs of an HIV-1 positive patient presenting antibodies against gp120, gp41 and p24 and screened the library against gp160 and p24 [176] . One phagotope recognizing an epitope within the 2F5 and 4E10 BNtAbs epitopes on gp41 (AB#31) with affinities in the nanomolar range was isolated. Importantly, it was shown to compete with 41 out of 42 gp160-reactive plasma samples from North-American and African HIV-1 positive patients, indicating that this antibody recognizes an epitope conserved in a large panel of isolates and might be suitable for diagnostic applications.

Inhibitors of Host Proteins

In parallel to the targeting of viral proteins, many efforts were undertaken to identify peptides, antibody fragments or modified ligands binding to the HIV-1 host proteins and impairing their interactions with the viral proteins (Table 5 ).

Host receptors inhibitors

The HIV-1 host receptors CD4, CCR5 and CXCR4 are involved in the early steps of HIV-1 infection and thus represent valuable targets for the identification of antiviral peptides or neutralizing Abs. Moreover, these receptors display very low variability compared to the viral Env proteins facilitating the identification of neutralizing antibodies. Although the CD4 receptor plays a crucial role in the entry process, the only phage display biopanning assays reported to date targeted the chemokine receptors CCR5 and CXCR4. However, to circumvent the difficulties of purifying and immobilizing such complex receptors on a solid support without losing their native structure, biopanning procedures had to be adapted. In this regard, screening strategies using biopanning on living cells [178, 179] , proteoliposomes [180] or peptides derived from the receptors extracellular parts [181, 182] were particularly successful.

CCR5 Coreceptor

The CC chemokine receptor 5 (CCR5) is one of the two major HIV-1 coreceptors and binds three different endogenous chemokines CCL5 (RANTES), CCL4 (MIP-1β) and CCL3 (MIP-1α) which were reported to prevent R5-tropic HIV-1 entry. Interestingly, inhibition of CCR5 binding to HIV-1 provides an almost complete protection against R5-tropic viruses with only minor effects on the normal physiological functions of the cells [183] .
The first biopanning experiment targeting CCR5 was performed using receptor embedded in paramagnetic proteoliposomes [180] . To create such proteoliposomes, magnetic beads were added to a mixture of synthetic lipids, a detergent-solubilized C9-tagged CCR5 receptor and a capture antibody, reconstituting membrane bilayers containing pure, native and properly oriented CCR5 receptor. These proteoliposomes were used in biopanning experiments with a human ScFv antibody library and several antibody fragments specifically binding to CCR5-expressing cells were identified. The same year, Steinberger et al. used the phage display technology to select and to humanize rabbit anti-CCR5 antibodies preventing the export of CCR5 to the cell surface [184] . Following rabbit immunization with a GST-Nterm CCR5 fusion protein, the authors constructed a phage displayed Fab library that was screened against the antigen initially used for the immunization. A phagotope (ST6) binding strongly and specifically to the immobilized antigen as well as to CCR5-positive cells was identified, expressed as a ScFv and humanized by successive replacements of the rabbit light and heavy chains by their human counterparts. One humanized antibody fragment, ST6/34, that retained the strong CCR5-binding capacity of the parental ST6 antibody was isolated from the screening of the intermediate libraries. When expressed as intrabody the ST6/34 scFV efficiently blocked the CCR5 expression at the cell surface. [185] (reviewed in Chevigné et al. [191] . In this strategy, a phage library displaying randomly mutated and N-terminally extended CCL5 chemokine variants (xS#xSSx###-CCL5, where # represents either A, P, S or T) was constructed and screened on CCR5-expressing cells. Only intracellular phagotopes that had induced CCR5 receptor internalization were recovered. Two CCL5 variants (P1 = LSPVSSQSSA-CCL5 and P2 = FSPLSSQSSA-CCL5) were identified. These variants displayed a higher selectivity for CCR5 and had more potent HIV-1 inhibitory abilities than the wild-type CCL5 chemokine. Further characterization demonstrated that P2 acted as a CCR5 superagonist and potently induced intracellular CCR5 sequestration. P1 was less potent but significantly reduced CCR5-dependent intracellular calcium signaling. In a subsequent study, P1 and P2 variants were optimized by phage display biopanning to select for variants that retained the high anti HIV-1 potency of P1 and P2 but reduced CCR5-agonist activity [186] . Three successive generations of libraries (xxPx 3 Q#TP-CCL5, QGPPLMx 4 -CCL5 and QGPΨ$x 5 -CCL5 where Ψ represents G, L or P and $ represents G, L or M) were constructed and screened as previously described [185] . The three most interesting candidates (5P12-CCL5, 5P14-CCL5 and 6P4-CCL5) produced as soluble proteins displayed highly potent antiviral activities. Analogue 6P4-CCL5 acted as an agonist and sequestered CCR5, 5P12-CCL5 induced no signaling or receptor sequestration while 5P14-CCL5 induced CCR5 internalization without triggering G-protein signaling. Altogether these data demonstrated that antiviral activities of similar molecules identified through phage display screening can rely on various mechanisms of action.
Screening of RPL was also applied to the discovery of small CCR5-blocking peptides through the targeting of receptor-expressing cells [178, 179] . Vyroubalova et al. screened a partially randomized 10-mer phage library (CDx 3 KPCALLRYx 10 -PIII) using competitive elution with a CCL5 analogue (NNY-CCL5, 100 nM) and selected a unique peptide (ALLRYNPFYYLSFSP). This peptide was further optimized through N-terminal extension, exon shuffling and biopanning. By applying successive treatments consisting of a preselection using a low amount of NNY-CCL5 (100 pm) to discard low-affinity binding phages followed by classical alkaline elution using TEA and a competitive elution containing NNY-CCL5 (1 µM) they identified an extended peptide (LLDSTFFTADALLRYNPFYYLS-FSP) inhibiting R5-tropic HIV-1 cell fusion with an IC 50 of 5 μM. In parallel, Wang et al. screened a fully randomized 12-mer RPL using acidic elution and identified phagotopes binding specifically to CCR5-expressing cells and sharing the AFDWTFVPSLIL sequence [178] . In peptide and phage formats this sequence blocked the binding of the anti-CCR5 neutralizing 2D7 MAb and completely inhibited binding of the chemokine CCL5 to the receptor [179] .

CXCR4 Coreceptor

The CXC chemokine receptor 4 (CXCR4) is the second major HIV-1 coreceptor. CXCR4 binds only to one endogenous chemokine ligand (CXCL12) and is also expressed at the surface of numerous cancer cell types underlining its high value as therapeutic target [192, 193] . Despite this importance and the relative success of phage biopanning on CCR5, only two recent studies reported the use of phage display to search CXCR4 inhibitors [182, 187] .
In 2010, Jahnichen et al. isolated llama-derived V HH binding specifically to CXCR4 and inhibiting the entry of X4-tropic virus [187] . To select V HH binding exclusively to functional and properly folded receptor, llamas were immunized with CXCR4-expressing HEK293T cells. A phage library was subsequently constructed from the PBMCs of immunized camelids and several phage clones inhibiting the binding of labeled CXCL12 chemokine to the receptor were identified. In particular, two V HHS (238D2 and 238D4) showed low nanomolar affinity for the receptor and inhibited entry of X4 and X4/R5-viruses into different CXCR4 + cell types with IC 50 values ranging from 10 to 100 nM. Dimerization of 238D2 and 238D4 to form biparatopic proteins increased their antiviral properties to IC 50 values in the picomolar range. Epitope mapping revealed that the two V HH s inhibited CXCR4 mainly through binding to the second extracellular loop.
Very recently, we used a peptide corresponding to this particular extracellular loop (ECL2) as target to identify short CXCR4 antagonists [182] . By screening a non-immune phage library displaying the human HCDR3 peptide repertoire [194] , several small peptides binding to the ECL2 peptide that specifically recognized CXCR4-expressing cells were identified. Notably, one of these HCDR3 peptides (TYPGRY) acted as a CXCR4 antagonist with potency in the micromolar range.

Other Host Protein Inhibitors

In addition to targeting host receptors, a few studies reported the identification of peptides or antibody fragments directed against other host proteins such as cell surface determinants (CD) or intracellular enzymes [188, 190, 195, 196] . The CD40 receptor, a member of the tumor necrosis factor receptor superfamily, and its CD40L ligand are involved in several biological processes including cell proliferation, activation and production of cytokines and chemokines. During HIV-1 infection, viruses were proposed to selectively downregulate or even deplete the pool of CD40L-expressing CD4+ T cells. In this context, antibodies binding to CD40 and restoring the HIV-1-induced CD40L downregulation might be of interest. In 2002, Ellmark et al. identified a set of anti-CD40 antibody fragments through biopanning of a human ScFv phage library against a biotinylated CD40 antigen [189] . When expressed as full-length IgG1 one antibody (clone B44) suppressed HIV-1 infection by a R5-tropic virus, most probably through induction of CC chemokine production [188] . More recently, the DDX3 protein, a cellular RNA helicase involved in RNA unwinding was shown to play important roles in HIV-1 replication. This protein presents a unique region (ALRAMKENG) responsible for high affinity binding to the HIV-1 RNA. Garbelli et al. carried out a biopanning experiment with a mix of linear 12-mer and linear and constrained 7-mer RPL against a peptide mimicking the DDX3 region interacting with HIV-1. They identified a 7-mer peptide (SDVPTQV) blocking the replication of an X4-tropic virus with an IC 50 of 20 μM [190] .

Phage Substrate

The use of phages for protease cleavage specificity profiling was first described by Matthews and Wells in 1993 [197] . This -phage substrate‖ approach relies on the use of phage particles to screen for enzyme substrates instead of classical binder selection. Protease cleavage profiling using phage takes advantage of the natural resistance of phage particles to proteolysis. Phage particles displaying random peptides are immobilized on a solid support and submitted to proteolytic elution to specifically liberate phages presenting peptides corresponding to the protease cleavage site. The phage substrate approach allows thus to rapidly determine the cleavage profile of a given protease and provides optimized substrate candidates which can be further used as leads for the development of specific inhibitors. Over the last two decades, phage substrate has been applied to a large variety of proteases including the HIV-1 protease (PR) [198, 199] . The HIV-1 PR is a homodimeric aspartic protease responsible for nine critical cleavage steps within both the structural (Gag) and the non-structural (Gag/pol) polyproteins. The HIV-1 protease recognizes substrate residues encompassing P4 to P3' positions (Schechter and Berger's nomenclature), with the primary determinants from positions P2 to P2' positions [200, 201] . Interestingly, alignment of the nine natural substrate cleavage sites of the HIV-1 PR shows a high sequence diversity suggesting a broad proteolytic specificity. In 2000, Beck et al. reported the use of hexapeptide phage library to unravel the HIV-1 PR specificity and develop new protease inhibitors (Table 6 ) [199] . This library was constructed by fusing the Mab 3-E7 epitope upstream of the randomized sequences. Phages were first incubated with the HIV-1 PR and uncleaved phages were removed by addition of pansorbin cells. Biopanning selected for highly diverse sequences consistent with the suggested broad substrate specificity of the HIV-1 protease. However, none of the selected peptides corresponded to the HIV-1 polyproteins cleavage sites.
Nonetheless, several phage-displayed peptides were very efficiently cleaved by the HIV-1 protease. In peptide format, most of them displayed a K m value lower than the one determined for a peptide mimicking the natural substrate at the matrix/capsid junction (IRKIL↓FLDG) ( Table 6 , italic). The most potent selected substrate (GSGIF↓LETSL) was cleaved 60 times more efficiently and had a K m value of 5 μM i.e., 260 times lower than the natural substrate (K m = 1300 μM). Interestingly, the GSGIF↓LETSL substrate displayed only two residues (in bold) of the optimal cleavage site model designed based on the most frequently selected residues for each position (SGVY↓FVTS) ( Table 7) . Table 6 . Phage substrate analysis of the HIV-1 protease cleavage specificity. Arrow denotes cleavage sites in the natural or the selected substrates sequences. Underlined substrates correspond to sequences used to derived inhibitors. Italic sequences correspond to natural substrate (matrix/capsid junction) used as reference for the assessment of the cleavage efficiently. Ψ: amid-reduced bound. Table 7 . HIV-1 protease specificity model.

Target

Sequences corresponding to the four most potent substrates were synthesized as peptidic transition state analogues and presented inhibitory activity in the nanomolar range (Table 6 , underlined sequences). In 2001, the same authors applied the phage substrate approach to compare the relative specificities of the human (HIV-1) and feline (FIV) Immunodeficiency virus PRs [198] . Hexapeptides specifically processed by each of the two proteases as well as peptides cleaved by both enzymes were identified. Further mutational analysis of synthetic peptides derived from a sequence processed with the same efficiency by the two proteases (KSGVF↓VVNG) was performed to assess the influence of amino acid substitutions on the catalytic process of each PR (Table 6 , underlined sequence). They showed that substitutions for a Val at position P2 or P2' increased the cleavage by the HIV-1 protease whereas the introduction of a Val at position P1 was more favorable to FIV protease activity. In particular, the GSGVFΨ(CH 2 NH)VVNGL inhibitor identified in the first study was active against both PRs with different potencies and replacement of its amide group by a hydroxyethylene group resulted in a peptide with equivalent inhibitory activity towards both the HIV-1 and the FIV proteases.

Phage particles as HIV-1 Antigen Carriers

In parallel to epitope mapping, inhibitor discovery and enzyme profiling applications, bacteriophage particles were exploited as carriers for the development of anti-HIV-1 vaccines. For this particular purpose, phages are not used as affinity selection tools but rather to display antigens to the immune system, aiming to elicit specific neutralizing antibodies or/and efficient cytotoxic T-cell responses. Indeed, phages can also be considered as biologically inert particles characterized by a dense and repetitive organization capable of displaying a wide range of exogenous proteins at a precise valency and in a controlled manner. Moreover, phages were shown to be naturally immuno-stimulatory [202, 203] and are particularly affordable, easy and rapid to produce and to administer in many animal models [204] . Altogether these characteristics make bacteriophages a suitable and valuable carrier for HIV-1 vaccine development.
Anti-HIV-1 vaccination trials to elicit humoral and cellular responses against various HIV-1 proteins such as the envelope proteins gp120 and gp41 as well as the RT and the p17 proteins [205] [206] [207] [208] [209] were conducted with a large diversity of bacteriophages including M13, T4, MS2 and lambda phages [205, [210] [211] [212] [213] ( Table 8) .
Numerous phage-vaccination trials using mimotopes identified on anti-HIV-1 antibodies were reported (See Tables 1-3 ). All these attempts aimed mainly at eliciting antibodies directed against the genuine epitope of an existing antibody. Besides these descriptions, a few studies were exclusively dedicated to the use of phage particles as vehicles to display fragments or complete HIV-1 proteins to the immune system to prime new immune responses. The pioneer attempt was reported by Minenkova et al. in 1993 [205] . Immunization of rabbits with filamentous M13 phage particles displaying a small peptide (GEDRW) derived from the p17 Gag protein in fusion with pIII minor coat protein elicited the production of specific IgGs reacting with the natural p17 antigen as well as with the Gag precursor protein. Shortly afterwards, two studies reported that mice immunization with fd phages displaying the gp120 V3 loop fused to the pIII protein induced high titers of antibodies cross-reacting with V3 loops of different strains and featuring neutralizing activity [207, 210] . The major coat protein pVIII of the fd phage (2700 copies per particles) was also investigated as scaffold protein for vaccination. Immunization with phages displaying a pVIII-fused peptide corresponding to residues 309-317 of the HIV-1 RT resulted in the induction of a specific cytotoxic T-cell response (CTL) against the displayed peptide. Priming nevertheless required a co-immunization with a T-helper epitope from the same protein (KDSWTVNDIQKLVGK) provided by either the same or a separate phage particle [206] . More recently attempts to prime CTL response were conducted with M13 phages displaying a mixture of thousands of variants of CTL epitopes (RGPGxAx 4 or xGxGxAxVxI) derived from the gp120 V3 loop (residues 311-320; RGPGRAFVTI) presented in an immunoglobulin V H domain scaffold [215] . Mice immunization provided potent and broad epitope-specific long lasting response (12 months) and effector memory T cells were induced. Moreover, recent studies demonstrated that mice immunized with these variable epitope libraries are capable of neutralizing half of the subtype B viral isolates used for challenge, including HIV-1 isolates which are known to be resistant to neutralization by several potent monoclonal antibodies [218] .
Moreover, fusion of different antigens to SOC and HOC offer the possibility to develop multicomponent HIV-1 vaccine particles [216] T4 phage displaying V3 loop of the gp120 protein fused to SOC protein was reported to elicit antibodies capable of recognizing the native antigen [214] . Further study demonstrated that HOC protein can also be used to display various HIV-1 antigens including the p24, Nef or a trimeric peptide derived from the C-terminus of gp41 [216] . More recently, lambda phage and its decorating capsid protein were used for dense display of glycosylated mammalian cell-derived trimers of gp140 protein [213] . Lambda phage provided a display level of 30 copies of gp140 trimer per particle, a 20-fold higher display than observed on native HIV-1 virions (14 ± 7 spikes per virion) [220] . Rabbit immunization trials were rather disappointing and higher antibody titers were elicited in animals receiving soluble oligomeric gp140. These results were proposed to be due to the sequential immunization process and to the strong and immunodominant response against phage capsid proteins, which is most probably boosted upon sequential immunization, and could lead to a decrease of the humoral immune response to the displayed antigen.
Virus like particles (VLP) derived from the MS2 bacteriophage coat protein displaying the V3 loop of gp120 and devoid of phage genome were also reported [212] . Although characterized by a low display level (90 copies per VLP) these particles were nevertheless able to elicit high titers of specific antibodies when injected in mice and provided neutralizing activity at 1/10 sera dilution. More recently, PP7 phage-derived VLPs displaying the V3 loop of gp120 were used to immunize mice providing high-titer antibody response [217] .

Conclusions and Future Challenges

AIDS was first described in 1981 [1] , and two years later HIV-1, its causative agent, was isolated [2] . The phage display technology was first published almost concomitantly in 1985 [12] . Both protagonists lived parallel lives until 1991, when Burton et al. identified a human Fab recognizing the CD4-binding site of HIV-1 gp120 through phage display [41] . This first antibody, b12, turned out to be one of the rare BNtAbs characterized to date, and many studies are still ongoing to set up scaffolds presenting its epitope in a conformation capable of eliciting Abs with b12-like HIV-1 inhibiting properties. This article was the starting point of a dense literature exploiting the different applications of the phage display technology to gain as much knowledge as possible on the HIV-1 infection process.
Although highly informative, immunization trials performed with mimotopes/phagotopes selected through phage display remained altogether rather disappointing as no long-lasting neutralizing response was reported, and provided yet further proof that antigenicity does not necessarily imply immunogenicity [221] . Among the bottlenecks in the field of HIV-1 vaccine research and development are the weak immunogenic properties of the identified mimotopes or native antigens. Whether the difficulty of eliciting and/or isolating strong or broad neutralizing antibodies from HIV-1-infected patients, be they normal Progressors or LTNPs, is a caveat of phage display or a real reflect of the restricted humoral response in fighting HIV, or to both, remains to be established. Nevertheless, these discrepancies are not exclusive of HIV-1 mimotopes and were observed with other antigens.
From a fundamental prospect, the use of phage-displayed IgM libraries to model Abs close to the germline Abs allowed to elegantly expore the poorly characterized determinants of the initial antiviral immune response. Phage display also contributed to a better knowledge on the structure of the diverse HIV-1 proteins as well as a gain of insight on non-structural proteins involved in replication mechanisms. A highlight example would be the phage substrate approach which allowed the precise characterization of the HIV-1 protease cleavage specificity and thereby provided valuable inhibitor candidates. Unraveling viral replication steps or protein interactions led to exploit phage display to engineer various types of inhibitors targeting either viral or host proteins.
Some of the most potent inhibitors originated from Fab libraries derived from asymptomatic HIV-1-infected patients whose reactivity against HIV-1 was previously assessed (b12, X5, Z13) or from immunized camelids (V HH anti CXCR4). Other inhibitors were identified from semi-synthetic (ligand analogues of CCL5) or randomized peptide libraries (D-peptides: PIE12 3 ) and their affinity was improved through secondary libraries. These inhibitors were selected against different targets (Env proteins, coreceptors) by biopanning carried out using different types of support (immobilized proteins, cells, peptides), illustrating the power and versatility of the phage display technology [41, 83, 85] (Figure 5 ). inhibitors blocking key steps in the entry process were identified using the phage display technology. These inhibitors target: the CD4 binding site (Fab b12 and Z13), the coreceptors CCR5 (CCL5 variants) or CXCR4 (V HH 238D2 and 238D4), the CD4-induced epitope of gp120 (Fab X5) or the heptad repeat region of gp41 (peptide PIE12 3 ).

Introduction

To better understand the reasons underlying the persistance of viral infection despite the strong and sustained immune response on the one hand, and to identify new protective immunogens, numerous studies were conducted to map the epitope landscape of both HIV-1-neutralizing and non-neutralizing antibodies isolated from infected patients. In parallel, the development of new molecules or antibody fragments capable of blocking either viral proteins or host receptors has been widely investigated.
Bacteriophages (phages) are bacteria-infecting viruses whose DNA or RNA genome is packed in a capsid composed exclusively of surface proteins. The principle of phage display relies on cloning of exogenous DNA in fusion with the phage genetic material allowing the display of foreign peptides in an immunologically and biologically competent form at the surface of phage capsid proteins [12] . The significance of phage display was first demonstrated for filamentous phages such as M13, fd or related phagemids and later extended to lytic bacteriophages λ, T4 and T7 (reviewed by Beghetto [13] ). The phage biopanning process consists of iterative cycles of binding, washing and elution steps leading to the progressive selection of phages displaying peptides/proteins binding to the target of interest [14] . The target is usually immobilized on a solid support which can be plastic, beads or even cells.

Antibodies Directed against Viral Proteins

Immunization attempts with ELDKWA peptides failed to elicit 2F5-like NtAbs, suggesting that the epitope necessitates additional residues in order to be immunogenic. Therefore, Menendez et al. screened a panel of 17 libraries of linear and constrained peptides against the MAb 2F5 and deduced that residues flanking the DKW core at the C-terminal side region were important for high-affinity binding to the MAb [25] . They subsequently constructed and screened two phage sublibraries displaying 12 random residues either upstream or downstream of the DKW core (x 12 -AADKW and AADKW-x 12 ) and isolated three peptides displaying high affinity for 2F5 from the AADKW-x 12 library. Ala substitution and deletion studies revealed that each clone bound 2F5 according to a different mechanism. This data led the authors to postulate that the 2F5 paratope was composed of two binding domains either recognizing the DKW core with strong specificity or multispecifically binding to the residues located at its C-terminus.
Based on this study, additional investigations were recently conducted on the BNtAb 2F5 epitope to assess the importance of structural constraints for MAb 2F5 recognition [26] . A linear 12-mer RPL and a constrained 7-mer RPL were screened against this antibody and all the sequences selected from the 12-mer RPL contained the D(K/R)W core motif, with flanking residues L, A and S present at different frequencies. Analysis of the sequence representation compared to their estimated probability of occurrence indicated a trend towards enrichment for sequences such as DKWA or LDKWA throughout panning of the 12-mer library, while all sequences selected from the constrained library contained DKWA or LDKWA. These results demonstrated that the strong epitope specificity postulated by Menendez [25] is only displayed when the epitope sequence is presented in a certain structural context provided in the constrained 7-mer peptides. Immunization studies performed with both linear and constrained forms of the peptide in mice and rabbits resulted in the inhibition of cell fusion only with sera of rabbits immunized with the linear peptide.

Gp120 C1 Domain

RPL screening may also contribute to the elucidation of the antigen structure. To that purpose, Stern et al. used a 20-mer RPL to analyze two different mouse MAbs (GV1A8 and GV4D3) recognizing non-overlapping sequences between residues 1 and 142 of gp120 [27, 40] . Biopanning performed on GV1A8 allowed for identification of mimotopes sharing a (L/I)W motif identical to residues 111-112 of the gp120 C1 domain and highlighted a HxxIxxLW motif compatible with two turns of an α-helix. Computer modeling confirmed that such a structure placed the residues recognized by GV1A8 contiguously on one face of the helix while other secondary structures did not. Similarly, biopanning on MAb GV4D3 yielded sequences with a trend towards an Nx 3 WxxD motif. The epitope maps to the FNMWKND sequence satisfying the helical motif FxxWxxD. In this study, the use of phage display not only predicted the α-helix structure of the C1 domain of gp120, but also pinpointed the contact residues defining the surface of the helix.

Polyclonal Antibodies Directed Against Viral Epitopes

In 2007, Humbert et al. investigated the immune response of eight LTNP patients presenting BNtAbs. By using linear and constrained RPLs they identified epitopes recognized by plasma IgGs captured on tosylactivated beads [54] . Each panning round consisted of a positive selection performed on LTNP IgGs followed by a negative selection on the IgGs of healthy donors. Homologies of some selected sequences to immunodominant regions such as the gp120 V3 loop or the gp41 GKLIC region were observed, as reported in previous studies [50, 52, 53] . Further homologies to linear motifs located near the V3 loop (NNNT), downstream of the ID GKLIC region (AVPW motif) and overlapping with the 2F5 BNtAb epitope (PPWx 3 W motif) were also identified. Additionally, the authors applied the 3DEX software to compare the phage insert sequences to HIV-1 protein structure files from the RCSB Protein Data Bank (www.pdb.org) [59, 60] . Phage pools corresponding to the linear V3 loop, GKLIC domain and WxxxW motif, as well as pools representing potential conformational epitopes, were selected for mice immunization assays, and elicited plasma-associated neutralizing activity against primary HIV-1 strains. The highest neutralizing ability was obtained with mice immunized with the V3 mimotopes, although immunization with potential conformational epitopes also provided a modest neutralizing response.

Identification of HIV-1 Inhibitors by Phage Display

As summarized in the first section of this review, phage-displayed RPLs are powerful tools to determine or to characterize MAbs as well as PAbs epitopes. Besides epitope mapping the phage display technology was also widely applied to the identification of HIV-1 inhibitors.

Fab Libraries

These studies were the first to demonstrate that recombinant Fabs (devoid of the typical IgG contamination residual of calpain cleavage) featured neutralizing activities similar to those of whole IgG. As the Fabs fragments were easier to produce and their smaller size allowed them to target binding sites that were not accessible to full-length Igs, this led to the construction of many Fab libraries to elucidate the immune response to HIV-1 and to identify therapeutic antibodies.