use prefix []or [-]not [+]and [=]has feature [!]exclude feature ie. 'interleukin-6 -animal +phenotypic =protein !tumor'

Displaying 4 papers, 1 pages, start at 1, 135 Hits
57 section matches

Abstract

Nipah (Nee-pa) viral disease is a zoonotic infection caused by Nipah virus (NiV), a paramyxovirus belonging to the genus Henipavirus of the family Paramyxoviridae. It is a biosafety level-4 pathogen, which is transmitted by specific types of fruit bats, mainly Pteropus spp. which are natural reservoir host. The disease was reported for the first time from the Kampung Sungai Nipah village of Malaysia in 1998. Human-to-human transmission also occurs. Outbreaks have been reported also from other countries in South and Southeast Asia. Phylogenetic analysis affirmed the circulation of two major clades of NiV as based on currently available complete N and G gene sequences. NiV isolates from Malaysia and Cambodia clustered together in NiV-MY clade, whereas isolates from Bangladesh and India clusterered within NiV-BD clade. NiV isolates from Thailand harboured mixed population of sequences. In humans, the virus is responsible for causing rapidly progressing severe illness which might be characterized by severe respiratory illness and/or deadly encephalitis. In pigs below six months of age, respiratory illness along with nervous symptoms may develop. Different types of enzyme-linked immunosorbent assays along with molecular methods based on polymerase chain reaction have been developed for diagnostic purposes. Due to the expensive nature of the antibody drugs, identification of broad-spectrum antivirals is essential along with focusing on small interfering RNAs (siRNAs). High pathogenicity of NiV in humans, and lack of vaccines or therapeutics to counter this disease have attracted attention of researchers worldwide for developing effective NiV vaccine and treatment regimens.

Introduction

Viral diseases like Avian/bird flu, Swine flu, Middle East respiratory syndrome coronavirus (MERS-CoV), Severe acute respiratory syndrome (SARS), Crimean-Congo haemorrhagic fever (CCHF), Lassa fever, Rift Valley fever (RVF), Marburg virus disease, Ebola, Zika, Nipah and Henipaviral diseases pose considerable risk of an international public health emergency, when these spread rapidly (Rizzardini et al. 2018) . After the recent emergency situations created by Ebola and Zika virus during past five years (Singh et al. 2016 , now Nipah virus disease outbreaks have created panic in the public. Ebola virus disease (EVD) outbreaks and epidemics (2014-2016) led a massive mobilization of researchers to seek new technologies in terms of developing efficient and rapid diagnostics, vaccines, therapies and drug targets to combat EVD and save lives of large human population across the globe. Like Zika, scientists are on the way to counter Nipah virus.
Nipah (Nee-pa) viral disease is a zoonotic infection and an emerging disease caused by Nipah virus (NiV), an RNA virus of the genus Henipavirus, family Paramyxoviridae, which is transmitted by specific types of fruit bats, mainly Pteropus spp. (Halpin et al. 2000; Vandali and Biradar, 2018) . NiV is a highly fatal virus posing potential threat to global health security. The Pteropus bats, viz., P. vampyrus, P. hypomelanus, P. lylei and P. giganteu, were associated with outbreaks of the Nipah viral disease in various countries of South and Southeast Asia, including Bangladesh, Cambodia, East Timor, Indonesia, India, Malaysia, Papua New Guinea, Vietnam and Thailand (Hayman et al. 2008; Sendow et al. 2010; Wacharapluesadee et al. 2010; Halpin et al. 2011; Hasebe et al. 2012; Yadav et al. 2012; Field et al. 2013; de Wit and Munster, 2015a; Majid and Majid Warsi 2018) . Fruit bats are the major reservoirs of the virus and it is the contact with such bats (infected) or intermediate hosts like pigs which are responsible for infection in man. It is to be remembered that various biologic as well as genetic features of various paramyxoviruses are retained by Nipah virus (Bellini et al. 2005) . Dependence on animal rearing as a source of additional income in many Asian countries is a predisposing factor for emergence of novel zoonoses like Nipah (Bhatia and Narain 2010) . Various studies reported that major factor responsible for emergence of NiV was thorough interaction between wildlife reservoir particularly fruit bats of the Pteropus spp. with animal population reared and managed under intensive conditions (Daszak et al. 2013) . The high fatality rate associated with Nipah disease and the lack of efficacious treatment and vaccines against it, classify it as a global threat (Epstein et al. 2006; Rahman and Chakraborty 2012) . The disease was recognized for the first time in 1998 in Kampung Sungai Nipah village, state of Perak, Malaysia. The causative agent was characterized and since then has been named as "Nipah virus (NiV)". The zoonotic potential of NiV was unknown before 1999 till Malaysia experienced Nipah viral outbreak. Such an outbreak had created alarming situation in the public health community globally as far as the potential of severe pathogenicity as well as viral distribution in widespread fashion are concerned (Chua 2012) . Considerable uncertainty exists about the patterns of Nipah virus circulation in bats and the epidemiological factors associated with its spill-over into pigs and horses (McCormack 2005) .

Transmission of the Nipah virus

Bats serve as reservoir hosts for several high risk pathogens, including Nipah, rabies and Marbug viruses. Such viruses are not associated with any significant pathological changes in the bat population (O'Shea et al. 2014; Schountz 2014) . Detailed studies are needed to understand the mechanisms of NiV transmission from bats-to-pigs, pigs-to-man, and from date palm sap to human and viral circulation between fruit bats, pigs and human beings. Fruit bats act as natural reservoir of Nipah viruses and among various outbreaks documented from different geographical parts of the globe these bats have been associated in one or other way for transmission of the virus and associated infection (Clayton et al. 2016; Yadav et al. 2018) . From bats, the virus has crossed its species-barrier frequently to several other species including man through spilled over transmission, but with limited transmission from person to person thereafter (Gurley et al. 2017) . Transmission of NiV to man occurs mainly in places where man, pigs and bats come in close proximity. People rear pigs for economic benefits and fruit bearing trees are also cultivated in and around the farm for shade. Bats of Pteropus spp. which are NiV reservoirs, are attracted by the fruits, hence NiV gets spilled over to pigs/animals and also to man. Infected pig meat travels across continents which led to transmission of virus from animals in one part of the world to people in another part of globe. This combination of close surroundings of fruiting trees, fruits-like date palm, fruit bats, pigs and man altogether form the basis of emergence and spread of new deadly zoonotic virus infections like Nipah (Pulliam et al. 2012) .

Introduction

Encephalitis (acute) along with high mortality is the main manifestation of infection due to NiV. Apart from this there may be development of pulmonary illness and sometimes the infection may be asymptomatic in nature (Kitsutani and Ohta 2005) . Myoclonus (segmental) along with tachycardia may become evident. The involvement of brain stem, which locates the major vital centres, is probably responsible for death and mortality may vary between 32% and 92%. From a diagnostic point of view serology is quite helpful but discrete, high signal lesions can be visualized best by fluid-attenuated inversion recovery (FLAIR) where the effect of cerebrospinal fluid (CSF) is reduced, so that an enhanced MRI image can be obtained (Arif et al. 2012) . Nipah virus was first isolated in 1999 (Farrar 1999; . Gene sequencing of the isolates showed that the outbreak involved two different NiV strains, probably with different origins (AbuBakar et al. 2004 ). The clinical signs and symptoms of the NiV disease include fever along with laboured breathing, cough and headache. Encephalitis along with seizures are the complications involved (Broder et al. 2013) . Survivors of NiV infection develop symptoms of neurological malfunction such as encephalopathy, cerebral atrophy, change in behavior, ocular motor palsies, cervical dystonia, weakness and facial paralysis, which remain for several years (Sejvar et al. 2007) . Despite an increasing risk, rigorous studies that collate data from Nipah infections of pigs, bats and humans have been scarce (Hsu et al. 2004; Chadha et al. 2006; Pulliam et al. 2012) . Serosurveillance studies in multiple host species may yield important insights into NiV epidemiology (Weingartl et al. 2009; Li et al. 2010; Rockx et al. 2010; Pallister et al. 2011; Fischer et al. 2018) .
The NiV belongs to the Henipavirus genus under the family Paramyxoviridae. This genus alsocontains Cedar virus (CedPV) and Hendra virus (HeV). Molecular studies have significantly improved our understanding of the genetic diversity of Henipaviruses (Wang et al. 2001; Rockx et al. 2012) .The almost annual occurrence of Henipaviruses in South-Eastern Asia and Australia since the mid 1990s is noteworthy. In Australia alone 48 cases of Hendra viruses and in south eastern parts of Asia 12 outbreaks of Nipah viruses have been reported which not only hit the health sector but also the economic stability of these nations (Aljofan, 2013) . There have been a total 639 human cases of NiV infection reported from Bangladesh (261 cases), India (85 cases), Singapore (11 cases), Philippines (17 cases) and Malaysia (265 cases), with a mortality rate of about 59% (Ang et al. 2018) . This points to the survival efficiency of NiV in nature and the history on its species jumping/host adaptation pattern adds to the public health concerns posed by this virus. Detailed studies and clinical therapeutic trials on various animal models such as guinea pigs, hamsters, ferrets, cats, pigs and African green monkeys are being investigated for Henipaviruses . The Nipah disease outbreak in 2001 in Siliguri and latest in Kerala have emphazised the need for an efficacious vaccine against it. Moreover the virus imposes threat to health of public (Sharma et al. 2018) . Enhanced monitoring and surveillance for Nipah infection and the development of an efficacious vaccine are the needs of the hour.

Nipah virus

Nipah virus (NiV) is a paramyxovirus (Henipavirus genus, Paramyxovirinae subfamily, Paramyxoviridae family, order Mononegavirales), an emerging virus that can cause severe respiratory illness and deadly encephalitis in humans. It is a negative sense, singlestranded, nonsegmented, enveloped RNA virus possessing helical symmetry. The RNA genome, from the 3-5, contains consecutive arrangement of six genes, viz., nucleocapsid (N), phosphoprotein (P), matrix (M), fusion glycoprotein (F), attachment glycoprotein (G) and long polymerase (L). The N, P and L attached to the viral RNA forming the virus ribonucleoprotein (vRNP). F and G proteins are responsible for cellular attachment of the virion and subsequent host cell entry (Ternhag and Penttinen 2005; Ciancanelli and Basler 2006; Bossart et al. 2007 ). The newly produced precursor F protein (F0) is cleaved into two subunits, viz., F1 and F2, by host protease. The fusion peptide of the virus contained in the F1 subunit drives the viral and host cellular membrane fusion for the virus entry (Eaton et al. 2006) . The virus M protein mediates morphogenesis and budding. Antibody to the G protein is essential for neutralization of the NiV infectivity (Bossart et al. 2005; White et al. 2005) . It is quite noteworthy that through the coordinated efforts of the fusion (F) (class I) and attachment (G) glycoproteins the target cell (i.e. host cell) is entered upon after binding by the enveloped Henipaviruses including NiV. Interactions between Class B ephrins (viral receptors) on host cells and the NiV glycoprotein (G) trigger conformational changes in the latter, leading to activation of F glycoprotein and membrane fusion (Steffen et al. 2012) . It is believed that the strategies of replication as well as fusion of the ephrin receptors are responsible for greater pathogenicity of these viruses. Multiple accessory proteins encoded by Henipaviruses aid in host immune evasion .
Nipah virus can survive for up to 3 days in some fruit juices or mango fruit, and for at least 7 days in artificial date palm sap (13% sucrose and 0.21% BSA in water, pH 7.0) kept at 22 C. The virus has a halflife of 18 h in the urine of fruit bats. NiV is relatively stable in the environment, and remains viable at 70 C for 1 h (only the viral concentration will be reduced). It can be completely inactivated by heating at 100 C for more than 15 min (de Wit et al. 2014) . However, the viability of the virus in its natural environment may vary depending on the different conditions. NiV can be readily inactivated by soaps, detergents and commercially available disinfectants such as sodium hypochlorite (Hassan et al. 2018) .

Transmission of the Nipah virus

NiV transmission occurs via consumption of viruscontaminated foods and contact with infected animals or human body fluids. Risk factors include close proximity viz., touching, feeding or attending virus infected person, thus facilitating contact to droplet NiV infection. Recently, experimental studies with aerosolized NiV in Syrian hamsters revealed that NiV droplets (aerosol exposure) might play a role in transmitting NiV during close contact (Escaffre et al. 2018) . Three transmission pathways of the Nipah virus have been identified after investigation carried out in Bangladesh. Consumption of freshdate palm sap is the most frequent route, with the consumption of tari (fermented date palm juice) being a potential pathway of viral transmission. NiV infection associated with tari can be prevented by prevention of the access of bat to date palm sap (Islam et al. 2016) . Studies using infrared camera revealed that the date palm trees are often visited bats like Pteropus giganteus and during the process of collection of the sap, bats lick them. The virus can survive for days in sugar-rich solutions, viz., fruit pulp (Fogarty et al. 2008; Khan et al. 2008) . The Nipah viral outbreak reported from Tangail district, Bangladesh was found to be associated with drinking of raw date palm sap. Notably, symptoms have been recognized in patients in Bangladesh during the season of collection of date palm sap, i.e. during December to March . Data also revealed high seroprevalence of anti-Nipah viral antibodies among Pteropusspp. This is suggestive of the fact that the virus has undergone adaptation well enough to get transmitted among Pteropus bats. The modes of transmission of the Nipah virus are depicted in Figure 2 .
Investigations during NiV outbreaks in Malaysia revealed that pigs are the intermediate as well as amplifying hosts for the virus (Nor et al. 2000; de Wit and Munster 2015b) . In Bangladesh, domestic animals represented another route of transmission of NiV. Foraging for fruits (contaminated with infectious saliva) was observed among domestic animals in Bangladesh. There has been a report of spread of the disease from sick cows during the year 2001 in a place called Meherpur in Bangladesh (Hsu et al. 2004 ). Illness acquired from pigs or saliva of goats and secretions of bats infected with Nipah virus has also been recorded in Naogaon (International Centre for Diarrhoeal Disease Research, Bangladesh; ICDDRB 2003; Montgomery et al. 2008; Hughes et al. 2009 ). In ferrets, systemic disease was induced when the animals are exposed to certain doses of NiV particles (Clayton et al. 2016) . NiV is more likely to be transmitted from patients suffering from infection of the respiratory tract (Escaffre et al. 2013) . A case-control study of risk factors for human infection with NiV during the outbreak in Malaysia showed that direct close contact with pigs was the primary source of human NiV infections, where only 8% of patients had no contact with pigs. The outbreak was stopped after pigs in the affected areas were slaughtered and proper disinfection measures were taken (Parashar et al. 2000; Chua 2010) .
Domesticated animals play key roles in major spill-over events of bat-borne viruses but their exact rolesas bridging or amplifying species remain unclear (Glennon et al. 2018) . Their susceptibility to zoonotic viruses and potential for disease transmission to humans needs to be studied in depth in order to diminish spill-over risks of viruses like NiV and others, especially in view of global intensification of agriculture.

Epidemiology and disease outbreaks

In Malaysia in 1999, human cases of Nipah viral encephalitis were initially confused with Japanese encephalitis or Hendra-like viral encephalitis. However, the Ministry of Health confirmed that NiV was the causative agent of the infection in pigs and man and morbidity was higher (231 cases out of 283 cases reported) in Negri Sembilan region of Malaysia. Genome of the NiV was sequenced at the CDC, Atlanta, Georgia, USA. The Ministry of Health declared total of 101 human deaths and approximately 900,000 pigs were culled (Uppal 2000) . Researchers confirmed that Nipah infections in pigs and man that occurred in peninsular Malaysia in 1998-1999 spilled over from Chiropteran bats (Yob et al. 2001 ). In peninsular Malaysia, an epidemiological study was conducted for three years to assess the seroprevalence of anti-NiV antibodies and the presence of virus among Pteropus vampyrus and P. hypomelanus bats of different age groups and physiological status [involving adults, especially pregnant lactating and juvenile bats (6-24 months)]. Various risk factors for NiV infection in pteropid bats were also explored. Among the two bat species, the risk of NiV and seroprevalence were higher for P. vampyrus (33%) than P. hypomelanus (11%). NiV seroprevalence and distribution showed variation (1-20%) in the P. hypomelanus batsand also in between the years 2004-2006 irrespective of seasons (Rahman et al. 2013 ). The surveillance study was performed to assess the distribution of Henipaviruses in Southeast Asia, Australasia, Papua New Guinea, East Timor, Indonesia and neighboring countries. NiV RNA was detected in P. vampyrus bats of Pteropodidae family and non-Pteropid Rousettus amplexicaudatus bats from East Timor (Breed et al. 2013) .
In Bangladesh, outbreaks of Nipah virus were initially confirmed only by the presence of anti-NiV antibodies in serum samples. However after 2004, researchers started genetic characterization of Nipah virus by detecting viral nucleic acid .Till the year 2010, overall 9 outbreaks have been recorded in Bangladesh. Raw date palm was the source of infection of the outbreak recorded during the year 2011 ). Such finding is further strengthened by the fact that raw date palm consumption was common in patients with fatal infection ($65% mortality rate) (Olson et al. 2002; Luby et al. 2006; ICDDRB 2010) . Another outbreak during 2011 in a remote town named Hatibandha in the Lalmonirhat district, northern Bangladesh, reported 15 deaths due to NiV infection (Wahed et al. 2011) . Studies performed in pigs in Ghana suggested that serum antibodies against Henipaviruses including Hendra and Nipah viruses and viral nucleic acid were also present in another species of fruit bat, i.e. Eidolon helvum, reflecting the exposure of pigs to these bats (Hayman et al. 2011) .
NiV disease outbreak investigation in Kerala, India, during May-June 2018, elucidated virus transmission dynamics and epidemiological analysis by employing real-time RT PCR testing to detect presence of virus in throat swabs, blood, urine and CSF. A total of 23 cases were identified including the index case, and 18 laboratory confirmed cases. The incubation period was recorded to be 9.5 days (6-14 days). Twenty cases (87%) showed respiratory symptoms and the case fatality rate was 91% with only two survivors. Sequencing and phylogenetic analysisrevealed NiV isolate to be closer to the Bangladesh lineage (Arunkumar et al. 2018 ). Nevertheless, there is a growing demand for increasing public awareness regarding the transmission pattern and risk of NiV infection which will ultimately aid in the potential reduction ofoccurrence and associated spread/ outbreaks of the disease (Yu et al. 2018) .
A firefly luciferase that expresses NiV has been generated for facilitating studies (spatiotemporal) on the pathogenesis of Henipaviruses. Herein bioluminescence imaging technique has been used for monitoring of the replication of the virus as well as spread in knockout mice. This reverse genetics system may be a useful tool to investigate Henipa-like viruses (Yun et al. 2015) .

Phylogenetic analysis of NiV

A pair-wise-similarity analysis among the nucleotide sequences retrieved from the Nipah Virus (NiV) N and G genes was carried out after aligning the sequences by the Clustral V program in MegAlign software of the DNASTAR software package. For the genetic relatedness study, representative 1599-and 1809 bp-length for N gene (27 strains) and G gene (15 strains), respectively, were investigated. NiV strains from different countries including Malaysia, Cambodia, Bangladesh, India and Thailand, submitted during 2001-2018 were retrieved from the NCBI database. Phylogenetic analysis was performed using the maximum likelihood method (1000 bootstrap replicates) in MEGA 6 software (v 6.06) (Tamura et al. 2013) . The suitable dendrogram analysis model was identified by using the find best DNA/protein model tool available in MEGA 6 (v 6.06), confirmed with the FindModel online tool (Posada and Crandall 1998) . For N and G genes, the respective models were KHY þ G and T92.

Immunobiology (immune response/immunity)

Immune response studies regarding Nipah virus have been conducted by various researchers, especially after each reported outbreak. Since the virus exhibits two dictinct types of association among its hosts (maintaining its persistence in the nature through reservoir hosts like bats and inflicting fatal clinical condition in humans as well as domestic animals like pig), the immune responses might be host-specific Negrete et al. 2005; Kulkarni et al. 2013) . Several proteins of Henipaviruses block host innate immune responses viz., P/phosphoprotein; V protein; the C and the W proteins. In response to several stimuli the IFNa/b production can be inhibited by V as well as W proteins whereas the ability of IFNs for signaling are blocked by P, V as well as W proteins, leading to induction of a state of cellular antiviral response (Basler 2012) . The innate immune system of pteropid bats is remarkable for its constitutive action of Type 1 interferon system (which can restrict the early viral replication within their body) (Zhou et al. 2016 ). This mode of action has been associated also with several interferon stimulated genes (ISG) particularly of those involved in noninflammatory pathways so that elevation in interferon response in bats is not allied with chronic inflammation unlike in case of rodents or humans (Halpin et al. 2011 ). Due to these differences, bat cells are primed to react to viral attack immediately but only upto a level of restricting replication (Zhou et al. 2011a,b) . Bats possess comparatively higher repertoire of naive immunoglobulins with more specifities, thereby favouring direct clonal selction of B lymphocytes for antibody production. In such condition there may be poor or no hypermutation and affinity maturation stages in B cells, leading to poorer responses and restricted production of high-titered antibodies than other species. These features contribute for the delay in viral clearance and persistence of virus for a pretty long period (Wellehan et al. 2009; Schountz et al. 2017) . Nipah virus comparative studies conducted in pteropid bats and hamster reinforce these points as virus showed lesser multiplication and shedding from bat endothelial cells as well as with poorer antibody responses upon challenge studies (Wong et al. 2003; Lo et al. 2010; de Wit et al. 2011) . Recently, tetherin (an IFN-induced protein from bats) has been reported to inhibit NiV replication in fruit bat cellsand to act as an innate immune antiviral protein that can facilitate the host to combat virus induced pathological changes (Hoffmann et al. 2018) .
Another immune mechanism within bats to prevent complete elimination of Nipah virus is the modulation of bat antiviral responses towards virus survival. In the reservoir host, the virusemploys immune evasion strategies especially against innate immune system so as to escape from the immune attack and maintain perpetuation within the host by retaining replication at a minimum level (Rodriguez and Horvath 2004; Rupprecht et al. 2011) . Such evasion strategies are mediated through accessory proteins encoded within the virus which may also have effect over other hosts through spillover adaptation (Schountz 2017) . The NiV P gene (coding for polymerase-associated phosphoprotein) playsa key role in evading interferon mediated immune response from the host (Shaw, 2009 ). This gene encodes accessory proteins such as P, V, W and C; all of these were reported to inhibit host antiviral responses through blockage of interferon mediated signaling pathways, especially STAT1 stimulated JAK-STAT signaling pathway (Shaw, 2009; Prescott et al. 2012) .
In case of hosts exhibiting clinical disease from Nipah virus, various virus associated immune antagonistic proteins subvert host immune responses, thus leading to pathogenesis and clinical condition. Wild type virus uses an unique RNA editing mechanism for the controlled transcription and translation of multiple antagonistic proteins which may be delayed in some hosts, so that the protein production may be slightly delayed. In such situations antiviral responses would be strong with associated inflammatory responses thus partially restricting viral replication and pathogenesis in some hosts (Seto et al. 2010 ). Although Nipah virus effectively suppresses antiviral cytokine production at early phase of infection, release of some amount of inflammatory cytokines has been suggested which can be attributed to the elevation in vascular permeability, ultimately favoring viral spread (Schountz 2014) .

Pathogenesis

From the respiratory epithelium, the virus is disseminated to the endothelial cells of the lungs in the later stage of the disease. Subsequently, the virus can gain entry into the blood stream followed by dissemination, either freely or in host leukocyte bound form. Apart from lungs, spleen and kidneys along with brain may act as target organs leading to multiple organ failure (Rockx et al. 2011; Escaffre et al. 2013 ). There is development of lethal infection in hamsters when leukocytes loaded with NiV are passively transferred (Mathieu et al. 2011 ). In pigs, there is productive infection of monocytes, natural killer (NK) cells along with CD6 þ CD8þ T lymphocytes (Stachowiak and Weingartl 2012) .

In humans

The virus is responsible for causing severe and rapidly progressing illness in humans with the respiratory system as well as the central nervous system (CNS) mainly getting affected ). The signs and symptoms of the disease appear 3-14 days post NiV exposure. Initially, there is a high rise of temperature along with drowsiness and headache. This is followed by mental confusion as well as disorientation, ultimately progressing towards coma within 1-2 days. A critical complication of the NiV infection is encephalitis. During initial phase, the respiratory problems may become evident. There is development of atypical pneumonia. Coughing along with acute respiratory distress may be evident in certain patients Williamson and Torres-Velez 2010) . There may be sore throat, vomiting, along with muscle aches (www.medicinenet.com). There may be development of septicemia along with impairment of the renal system and bleeding from the gastrointestinal tract. In severe cases within a period of 24-48 h, there may be development of encephalitis along with seizures that ultimately leads to coma (Giangaspero 2013) . It is crucial to note that transmission of the virus is more common from patients having labored breathing than those having no respiratory problems (www. cdc.gov; Luby et al. 2009 ). NiV antigen can be detected in bronchi and alveoli. 3. Inflammatory mediators are activated as a result of infection to the airway epithelium. 4. Virus is disseminated to the endothelial cells of the lungs in the later stage of the disease. 5, 6. Virus enter the blood stream followed by dissemination, either freely or in host leukocyte bound form, reach brain, spleen and kidneys. 7. Two pathways are involved in the process of viral entry into the central nervous system (CNS), via hematogenous route and anterogradely via olfactory nerve nerves. 8. The blood brain barrier (BBB) is disrupted and IL-1b along with tumor necrosis factor (TNF)-a are expressed due to infection of the CNS by the virus which ultimately leads to development of neurological signs. Red font shows the symptoms in human.

Public health significance and zoonotic aspects

NiV is the most recently emerging zoonotic and highly deadly virus having pandemic threat. As an emerging and recognized zoonotic pathogen discovered in modern times, NiV causes severe febrile illness and high fatality rates in affected persons and is posing an ongoing high risk to the health of humans worldwide (Clayton 2017; Mukherjee 2017; Thibault et al. 2017) . NiV is an uncommon but has become a deadly virus responsible for causing high fatality rates of 40-75%. Fruit bats (Pteropus) serve as natural hosts (wildlife reservoir) and pigs are the intermediate hosts for NiV zoonotic cycle (Paul 2018) . During a large outbreak of acute encephalitis in Malaysia in 1998, the virus was discovered in affected patients having contact with sick pigs. The pigs got infection from bats, and then NiV spread proficiently among pig-to-pig, and thereafter from pig-to-man. Moreover, it has been revealed that Pteropus vampyrus and Pteropus hypomelanus (flying foxes in the Malysian Islands) bear the virus in saliva as well as urine, indicating their potential to act as natural reservoir of the virus (Looi and Chua 2007) . It is interesting to note that there is always risk of spill over associated with NiV infection. Interaction of the molecular as well as ecological factors collectively that govern the susceptible nature of populations of animals (domestic) as well as humans are not understood yet well (Thibault et al. 2017) .

Laboratory diagnosis

Confirmation of the human as well as animal NiV infections can be done by isolation of the virus along with performing serological tests and tests to amplify viral nucleic acids. Biosafety level-4 (BSL-4) laboratory facilities are required for NiV isolation as well as propagation. However, BSL-3 may prove to be sufficient to primarily isolate the virus from suspected clinical materials. Following confirmation of the virus in infected cells (fixed by acetone) by immunofluorescent technique, there should be immediate transfer of the culture fluid in BSL-4 laboratory Ksiazek et al. 2011) . It is crucial to note in this aspect that International Centre for Diarrhoeal Disease Research, Bangladesh (ICDDRB) along with Institute of Epidemiology Disease Control and Research (IECDR) are the institutes involved in handling NiV in Bangladesh. In India, BSL-4 laboratory has been established in Pune at National Institute of Virology (NIV) (Kulkarni et al. 2013) . In Japan, National Institute of Animal Health has developed immunohistochemical diagnostic technique based on monoclonal antibodies (Tanimura et al. 2004 ).
In order to screen the serum samples of pigs, a recombinant N protein based-ELISA has been developed at the High Security Animal Disease Laboratory (HSADL), Bhopal. By the use of pseudotyped particles, a serum neutralization test for NiV can be performed under BSL-2 conditions. This test uses a recombinant vesicular stomatitis virus that expresses secreted alkaline phosphatase (SEAP). Neutralization titer can be obtained by measurement of SEAP activity . Microsphere assay (luminex based) has been used for detection of antibodies against a glycoprotein of NiV, namely NiV sG, in the sera of pigs and ruminants like goats and cattle (Chowdhury et al. 2014) . Recently, ELISA has also been developed using recombinant full length N protein and truncated G protein for detecting virus specific antibodies in serum samples of porcines (Fischer et al. 2018) . NiV N ELISA was employed for initial screening of serum samples for henipavirus infection, while NiV G ELISA detected specifically the NiV infections. Such ELISAs are valuable diagnostic methods for seromonitoring of swine population and probably livestock and wildlife animals.

Vaccines

A recombinant measles virus (rMV) vaccine that expresses envelope glycoprotein of NiV has been found to be promisingfor use in man (Yoneda 2014) . A replication-competent, recombinant VSV-vectored vaccine encoding NiV glycoprotein was reported to show high efficiency in a hamster model. A single intramuscular dose of the vaccine conferred protective immunity in African green monkeys one month after vaccination (Prescott et al. 2015) . Healthcare workers and family contacts attending Nipah cases should be considered for Nipah vaccination, in order to limit human-to-human transmission and curb outbreaks (DeBuysscher et al. 2016) . A very strong virusspecific immune response is generated through vaccination which inhibits the virus replication and shedding. Such vaccine could provide protection from NiV in disease outbreaks. Attenuated live vaccines as well as subunit G (recombinant platforms) have also been tested (Satterfield et al. 2016a) .
Nipah virus-like particles (NiV-VLPs) composed of three NiV proteins G, F and M derived from mammalian cells have been produced and validated as vaccine in BALB/c mice. The immunogenicity of the NiV-VLP vaccine was high because the VLPs possess the native characteristics of the virus including the size, morphology and surface composition (Jegerlehner et al. 2002; Jennings & Bachmann 2007; Walpita et al. 2011; Liu et al. 2013) . A recent work reported a novel strategy of adding a cholesterol group to the C-terminal heptad repeat (HRC) of the F protein that facilitated membrane targeting and fusion of the peptide. Enhanced penetration of the central nervous system and significant increase in antiviral effects were observed with these peptides (Porotto et al. 2010) . NiV-VLPs derived from mammalian cells transfected with plasmids containing NiV G, F and M genes have also been produced yielding VLPs with the three proteins. These VLPs are composed of G, M and F proteins of the virus. Golden Syrian hamsters immunized with these VLPs developed high titres of neutralizing antibody in serum, and showed complete protection upon viral challenge (Walpita et al. 2017) .
An overview on different vaccine strategies available for Nipah virus (NiV) is presented in Table 1 and few important vaccine platforms are depicted in Figure 5 .

Prevention and control measures

Bat-borne viruses are posing high risks to human and animal health, and the present scenario demands a 'One Health' approach to comprehend their frequently complex spill-over routes (Glennon et al. 2018 ). There is an urgent need to create multidisciplinary teams as far as 'One Health' approach is concerned. Such team should include: medical doctors, veterinarians and agriculturists; officers from public health sectors; vector biologists as well as ecologists and phylogeneticists who can altogether put combined effort for preventing any major outbreak (Zumla et al. 2016) .
The role of bats in transmission and spread of the pathogens need to be understood in depth so as to avoid cross-species spill over especially of the deadly virusesat wild and domestic animals as well as human interface. As an illustration, screening of fecal samples of bats in caves often visited by local residents to gather manure or for hunting in Zimbabwe revealed the significance of virus monitoring and surveillance in bats at sites with high zoonotic diseases transmission ability and to strengthen appropriate prevention and control measures to curtail and check the dissemination of virus to other places (Bourgarel et al. 2018) .

Therapeutics and treatment modalities

Properties like virulence, cell tropism, viral entry into the host cell (that includes virus attachment and receptor identification and the process of fusion of membranes of the virus and host cell), etc. have been studied extensively to develop therapeutics effective against infection caused by Henipavirus including NiV Bossart and Broder 2006) . The G as well as F proteins of NiV (as well as HeV) can be targeted for inhibition of the viral entry into the host cell (Steffen et al. 2012) . For treating infections caused by Henipaviruses, there were no efficacious therapeutic or prophylactic measures as is evident from the report of Vigant and Lee (2011) . Even though in case of NiV outbreaks the empirical use of ribavirin has been proven to be beneficial, but its use has got dispute due to its inefficacy in case of infection caused by other Henipaviruses in various animal models (Vigant and Lee 2011) .

Conclusion and future directions

The 'One Health' approach is also the utmost importance. There is requirement of coordination between institutes as well as at the international level among virologists from both medical and veterinary fields as well as ecologists for understanding to the fullest the period and mechanism involved in excretion of the virus by the bats. At the same time, the common people should be educated about food hygiene as well as hygiene at personal level. Inspection of all the imported livestock at the time of arrival and also before travel at the point of origin is essential. Proper isolation, quarantine and disinfection protocol including infrastructure facilities and trained personnel with protective clothing should be in place to respond quickly upon identification of any new case. There should be maintenance of proper hygiene at maximum level for slaughtering such livestock. For preventing future NiV outbreaks, a continuous surveillance in the area of human health, animal health, and reservoir hosts should be carried out to determine the prevalence and to predict risk of virus transmission in human and swine populations. Successful accelerated development of preventive vaccines and therapeutic antibodies or antivirals are need of the hour to control the spread and treat the infected patients during an outbreak. Collaborative efforts such as CEPI and biotech companies will accelerate the vaccine or therapeutic development for NiV.

Nipah virus

NiV infects its host cells via two glycoproteins, i.e. G and F proteins. The G glycoprotein mediates attachment to host cell surface receptors and the fusion (F) protein makes fusion of virus-cell membranes for cellular entry. The G protein of NiV binds to host ephrin B2/3 receptors and induces conformational changes in G protein that trigger the F protein refolding (Liu et al. 2015) . Wong et al. (2017) have demonstrated that monomeric ephrinB2 binding leads to allosteric changes in NiV G protein that pave the way to its full activation and receptor-activated virus entry into the host cells. Recently, viral regulation of host cell machinery has been revealed to target nucleolar DNA-damage response (DDR) pathway by causing inhibition of nucleolar Treacle protein that increases Henipavirus (Hendra and Nipha virus) production (Rawlinson et al. 2018) . A diagrammatic structure of Nipah virus is depicted in Figure 1 .
NiV infection produces severe respiratory symptoms in pigs compared to humans. A rapid spread of NiV is seen in human airway epithelia which express high levels of the NiV entry receptor ephrin-B2, and the expression levels vary between cells of different donors (Sauerhering et al. 2016) . NiV infection upregulates IFN-k in human respiratory epithelial cells. IFN-k pretreatment can proficiently demonstrate antiviral activity by hindering NiV replication and thus variations in its receptor expression can participate in a useful role in NiV replication kinetics in different donors (Sauerhering et al. 2017) . The NiV V protein, which is one of the three accessory proteins encoded by the viral P gene, plays crucial role in pathogenesis of the virus in experimental infection in hamster. NiV V protein has been shown to increase the level of a host protein UBXN1 (UBX domain-containing protein 1, a negative controller of RIG-I-like receptor signaling) by restraining its proteolysis and thus regulating (suppressing) induction of innate interferons (Uchida et al. 2018) . Analyzing viral proteins, their structure and biological functions would help in designing possible strategies for designing appropriate drugs and vaccines (Sun et al. 2018) . A variety of cellular machinery is recruited by matrix protein of NiV in order to scaffold the viral structure as well as facilitate the assembly and coordinatevirion budding. The matrix protein also highjacks ubiquitination pathways to facilitate transient nuclear localization. It is crucial to note that amongst the matrix proteins there is conservation of the molecular details of the virus (Watkinson and Lee 2016) . Production of viral RNA as well as regulation of viral polymerase activity is governed by overexpression of the nucleocapsid protein of NiV. There is inhibition of transcription (of viral specific proteins) due to overexpression of such protein but definitely synthesis of genome of the virus is increased. Ultimately, the progeny of the virus is inhibited due to the bias of the activity of polymerase towards production of genome (Ranadheera et al. 2018) . Super-resolution microscopy revealedrandom distribution of F as well as G proteins on the NiV plasma membrane irrespective of the presence of matrix (M) protein. Virus like particles (VLPs) are formed due to the assembly of M molecules at the plasma membrane . G protein recruitment into VLPs is augmented by formation of viral particles that are driven by F, M as well as M/F. Such studies on viral proteins aid in improving the knowledge regarding the process of virus assembly which can ultimately spearhead researchers to design effective and specific therapeutics (Johnston et al. 2017) . Further for developing prophylactic as well as therapeutic agents it is necessary to know the interaction between host and NiV. The microRNA processing machinery along with the PRP19 complex are the host targets of the virus. The p53 control along with expression of genesisgets altered by W protein of the virus. Affinity purification coupled with mass spectrometry has helped to identify interaction between the human as well as NiV proteins (Martinez-Gil et al. 2017) . VLPs consisting of M, G and F proteins have been produced in humanderived cells, and have been characterized by liquid chromatography and mass spectrometry (Vera-Velasco et al. 2018 ).

Transmission of the Nipah virus

Changing resource landscapes, rapid change in fruit bat habitat, related shifts in their ecology and behavior, altered diet, roosting environment, movement and behaviors altogether constitute the ecological drivers causing increasing spillover risk of bat-borne viruses like Henipavirus to domestic animals and humans (Kessler et al. 2018) . Understanding virus-bat interactions is an exciting new area of research that could through new light on the different modes regulating NiV infection and to designing effective and novel therapeutics (Ench ery and Horvat 2017).
Spatial and temporal distribution studies of NiV spillover events in Bangladesh (2007-2013) revealed bat-to-man spillovers every winter with 36% annual variation and the distance to surveillance hospitals showed 45% of spatial heterogeneity (Cortes et al. 2018) . Therefore, strategies to prevent NiV infections in humans need to be strengthened all through colder winters. Dynamics of bat infections and spillover risk need to be understood in depth, for which purpose the evolutionary studies based on codonusage pattern can throw some lights.A recent study on the systematic evolutionary set up and codon usage pattern by both Hendra and Nipah viruses revealed that Henipaviruses are highly adapted within bats belonging to the genus Pteropus and this is strongly influenced by natural selection ).

Epidemiology and disease outbreaks

In 1998, NiV disease was recognized for the first time in Malaysia in persons who were in contact with swine population. In March 1999, one outbreak of acute Nipah virus infection was recorded in 11 male abattoir workers (average age of 44 years) in Singapore where pig meat was imported from Malaysia, with one dead. Patients showed higher level of IgM in serum and some unusual symptoms of atypical pneumonia and encephalitis with characteristic focal areas of increased signal intensity in the cortical white matter in MRI. Symptoms of hallucination along with abnormal laboratory results including low lymphocyte and platelet counts, high levels of CSF proteins and of aspartate aminostransferase were present. The patients were treated by intravenous acyclovir and eight were cured (Paton et al. 1999; Abdullah and Tan 2014) . FromSeptember 1998 to June 1999, 94 patients (both males and females), with anaverage age of 37 years, reporting close contact with swine population and diagnosed with severe viral encephalitis were investigated. Results showed a direct transmission of Nipah virus from pigs to human beings. The illness showed a very short incubation period and the symptoms includedheadache, dizziness, fever, vomiting, doll's-eye reflex, hypotonia, tachycardia, lowering of consciousness, areflexia (loss of all spinal reflexes), hypertension and high mortality (Goh et al. 2000) . Surveillance studies on Malaysian wild life species like island flying foxes (Pteropus hypomelanus) initially revealed the seropositivity of Nipah viral antibodies in them and laterconfirmed the existence of virus also by isolation studies .
In Singapore and Malaysia, febrile encephalitis due to NiV has been reported from 246 patients between1998 and 1999 and in farmed pigs during the same period, as an epidemic with neurological as well as respiratory signs (CDC 1999a,b; Nor et al. 2000; Pulliam et al. 2012) . Farmers associated with pig farming and abattoir workers were found to be in the high risk group (Pulliam et al. 2012) , and the human mortality was about 40% (Lo and Rota, 2008) . NiV infection has not been reported directly in man or pig in Indonesia, but exposure of Pteropus vampyrus bats to NiV has been reported. Thus in Indonesia, there is every possibility of disease spread from the carrier bats to pig or man (Woeryadi and Soeroso 1989; Mounts et al. 2001; Kari et al. 2006 ). Presence of anti-NiV antibodies in serum indicated an early exposure of bats to the virus. In India, a sero-surveillance study conducted over 41 pteropid fruit bats in North Indian region showed seropositivity in twenty bats (Epstein et al. 2008) .
NiV was detected for the first time in Siliguri, West Bengal, India in the year 2001 during an outbreak characterized by febrile illness in association with altered sensorium (poor thinking capability or poor concentrating capacity). A close resemblance had been found between the isolates of Siliguri outbreak and those obtained during the outbreak in Bangladesh. Such resemblance is justified, as Siliguri is located at the vicinity of Bangladesh (Harit et al. 2006; ICDDRB 2011) . Another outbreak was reported from Nadia district, West Bengal in the year 2007 (Chadha et al. 2006) . Most recently in the year 2018, Nipah viral disease outbreak has been reported in Kozhikode district, northern Kerala, India and the fruit bats have been identified as the source of the outbreak (Chatterjee 2018; Paul 2018) . During this outbreak, deaths occurred in the infected subjects as well as in healthcare personnel who were involved in treatment of patients. On May 19, 2018, 4 infected people died and on 23 May, 2018 13 more subjects deceased (3 from Malappuram and 10 from Kozhikode district). NiV was confirmed upon laboratory testing using RT-PCR. Genetic analysis at the early stage confirmed NiV etiology and that the epidemic strain showed close resemblance to the BD strain of NiV (http://gvn.org/update-on-the-nipahvirus-outbreak-in-kerala-india/). In both outbreaks, circumstantial evidences suggested the human-tohuman transmission, as most people who acquired the infection were either care-givers, or family members of infected persons.

Molecular epidemiology

For comparison of the open reading frame sequence of the NiV with those from other members of the Paramyxovirinae subfamily, phylogenetic analysis had been used widely and by such approach the closest relation between NiV and Henipavirus has been proven . It has been revealed by nucleotide sequencing technique that there exist very little difference in the nucleotide sequences of NiV isolated from throat secretion and cerebrospinal fluid (difference by just 4 out of 18,246 nucleotides) (Arankalle et al. 2011) . Nucleotide sequence homology has also been observed between the virus isolated from Bangladesh and Malaysia but it is interesting to note that nucleotide heterogeneity (inter-strain) had been found to be more obvious. It is interesting to note that differences in genetic variability certainly have relation with the mode of transmission. It is evident by molecular epidemiological studies that NiV had been introduced in pigs in Malaysia during 1998-1999 causing great loss to pig farming (Looi and Chua 2007) . However, the human and pig isolates in Malaysia during the later phase of outbreak showed nearly identical sequences. This is suggestive of the fact that there was rapid spread of only one variant in pig and such variant was responsible for most of the cases in man. In contrast, the introduction of NiV from fruit bats to humans for multiple times in Bangladesh might be responsible for the sequence heterogeneity of the NiV isolates (Chan et al. 2001; Chakraborty 2012) . Detailed phylogenetic analyses have been performed on thecomplete gene sequences of NiV strains from the year 2008 as well as 2010 outbreaks in Bangladesh. On the basis of a nucleotide sequence window (comprising of 729 nucleotides), a genotyping scheme has been introduced. An accurate and simple way for classification of current as well as future sequences of NiV has been provided by this genotyping scheme. A phylogenetic tree (with very high bootstrap values) has been constructed by such genotyping method. Phylogenetic analysis showed close similarity of sequences obtained from pigs and humans during the Malaysian outbreak. Analysis also revealed that the virus isolated from Bangladesh possesses an additional 6 nucleotides than the prototype Malysian strain . For classification of sequences of NiV such methodology and phylogenetic tree is very helpful . For investigating the viral genetic diversity, a phylogenetic study of the infection caused by NiV has helped in estimating the infection spread and its date of origin .

Immunobiology (immune response/immunity)

Presence of antigen-positive inclusions in the brain tissues of patients with Nipah Viral encephalitis points to the inadequacy of both innate and adaptive responses for preventing viral spread. These findings suggest the inability of dendritic cells residing at primary entry point of virus; especially respiratory tract and lungs, rendering inefficient antigen capturing and tissue restriction (Chua et al., 1999; Chong and Tan 2003) .
Evidence also suggest the suppression of MHC-I expression in immune cells by the viral proteins, leading to a repression in both antigen presentation by antigen presenting cells and stimulation for mounting adaptive responses, ultimately resulting in viral spread and persistence in other target organs (Dasgupta et al. 2007; Seto et al. 2010) . Besides these, the virus induced immune evasion for long time also accounts for the persistence of virus in brain tissues and ensuing relapsed and late onset fatal encephalitis in man (Tan et al. 2002) . Apart from these findings, typical interaction pattern of the virus with other critical genes of the host such as TLR genes of host defence, Notch genes of neurogenesis, and other genes like TJP1, FHL1 and GRIA3 concerned with blood-brain barrier and encephalitis, etc. have been reported by computational prediction. Crucial role of miRNAs present in NiV genome in inhibiting these host genes, thereby aiding the viral spread and pathogenesis has been reported (Saini et al. 2018) . The pathogenecity of Nipah virus in pigs and man can be correlated with its ability and magnitude to evade immune responses in reservoir host. Though the virus has undergone frequent species jumping involving various hosts, higher fatality rates are being associated with human outbreaks so far, which warrants a comprehensive study to elucidate and explore the viral evolution and adaptation in different hosts.

Pathogenesis

Two pathways are distinctly involved in the process of viral entry into the central nervous system (CNS), viz., via hematogenous route (through choroid plexus or blood vessels of the cerebrum) and/or anterogradely via olfactory nerves (Weingartl et al. 2005) . The blood brain barrier (BBB) is disrupted andIL-1b along with tumor necrosis factor (TNF)-a are expressed due to infection of the CNS by the virus which ultimately leads to development of neurological signs (Rockx et al. 2011 ). There may be presence of inclusion bodies in case of infected CNS in man. In both the gray as well as white matter plaques may be evident along with necrosis (Escaffre et al. 2013) . It is quite noteworthy that the virus can directly enter the CNS in several experimental animal models via the olfactory nerve. The olfactory epithelium of the nasal turbinate is infected by NiV in such animal models. The viral infection subsequently extends through the cribiform plate into the olfactory bulb. Ultimately, the virus is disseminated throughout the ventral cortex along with olfactory tubercle (Weingartl et al. 2005; Munster et al. 2012; Escaffre et al. 2013) . A diagrammatic representation of pathogenesis of NiV has been depicted in Figure 4 .

Public health significance and zoonotic aspects

Besides Malaysia, the fruit bats of Pteropus genus serve as the main reservoir of NiV in Thailand and Cambodia. Apart from drinking raw date palm sap contaminated by bats as a cause of initial outbreak, man-to-man and animal-to-man transmission is also a major mode of spread of the infection during an ongoing outbreak. Further, it has been found that direct contact of the susceptible population with the respiratory and body secretions of the infected patients increases the risk of acquiring the infection. During the NiV outbreak in Thakurgaon district, northwest Bangladesh, anti-NiV antibodies were detected in half of the Pteropus bats tested (Chadha et al. 2006; Gurley et al. 2007; Homaira et al. 2010a,b; Clayton 2017; Thibault et al. 2017) . Other major public health threats appear to be acquiring NiV infection from the susceptible food and domestic animals. Many domesticated mammals seem to be susceptible to Nipah virus. This virus can be maintained in pig populations, but other domesticated animals such as sheep, goats, dogs, cats and horses appear to be incidental hosts acquiring the infection during outbreaks. Fruits punctured by the bat and contaminated with their saliva forma common source of transmission of NiV infection from bats to domestic animals. Consumption of fruits eaten partially by fruit bats may cause infection in pigs which may then transmit it to humans. Contact with sick cow was reported to have caused a case of human infection in Bangladesh (Chua 2003; Luby et al. 2012; Siddique et al. 2016 ; http://www.cfsph. iastate.edu/Factsheets/pdfs/nipah.pdf).
The potential for a global pandemic due to NiV appears to stem from several features: availability of susceptible human population, several viral strains withpotential for person-to-person transmission, and error-prone nature of RNA virus replication. Outbreaks of NiV disease in densely populated regions like South Asia can lead to pandemics, due to extensive global travel and trade connectivity (Luby 2013) . Many ecological and molecular factors underlie NiV spillover into humans and human and animal susceptibility to it, though the intricate interaction between these is unclear (Thibault et al. 2017) . Research studies need to be undertaken to elaborate the molecular mechanisms of the respiratory transmission of NiV in order to reduce the risk of human-to-human transmission. Improved surveillance and vaccination strategies must also be adopted (Luby 2013) .

Laboratory diagnosis

Molecular tests such as reverse transcription polymerase chain reaction (RT-PCR) along with real-time RT-PCR (qRT-PCR) and duplex nested RT-PCR (nRT-PCR) have been found useful for detection of NiV infection, with subsequent confirmation by nucleotide sequencing of amplicons. A unique primer set targeting the N gene has been reported. Internal controls may also beincluded in nRT-PCR tests for detection of NiV RNA. Further, such kind of nRT-PCR has helped to detect two different viral strains from Pteropus lylei in Thailand Guillaume et al. 2004a; Wacharapluesadee and Hemachudha 2007) . qRT-PCR protocols have also been developed fordetection of henipaviruses and found to be useful for the diagnosis of NiV infection as well (Wang and Daniels 2012; Kulkarni et al. 2013; Jensen et al. 2018) . SYBR-Green I dye-basedqRT-PCR employing primers specific to N gene have also been reported (Chang et al. 2006) . Recently, a novel one step qRT-PCR assay targeting the intergenic region separating F and G genes has been reported for quantitative detection of NiV replicative viral RNA that avoids viral mRNA amplification, and may represent a more precise assay than the conventional qRT-PCR (Jensen et al. 2018) . Advancements in the field of diagnosis of emerging zoonotic pathogens following an integrated One Health approach need to be explored optimally (Bird and Mazet 2018) .

Vaccines

Vaccination of humans is an integral part of preventing infection due to NiV. Prevention also includes vaccination of livestock (especially pigs and probably horses) in endemic areas (Broder et al. 2016) . Of note, outbreaks cannot be prevented amongst the livestock population in areas where contamination of date palm sap acts as major contributor to the spread of NiV infection. However, if vaccination of livestock is made cheap it may prove to be successful in certain regions. Extensive research involving preclinical studies in a number of animals and nonhuman primates have identified multiple vaccine candidates, including vectored and subunit vaccines, offering protective immunity (Satterfield et al. 2016a ). Among vectored vaccines, one employing vesicular stomatitis virus has shown protection inferrets, African green monkeys, as well as hamsters (Mire et al. 2013) . Despite these developments, funding for human clinical trials of candidate vaccines remains a problem for academic community.The pharmaceutical companies are hesitant to invest in research on development of vaccines for diseases like Nipah, which are rare occurrences, despite the high fatality.
A collaborative effort has been undertaken by both government and pharmaceutical companies, known as the Coalition for Epidemic Preparedness Innovations (CEPI). It was formed in January 2017 for developing safe, efficacious and affordable vaccines against diseases associated with pandemic potential, like Nipah (Satterfield 2017) . NiV, Lassa virus and Middle East Respiratory Syndrome Coronavirus (MERS-CoV) have been afforded high priority by CEPI. CEPI aims to develop two new experimental vaccines within five years, in the first phase of the clinical trial. It is anticipated that field efficacy studies of such vaccines could be done during massive outbreaks (Satterfield 2017) . CEPI has recently signeda $25 million contract with two US Biotech companies, i.e. Profectus BioSciences and Emergent BioSolutions, to accelerate the work on developing a vaccine against the NiV. DNA vaccines, virus-like particles, virus vectors(live and recombinant), and other advanced vaccines have been developed as strategies of immunization against both HeV and NiV (Walpita et al. 2011; Kong et al. 2012; Kurup et al. 2015) . Experimental vaccines based on the several viral vectors, including the canarypox virus, vesicular stomatitis virus glycoprotein (VSVDG) and rhabdovirus have been evaluated (Weingartl et al. 2006; Chattopadhyay and Rose 2011; Lo et al. 2014; Kurup et al. 2015; Satterfield et al. 2016a) .
Immunoinformatic advances have been utilized for developing peptide-based NiV vaccine by prediction and modeling of T-cell epitopes of NiV antigenic proteins. Specific epitopes, viz., VPATNSPEL, NPTAVPFTL and LLFVFGPNL of N, V and F proteins, respectively, showed substantial binding energy as well as score with HLA-B7, HLA-B Ã 2705 and HLA-A2 MHC class-I alleles, respectively (Kamthania and Sharma 2015) . Such predicted peptides can potentially stimulate T-cell-mediated immunity and could have utility in developing epitope-based vaccines to counter NiV. In silico epitope prediction tools which evaluated G and F protein of NiV indicated that either GPKVSLIDTSSTITI or EWISIVPNFILVRNT peptides could formulate an effective universal vaccine component, inducing both humoral and cell-mediated immunity (Sakib et al. 2014) . A more recent in silico analysis using bioinformatics tools indicated that the epitopes from G (VDPLRVQWRNNSVIS) and M (GKLEFRRNNAIAFKG) proteins can be helpful for designing common B-and T-cell epitope-based peptide vaccinesagainst HeV and NiV, and this approach needs to be evaluated (Saha et al. 2017) . From another epitope-based immunoinformatics and prediction study on the NiV associated RNA-dependent RNA polymerase protein complex, best-predicted Tcell epitopes identified are 'ELRSELIGY' (peptide of phosphoprotein) and 'YPLLWSFAM' (nucleocapsid protein). Such approach identified B-cell epitope sequences in phosphoprotein (421 to 471), polymerase enzyme gene (606 to 640) and nucleocapsid protein (496 to 517). These studies are oriented for the validation of potential vaccine candidate protein portions from Nipah virus which could then spearhead towards the development of fruitful subunit vaccines (Ravichandran et al. 2018) .
The development of animal models of NiV disease is another priority, in order to evaluate the preventive and therapeutic approaches. This will help in employing successful immunization strategies (both active as well as passive) by targeting the envelope glycoprotein of the virus ).

Prevention and control measures

Outbreaks of NiV invoke costly emergency responses around the world. NiV poses a greater threat in regions associated with risk factors and with poor indicators of development (Tekola et al. 2017) . With climate change and human encroachment into flying fox habitats, it is likely that outbreaks will occur in new locations (Satterfield 2017) . Strategies other than vaccination also play crucial roles in prevention and control of human NiV infection and would prove to be more economical. Prevention of infection in livestock could be an efficacious strategy in regions where theyserve as intermediate hosts. It involves keeping fruit as well as bat roosting trees away from the livestock farms and grazing lands susceptible to virus contamination. In certain countries, like Malaysia, such effort is already proven to be highly effective (Satterfield 2017 Walpita et al. 2017 tougher in regions where contaminated date palm sap is the primary source of NiV. In such regions, human behavioral changes, viz., drinking of contaminated date palm sap, would be necessary. In Bangladesh, date palm sap is usually harvested overnight. The nocturnal activities of bats such as drinking from, defecating or urinating in the date palm sap collection jars have become evident through infrared cameras. Measures to prevent the access of bat to the sap stream of the date palm tree as well as shaved surface can minimize the risk of human exposure to NiV in such settings .
Creating public awareness about avoidance of consumption of raw sap and preventing contamination of the sap collection potswere the essential approaches to prevent the disease in Bangladesh during 2012-2014 (Nahar et al. 2017) . Proper washing of the vegetables as well as fruits is essential to remove traces of bat excreta (https://www. ndtv.com/health/nipah-virus-some-preventive-measuresfor-nipah-virus-1855891).
When the genomes of Squirrel monkey, Cynomolgus macaques and African green monkey are analyzed it sheds light on protection against the infection caused by NiV with the aid of immune factors of man. Cynomolgus macaques do not develop signs of NiV infection while Squirrel monkeys and African green monkeys develop symptomatic infection. NiV is endemic in Southeast Asia where co-evolution of the virus has taken place likely with other Henipaviruses apart from NiV. Squirrel monkeys are found in central as well as south America while African green monkeys inhabit Africa. NiV is not reported in any of these regions hence theSquirrel monkeys and African green monkeysarenaïve to NiV. Full genome sequencing along with annotation of such species of primates is possible due to the availability of improved DNA sequencing technology. This can yield insights on the host genetics conferring susceptibility of certain primate species to NiV infection and might inform on therapeutic and preventive targets in humans (Satterfield 2017) . The National Centre for Disease Control (NCDC) reported that implementation of infection control and precaution at both household and hospital levels helps to limit the NiV disease outbreak. Active surveillance and contact tracing are important along with quarantine of health professionals and peopleat high risk (http://www.ncdc.gov.in/showfile. php?lid=241). Sporadic nature of NiV outbreaks, lack of information of exact correlates of protective immunity, lack of interest among private pharmaceutical companies, and inadequate availability of BSL4 laboratory facilities to test the vaccines or therapeutics against NiV in preclinical models pose challenges to NiV vaccine development. Development of diagnostics suitable for field conditions, immunotherapeutic approaches, vaccines and antiviral drugs are urgent priorities for long-term measures aimed at prevention and control of NiV disease. Utmost care should be taken in order to avoid direct contact with the persons infected by NiV. Personal protection devices, viz., masks, glasses and gloves, should be used properly. Hydration of the patient is also important (https://www.ndtv.com/health/nipah-virussome-preventive-measures-for-nipah-virus-1855891). In nations having no past history of Nipah viral outbreak, anticipatory preparedness for rural as well as urban outbreaks of the disease will ultimately help in prevention and control of potential outbreaks (Donaldson and Lucey 2018) . Control measures adopted during the NiV outbreaks in Malaysia have shown the involvement of multidisciplinary, multiministerial teams in a close collaborative and cooperative manner with various agencies at international level (Chua 2010) . Such approach needs to be adopted in case of the unpredictable disease outbreaks and deadly epidemics for controlling the spread of the virus (Kumar and Anoop Kumar 2018) .
A detailed understanding of the biogeography of the disease is required to comprehend the potential distribution of the NiV disease. Deka and Morshed (2018) carried out a study implementing certain means of modelling the risk of regional disease transmission viz., ENMeval and BIOMOD2. Such approaches help in measuring niche similarity between the ecological features and the Pteropus bats (as reservoirs of NiV). A recent bibliometric study identified a sudden increase in the number of publications referring to the eight pathogens of global concern identified by WHO, viz., Lassa, Rift Valley, Marburg, Ebola, Middle Eastern Respiratory Syndrome, Severe Acute Respiratory Syndrome, and Crimean-Congo Hemorrhagic Fever viruses (Sweileh 2017) . Almost two decades after the first report of NiV, a fruitful development in the therapeutical and preventive aspect of this deadly disease is still lacking which adds to the public health threat out of it. According to the reports from the Centers for Disease Control and Prevention (CDC), several developing as well as economically deprived countries are at high risk of Nipah outbreak (Ramphul et al. 2018 ).

Therapeutics and treatment modalities

The essence of treatment modalities along with effective therapeutics is understood, once there is an outbreak of an infectious disease. There is a need for administering therapeutics to manage the patients during NiV outbreaks and to prevent the mortality. No specific drug has been yet approved for the treatment of this important disease. Limited work has been done to develop therapeutics against NiV infection. In preclinical studies, monoclonal antibodies have been used for treatment purposes. Due to the expensive nature of the drugs based on antibodies, identification of broad spectrum antivirals is essential along with focusing on small interfering RNAs (siRNAs) (Satterfield 2017) . In animal models, the NiV pathogenesis has been understood by shedding light on the crucial nature of phospho-matrix as well as accessory proteins. For the development of novel anti-NiV drugs, such viral proteins, fusion protein and glycoprotein of the virion surface are attractive targets (Mathieu et al. 2012; Satterfield et al. 2015 Satterfield et al. , 2016b Watkinson and Lee 2016; Satterfield 2017) . A monoclonal antibody targeting the viral G glycoprotein has been shown beneficial in a ferret model of the NiV disease (Bossart et al. 2009 ). A successful outcome of an in vivo study using an investigational therapeutic, i.e. fully humanized monoclonal antibody m102.4 against NiV, in a nonhuman primate model highlights the availability of potential drug for NiV treatment in future (Geisbert et al. 2014 ). All the 12 African green monkeys that received m102.4 survived the NiV infection, whereas the untreated control subjects succumbed to disease between days 8 and 10 after infection. It has been noticed in the recent outbreak in Kerala in South India that the antiviral drug ribavirincould be explored as anti-NiV agent (https://indianexpress.com/ article/india/nipah-virus-outbreak-in-kerala-everythingyou-need-to-know-5194341/). Supportive therapies such as hydration and ventilator support constitute important aspects of clinical management of NiV cases.
In animal models, the recent therapeutic approaches against NiV have been validated targeting the early steps in the infection caused by the virus. These include: use of the virus neutralizing antibodies and blocking the fusion of membrane with peptides binding the fusion protein of the virus (Mathieu and Horvat 2015) . Full protection has been provided by the drug favipiravir (T-705) when used for 2 weeks (either orally two times a day or through subcutaneous route once a day) in Syrian hamsters challenged with Nipah viral lethal dose (Dawes et al. 2018) . The use of monoclonal antibodies, immunomodulators, convalescent plasma along with intensive supportive care are in vogue for treating severe complications associated with respiratory and nervous system (Chattu et al. 2018) . Experiments have been conducted to develop concept of prophylactic use of antifusion lipopeptides against the lethal NiV. As far as developing effective lipopeptide inhibitors (with convincing biodistribution as well as pharmacokinetic features) and efficacious delivery method are concerned results of such experiments are very much crucial (Mathieu et al. 2018) . Nevertheless, the pathogenesis of the Henipavirus infection including NiV should be understood in a better way for advancing the field of therapy against such kind of viral infection further (Mathieu and Horvat 2015) .
It is to be noted that when neurological as well as respiratory troubles prevail use of antiviral drug viz., ribavirin along with intensive support care as well as immunomodulators are effective. However, the surveillance system concerning the animal health must be strengthened through a 'One Health' approach so that the public health authorities can be warned at an early stage (Dhama et al. 2013a; Chattu et al. 2018) .
Therapeutic applications of cytokines (Dhama et al. 2013b) , recombinant proteins, RNA interference technology, Toll like receptors (TLRs) (Malik et al. 2013) , avian egg yolk (IgY) antibodies, plant-based pharmaceuticals (Arntzen 2015; Streatfield et al. 2015) , nanomedicines (Prasad et al. 2018) ; immunomodulatory agents, probiotics, herbs/plant extracts (Dhama et al. 2013c (Dhama et al. , 2018a , and others may be explored appropriately to combat NiV, as these have been found promising against other viral pathogens (Dhama et al. 2013d (Dhama et al. , 2014 (Dhama et al. , 2018b (Dhama et al. , 2018c Munjal et al. 2017; Singh et al. 2017) . Global adequacy of current and advanced approaches in designing efficient diagnostics, vaccine and drugs as well as their timely availability will give a high strength to counter emerging and re-emerging pathogens as well as alleviate their zoonotic impact and pandemic threats.
30 section matches

Abstract

Background: Co-lethality, or synthetic lethality is the documented genetic situation where two, separately non-lethal mutations, become lethal when combined in one genome. Each mutation is called a "synthetic lethal" (SL) or a colethal. Like invariant positions, SL sets (SL linked couples) are choice targets for drug design against fast-escaping RNA viruses: mutational viral escape by loss of affinity to the drug may induce (synthetic) lethality.
From an amino acid sequence alignment of the HIV protease, we detected the potential SL couples, potential SL sets, and invariant positions. From the 3D structure of the same protein we focused on the ones that were close to each other and accessible on the protein surface, to possibly bind putative drugs. We aligned 24,155 HIV protease amino acid sequences and identified 290 potential SL couples and 25 invariant positions. After applying the distance and accessibility filter, three candidate drug design targets of respectively 7 (under the flap), 4 (in the cantilever) and 5 (in the fulcrum) amino acid positions were found.
Conclusions: These three replication-critical targets, located outside of the active site, are key to our anti-escape strategy. Indeed, biological evidence shows that 2/3 of those target positions perform essential biological functions. Their mutational variations to escape antiviral medication could be lethal, thus limiting the apparition of drug-resistant strains.

Reviewer's comment

tion and helped to draft the manuscript; TV helped with study design, provided comments and feedback on draft manuscript and translated it into English; EO created the database and was involved in drafting the manuscript; LM provided help in statistical and SL couples analysis; AV carried out conception, design and coordination, analysis and interpretation of the SL couples and locked sets, wrote the draft manuscript and gave the final approval of the version to be published. All authors read and approved the final manuscript.

Background RNA viruses alone include 350 different human pathogens. Most are the agents of newly emerging diseases. Recent concerns for actual or feared pandemics (SARS, avian flu, or swine flu viruses) all raised the challenge to quickly come up with solutions. Worldwide, over 100 million influenza cases occur each year, 170 million people carry HCV, and 33 million HIV. RNA viruses generally have very high mutation rates as they use polymerases which cannot find and fix mistakes, and are therefore unable to conduct genomic repair of damaged genetic material. Under selective pressure, this errorprone replication can confer drug resistance. Since AIDS appeared, many new drugs have been created and used against RNA viruses, which in turn readily evolved drugresistant strains, a now predictable process and an unprecedented public health issue. HIV mutant strains that escape antiviral compounds have been extensively documented [1] , and one of influenza's main treatments, tamiflu, was quickly escaped from by an influenza strain which then spread surprisingly fast across the planet [2] . It now becomes clearer that future antiviral strategies should address this issue from the outset, aggressively striving to prevent viral escape. To deal with this, several directions have been explored since a decade: the structures of resistant proteins [3] , second generation drugs that can bind resistant proteins [4] , drug target polymorphism analysis in order to define "super drugs" [5] , definition of new targets, such as protein backbones [6] or dimeric proteins' monomers, in order to block them before dimerisation [7] . Overall, although individual case solutions were found, no general solution has emerged yet.

Conclusions

The constant rise of drug resistant RNA viruses is a reason to start looking for therapeutic strategies that mini-mize or eliminate viral escape. We described here a method to identify targets that may be involved in essential viral functions. These are what we call locked targets: spatially close, accessible viral invariant positions and/or potential synthetic lethals (SLs are groups of survivable mutations which are lethal whenever any two co-occur). Due to these two features, vital function and invariance, these targets are unique in that they might minimize or prevent viral escape. Our application to the HIV protease yielded 3 locked targets that are accessible, compact enough to possibly dock a drug, and are all outside of the enzyme's active site; whereas to date, all 9 existing antiprotease drugs were competitive inhibitors that bind the active site. The first locked target we closed in on is made of 7 amino acids positioned between the protease flaps and cantilever. The second locked target we detected is made of 5 amino acids in the protease fulcrum. The third locked target is composed of 4 amino acids located in the protease cantilever region. These three locked targets altogether contain 16 protease amino acid positions. Biological evidence regarding 10 positions out of these 16, provided in the Results section supports that those 3 locked targets are strategic drug targets. This because it seems a virus cannot easily mutate these targets to escape, as our statistics significantly exclude free comutations within those targets. While our method made no a priori assumptions on the positions or sequence sets, it revealed locked targets that bear essential biologic functions, which validates the starting hypothesis linking SLs to essential biological functions. We believe this method can be used against any variable protein, for identification of the best locked targets.
Obviously, the approach described here can be used for the other HIV-1 [41] and HIV-2 genes, but also for other viruses such as HCV [42] , SARS-coV or influenza [43] , or indeed for any quickly variable protein sequence. The large sequence banks needed for the statistics exist already, or can expand quickly due to the speed improvements in mass sequencing technology.

Reviewer's comment

A related issue is the application of the method to other molecules. The authors indicate that if there are not enough sequences (from another virus) in the databases, the new-generation sequencing methods may come to the rescue. But what should be sequenced, how many reads should be enough, and are there any properties of the sample that might predict the success of the approach?

It is now documented that drug resistance is due to at least one mutated amino acid, so researchers have recommended, that invariant viral amino acids should be targeted by future new drugs [8] . The rationale is that mutations in invariant positions always damage a critical biological function, resulting in non-replicative viruses.
Unfortunately, sometimes, invariant positions are too few to constitute a proper docking site for a drug. For example, 63% of the HIV protease amino acids are variable [13] . In an effort to gather enough replicativity-critical amino acids for a possible docking site, we propose to also target synthetic lethals (SL, or co-lethals). Synthetic lethality is the lethality resulting when two, individually survivable, mutations, are co-present [14] . Each mutation is then called a synthetic lethal. Analysis of SLs is a powerful tool to understand genetic interactions and essential metabolic pathways. Synthetic lethality has been extensively used to study gene products' interactions in the secretion pathway in yeast [15] , in bacteria [16] and even to identify anticancer agents [17] . It was noticed that, in many cases, the double mutants identified had their colethal mutations in 2 different genes. A few yeast studies analyzed protein structure-function aspects, revealing intragenic synthetic lethals [18] , i.e. cases where the 2 colethal mutations occur in one same gene. Little SL research has been undertaken on viruses. Researchers were rather on a quest for the opposite situation, where one crippling mutation is rescued by a second, intragenic, suppressor mutation, the second restoring the function lost due to the first mutation [19, 20] . They also found few, under-represented intragenic suppressors, which actually turned out to be SLs.
To test this anti-escape approach in practice, we study the HIV-1 protease from the B sub-type. This protein is a 99 amino acids homodimeric aspartic protease and its substrate-binding pocket includes the D25-T26-G27 catalytic triad and flap regions, which presumably open and close to allow entry and binding of substrates or inhibitors [22, 23] . This protein has more than 60% of variant positions. We first aligned 24,155 amino acid sequences of the HIV protease. From this alignment, we determined all the couples of positions that were statistically never found mutated together, that we called SL and focused on the SL set, which are accessible to the solvent, and not too far apart spatially. We report here that our method has yielded 3 targets of respectively 4, 5 and 7 amino acids. Unlike currently documented drug targets, which mutate and escape drug treatments, these targets should be conserved, otherwise viral replicativity would suffer, impairing viral pathogenicity. More generally, this process can of course be used against other HIV proteins, other RNA viruses, or any highly variable agents.

Graph of the accessible and spatially close synthetic lethal clusters

The first step is to define which couples of protease positions are synthetic lethal. To do so, we analyzed, from patients, 24,155 protease amino acid sequences of HIV-1 subtype B. Some patients had been previously treated with antiviral drugs, and some never, so the pool of sequences more appropriately represents the actual viral diversity. We kept all the sequences in one same set for four different reasons. Firstly, an untreated patient can get a viral sequence -possibly highly mutated -from a treated patient. Secondly, treatment is a selection pressure but other pressures can select other mutations appearing in both treated and untreated sets. Thirdly, our dataset can be enriched with plenty of mutations from unknown patients with unknown treatment histories. Finally, we wanted to create a tool that is robust enough to be used on other RNA virus databases that are less documented than HIV. When sequences occurred in multiple copies in the sequences collection, we kept all these redundant copies, assuming redundancy may reflect biological fitness (mostly identical copies from different patients, rather than multiple samples from the same patient). We did not use clonal data because it is less informative: not enough sequences, and not mutated enough. These 24,155 sequences were aligned. From this alignment, we found 25 positions that display changes in less than 0,3% of the sequences. We called those positions "invariants". Ceccherini-Silberstein et al. [10] , who defined invariant positions as displaying changes in less than 1% of the sequences, found 46 of those positions. All our 25 invariant positions were totally included in the Ceccherini-Silberstein invariant set of protease inhibitor (PI) treated patients. Stanford university HIV drug resistance database [13] described 36 positions represented in less than 0.5% of the HIV subtype B sequences from PI treated patients. Our 25 invariant positions are also all included in this group. The invariants positions were then set aside momentarily; and we compared the 74 variant positions in pairs (2,701 couples).

Discussion

Finally, the fact a synthetic lethal co-mutation is never observed statistically does not mean such a co-mutation is completely impossible, hence our use of the expression "potential SLs" in cases with no experimental replication data. Our results indeed include cases where synthetic lethals were actually mutated (although this occurred in less 0.02%-0.5% of the sequences, data not shown). So if these sequences truly represent replicative viruses, as Figure 1 , boxed with the same colour codes and linked by black edges. Each black edge means "is co-lethal with, and within 10 Angström from". The corresponding p-value is shown above each edge. Invariant amino acids are boxed in green, linked by dotted green edges meaning "is within 10 Angström from". All the nodes are shaded in blue because they are all accessible. Each invariant is also linked to all the other nodes by implicit edges, not shown, for the sake of clarity. The locked target represented subgraph "a" is called SGI flap (or-ange+green) The locked target represented subgraph "b" is called SGI canti (red+green) The locked target represented subgraph "c" is called SGI fulc (yellow+green) The graph was built using the Graphviz software.

In line with our patented model [21] we chose to focus on intragenic synthetic lethals in the HIV protease. As mentioned above, the number of replicativity-critical amino acids available for drug docking should be maximized, invariant amino acids being not numerous enough, and one single SL couple being possibly too little. So we propose to add larger "SL sets" rather than one single SL couple. By "SL set", we mean a set of SL couples that are connected to each other. We choose to add also the invariant position being in the vicinity of the SL set. Indeed, a set of positions containing invariant amino acids plus an SL set may be large enough a target for putative antiviral drugs.

Graph of the accessible and spatially close synthetic lethal clusters

The distances between the two positions of each couple were calculated using the 3D protease structure PDB ID:1HSG [25] . The resulting distance graph was then superimposed to the "total SLs" graph. To be accessible by a putative drug, amino acids should fulfill two preconditions: to be spatially close enough to possibly form a pocket structure, and to be solvent-accessible (on the outside of the protein). Thus we chose to retain only the clusters of close positions, by only keeping the edges that indicate less than 10 Angström of distance between two nodes. This resulted is a drastically reduced graph, of "spatially close SLs", shown Figure 1 , going from 290 nodes to 48.

Extending the SL sets: the flaps and cantilever

For an anti-protease drug to be active, its docking must block a vital protease function. To compromise viral escape, it should dock on an invariant position or on an SL set. As described in Materials and Methods, we kept the invariant positions aside, as obviously lethal, when searching for synthetic lethals. But invariant positions are of course prime target candidates for the docking of an antiviral agent with limited escape possibilities. So we added all the invariant positions (shown as green rectangles shaded in blue on Figure 2 ) that are accessible and close, to the 4 locked sets (shown as ovals on Figure 2 ).

Extending the SL sets: the cantilever and fulcrum

Are the locked targets we described actually druggable pockets?

Discussion

opposed to unfiltered artifacts, then there is a risk that blocking some synthetic lethal couple with a new drug could create a selection bottleneck [39, 40] , forcing viral evolution towards more resistant strains. If such new escape mutants were to appear, they should carefully be monitored by sequencing at least their synthetic lethal positions, as a first step in further refining the antiviral strategy.
In any case, our method has the advantage to indicate the comprehensive list of all possible protease locked targets, the best targets to minimize or eliminate viral escape. Even if locked-set-designed anti-viral drugs were to somehow elicit escape mutants, the other (non-locked) amino acids would remain wrong targets, since viral escape seems nearly guaranteed with them: the corresponding multi-mutant HIVs are already in nature and their sequences in worldwide databanks.

Alignment of the sequences

We pooled all of the amino acid sequences, whole or partial, of subtype B HIV-1 Pol from three databases: SwissProt, Los Alamos National Library [44] and Stanford University HIV Drug Resistance Database [13] . We thus collected over 70,000 Pol sequences from patients that are treated, non treated, or of unknown treatment history.

Identification of the accessible positions

In order to define the accessibility of the amino acids to some putative external ligand -with a future drug in mind, using the 3D protease structure PDB ID:1HSG [25] , we computed the surface accessible to the solvent, using the ASA software [51] available at RPBS [52] , and considered "accessible" all amino acids with an accessibility threshold greater than 25%.

Reviewer's comment

The manuscript by Brouillet and co-authors presents a simple, direct and, in my opinion, promising strategy towards computational selection of druggable targets, based on the inference of intramolecular synthetic lethal pairs of amino acid substitutions. It is clearly written, and I do not have any major concerns with the technical side of the work. However, additional explanation of the following would be helpful.
This is an interesting manuscript on molecular evolution and structural analysis of the HIV protease. The goal of the study is to use co-evolution of amino acid pairs for proposing drug binding sites immune to mutational escape. Although the work is of interest, I have a few comments listed below. The approach based on the contingency table analysis ignores phylogenetic structure of the dataset. This may lead to false-positive predictions of co-evolved residues. It is clear that the size of the dataset prevents any accurate phylogenetic analysis. However, it should be easier to verify that selected clusters cannot be explained by phylogeny and represent a real signal of co-evolution. At least, the manuscript would benefit from a detailed discussion of this point.
The authors use the concept of co-lethality of mutations to identify possible drug targets in HIV. I have two major comments.

Extending the SL sets: the flaps and cantilever

The full Figure 2 graph has 16 nodes and 72 edges (7 in black plus 65 implicit edges) linking each invariant position to all the others. For the sake of clarity only the 15 implicit links within a 10 Angström distance are represented (as dotted green lines). SG-flap is now included in a larger connected subgraph named SGI-flap (7 nodes/18 edges including 3 in black, 10 in dotted lines, and 5 implicit edges Figure 2 .a). The connectivity coefficient is increased from C SG-flap = 0.5 to C' SGI-flap = 0.71. SGI-flap has 3 invariant amino acids, out of which 2 (G40, W42) are within 10 Angström of each other, G40 is within 10 Angström of the whole SG-flap residues and W42 is within 10 Angström of (P39, K41, D60) from SG-flap residues. The whole SGI-flap set is an interesting result since its amino acids (N37, P39, G40, K41, W42, P44, D60) are all accessible, can't all freely mutate without damaging replicativity, and are close enough in space to imagine designing a single drug to target several of them. Figure 3a shows these amino acids on the dimerized 3D structure, using the same color codes as Figure 1 and 2: four orange SLs and three green invariants. It is important to note here that each of these 7 positions does bear indispensable viral functions, namely:

Extending the SL sets: the cantilever and fulcrum

Overall, we identify a cluster of 5 amino acids with a connectivity coefficient of C' SGI-fulc = 0.8, which are all close and amongst which at most two may freely mutate. This set, (L10, T12, K14, L19, E21), called SGI-fulc, is therefore a good target for a putative drug ( Figure 3c ). It is interesting to note that position 10 can have a secondary mutation conferring resistance [33] and that mutations in position 12 occur more often in treated patients than in drug-naive patients [34] .

Discussion

All the amino acids in the first locked target (subgraph SGI-flap on Figure 3a ) are part of the protease flaps' external loop. The flaps being mobile structures [36] that open out to let the substrate pass, one can imagine that a molecule drug-designed to bind the external loop of the protease flaps could block their mobility, therefore keep- Figure 1 Spatially close SLs. The whole graph represents the "spatially close SLs". The blue-shaded ovals, represents only the "accessible, spatially close SLs". Each oval contains the amino acid (one-letter codename) found in the HIV-1 protease ancestral sequence, followed by its position in this sequence. The accessible residues are all shaded in blue. The black edges linking the nodes mean "is co-lethal with and is within 10 Angström from". ing the protease from processing its substrate. This idea was also developed by Perryman et al. using molecular dynamics [37] : they also proposed to affect the flaps' mobility. Our approach is in agreement with theirs, and we believe this group to be a very good candidate for future drug targeting. The method described in this manuscript enables us to define which amino-acids would be the most adequate drug targets to limit or avoid escape. But this method does not tell us whether these amino-acids define an actually druggable pocket. To check that, we used the Q-siteFinder software [35] , which takes the 3D structure of a protein as input, and outputs its ligand-binding sites. Out of the 3 SL locked targets we describe, 2 (SGI-flap and SGI-fulc) closely overlap two sites revealed by Q-siteFinder for the HIV protease, on respectively 5/7 and 3/5 positions. This result suggests our SL locked targets SGI-flap and SGI-fulc actually are druggable pockets.
Our method treats variation in all positions as bi-allelic (mutated versus non-mutated genotype). Arguably, it would seem more informative to take into account the actual amino-acid changes. This would entail choosing a score matrix -most score matrixes do take into account the physico-chemical specificity of each amino-acid so as to decide whether the mutated one is similar or not to the initial one. But the mutations involved in the resistance to anti-protease drugs do not follow the same rules. Therefore we do not think using an existing score matrix of the Blosum or Pam type would be more informative.

Identification of locked sets

Whenever a node is an invariant amino acid, it is by definition linked to all the other nodes. We indeed add invariants to SL sets in order to extend our targets into "locked sets". By "a locked set" we mean "an SL set plus all the invariant positions". Indeed, a set of positions containing invariant amino acids plus an SL set may be large enough a target for putative antiviral drugs. We call all these most conserved positions "locked" because they "can't escape" by freely mutating. Invariants are totally locked (no mutation is survivable), while SL sets are partially locked (some, but not all, mutation combinations are survivable).

Authors' response

Yes, we totally agree with you and we also thought about this issue. Now to address it, we needed to choose a score matrix. Most score matrixes do take into account the physico-chemical specificity of each amino-acid so as to decide whether the mutated one is similar or not to the initial one. But the mutations involved in resistance to anti-protease drugs do not follow the same rules. Indeed, resistance can be due to the mutation of one amino-acid into another one of very close physico-chemical characteristics. Therefore we do not think using a score matrix of the blosum or Pam type would be more informative.
30 section matches

Abstract

Background: The decrease in cost for sequencing and improvement in technologies has made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. It is possible to completely sequence a small genome within days and this increases the number of publicly available genomes. Among the types of genomes being rapidly sequenced are those of microbial and viral genomes responsible for infectious diseases. However, accurate gene prediction is a challenge that persists for decoding a newly sequenced genome. Therefore, accurate and efficient gene prediction programs are highly desired for rapid and cost effective surveillance of RNA viruses through full genome sequencing. Results: We have developed VIGOR (Viral Genome ORF Reader), a web application tool for gene prediction in influenza virus, rotavirus, rhinovirus and coronavirus subtypes. VIGOR detects protein coding regions based on sequence similarity searches and can accurately detect genome specific features such as frame shifts, overlapping genes, embedded genes, and can predict mature peptides within the context of a single polypeptide open reading frame. Genotyping capability for influenza and rotavirus is built into the program. We compared VIGOR to previously described gene prediction programs, ZCURVE_V, GeneMarkS and FLAN. The specificity and sensitivity of VIGOR are greater than 99% for the RNA viral genomes tested. Conclusions: VIGOR is a user friendly web-based genome annotation program for five different viral agents, influenza, rotavirus, rhinovirus, coronavirus and SARS coronavirus. This is the first gene prediction program for rotavirus and rhinovirus for public access. VIGOR is able to accurately predict protein coding genes for the above five viral types and has the capability to assign function to the predicted open reading frames and genotype influenza virus. The prediction software was designed for performing high throughput annotation and closure validation in a post-sequencing production pipeline.

Background

Rapid and cost effective genomic surveillance of RNA viruses is a critical component of vaccine and drug development pipelines for the control of emerging viral diseases. Improvements in sequencing technology and the concomitant decrease in costs have made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. This has led to an exponential increase in the genomic data available in public databases. However, accurate gene prediction is a challenge that has created a bottleneck in the gene predication pipeline.

Custom protein databases

Complete protein sequences of all ORFs for influenza virus, rotavirus, rhinovirus and coronavirus subtypes were downloaded from GenBank, and redundant sequences were removed by custom scripts. For coronavirus and SARS coronavirus, both the orf1a polypeptide sequences and orf1b polypeptide sequences were included in the coronavirus and SARS coronavirus reference database.

Identification of start codon and stop codon

The first transcript of coronavirus genomes and SARS coronavirus genomes encodes two polyproteins because of ribosomal slippage during translation [6] . The first polyprotein (orf1a) is translated from the sequence with start and stop codons, and normal translation, while the synthesis of the second polyprotein (orf1ab) is dependent on a -1 nucleotide ribosomal frameshift induced by a "slippery" sequence of the type "UUUAAAC" upstream of the orf1a stop codon [7] . VIGOR examines the region upstream of the orf1a stop codon to map out precisely the "UUUAAAC" string. It then shifts back the reading frame by -1 nucleotide (from AAC to AAA) within the slippery sequence, and the -1 frame is extended to generate the coding sequence for the translation of orf1ab.

Genotyping of influenza virus

There are 16 subtypes for hemagglutinin protein (HA) and 9 subtypes for neuraminidase (NA) in group A influenza viruses, but only one subtype for HA and one subtype for NA for influenza B viruses [8, 9] . The genotypes of influenza viruses can be categorized by the hemagglutinin protein (HA) and neuraminidase (NA). In the custom VIGOR database, HA and NA subtype sequences are stored and used to categorize the genotypes of these two influenza proteins based on the best similarity.
6. Identification of the mature peptide cleavage sites for the rhinovirus polyprotein and SARS coronavirus orf1a and orf1ab
The rhinovirus polyprotein is cleaved into 11 mature, functional peptides by proteases. There are conserved cleavage signature sequences for 9 cleavage sites [10] . In order to predict mature peptides, the polyprotein sequence is aligned with the sequences in VIGOR's custom rhinovirus mature peptide database to identify the mature peptide cleavage sites. In the absence of a conserved signature sequence, the putative cleavage site whose products result in best alignments for both sequence length and similarity is selected.
The conserved mature peptide cleavage signature sequences for the orf1a and orf1ab of SARS coronavirus derived from sequence comparative alignment and literature [11] [12] [13] . At the position P1 (the position just before the cleavage site) is the conserved Glutamine (Q), the signature residue recognized by papain-like proteinase. Mature peptide cleavage sites are determined by mature peptide length and conserved cleavage signature sequences.

Further criteria for gene prediction

(ii) There should not be an internal stop codon or frameshift in the coding sequences except for the orf1ab in coronavirus genomes and SARS coronavirus genomes.

Implementation

Arcturus is responsible for invoking the appropriate gene prediction program in the VIGOR package for the specified virus type. Currently, all jobs are executed on a single, dedicated server. The backend service is implemented to support scalability. The entire backend service was implemented in Perl.
A user needs to select the virus type through a pulldown virus name menu prior to submitting the sequence data. The user will be informed of the link to download the prediction result by email following the VIGOR run. The output includes three files. The main output file is the gene prediction file which includes the predicted peptide sequence length, coordinates of the coding regions, splice sites if applicable, protein function and genotype if available, and the predicted amino acid sequences. The other two output files are the cDNA sequence file and a file of the alignment between the predicted protein and the best match in the custom database so that the user can evaluate the prediction. If mutations which generate internal stop codons or frameshifts are detected, the mutated sequences plus the flanking sequences will be presented in the output. The alignment data from the BLASTX search is also included in the gene prediction file.

Results and Discussion

To assess the performance of VIGOR, five sets of annotated sample sequences were downloaded from Gen-Bank. These included influenza virus, rotavirus, coronavirus, SARS coronavirus and rhinovirus. VIGOR was compared to three separate gene finding programs: GeneMarkS, ZCURVE_V and FLAN. GeneMarkS [2] and ZCURVE_V [14] both are ab initio universal gene finding programs. FLAN is a web-based gene prediction tool specific for influenza viruses that was developed at NCBI for the Influenza Genome Sequencing Project and has been widely used to annotate influenza sequences [15] . FLAN uses the similarity based approach, comparing the influenza genomic sequences with annotated influenza peptide sequences to identify open reading frames. GeneMarkS and ZCURVE_V were run for all sample genomes, while FLAN was run only on influenza genomes.

Influenza

The influenza virus genome consists of eight RNA segments that encode one or two proteins each. Splicing is involved in the expression of the MP and M2 proteins from segment 7 (MP segment) in group A influenza viruses and NS1 and NS2 proteins from segment 8 (NS segment) in both group A and group B viruses [16] . Segment 2 (PB1 segment) encodes two proteins, PB1 and PB1-F2, in some influenza genomes. The coding sequence of PB1-F2 is completely embedded in the PB1 coding region with a different reading frame [17] . In order to test its accuracy in gene finding, 2376 full and partial influenza segment sequences including group A, group B, and group C viruses, encoding 3177 annotated proteins, were run through VIGOR. VIGOR predicted 3178 ORFs encoded by these segments. Among these predicted protein sequences, 99% (3169/3178) ORFs completely agreed with the annotations in GenBank. Three predicted ORFs were partially correct and one ORF was incorrectly predicted ( Table 1) .

Rotavirus

Rotavirus genomes are made up of 11 segments of double stranded RNA encoding 6 viral structural proteins (VP1-4, VP6-7) and 6 non-structural proteins (NSP1-6). Non-structural protein 5 and 6 are encoded by same genomic segment; the coding regions overlap, but are in different reading frames [18] . 19 G types and 27 P types of rotaviruses based on structural proteins VP7 and VP4 were recorded We downloaded from GenBank 1166 rotavirus sequence segments with 1158 annotated genes, and ran in parallel VIGOR, ZCURVE_V and GeneMarkS analyses. VIGOR predicted 1202 protein coding genes including 44 newly detected ORFs which were not annotated in GenBank. Three predictions picked different start codons compared to the annotations in Gen-Bank. These new predictions were examined closely and all of them are homologous to NSP6 with very good similarities (E-value < 1e-10, data not shown).
ZCURVE_V performed well for rotavirus genome gene prediction ( Table 1) . 1112 of the 1171 predictions were the same as the annotations in GenBank. Both the specificity and sensitivity are approximately 95%; only one protein coding gene was not picked by ZCURVE_V, and 45 were predicted with different start codons. The performance of GeneMarkS was limited for rotavirus gene prediction. Approximately 64% of the predictions (776 predictions) selected the wrong start codons (Table 1) .

Rhinovirus

The rhinovirus genome encodes one polyprotein precursor which is cleaved into eleven functional mature peptides [20] . Thirty-six annotated rhinovirus genomes were downloaded from GenBank and tested with VIGOR, GeneMarkS and ZCURVE_V. VIGOR correctly predicted the polypeptide start codon and stop codon, as well as the mature peptides for each genome (see Table 2 ). GeneMarkS identified all 36 polyproteins, but predicted the wrong start codons for four genomes. An additional nine small ORFs were incorrectly predicted in the 5'-UTR. ZCURVE_V predicted 77 genes in total, including the 36 true ORFs and 41 mis-predicted small peptides. The start codons of 6 real open reading frames were not correctly predicted. VIGOR was also used to predict the polyproteins and mature peptides for 66 ATCC rhinovirus samples and field samples sequenced at JCVI with 100% specificity and sensitivity [21] . Neither GeneMarkS nor ZCUR-VE_V was designed to predict mature peptide sequences for viral genomes.

Coronavirus

Coronavirus genomes are 27 to 31 Kb in size and encode 9-15 proteins. The genomic structure of each species in the coronavirus genus is highly variable [6] with considerable species diversity among the non-structural proteins. The first open reading frame occupies about 2/3 of the genome, and ribosomal slippage occurs in the expression of this transcript, producing two polypeptides (orf1a and orf1ab) which are cleaved into functional mature peptides. Coronavirus genomes also encode overlapping genes and genes which are completely embedded within other genes.
To evaluate the performance of VIGOR for coronavirus gene prediction, 38 annotated complete coronavirus genomes containing annotation for 341 genes were downloaded from GenBank and run through VIGOR, GeneMarkS and ZCURVE_V. VIGOR identified 354 ORFs, while GeneMarkS and ZCURVE_V predicted 314 and 339 protein coding genes respectively (Table 2) . Of the 341 GenBank annotated genes, VIGOR correctly predicted 339 genes, missed one gene and identified one gene with wrong start codon ( Table 2) . VIGOR also predicted 14 new ORFs which were not annotated in Gen-Bank. Manual curation of these 14 newly predicted proteins showed that they are highly similar (E value < 1e-10) to annotated coronavirus or other viral proteins (data not shown); thus we classified these 14 newly identified genes in coronaviruses as correct predictions. Of the 341 annotated genes, GeneMarkS and ZCUR-VE_V did not detect 48 and 34 genes respectively. Most of the missing genes were short overlapping genes. The small structural envelope protein coding gene in 10 coronavirus genomes was not identified by either of these two programs because the coding region of this envelope protein overlaps with the coding region of an upstream gene.
VIGOR was also evaluated and used successfully for the gene prediction of more than 50 coronavirus genomes sequenced at JCVI; the specificity and sensitivity were greater than 99% [22] [23] [24] [25] .
VIGOR has been adjusted as well to optimally predict the protein coding genes in SARS coronavirus genomes. We downloaded from GenBank 102 annotated SARS coronavirus genomes, containing a total of 1322 annotated genes. VIGOR, GeneMarkS, and ZCURVE_V were run for these SARS coronavirus genomes to identify protein coding genes. VIGOR detected 1447 ORFs, 1321 of which completely agreed with the annotations in GenBank (Table 2) . Only one GenBank annotated gene was missing on the VIGOR prediction list. VIGOR also found 126 ORFs in these SARS coronavirus genomes which were not annotated in GenBank. By searching the NCBI NR database, the similarity search showed that these 126 newly detected genes encode proteins highly similar (E value < 1e-10) to proteins in SARS coronavirus or other viruses. ZCURVE_V predicted 1204 genes, 958 of which were identical to the annotations in GenBank. One hundred seven ZCURVE_V predictions have different start codons compared to the annotations in GenBank ( Table 2 ). This program also detected 76 new ORFs which did not exist in GenBank; as with VIGOR, the encoded proteins are highly similar to other viral proteins in Gen-Bank (data not shown). Sixty-three predictions may be incorrect since they could not be corroborated by similarity searches. These were either small peptides (shorter than 50 aa) or were located within the first long open reading frame.
The gene predictions from two SARS coronavirus genomes (NC_009695 and AY485277) are detailed in Table 3 . NC_009695 is the genomic sequence of a bat SARSlike coronavirus published in 2005 [26] . The annotations have been updated several times by different annotators. This genome encodes 14 ORFs in GenBank. VIGOR predicted 13 ORFs and detected one mutation which resulted in a truncated non-functional peptide ( Table 3 ). The 13 predicted ORFs were exactly the same as the annotations in GenBank. The mutation detected by VIGOR is located in orf3b and generates an internal stop codon, creating truncated peptide of 114 amino acids. The orf3b gene in other coronaviruses is~154 aa long. We believe this truncated protein is non-functional. ZCURVE_V identified 11 ORFs but missed the two short ORFs (28094-28387, 28544-28756) which are completely embedded in the coding region of the nucleocapsid protein. GeneMarkS detected 8 ORFs but ignored 3 additional ORFs. One was the envelope protein gene (26056-26286), and the other two were non-structural protein genes (27573-27707, 27713-28082). Both GeneMarkS and ZCURVE_V did not predict the orf1ab correctly.
AY485277 is another SARS coronavirus genome that has 8 ORFs annotated in GenBank [27] . VIGOR predicted an additional 7 ORFs (Table 3 ). These 7 ORFs were corroborated by comparing them with other viral proteins in GenBank. ZCURVE_V detected 12 ORFs including 5 ORFs which were not annotated in Gen-Bank; these 5 ORFs are highly homologous to other viral proteins. One ORF annotated in GenBank and two ORFs predicted by VIGOR were ignored by ZCUR-VE_V. GeneMarkS identified 9 ORFs. Three GenBank annotated genes were missing, and 3 VIGOR predictions were also missing from the GeneMarkS prediction list. Neither GeneMarkS nor ZCURVE_V predicted orf1ab protein correctly.
Two ab initio gene prediction programs, ZCURVE_-CoV [28] and GeneDecipher [29] , were specifically trained and the program parameters were adjusted for SARS coronavirus genomes. Both programs can correctly predict the major large protein coding genes and structural protein coding genes like polyprotein orf1a, orf1ab, spike gene, nucleocapsid gene, envelope gene and membrane protein gene. However, short peptide genes and embedded genes were often missing on the predicted gene list (false negative) [29] , although the exact function of these small peptide genes is unknown. Mature peptide prediction is not a designed function for these two programs. 17 of the 18 SARS-CoV genomes tested by GeneDecipher were used to evaluate VIGOR predictions. Since there were no annotations for most of these genomes in GenBank, SARS-CoV genome TOR2 genome was annotated in GenBank and predictions were listed [29] , a detail comparison was done for this genome (data not shown). The predictions of VIGOR for SARS-CoV genome TOR2 were exactly same as the GenBank annotations. GeneDeciper didn't pick 6 small non-structural protein genes and predicted one gene incorrectly. The genome structure and genes of the other 16 tested SARS-CoV genomes are same as SARS-CoV genome TOR 2, 14 genes and one non-functional non-structural protein gene were detected in these genomes by VIGOR.

Conclusion

We have demonstrated that VIGOR, a RNA virus gene prediction tool, can predict protein coding genes with high accuracy for 5 different RNA virus types, influenza virus, rotavirus, rhinovirus, coronavirus and SARS coronavirus. VIGOR is available for public use at http:// www.jcvi.org/vigor. VIGOR has been thoroughly field tested in several high throughput genome sequencing projects at the JCVI. VIGOR has been employed to predict the protein coding genes successfully for 51 newly sequenced group A rotavirus complete genomes sequenced at JCVI [30] and to annotate and predict mature peptides for 66 rhinovirus full genome sequences [21] . The similarity based program has been also used to annotate the published sequences of bovine, feline, human, murine, rat, SARS and several novel wild animal coronavirus genomes [22] [23] [24] [25] . Partial genomes and the potential sequence errors which generate premature stop codons or frameshifts were identified by VIGOR as well during the genome finishing process for these viral sequencing projects.

Background

Although most viral genomes are relatively small compared to eukaryotic and prokaryotic genomes, the gene structure of viral genomes can be complex. For example, introns, alternative splicing, overlapping genes, and ribosomal slippage exist in many viral genomes. Thus an all purpose gene finder cannot be easily adapted for gene prediction across all virus families. However, if the genome scaffold and the gene features of a viral genome are well understood, a similarity-based gene prediction approach based on the curated gene repertoire for a specific virus genus with attention to particular recognition features, such as, splice sites and mature peptide cleavage sites can be adapted, and perform better than an ab initio gene finder.
The National Institute of Allergy and Infectious Diseases (NIAID) funds a Genomic Sequencing Center for Infectious Diseases (GSCID) at the J. Craig Venter Institute (JCVI). One of the goals of the GSCID is high throughput sequencing of various viral pathogens. The viral genome sequencing projects at JCVI have resulted in publication of more than 4000 influenza virus genomes from clinical and animal reservoir specimens, and hundreds of coronavirus and rotavirus sequences. Prediction of protein coding genes encoded in these viral genomes is a critical step to understanding these pathogenic viruses. In order to have a flexible, accurate gene prediction tool for utilization in high throughput viral genome sequencing projects, we developed a viral annotation program, VIGOR (Viral Genome ORF Reader). VIGOR uses a similarity-based approach to detect open reading frames (ORF) in various viral genomes by similarity searches against custom reference protein sequence databases. VIGOR takes into account differences between the genomic structures of viral taxonomic groups. VIGOR is tailored for the designated viruses with complex gene features such as splicing and frame-shifting, and it is able to predict genes accurately in influenza (group A, B, and C), coronavirus (including SARS coronavirus), rhinovirus, and rotavirus genomes. It was also designed to assign function to the predicted ORFs and genotype influenza viruses. In addition to gene prediction, VIGOR can also be used as a tool to validate sequence accuracy and completeness during the genome finishing process.

Influenza

The same set of influenza genomic sequences was also run using GeneMarkS and ZCURVE_V, two ab initio approach gene prediction tools for viral genomes ( Table 1 ). The specificity and sensitivity for GeneMarkS was 40.63% and 35.22% respectively; while the specificity and sensitivity for ZCURVE_V was 81.74% and 72.27%. Similar numbers of GenBank annotated genes were missed by both GeneMarkS (770 genes) and ZCUR-VE_V (841 genes). Manual inspection showed that the majority of the overlooked genes were PB1-F2, NS2 and M2 genes. Several studies have shown that embedded genes and splicing often pose problems for viral gene prediction algorithms [2, 14] . For example, ZCURVE_V could not identify the Tat gene correctly and missed the Rev completely when it was used to predict genes for the HIV-I virus [14] . Additionally, almost half of the GeneMarkS predictions for influenza genomes picked start codons upstream of the correct start codons.

Coronavirus

VIGOR usage in high-throughput viral genome closure and annotation VIGOR has been used extensively to validate the genomes in the finishing process for the high-throughput virus sequencing projects at JCVI [21] [22] [23] [24] [25] 30] . In this role, VIGOR is used to detect sequence changes which generate a premature stop codon or a frameshift. The potential sequence error and the flanking sequences as well as the BLASTX alignment results are presented in the prediction output. This data allows researchers working on finishing tasks to investigate whether the sequence changes are due to laboratory error or are biologically relevant SNPs or IN/DELs. We noticed that if a mutation or a sequence error generates a pre-mature stop codon or causes frame-shift, and the translated product still meets all criteria stated above, VIGOR will predict this gene as a functional gene. In this type of cases, VIGOR prediction may not be able to identify the potential sequence errors. However, VIGOR provides the alignment data between the predicted peptide and reference sequence in the output alignment file, users can use the alignment data to evaluate the prediction and identify the potential sequence errors. If a genomic sequence covers only a fraction of a gene coding region, VIGOR will identify this genomic sequence as partial sequence. Genome finishing team is able to pursue finishing the genome basing the missing regions identified by VIGOR.
VIGOR has also been used in the gene annotation and submission process. One of the advantages of VIGOR is that it can be used on large numbers of viral genomes simultaneously. The efficiency of VIGOR varies depending on the viral sequence type used as input. For example, using four hundred and fifty eight influenza genomes (6 Mb in total of nucleotide sequences) VIGOR took 85 minutes to complete the gene predictions. In comparison, it took VIGOR 23 minutes to execute the gene prediction for 102 SARS coronavirus genomes (3 Mb in total nucleotide sequences). Table 3 Comparison of the annotations in GenBank and predictions by VIGOR, GeneMarkS and ZCURVE_V of two SARS coronavirus genomes, NC_009695 (NC) and AY485277 (AY)

Conclusion

VIGOR detects protein coding sequences based on similarity searches in conjunction with the known genome specific features for the particular viral genomes. Genes with introns, overlapping genes, and even the genes with a frameshift due to ribosomal slippage can be identified accurately because VIGOR includes these complex mechanisms in the processing for the designated genomes. Both the specificity and sensitivity of VIGOR for the tested genomes was greater than 99%. The same sets of viral genomes were tested for two existing universal viral gene predication methods, the specificity was between 31% and 95%, and the sensitivity was from 35% to 96%. VIGOR was designed to predict the mature peptides accurately for rhinovirus genomes and SARS coronavirus genomes, which is not applicable for the existing universal gene prediction tools. VIGOR can also conduct genotyping and assign function to the predicted protein, both of which are not capable for most available viral gene prediction tools. This userfriendly program is convenient for high throughput sequencing projects and for use by individual laboratories. If reference protein sequences can be collected, and genome specific features are added to VIGOR, this program can extend its capability to predict the protein coding genes in many other small viral genomes.
14 section matches

Abstract

Since 2007, repeated outbreaks of Zika virus (ZIKV) has affected millions of people worldwide and created global health concern with major complications like microcephaly and Guillain Barre's syndrome. Generally, ZIKV transmits through mosquitoes (Aedes aegypti) like other flaviviruses, but reports show blood transfusion and sexual mode of ZIKV transmission which further makes the situation alarming. Till date, there is not a single Zika specific licensed drug or vaccine present in the market. However, in recent months, several antiviral molecules have been screened against viral and host proteins.
Among those, (−)-Epigallocatechin-3-gallate (EGCG), a green tea polyphenol has shown great virucidal potential against flaviviruses including ZIKV. However, the mechanistic understanding of EGCG targeting viral proteins is not yet entirely deciphered except little is known about its interaction with viral envelope protein and viral protease. Since literature has shown significant inhibitory interactions of EGCG against various kinases and bacterial DNA gyrases; we designed our study to find inhibitory actions of EGCG against ZIKV NS3 helicase. NS3 helicase is playing a significant role in viral replication by unwinding RNA after hydrolyzing NTP. We employed molecular docking and simulation approach and found significant interactions at ATPase site and also at RNA binding site. Further, the enzymatic assay has shown significant inhibition of NTPase activity with an IC50 value of 295.7 nM and Ki of 0.387 ± 0.034 µM. Our study suggests the possibility that EGCG could be considered as prime backbone molecule for further broadspectrum and multitargeted inhibitor development against ZIKV and other flaviviruses.

ABSTRACT

Since 2007, repeated outbreaks of Zika virus (ZIKV) has affected millions of people worldwide and created global health concern with major complications like microcephaly and Guillain Barre's syndrome. Generally, ZIKV transmits through mosquitoes (Aedes aegypti) like other flaviviruses, but reports show blood transfusion and sexual mode of ZIKV transmission which further makes the situation alarming. Till date, there is not a single Zika specific licensed drug or vaccine present in the market. However, in recent months, several antiviral molecules have been screened against viral and host proteins.
Among those, (−)-Epigallocatechin-3-gallate (EGCG), a green tea polyphenol has shown great virucidal potential against flaviviruses including ZIKV. However, the mechanistic understanding of EGCG targeting viral proteins is not yet entirely deciphered except little is known about its interaction with viral envelope protein and viral protease. Since literature has shown significant inhibitory interactions of EGCG against various kinases and bacterial DNA gyrases; we designed our study to find inhibitory actions of EGCG against ZIKV NS3 helicase. NS3 helicase is playing a significant role in viral replication by unwinding RNA after hydrolyzing NTP. We employed molecular docking and simulation approach and found significant interactions at ATPase site and also at RNA binding site. Further, the enzymatic assay has shown significant inhibition of NTPase activity with an IC50 value of 295.7 nM and Ki of 0.387 ± 0.034 µM. Our study suggests the possibility that EGCG could be considered as prime backbone molecule for further broadspectrum and multitargeted inhibitor development against ZIKV and other flaviviruses.
Zika virus (ZIKV), a close relative of dengue virus (DENV), is primarily a mosquito-transmitted pathogen that has already affected millions of people in more than 40 countries throughout Americas, South Pacific and South Asia(1, 2). The real danger posed by ZIKV is neurological defects like microcephaly and Guillain-Barre syndrome in newborns as well as in adults respectively (3, 4) . Epidemiological studies have also reported sexual mode of ZIKV transmission is further raising the threat alarm worldwide (5) . As on 1st February 2016, the World Health Organization has called a global health emergency that demands the development of safe and effective therapeutics. In 2017, WHO has confirmed three cases of ZIKV in Ahmedabad District, Gujarat, State, India (http://www.who.int/). A recent ZIKV outbreak in 2018 has been observed in India where more than 200 zika cases were confirmed including pregnant women. There is an urgency to develop antivirals against ZIKV. In past months, several bioactive molecules have been assayed either against ZIKV proteins or targeting cellular proteins by employing different approaches like screening new compound libraries or using drug repurposing (6, 7) . Another essential aspect in drug discovery that could not be ignored is the use of natural products which are known to possess enormous structural and chemical variety over any other synthetic compound library (8) . Moreover, natural products deliver a crucial advantage of being pre-selected evolutionary with optimized chemical structures against biological targets (9) .
Like other flavivirus helicases, ZIKV helicase also belongs to SF2 (Superfamily) family and a phylogenetically close relative of Murray Valley encephalitis virus (MVEV), DENV4, and DENV2 (18) . Full-length NS3 protein has N-terminal protease activity, and C-terminal is associated with helicase activity. ZIKV NS3 helicase (172-617 residues) is a large protein containing three domains where domain 1 (residues175-332) and domain 2 (residues 333-481) forms NTPase pocket and domain 3 (residues 481-617) in association with domain 1 and 2 forms RNA binding tunnel (21) . Though ZIKV helicase is well structured, active sites at NTPase and RNA binding pockets contain highly flexible or disordered P-loop (193-203 residues) and RNA binding loop (244-255 residues) respectively, which are critical for their specific function (21, 22) . In general, past decade has evidenced the significance contribution of intrinsically disordered proteins/regions (IDRs/IDPs) in almost all biological processes and the regions are considered as novel therapeutic targets (23) (24) (25) (26) (27) (28) (29) . Despite the conversed active site amino acid residues among flavivirus helicases, ZIKV helicase show different motor domain movements and RNA binding modes when compared to DENV helicase (21) . These vital functions of ZIKV helicase encourage to screen for antiviral molecules against its active sites. In a recent study, we have determine the inhibitory potential of a small molecule (HCQ) against ZIKV protease with computational and enzyme kinetics studies (30) . Therefore, in this article, we have used molecular docking and simulation approach to find out a significant binding cavity for EGCG. Further, we have verified our computational findings by in vitro enzyme assay to probe the significant binding of EGCG at NTPase site of ZIKV helicase.

In silico docking studies

Since for the first time, a flavivirus helicase was co-crystallized with bound ATP at substrate binding site; therefore this structure seems more significant for inhibitor screening purpose (as shown in figure 1A ). Similarly, another crystal structure has ssRNA bound at helicase active site which appears suitable for employing virtual screening protocol (as shown in figure 2A ). We have used extra precision (XP) mode in glide suite of Schrödinger to dock EGCG firstly at ATPase site and after that at helicase site (as shown in figure 1 and figure 2 respectively). After docking, the extent of EGCG binding at ATPase site was represented in terms of docking score as shown in table 1. A significant docking score (-7.8 Kcal mol -1 ) was observed which is contributed by various hydrogen bonding interactions with key residues of ATPase site such as ARG (202), THR (201), GLY (197), ASN (463), and ASN (417) (as shown in figure 1B, 1C and table1). Another important interaction was observed with ARG (462) which shows salt bridge and Pi-cation bonding with EGCG ( figure 1B and 1C ). More importantly, these interactions were reported at the critical P-loop (residues193-203) and motif VI (residues Q455, R459, and R462) of NTPase binding pocket. Mechanistic studies have already shown that P-loop residues play the most significant contribution in NTP binding and further hydrolysis (21, 31) .

Discussion

It is well known fact that flavivirus helicases are motor proteins and require energy released from NTP hydrolysis to perform their helicase function (35) . Therefore, firstly we analysed the EGCG affinity towards NTPase site through docking, binding energy calculation and MD simulations. These studies revealed that EGCG can dock significantly with key residues (ARG 202, THR 201, GLY 197, ASN 463 and ASN 417) at NTPase site and further MD simulations were supporting the stable EGCG interaction with residues (Mn 2+ , ARG 462, THR 201, GLU 286 and ARG 459) were carried out throughout the simulation period (100ns). In crystal structure of ZIKV NS3 helicase with bound ATP, these residues have significant functions such as: the Mn 2+ coordination with GLU 286 stabilizes the ATP molecule; the P-loop residues (GLY 197, ARG 202, LYS 200) and motif VI residues (ARG 459, ARG 462) are playing key role ion NTP hydrolysis by interacting with transition state nucleotides (21) . author/funder. All rights reserved. No reuse allowed without permission.

ABSTRACT

One of such natural products is a polyphenol called EGCG which constitutes major fraction (59 % of all polyphenols) of green tea polyphenols and has shown multiple health benefits such as antitumor, antimicrobial, antioxidative and antiviral, etc (10) . The antiviral role of EGCG has been well demonstrated against several viruses such as hepatitis C virus (HCV), human immunodeficiency virus (HIV), influenza virus (FLU), DENV and chikungunya virus, etc (11) (12) (13) (14) (15) . In a recent study, EGCG has shown a strong virucidal effect against ZIKV with a probable mechanism related to inhibiting entry into host cell demonstrated by computational finding (16, 17) . However, reports suggest that apart from viral entry inhibition EGCG can also block essential steps in the replication cycle of some viruses (10) . Due to the lack of complete understanding of EGCG inhibition mechanism on ZIKV, we designed our study to find a specific viral protein which could be targeted by EGCG. We have chosen NS3 helicase protein of ZIKV, a crucial enzyme in viral replication which unwinds genomic RNA after deriving energy from intrinsic nucleoside triphosphatase (NTPase) activity (18, 19) . In addition to RNA unwinding activity, flavivirus helicases have also been reported to participate in other vital functions such as ribosome biogenesis, pre-mRNA splicing, RNA export and degradation, RNA maturation as well as translation etc (20) . Hence, essential functions of these helicases are making them attractive drug targets.

Discussion

In recent years, repeated outbreaks of ZIKV has necessitated the urgent need for developing specific drugs. Also, the major complications of ZIKV infections are related to pregnant women, therefor it is important to find molecules which are safe and have minimal or no side-effects. Considering the safety point, natural products have always been a great source of drugs or drug like molecules and also these molecules have evolutionary preoptimized biological targets (8) . To find specific biological targets, in silico structure-based drug discovery approaches have revolutionized and fasten the current drug developing strategies. In fact, the molecules which can target specifically viral proteins could act as safe therapeutics against ZIKV (34) . EGCG, a green tea polyphenol has shown significant antiviral activity against several viruses including HIV, HSV, CHIKV and some flaviviruses like HCV, and DENV (10) . Recently, in ZIKV, EGCG inhibitory potential was determined in a cell line based study where probable mechanism was related to interaction of the compound with envelope protein (16) . Previously, we have also supported the EGCG envelope protein interaction with computational study (17) . However, reports suggested that EGCG may target other viral proteins which are important in genome replication and maturation (10) . Due to lack of adequate experimental support regrading EGCG envelope protein interactions and considering the possibility of finding more specific target for EGCG, we have chosen NS3 helicase protein of ZIKV for determining potential inhibitory effects of EGCG. NS3 helicase of ZIKV is an attractive drug target due to its essential role in opening RNA secondary structures during replication (21) . Also, reports suggest that EGCG has shown anti-ATPase activity against bacterial DNA gyrases(33).
In summary, our extensive docking and simulation analysis is showing that EGCG can bind strongly to the NTPase site and can inhibit the activity of ZIKV NS3 helicase more precisely supported by invitro enzyme kinetics assays. Also, EGCG can form the significant binding interactions at RNA site revealed by computational tools. Interestingly, the comparison with previous studies illustrate that EGCG can target multiple viral protein such as envelope (16) , protease (37) and now more precisely helicase. Since EGCG has shown virucidal effect against several viruses, therefore EGCG backbone could be used to develop a broad-spectrum antiviral molecule in near future.

Binding energy calculation and ADME properties

QikProp module of Schrödinger software (QikProp, version 4.3, Schrodinger) was used for the calculation of the drug like behavior through the evaluation of the pharmacokinetic properties that are required for the absorption, distribution, metabolism, and excretion (ADME) (17) . These properties have been calculated already in our previous study (17) .
Binding energy calculations for ligand binding at protein active sites were estimated by a molecular mechanics-based approach (MM-GBSA) which employs the forcefield methods to analyze the difference in free energies of ligand, protein and the complex. The glide XP docking poses of EGCG and helicase protein at both binding sites (NTPase and RNA binding site) were used for estimating binding energies by using the Prime suite. In table1, binding energies for EGCG at RNA binding site (-51.312 kcal/mol) was shown which seems slightly higher than at ATPase site (-47.324 kcal/mol). These results show that EGCG molecule can bind at both active sites of NS3 helicase. Further, ADME properties for EGCG molecules were calculated as reported previously by Sharma et al., (2017) (17) . Except, low oral absorption value, rest of the ADME properties of EGCG were within range to declare this molecule as a safe drug candidate. In support of this, a study on cell lines has reported that EGCG is only cytotoxic at concentrations greater than 200uM (16) .

Discussion

In ZIKV NS3 helicase the RNA binding site along with NTPase site has flexible pockets containing critical loop regions needed to perform function (21, 22) . Therefore, we also studied the possibility of EGCG interacting at RNA binding site. This was due to the fact that polyphenols have shown potential interactions with intrinsically disordered regions in proteins and could be seen as novel strategies of drug development against IDPs (38) . Also, literature has shown that viral proteins have several short stretches of disordered regions within proteins and more propensity of intrinsically disordered active sites (22, 24, 39) . Our docking and MD simulations studies are showing that EGCG has the ability to bind at the entry site of RNA binding pocket with significant interactions. More specifically, EGCG is showing different types of interactions (H-bond, ionic, salt bridge) with residues GLU 413, MET 414, LYS 431, PHE289 and ASP 410. In crystal structure, these residues are playing key role in binding to RNA (21) . In our MD simulations, it has also reported that EGCG binding at NTPase site is increasing fluctuations in RNA site which could interpreted as an allosteric relationship between two sites. This observation could be supported by the recent study on DENV helicase where NTPase and RNA sites show allosteric effects.