Epitope-based vaccine design against the membrane and nucleocapsid proteins of SARS-CoV-2

,


Introduction
Coronavirinae is a subfamily of Coronaviridae, and it includes 4 genera -alpha, beta, gamma, and delta corona viruses.Human coronaviruses were first identified in 1965; they are responsible for respiratory tract infec tions in large populations in various countries around the world. 1,2][5][6] The whole genome of SARSCoV2 contains 30 kb single stranded ribonucleic acid (RNA) that encodes 29 dif ferent proteins.The most prominent structural proteins are spike (S), envelope (E), membrane (M), and nucleo capsid (N) ones.These proteins play a vital role in the pathogenesis of the virus, and each has distinct functions. 7he S protein is the major surface protein that binds to the host cell receptors and facilitates entrance into hu man cells; in approx.86%, it is similar to the SARSCoV protein.Moreover, it seems that mutations in the gene en coding the S protein cause differences in the pathogenic potential of SARSCoV2 as compared to SARSCoV. 8he E protein is one of the small structural proteins in volved in the viral life cycle and pathogenesis; it partici pates in virus assembly, budding and envelope formation. 9he protein in SARSCoV2 is very similar to the SARS CoV variant, with similarity estimated to be 94.74%. 10he M protein, as the most abundant glycoprotein, plays a critical role in virus size and shape maintenance, and assists in the assembly and budding stages of the virus.In addition, the M protein cooperates with the S protein to facilitate virus attachment and entry into host cells. 10ccording to studies, the M protein of SARSCoV2 re sembles the M protein of SARSCoV in 90%, which means there is a remarkable similarity between the 2 types of vi rus. 11he N protein is a multipurpose protein with several functions.It is involved in the viral life cycle, including virus core formation, assembly, budding, envelope forma tion, genomic messenger RNA (mRNA) replication, and genomic RNA synthesis.It is also vital for the cellular response, playing a role in chaperone activity, cell cycle regulation, cell stress responses, pathogenesis, and signal transduction. 12In addition, the SARSCoV2 N protein has 90% similarity with the SARSCoV N protein, which highlights its functional importance. 13everal vaccines were developed during the corona virus pandemic and produced promising results.However, the high mutation rate of SARSCoV2 prompted the use of computational approaches to increase knowledge on the prevention and treatment strategies. 14,15Although several SARSCoV2 epitopes have been reported in nu merous vaccine development studies, the search for spe cific and unique epitopes has opened the door for future discoveries.Therefore, the present study aimed to iden tify and screen distinct epitopes of the M and N proteins of VOCs by using bioinformatics databases.

Sequences of membrane and nucleocapsid proteins, and the phylogenetic tree
Several sequences representing all the circulat ing VOCs of SARSCoV2, presented by the World Health Organization (WHO) (https://www.who.int/activities/trackingSARSCoV2variants), including GenBank accession numbers OX008586.1,OL790194.1,OW998408.1,OW996240.1,ON286809.1,ON286831.1,and OX014251.1,were acquired from the National Center for Biotechnology Information (NCBI) (https:// www.ncbi.nlm.nih.gov).The sequences were saved in the FASTA format, and a phylogenetic tree was designed us ing Molecular Evolutionary Genetics Analysis Version 11 (MEGA 11).The FASTA format of the SARSCoV2 wildtype (WT) (WuhanHu1) strain (accession number NC_045512.2) was included for comparison.The con served regions of the proteins were identified and evalu ated for further work.

Validation of B-cell epitopes
The Bcell epitopes of the M and N proteins, contain ing at least 10 amino acids were predicted by IEDB and the artificial neural networkbased Bcell epitope predic tion server (ABCpred) (https://webs.iiitd.edu.in/raghava/abcpred), with threshold values of 0.5.Next, the selected epitopes were assessed based on their hydrophilicity, flexi bility, polarity, surface area, and threedimensional (3D) structures.

Structure and docking analysis
The physicochemical and biochemical properties of the epitopes, including molecular weight, stability and hydro phobicity, were characterized using the ProtParam tool (https://web.expasy.org/protparam),and the 3D structures of peptides were drawn using the Molegro Virtual Docker software, v. 6.0.1 (Molegro ApS, Aarhus, Denmark).To analyze the molecular docking interaction with the highlyreplicated epitopes, the tertiary structures of human leukocyte antigen (HLA) were gained from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (https://www.rcsb.org).Then, the best ligand with minimum energy was assessed through Molegro Virtual Docker.

Results
The phylogenetic analysis and multiple sequence align ment of the complete SARSCoV2 genome revealed a close relationship between different variants of the virus (Supplementary material available on request from the corresponding author).Protein Basic Local Alignment Search Tool (BLAST) findings in NCBI also showed that the M and N proteins were highly conserved among the SARSCoV2 subtypes.

Stimulation of the cellular and humoral immune system
Of all the M epitopes identified in this study, LVIGFLFLT, WLLWPVTLA, LFLTWICLL, and FLYIIKLIF were ca pable of evoking cellular immunity, and seemed to induce a stronger host response.Also, they could correctly bind to the major histocompatibility complex class I (MHCI) located at positions 22-30, 55-63, 27-35, and 45-53, respectively (Table 1).After immunogenicity and allergenicity evaluation, the final epitopes, including LVIGFLFLT and LFLTWICLL, were predicted as candi dates for vaccine development.
The sequences KLDDKDPNFKDQ, RGPEQTQGNFGD and HIGTRNPANNAA, were identified as the fre quented epitopes of the N protein that could interfere with cellular and humoral immune responses (Table 2).Based on the evaluation with the use of different serv ers, the best epitope of the N protein that could stimu late Bcells was KLDDKDPNFKDQ, located at position 338-349 of the sequence.Conversely, none of the M pro tein epitopes could elicit a robust humoral immune re sponse (Table 3).
Table 1.List of the high-scored major histocompatibility complex class I (MHC-I) predicted epitopes of the membrane (M) protein

Epitopes
Alleles Servers AllerTOP results

Determination of structural conformations and molecular docking
According to the ProtParam server, 3 epitopes were de fined as stable peptides, and the molecular weight, iso electric point and hydrophobicity were 1,462.58g/mol, 4.58 and 2.308 for KLDDKDPNFKDQ, 1,022.30g/mol, 5.52 and 2.733 for LVIGFLFLT, and 1,121.45g/mol, 5.52 and 2.600 for LFLTWICLL, respectively.The 3D structures of the best peptides are shown in Fig. 1.The predicted results obtained from RCSB PDB showed that the epitopes could interact with HLA and induce strong immune responses.Moreover, the findings showed that there was an interaction be tween MHCI/II (HLAA0201:KLDDKDPNFKDQ, HLAB51:LFLTWICLL and HLADRB1:LVIGFLFLT) and the suggested epitopes (Fig. 2).

Discussion
][21][22] Bioinformatics databases have proven to be useful sources of information on disease prevention, especially in the case of the diseases associated with lifethreatening infections.Furthermore, bioinformatics methods have created a new avenue for designing effective vaccines at a low cost and with high efficiency.Indeed, the knowledge available in the bioinformatics field makes it possible to design effective vaccines against viruses and other infec tious agents through using extensive information about the structural and immune features of microorganisms and humans. 23,24n ideal vaccine should be able to activate both the cellular and humoral arms of the immune system.Different platforms are used for vaccine development, and each of them presents several advantages and disadvan tages.Typically, the subunit vaccines using recombinant peptides and proteins are among the most effective, in expensive and safe vaccines that can be designed.Moreover, this type of vaccine provides effective immunogenicity by evoking host immune responses. 25,26ccording to studies on the prevention of COVID19, there are more than 60 vaccine candidates for SARS CoV2, most of which are aimed at inducing the release of neutralizing antibodies against the S protein. 27Un fortunately, several reports have shown that the S protein is an antigen that can mutate rapidly.9][30][31] Furthermore, a number of re ports have confirmed that the M and N proteins are good targets for stimulating the antibodyproducing Bcell and Tcell responses. 32,33Indeed, studies published by Enayatkhani et al., 33 Rahman et al. 34 and Quayum et al. 35 emphasized the importance of the M and N proteins in the viral structure, confirming their potential role as suit able candidates for predicting multiepitope vaccines.However, the current work focused on the M and N pro tein epitopes to predict a novel subunit vaccine.In the present study, all epitopes were evaluated based on different immune responses, and the findings indicate that the M protein may be a useful target for eliciting a cellular immune response, while the N peptides could elicit a strong humoral immune response.Also, the re sults demonstrate that, among many epitopes, 2 highly antigenic M proteins, LVIGFLFLT and LFLTWICLL, and the KLDDKDPNFKDQ N protein could be used to con struct an epitopebased vaccine.The LVIGFLFLT and LFLTWICLL M proteins were the best options among Tcell epitopes, while KLDDKDPNFKDQ was identified as a powerful Bcell epitope.
A number of publications reported on the LVIGFLFLT and LFLTWICLL M epitopes, including works by Behmard et al. 36 and Naveed et al. 37 In ad dition, Heffron et al. introduced KLDDKDPNFKDQ as part of the AIKLDDKDPNFKDQVI and KLDDKDPNFKDQVILLNKH peptides in a study on antibodies against the SARSCoV2 N protein. 38In comparison with those studies, the findings of this study were more specific.Indeed, this work assessed specific epitopes to determine their distinct sites in the protein sequences, and introduced unique epi topes for vaccine and antibody research.
The present study included the SARSCoV2 WT (WuhanHu1) strain sequence and compared the re sults with other VOCs, especially lineage B.1.1.529,with respect to the worldwide omicron epidemic.All the sug gested epitopes were matched with the new variants of COVID19.

Conclusions
COVID19 continues to be an alarming global disease, as observed in the reports of new cases and deaths, which in creases the importance of developing a more effective vac cine.The immunoinformatics results obtained for the M and N proteins identified 3 top epitopes, including LVIGFLFLT, LFLTWICLL and KLDDKDPNFKDQ, that could effectively stimulate Tcells and Bcells with the lowest binding energy.Therefore, additional in vitro and in vivo studies are recom mended to confirm this theoretical information.Moreover, bioinformatics tools are suggested to be used against future epidemics to design new vaccines for other infectious dis eases, and researchers should pay more attention to this issue.

Table 3 .
Most frequent B-cell epitopes of the nucleocapsid (N) protein

Table 2 .
List of the high-scored major histocompatibility complex class I (MHC-I) predicted epitopes of the nucleoplastic (N) protein