Abstract
Background. The high prevalence and mortality rate of coronavirus disease 2019 (COVID-19) is a major global concern. Bioinformatics approaches have helped to develop new strategies to combat infectious agents, including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Indeed, the structural proteins of microorganisms provide suitable epitopes for the development of vaccines to prevent infectious diseases.
Objectives. The present study aimed to use bioinformatics tools to find peptides from the membrane (M) and nucleocapsid (N) proteins with effective cellular and humoral immunogenicity.
Material and methods. Sequences of the M and N proteins were sourced from the National Center for Biotechnology Information (NCBI). The conserved regions of the proteins with the highest immunogenicity were identified and assessed using different servers, and the physicochemical and biochemical properties of the epitopes were evaluated. Finally, allergenicity, antigenicity and docking to human leukocyte antigen (HLA) were investigated.
Results. The data indicated that the best epitopes were LVIGFLFLT and LFLTWICLL (as membrane epitopes), and KLDDKDPNFKDQ (as a nucleocapsid epitope), with significant immunogenicity and no evidence of allergenicity. The 3 epitopes are stable peptides that can interact with HLA to induce strong immune responses.
Conclusions. The findings indicate that 3 common epitopes could effectively elicit an immune response against the disease. Hence, in vitro and in vivo studies are recommended to confirm the theoretical information.
Keywords: HLA, COVID-19, SARS-CoV-2, bioinformatics, multi-epitope vaccine
Introduction
Coronavirinae is a subfamily of Coronaviridae, and it includes 4 genera – alpha, beta, gamma, and delta coronaviruses. Human coronaviruses were first identified in 1965; they are responsible for respiratory tract infections in large populations in various countries around the world.1, 2 A novel type of coronavirus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2 or 2019-nCoV), the causative agent of coronavirus disease 2019 (COVID-19), was first reported in Wuhan, Hubei Province, China, in December 2019.2 During the COVID-19 pandemic, several viral lineages of various clinical and public health impact were identified, and divided into variants of interest (VOIs) and variants of concern (VOCs).3, 4, 5, 6
The whole genome of SARS-CoV-2 contains 30 kb single-stranded ribonucleic acid (RNA) that encodes 29 different proteins. The most prominent structural proteins are spike (S), envelope (E), membrane (M), and nucleocapsid (N) ones. These proteins play a vital role in the pathogenesis of the virus, and each has distinct functions.7
The S protein is the major surface protein that binds to the host cell receptors and facilitates entrance into human cells; in approx. 86%, it is similar to the SARS-CoV protein. Moreover, it seems that mutations in the gene encoding the S protein cause differences in the pathogenic potential of SARS-CoV-2 as compared to SARS-CoV.8
The E protein is one of the small structural proteins involved in the viral life cycle and pathogenesis; it participates in virus assembly, budding and envelope formation.9 The protein in SARS-CoV-2 is very similar to the SARS-CoV variant, with similarity estimated to be 94.74%.10
The M protein, as the most abundant glycoprotein, plays a critical role in virus size and shape maintenance, and assists in the assembly and budding stages of the virus. In addition, the M protein cooperates with the S protein to facilitate virus attachment and entry into host cells.10 According to studies, the M protein of SARS-CoV-2 resembles the M protein of SARS-CoV in 90%, which means there is a remarkable similarity between the 2 types of virus.11
The N protein is a multi-purpose protein with several functions. It is involved in the viral life cycle, including virus core formation, assembly, budding, envelope formation, genomic messenger RNA (mRNA) replication, and genomic RNA synthesis. It is also vital for the cellular response, playing a role in chaperone activity, cell cycle regulation, cell stress responses, pathogenesis, and signal transduction.12 In addition, the SARS-CoV-2 N protein has 90% similarity with the SARS-CoV N protein, which highlights its functional importance.13
Several vaccines were developed during the coronavirus pandemic and produced promising results. However, the high mutation rate of SARS-CoV-2 prompted the use of computational approaches to increase knowledge on the prevention and treatment strategies.14, 15 Although several SARS-CoV-2 epitopes have been reported in numerous vaccine development studies, the search for specific and unique epitopes has opened the door for future discoveries. Therefore, the present study aimed to identify and screen distinct epitopes of the M and N proteins of VOCs by using bioinformatics databases.
Material and methods
Sequences of membrane and nucleocapsid proteins, and the phylogenetic tree
Several sequences representing all the circulating VOCs of SARS-CoV-2, presented by the World Health Organization (WHO) (https://www.who.int/activities/tracking-SARS-CoV-2-variants), including GenBank accession numbers OX008586.1, OL790194.1, OW998408.1, OW996240.1, ON286809.1, ON286831.1, and OX014251.1, were acquired from the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov). The sequences were saved in the FASTA format, and a phylogenetic tree was designed using Molecular Evolutionary Genetics Analysis Version 11 (MEGA 11). The FASTA format of the SARS-CoV-2 wild-type (WT) (Wuhan-Hu-1) strain (accession number NC_045512.2) was included for comparison. The conserved regions of the proteins were identified and evaluated for further work.
Prediction of T-cell epitopes
In order to predict the cellular immunoreactivity of epitopes, several online servers, including NetMHC (https://services.healthtech.dtu.dk/service.php?NetMHC-4.0), Immune Epitope Database (IEDB) (https://www.iedb.org), NetCTL (https://services.healthtech.dtu.dk/service.php?NetCTL-1.2), MHC2Pred (http://crdd.osdd.net/raghava/mhc2pred), and SYFPEITHI (http://www.syfpeithi.de), were used. The epitopes with the highest scores for intensely stimulating the cellular immune system were selected for the study.
Validation of B-cell epitopes
The B-cell epitopes of the M and N proteins, containing at least 10 amino acids were predicted by IEDB and the artificial neural network-based B-cell epitope prediction server (ABCpred) (https://webs.iiitd.edu.in/raghava/abcpred), with threshold values of 0.5. Next, the selected epitopes were assessed based on their hydrophilicity, flexibility, polarity, surface area, and three-dimensional (3D) structures.
Evaluation of allergenicity
and antigenicity
The allergenicity and antigenicity of the sequences were evaluated using AllerTOP (https://www.ddg-pharmfac.net/allertop) and VaxiJen (https://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html), respectively.
Structure and docking analysis
The physicochemical and biochemical properties of the epitopes, including molecular weight, stability and hydrophobicity, were characterized using the ProtParam tool (https://web.expasy.org/protparam), and the 3D structures of peptides were drawn using the Molegro Virtual Docker software, v. 6.0.1 (Molegro ApS, Aarhus, Denmark). To analyze the molecular docking interaction with the highly-replicated epitopes, the tertiary structures of human leukocyte antigen (HLA) were gained from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (https://www.rcsb.org). Then, the best ligand with minimum energy was assessed through Molegro Virtual Docker.
Results
The phylogenetic analysis and multiple sequence alignment of the complete SARS-CoV-2 genome revealed a close relationship between different variants of the virus (Supplementary material available on request from the corresponding author). Protein Basic Local Alignment Search Tool (BLAST) findings in NCBI also showed that the M and N proteins were highly conserved among the SARS-CoV-2 subtypes.
Stimulation of the cellular
and humoral immune system
Of all the M epitopes identified in this study, LVIGFLFLT, WLLWPVTLA, LFLTWICLL, and FLYIIKLIF were capable of evoking cellular immunity, and seemed to induce a stronger host response. Also, they could correctly bind to the major histocompatibility complex class I (MHC-I) located at positions 22–30, 55–63, 27–35, and 45–53, respectively (Table 1). After immunogenicity and allergenicity evaluation, the final epitopes, including LVIGFLFLT and LFLTWICLL, were predicted as candidates for vaccine development.
The sequences KLDDKDPNFKDQ, RGPEQTQGNFGD and HIGTRNPANNAA, were identified as the frequented epitopes of the N protein that could interfere with cellular and humoral immune responses (Table 2). Based on the evaluation with the use of different servers, the best epitope of the N protein that could stimulate B-cells was KLDDKDPNFKDQ, located at position
338–349 of the sequence. Conversely, none of the M protein epitopes could elicit a robust humoral immune response (Table 3).
Determination of structural conformations and molecular docking
According to the ProtParam server, 3 epitopes were defined as stable peptides, and the molecular weight, isoelectric point and hydrophobicity were 1,462.58 g/mol, 4.58 and 2.308 for KLDDKDPNFKDQ, 1,022.30 g/mol, 5.52 and 2.733 for LVIGFLFLT, and 1,121.45 g/mol, 5.52 and 2.600 for LFLTWICLL, respectively.
The 3D structures of the best peptides are shown in Figure 1. The predicted results obtained from RCSB PDB showed that the epitopes could interact with HLA and induce strong immune responses. Moreover, the findings showed that there was an interaction between MHC-I/II (HLA-A0201:KLDDKDPNFKDQ, HLA-B51:LFLTWICLL and HLA-DRB1:LVIGFLFLT) and the suggested epitopes (Figure 2).
Discussion
COVID-19 has been reported in many parts of the world as having a wide collection of clinical symptoms, ranging from fever and a dry cough to respiratory involvement, heart failure and kidney damage.16, 17, 18 The increasing mortality, especially in patients with underlying diseases, such as diabetes, cancer, cardiovascular complications, and bacterial co-infections, as well as the spread of viral mutations, emphasizes the importance of disease prevention and treatment.19, 20, 21, 22
Bioinformatics databases have proven to be useful sources of information on disease prevention, especially in the case of the diseases associated with life-threatening infections. Furthermore, bioinformatics methods have created a new avenue for designing effective vaccines at a low cost and with high efficiency. Indeed, the knowledge available in the bioinformatics field makes it possible to design effective vaccines against viruses and other infectious agents through using extensive information about the structural and immune features of microorganisms and humans.23, 24
An ideal vaccine should be able to activate both the cellular and humoral arms of the immune system. Different platforms are used for vaccine development, and each of them presents several advantages and disadvantages. Typically, the subunit vaccines using recombinant peptides and proteins are among the most effective, inexpensive and safe vaccines that can be designed. Moreover, this type of vaccine provides effective immunogenicity by evoking host immune responses.25, 26
According to studies on the prevention of COVID-19, there are more than 60 vaccine candidates for SARS-CoV-2, most of which are aimed at inducing the release of neutralizing antibodies against the S protein.27 Unfortunately, several reports have shown that the S protein is an antigen that can mutate rapidly. For example, the most common type of coronavirus, omicron, has more than 30 S protein mutations.28, 29, 30, 31 Furthermore, a number of reports have confirmed that the M and N proteins are good targets for stimulating the antibody-producing B-cell and T-cell responses.32, 33 Indeed, studies published by Enayatkhani et al.,33 Rahman et al.34 and Quayum et al.35 emphasized the importance of the M and N proteins in the viral structure, confirming their potential role as suitable candidates for predicting multi-epitope vaccines. However, the current work focused on the M and N protein epitopes to predict a novel subunit vaccine.
In the present study, all epitopes were evaluated based on different immune responses, and the findings indicate that the M protein may be a useful target for eliciting a cellular immune response, while the N peptides could elicit a strong humoral immune response. Also, the results demonstrate that, among many epitopes, 2 highly antigenic M proteins, LVIGFLFLT and LFLTWICLL, and the KLDDKDPNFKDQ N protein could be used to construct an epitope-based vaccine. The LVIGFLFLT and LFLTWICLL M proteins were the best options among T-cell epitopes, while KLDDKDPNFKDQ was identified as a powerful B-cell epitope.
A number of publications reported on the LVIGFLFLT and LFLTWICLL M epitopes, including works by Behmard et al.36 and Naveed et al.37 In addition, Heffron et al. introduced KLDDKDPNFKDQ as part of the AIKLDDKDPNFKDQVI and KLDDKDPNFKDQVILLNKH peptides in a study on antibodies against the SARS-CoV-2 N protein.38 In comparison with those studies, the findings of this study were more specific. Indeed, this work assessed specific epitopes to determine their distinct sites in the protein sequences, and introduced unique epitopes for vaccine and antibody research.
The present study included the SARS-CoV-2 WT (Wuhan-Hu-1) strain sequence and compared the results with other VOCs, especially lineage B.1.1.529, with respect to the worldwide omicron epidemic. All the suggested epitopes were matched with the new variants of COVID-19.
Conclusions
COVID-19 continues to be an alarming global disease, as observed in the reports of new cases and deaths, which increases the importance of developing a more effective vaccine. The immunoinformatics results obtained for the M and N proteins identified 3 top epitopes, including LVIGFLFLT, LFLTWICLL and KLDDKDPNFKDQ, that could effectively stimulate T-cells and B-cells with the lowest binding energy. Therefore, additional in vitro and in vivo studies are recommended to confirm this theoretical information. Moreover, bioinformatics tools are suggested to be used against future epidemics to design new vaccines for other infectious diseases, and researchers should pay more attention to this issue.
Ethics approval and consent to participate
Not applicable.
Data availability
The datasets generated and/or analyzed during the current study are available from the corresponding author on reasonable request.
Consent for publication
Not applicable.