Indian Journal of Pharmacology Home 

[Download PDF]
Year : 2019  |  Volume : 51  |  Issue : 6  |  Page : 389--399

Investigation of hub genes and their nonsynonymous single nucleotide polymorphism analysis in Plasmodium falciparum for designing therapeutic methodologies using next-generation sequencing approach

Sanjay Kumar Singh, Sudhakara M Reddy 
 Department of Biotechnology, Thapar Institute of Engineering and Technology, Patiala, Punjab, India

Correspondence Address:
Dr. Sudhakara M Reddy
Department of Biotechnology, Thapar Institute of Engineering and Technology, Patiala - 147 004, Punjab


BACKGROUND: Incidences of resistance to current drugs by Plasmodium is increasing, hence, it is necessary to investigate and explore new drug targets to combat malarial disease. OBJECTIVE: Analysis of the transcriptome sequence information to characterize hub genes and their nonsynonymous single nucleotide polymorphisms (nsSNPs) to derive therapeutic objectives for Plasmodium falciparum. MATERIALS AND METHODS: Differentially expressed genes between Ring and other stages of P. falciparum were identified using Cufflinks tool. Using DAVID and KAAS programs, the gene ontology and pathway analysis were performed. The networks of protein-protein interaction (PPI) were developed by Search Tool for the Retrieval of Interacting Genes/Proteins and Cytoscape, and the node degree in the network was calculated by using Network Analyzer, and MCODE plugins of Cytoscape. SIFT, PROVEAN, and PredictSNP programs were used to study the genetic variations, which affect protein functions. RESULTS: A list of 4196 nonredundant genes was used for functional annotation cluster analysis, and 8 significant hub genes have been picked from the PPI network using MCODE plugins of Cytoscape. Various nsSNPs were identified in these 8 hub genes and were investigated both for its native and mutant stage for solvent accessibility and alteration in secondary structure protein residues. CONCLUSION: Hub genes identified in this study serve as potential targets to develop therapy to suppress the pathogenic action of P. falciprum through experimental techniques.

How to cite this article:
Singh SK, Reddy SM. Investigation of hub genes and their nonsynonymous single nucleotide polymorphism analysis in Plasmodium falciparum for designing therapeutic methodologies using next-generation sequencing approach.Indian J Pharmacol 2019;51:389-399

How to cite this URL:
Singh SK, Reddy SM. Investigation of hub genes and their nonsynonymous single nucleotide polymorphism analysis in Plasmodium falciparum for designing therapeutic methodologies using next-generation sequencing approach. Indian J Pharmacol [serial online] 2019 [cited 2020 May 26 ];51:389-399
Available from:

Full Text


Malaria, a deadly infectious disease, is caused by intracellular single-celled parasites belongs to the genus Plasmodium. According to World Malaria Report 2018 by World Health Organization (WHO), about 219 million malaria cases and 435,000 deaths were reported in 87 nations in 2017. In the human genome, it is recognized as one of the most powerful evolutionary selection. Plasmodium falciparum, Plasmodium vivax, Plasmodium knowlesi, Plasmodium malariae, and Plasmodium ovale are different species that cause disease in humans. The most severe malignant malaria particularly in children below 5 years is P. falciparum.[1] In India, almost 95% population lives in endemic areas of malaria, and 80% of recorded malaria in India is limited to 20% of the population areas who lives in rural, hilly, remote, or difficult-to-reach areas.[2] Traditional first-line therapies such as chloroquine and pyrimethamine/sulfadoxine have lost their efficacy in most countries, which resulted in the development of new and more effective anti-malarial medicines, like artemisinin-based combination therapy.[3] Although efforts were made to develop vaccines and medicines to fight malaria, there is still a problem with vaccine escape and drug resistance.[4] What remains in dearth is how plasmodium genetic variation can result in drug resistance or can provide new drug targets. It has been shown that genetic variation and recombination accelerate antigen heterogeneity, immune escape, the production of anti-malarial drug resistance, and comparison of entire genomes may aid in these efforts.[5] RNA-seq is a method that can use next-generation sequencing to analyze the quantity and sequences of RNA in a sample. Knowing the transcriptome is essential to linking the genome data to its functional protein expression. RNA-seq technology enables accurate gene isoform identification, translocation events, nucleotide mutations, and posttranscriptional modifications.

In P. falciparum, the genetic diversity exists in the form of single nucleotide polymorphism (SNPs), microsatellite repeats, insertions, deletions, and a variety of gene duplication. Several studies have been reported to investigate the SNPs of P. falciparum.[6],[7] Nevertheless, to the best of our acquaintance, no research to date has used the impact of deleterious SNPs and solvent accessibility and secondary structure change of protein residues at the native and mutant level in P. falciparum.

López-Barragán et al. 2011 have sequenced seven bidirectional and four strand specific libraries for the analysis of gene expression and antisense transcripts. RNA-Seq data sets of seven bidirectional libraries were used in the study. Therefore, network analysis was performed to recognize the main hubs from the Ring (R), Schizont (Sc), gametocyte stage V (GV), gametocyte stage II (GII), early trophozoite (ET), Late trophozoite (LT), and Ookinete (Oo) stages of P. falciprum. In these prospective hubs, we have examined SNPs which can be proposed as key areas for vulnerability to affect the function of protein. The areas projected in this research can be further targeted to develop therapy to suppress the pathogenic action of P. falciprum through experimental therapeutic techniques such as gene knockout method and gene targeting.

 Materials and Methods


The RNA-Seq dataset was obtained from NCBI website, all stages with accession number SRP009370[8] for analysis. Seven bidirectional reads from the 3D7 parasite have been taken for further investigation. The non-synonymous SNPs (nsSNPs) information of the P. falciparum genes were collected from PlasmoDB database ( The protein sequence was obtained from the UniProtKB database ( in the FASTA format.

Sequence quality control and preprocessing

The quality of sequence data obtained from high throughput sequencing pipelines were checked using FASTQC tool ( The FASTX-Toolkit ( was followed to clean the reads with low base-call quality using a quality filter tool.

Sequence reads alignment and transcript assembly

The reference genome of P. falciparum 3D7 (PlasmoDB version 7.1) was used to map the sequence reads with TopHat tool version: 2.0.14 ([9] The minimum anchor length was seven base pairs for reads present at each side and a maximum size of intron 800 bp. The output of TopHat was filtered to keep only reads mapped from Ring to gametocyte stages with 0 mismatches and up to 1 mismatch in Oo stage to maximize the accuracy. The Cufflinks ([9] tools were used to further evaluate the matched reads using a multifasta files (plasmodium_falciparum.fa), which improves the reliability of abundance of transcript by detecting bias and an algorithm for correction. The relative affluence of transcripts has been depicted as fragments/kilobase of exon/million fragments mapped (FPKM) and the fold change of the FPKM value between Ring (R) and other stage was reported. Poorly expressed genes were removed from the dataset by eliminating genes with FPKM value <2 in all the stages. A list of nonredundant genes which differentially expressed was created from the RNA-Seq data by combining all identified genes of Ring and other stages and the duplicate gene were removed. Fold change ratio was calculated for R versus ET, R versus LT, R versus Sc, R versus GII, R versus GV and R vs Oo groups. Further, genes which are expressed differentially were manually checked in the PlasmoDB [10] and UniProt database.[11]

Functional analysis and pathway mapping

DAVID web server [12] was used to identify and select significantly enriched gene ontology (GO) terms and pathways. The functional annotation was determined with DAVID program ( Those terms with count number of ≥5 genes, and P≤ 0.05 was chosen for analysis. In DAVID, the terms GO cellular component (CC), biological process (BP), and molecular function (MF) were used to classify improved biological topics in lists of genes which differentially expressed. The KEGG Automatic Annotation Server (KAAS) has been used to map the pathway.[13] Using the best single-directional hit method for orthology assignment, the amino acid sequences of genes which were up regulated and under regulated were submitted as input to the KAAS server. KAAS offers functional gene annotation in the database KEGG GENES through a similarity search tool of BLAST for a manually curated orthology group sets. For datasets mapped to one of reference pathways of KEGG, KAAS assigned a KEGG Orthology (KO) number to the genes.

Investigation of protein–protein interactions

The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) program was used to study the functional relationships between differentially expressed genes (DEGs).[14] The GO terms – MF, CC, BP, and pathways of KEGG were used to create the interaction network using reference organism P. falciparum. The physical and functional interactions, coexpression, colocation, pathways, predicted interactions, and protein domain similarity are included in the network relationship between genes. The network has been filtered by eliminating all interactions with weights below 0.1.

Identification of hub genes

Cytoscape 3.4.0,[15] a data integration and network visualization bioinformatics package, has been used to identify hub genes by measuring distribution of node degrees via Network Analyzer plugin. Clusters were created using Cytoscape molecular complex detection (MCODE) plug-in. It detects regions, which are highly interconnected in the network.[16] The statistical criteria for MCODE are as follows: K score = 2, Cutoff degree = 2, Cutoff node score = 0.2 as the default. In the current network, top 5 genes with the largest distribution have been regarded as hubs.

Prediction of deleterious nonsynonymous single nucleotide polymorphisms of hub genes

PlasmoDB database was used to extract the information of SNPs for hub genes identified for P. falciparum. Deleterious coding nsSNPs were predicted by using PROVEAN, SIFT, and PredictSNP. SIFT and PROVEAN are tools to determine how amino acid substitution affects protein function in case the score is lower than threshold value. The PROVEAN tool was used to perform BLAST hits clustering. For each supporting sequence, a delta alignment score was calculated. To calculate the final PROVEAN score, average scores are generated across clusters.[17] Deleterious is considered a score of −2.5 or higher, when anything below this cut-off rating, which has an effect neutral. SIFT tool based on the physical properties of amino acids and homology of sequences determines whether substitution of an amino acid can influence the function of protein.[18] The sequence tool SIFT makes SIFT predictions for a particular sequence of proteins in FASTA format. The sequence of protein queries and interest substitutions of nsSNPs and hub genes with default parameters have been submitted to SIFT program predicts substitutions with values <0.05 to be detrimental. PredictSNP [19] has been explicitly designed to combine the projected outcomes of several tools to form a prediction of consensus. For the predictions, prediction tools use a list of variants in the protein sequence as input.

Protein solvent accessibility and prediction of secondary structure

To predict protein accessible surface area (ASA), surface accessibility and secondary structure, NetSurfP program, was used. The NetSurfP simultaneously predicts accuracy for each prediction by calculating the Z-score. It uses two types of neural networks, the first type networks are based on secondary structure predictions and sequence profiles, with two outputs with respect to buried or exposed and in combination with sequence profiles, the other networks use these outputs as inputs and are trained to assess the relative surface exposure of each amino acid residues.[20] The normal and predicted SNP sequences in FASTA format have been submitted for prediction to the NetSurfP. For prediction of secondary structure and ASA, protein encoding gene in normal and its predicted SNP substitutions have been uploaded individually to NetSurfP web server. Microsoft Excel ® 2016 has converted the most predicted secondary structure probabilities from NetSurfP to a single letter code representing helical (H), β-strand (E), and coil (C).


The RNA-Seq dataset for all the stages of P. falciprum such as Ring SRR364849, ET SRR364848, LT SRR364847, Schizont SRR364843, GII SRR364840, GV SRR364838, and Oo SRR364834 has downloaded from NCBI SRA for analysis. SRA Toolkit modules, vdb-validate and fastq-dump were used to validate the integrity of downloaded SRA data and to convert SRA data into fastq format, respectively. FASTX Toolkit, a quality filter tool, was used to clean the reads with low base-call quality. Bowtie index was built from a set of DNA sequences of P. falciparum chromosomes. RNA-seq reads for every stage were mapped for P. falciparum genome using TopHat program. The output of TopHat was filtered to keep only reads, which were mapped with 0 mismatches from ring to gametocyte stages, and up to 1 mismatch in Oo stage to maximize the accuracy.

Prediction of differentially expressed genes

Transcripts were assembled and analyzed in FPKM for their relative expression levels by Cufflinks tool after sequencing reads mapped to the TopHat reference genome. Between Ring and other stages, fold change analysis was done to compare the gene expression. The data with FPKM values equivalent to zero were removed, and remaining values were subjected to further analysis. As a result, 7517 genes from ring (R), 6799 genes from ET, 7482 genes from LT, 5102 genes from schizont (Sc), 8731 genes from gametocyte stages (GII), 8831 genes from gametocyte stages (GV), and 5155 genes from Oo stages were identified. Then, six groups – RvET, RvLT, RvSc, RvGII, RvGV, and RvOo were created, and the common gene in each group has been identified [Table 1].{Table 1}

The common genes between RvET-4402; RvLT-4354; RvSc-4022; RvGII-3917; RvGV-2988; and RvOo– 4009 have been identified. The fold change in each group was estimated using these common genes, which was defined as the FPKM value ratio of RvET, RvLT, RvSc, RvGII, RvGV, and RvOo groups. To classify the DEGs, less expressed genes with FPKM value <2 have been removed from the dataset in all the stages. On the basis of above criteria, there are 2442 DEGs between RvET, 2796 between RvLT, 2935 between RvSc, 2807 between RvGII, 2180 between RvGV, and 2895 between RvOo groups were sorted out for the analysis [Table 1]. The complete workflow for the analysis is depicted in [Figure 1].{Figure 1}

Functional annotations and pathway analyses

The functional annotation cluster analysis was conducted by DAVID tool on the DEGs. The GO terms – BP, MF, and CC were used for interpretation and only those terms with count number of ≥5 genes and P≤ 0.05 were selected. Five top GO terms with significant P values for each group from functional analysis are presented in [Table 2]. Only those gene IDs which were mapped via DAVID tool is used for further study. From this data, it is clear that the GO terms are enriched in RvET2209, RvLT2546, and RvGV1990 genes, which represent the functions necessary for host cell plasma membrane, antigenic variation, and pathogenesis [Table 2]. The enhanced GO terms in RvSc2735 including functions related to pathogenesis, single organismal cell–cell adhesion, receptor activity, and cell adhesion molecule binding [Table 2], and RvGII2594 include functions related to single organismal cell-cell adhesion, pathogenesis, receptor activity, and cell adhesion molecule binding. Similarly, functions-related single organismal cell–cell adhesion, pathogenesis, and cell adhesion molecule binding are enriched in RvOo2726 samples [Table 2].{Table 2}

The KAAS has been used to map the pathways of DEGs in all six stages. DEGs amino acid sequences in FASTA format were submitted to the KAAS server. As a result, 277 pathways were predicted for RvET2209, 283 pathways for RvLT2546, 280 pathways for RvSc2735, 285 pathways for RvGII2594, 229 pathways for RvGV1990, and 272 pathways for RvOo2726 were recognized. The top 10 KEGG pathways for each six categories are shown in [Table 3], and [Supplementary Table 1] provides a complete list of pathways. It was observed from [Table 3] that most DEGs have been linked with important biological processes, many of which are classified as metabolic pathways, secondary metabolite production pathways, ribosome, or being involved in biosynthesis of antibiotics.{Table 3}[INLINE:1]

Identification of hub genes

STRING program has been used to investigate DEGs interaction. Only those genes showing significant interactions with weights higher than 0.4 were selected for network analysis. A network between DEGs was constructed for all the six groups, namely RvET2209, RvLT2546, RvSc2735, RvGII2594, RvGV1990, and RvOo2726. The six interaction networks resulted from STRING were then subjected to Cytoscape. The network consists of 1647 nodes and 23970 edges for RvET2209, 1969 nodes and 36152 edges for RvLT2209, 2236 nodes and 48514 edges for RvSc2735, 2010 nodes and 30113 edges for RvGII2594, 1473 nodes and 17986 edges for RvGV1990, and 2095 nodes and 37531 edges for RvOo2726. All genes in the network are represented in circles and their interactions represent edges. The interaction networks of all six groups are given in [Supplementary Figure 1].[INLINE:2]

Further, all six networks have been analyzed using Network Analyzer and MCODE modules available in Cytoscape. Network Analyzer is calculating the node degrees in the network, whereas the MCODE module is creating the clusters in the network. The higher node degrees were regarded to be more significant genes and were referred as hub genes. The top 8 MCODE seed are PF3D7_1126700, PF3D7_0508100, PF3D7_1306000, PF3D7_1439500, PF3D7_0324900, PF3D7_1234300, PF3D7_1207100, and PF3D7_0705600. The interaction between these hubs and their first neighbors was presented as [Figure 2].{Figure 2}

Single nucleotide polymorphism analysis

A SNP is a significant source of variance in a genome. SNPs may result in affecting protein function by decreasing protein solubility or by destabilizing structure of protein.[21] The PlasmoDB database has been used to retrieve the nsSNPs and SNPs for the identified hub genes. The PF3D7_0324900 have highest SNPs and nsSNPs information, i.e., 2416 SNPs and 1606 nsSNPs from the database among all hub genes; whereas PF3D7_1439500, PF3D7_0508100, PF3D7_1306000, PF3D7_0705600, PF3D7_1207100, PF3D7_1126700, and PF3D7_1234300 have 171, 128, 101, 62, 38, 28, and 11 nsSNPs, respectively. The total number of nsSNPs and SNPs recognized for the hub genes are illustrated in [Figure 3].{Figure 3}

Analysis of nsSNPs for protein function

Deleterious coding nsSNPs were predicted using SIFT, PROVEAN, and PredictSNP tools. SIFT is a sequence homology-based tool used to classify substitutions for amino acids. The SIFT tool predicts whether substitution of an amino acid can affect protein function for a given FASTA protein sequence or not. The SIFT predicts substitutions with values <0.05 to be deleterious. The SIFT sequence tool predicted 60, 52, 38, 33, 28, 19, 7, and 2 positions to be affect protein function for PF3D7_1439500, PF3D7_1306000, PF3D7_0508100, PF3D7_0705600, PF3D7_0324900, PF3D7_1207100, PF3D7_1126700, and PF3D7_1234300, respectively. To predict the final PROVEAN score, a delta alignment score is calculated for every supporting sequence and mean value across the clusters. By default, a score of −2.5 or above of it is taken as deleterious, while anything short of this cutoff rating has been considered as neutral. In the hub genes, the PROVEAN protein tool predicted 40, 4, and 2 positions to be deleterious for PF3D7_0324900, PF3D7_1306000, PF3D7_1207100 and 1 position for PF3D7_0705600 and PF3D7_1234300, respectively. However, PROVEAN tool not found any deleterious mutation in PF3D7_1439500, PF3D7_0508100, and PF3D7_112670. PredictSNP unambiguously designed to combine outcomes of several methods, mostly to annotate disease-variant relationships. According to PredictSNP tool, there are 18, 16, 11, 9, 6, 4, 2, and 1 mutations to be deleterious for PF3D7_1306000, PF3D7_0324900, PF3D7_1439500, PF3D7_0705600, PF3D7_1207100, PF3D7_0508100, PF3D7_1126700, and PF3D7_1234300, respectively.

It was confirmed and verified at least by two tools used above in the study that 25 positions for PF3D7_0324900, 20 for PF3D7_1306000, 11 for PF3D7_1439500, 10 for PF3D7_0705600, 6 for PF3D7_1207100, 4 for PF3D7_0508100, 2 for PF3D7_1126700, and 1 for PF3D7_1234300 were identified to influence the function of protein [Supplementary Table 2].[INLINE:3]

Solvent accessibility and secondary structure prediction

Further, the secondary structures and solvent accessibility of the hub genes was investigated using NetSurfP-1.1. The SNPs which were predicted by at least two tools to have a negative effect on the function of protein were used for the analysis of secondary structure and solvent accessibility. Moreover, the data were screened by selecting the residues, which showed change in ASA ≥10 Š2 from buried state to exposed state and also exposed state to buried state and its secondary structure change. The information related to ASA and secondary structure are shown individually in [Supplementary Table 3].[INLINE:4]

In PF3D7_0324900, N615K mutation showed an ASA change to exposed state from buried but T1745P, S1747R, S2024R, P2026S and E2065K mutations showed the opposite change to buried state from exposed state. There was a change in most of the conformations from Coil to Alpha-Helix, due to these mutations [Table 4]. Due to mutations Y779D and Y862N, an ASA change to exposed state from buried was shown in PF3D7_1306000 whereas D1113Y showed an inverse change. Conformations in Y779D and D1113Y were changed from Alpha-Helix to Coil, while in Y862N, Beta-strand was changed to Coil. Mutations in the residue position N186H, S312N, N334S, Y338N, S348C, Y403S, S445L, and R570G in PF3D7_1439500 show a change to exposed state from the buried state and mutations S313N, T411I, and N746 K show a change from buried state in the majority of the cases. In most cases, it shows changes in secondary structure from coil to helix and coil to beta-strand. In PF3D7_0705600, mutations show almost the same change to exposed state from buried state and vice versa. It shows changes in secondary structure from helix to coil and helix to beta-strand in most cases and also from coil to helix in some cases. Due to mutation D377H, an ASA change primarily from buried state to exposed state was shown in PF3D7_1207100 whereas N235K, R277W, P528S, and N661Y show an inverse change. In R277W mutation, both F274 and E275 show change in C to H conformation. D377H shows change in conformation mainly from H to C. In P528S mutation, both L337 and Y340 show change from C to H conformation. In N661Y mutation, L337 show change from C to H and I410 show an opposite change in conformation. Mutations in the residues I800T, I1103K, S1411P and Y1474H, in PF3D7_0508100 show mostly from exposed to buried state. In most of the cases, it shows changes in secondary structure from helix to coil and vice versa. Some also change from helix to beta-strand conformations. In PF3D7_1234300, S434Y mutation, I137 show change in conformation from coil to helix. In PF3D7_1126700, no significant change was detected [Table 4].{Table 4}

Further, a database has been developed to show the analysis done by PROVEAN, SIFT, PredictSNP, and NetSurfP software, which is available online at URL

Database was developed using PHP and JavaScript with user-friendly search environment using a range of options, such as simple searches and advanced searches. Users can search easily with any of the three fields such as gene name, UniProt ID, or SNP ID. Gene name with corresponding residues is available in advance search. Users can search for particular residues in any gene or genes available in this database (currently the hub genes). For displaying the SNP data, two-step approach is exploited by these two search methods. The first step is to display Gene name, UniProt ID, SNP ID, AA change, PredictSNP cuttoff, Confidence, PROVEAN Score, SIFT Score, and cutoff. The second step will reveal the details of the secondary structure, SNP ID, Residue N, Location, ASA N, Class N, SS N, Residue SNP, SNP, ASA SNP, Class SNP, and SS SNP. Users can also customize their search results by choosing any field like Class change, SS change, ASA change or Residue N or any combination for a particular gene. The relevant information is displayed dynamically. In the help section, all headers are defined and hyperlinked to this web portal from the search pages. In addition, every SNP ID is directly linked to its relevant entries in the PlasmoDB database.


The most virulent pathogen of malaria and malaria mortality worldwide is P. falciparum. To control the disease in parasites of malaria, the study of genetic variation is of practical importance. A nonredundant list of 4196 genes was used for the functional annotation cluster analysis by the DAVID tool. The most important enriched GO terms identified in these genes by DAVID functional cluster analysis consist of functions required for the host cell plasma membrane, antigenic variation, and pathogenesis [Table 2]. The node degree was calculated for each gene in the network to explore the functional roles of genes involved in different processes and interaction networks were created using Network Analyzer and MCODE plugins of Cytoscape. PF3D7_0324900, PF3D7_1306000, PF3D7_1439500, PF3D7_0705600, PF3D7_1207100, PF3D7_0508100, PF3D7_1126700, and PF3D7_1234300 genes were considered as hubs genes in the network constructed from 4196 P. falciparum genes by Network Analyzer and MCODE plugins of Cytoscape. Hub genes are considered functionally important because these are highly interconnected with nodes in a system. Therefore, these can act as putative targets for drug designing.

In this study, a number of nsSNPs have been identified for nearly all hubs, but few have had an impact on protein functions. PF3D7_0324900, PF3D7_1306000, PF3D7_1439500, PF3D7_0705600, PF3D7_1207100, PF3D7_0508100, PF3D7_1126700, and PF3D7_1234300 genes were annotated by at least two tools to affect protein function. The NetSurfP web server has been used to predict the protein secondary structure and surface accessibility in normal for each gene and its predicted SNP substitutions. The erythrocyte membrane protein 1 of P. falciparum, PfEMP1 (PF3D7_0324900) is considered as potential hub in this study, which mediates the attachment of infected erythrocytes to a range of host cells in the vascular lining during the blood stage of the infection with malaria. PfEMP1 is inserted into the RBC membrane and then laterally transferred to the preformed knob structures in the central region. Knob assembly may result in new ways of inhibiting PfEMP1 presentation on infected RBCs.[22] Strategies for overcoming PfEMP1 antigenic diversity would provide an exciting new opportunity for the development of malaria vaccines.[23] PF3D7_1306000 is a conserved protein of Plasmodium with unknown function and considered as hub gene. This gene was found as essential along with druggability index 0.5 in Tropical Disease Research (TDR) Targets database ( Another hub CCAAT-binding transcription factor or oocyst rupture protein 2 (ORP2) (PF3D7_1439500) was involved in sporozoite egress. Sporozoite egress of the oocyst can be blocked by removing the N-terminal histone fold domain of ORP2.[24]

Hub gene RNA helicase, (PF3D7_0705600) play various roles, including the cell growth and development. Helicases are significant unwinding enzymes that are needed in the malaria parasite for nearly all the nucleic acid metabolism. RNA helicases could be used as reasonable targets to develop new antiparasite therapies and solve the problem of drug resistance.[25] Small subunit rRNA processing factor, (PF3D7_1207100) involved in the maturation of SSU-rRNA from tricistronic rRNA transcript was also found as a hub gene. It is a protease group of enzymes, which play key roles in the development and invasion of parasites. The ability to design particular protease inhibitors makes them promising objectives for drugs.[26] SET domain protein, (PF3D7_0508100), which enables histone-lysine N-methyltransferase activity was also observed as hub gene. Two types of protein methyltransferase enzymes (PMTs) are present in eukaryotic cells: lysine specific and arginine specific. They were both linked with a variety of diseases including neurodegenerative and inflammatory diseases, and cancer. PMT enzymes have emerged as a target class against human disease for drug discovery.[27]

The gene PF3D7_1126700 that was also observed as a hub, codes for autophagy-related protein 23. The ATG18 autophagy-related protein controls the biogenesis of apicoplast in apicomplex parasites and decline of ATG18 in P. falciparum showed in delayed death.[28] DNA polymerase epsilon subunit B (PF3D7_1234300) is involved in the DNA-dependent DNA replication and enables DNA-directed DNA polymerase activity. The main function of Pol epsilon is to extend the leading strand synthesis during replication.[29]

A total of 6 mutations (N615K, T1745P, S1747R, S2024R, P2026S, and E2065K) were identified for PF3D7_0324900, 3 mutations (Y779D, Y862N and D1113Y) for PF3D7_1306000, 11 mutations (N186H, S312N, S313N, N334S, Y338N, S348C, Y403S, T411I, S445 L, R570G and N746K) for PF3D7_1439500, 6 mutations (N124Y, E623V, A626D, D645Y, T688R and T852S) for PF3D7_0705600, 5 mutations (N235K, R277W, D377H, P528S and N661Y) for PF3D7_1207100, 4 mutations (I800T, I1103K, S1411P and Y1474H) for PF3D7_0508100, and only one mutation for PF3D7_1234300. These mutations brought a change in accessible surface area from buried to exposed state and vice versa and also change in their secondary structure as observed through NetSurfP tool.

A computational approach to systematically analyze nsSNPs was undertaken in this study to predict deleterious mutations. SNP within the proteins significantly affects stability of structure of a protein and its function. The role of nsSNPs in understanding the functional effects of mutations that may cause changes in amino acids in hub proteins was also examined. Further, determination of nsSNPs that interfere with the function of protein and cause a disease needs to be determined.[30]


In this study, a comprehensive approach has been used to derive potential therapeutic targets for P. falciparum using RNA-seq dataset. The differential expression of genes, functional, and pathway enrichment analysis of P. falciparum was apprized in detail. The present study results suggested that PF3D7_0705600, PF3D7_1207100, PF3D7_0508100, PF3D7_1126700, and PF3D7_1234300 hub genes serves as putative targets for drug designing. These hub genes are showing less mutation and no similarity with human proteins. These genes act as lysine methyltransferases, transcription, translation, RNA splicing, and other important cellular pathways in P. falciparum. Moreover, the study also provides clusters of hub genes and their network pathways analysis which can be used further studies to devise a therapeutic target that stabilizes their gene expression. In addition, nsSNPs and their functional impact of these hub genes were also calculated.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Le Roch KG, Chung DW, Ponts N. Genomics and integrated systems biology in Plasmodium falciparum: A path to malaria control and eradication. Parasite Immunol 2012;34:50-60.
2Sharma VP. Continuing challenge of malaria in India. Curr Sci 2012;102:678-82.
3Goswami D, Baruah I, Dhiman S, Rabha B, Veer V, Singh L, et al. Chemotherapy and drug resistance status of malaria parasite in Northeast India. Asian Pac J Trop Med 2013;6:583-8.
4Hay SI, Okiro EA, Gething PW, Patil AP, Tatem AJ, Guerra CA, et al. Estimating the global clinical burden of Plasmodium falciparum malaria in 2007. PLoS Med 2010;7:e1000290.
5Imwong M, Dondorp AM, Nosten F, Yi P, Mungthin M, Hanchana S, et al. Exploring the contribution of candidate genes to artemisinin resistance in Plasmodium falciparum. Antimicrob Agents Chemother 2010;54:2886-92.
6Breglio KF, Amato R, Eastman R, Lim P, Sa JM, Guha R, et al. A single nucleotide polymorphism in the Plasmodium falciparum atg18 gene associates with artemisinin resistance and confers enhanced parasite survival under nutrient deprivation. Malar J 2018;17:391.
7Subudhi AK, Boopathi PA, Pandey I, Kaur R, Middha S, Acharya J, et al. Disease specific modules and hub genes for intervention strategies: A co-expression network based approach for Plasmodium falciparum clinical isolates. Infect Genet Evol 2015;35:96-108.
8López-Barragán MJ, Lemieux J, Quiñones M, Williamson KC, Molina-Cruz A, Cui K, et al. Directional gene expression and antisense transcripts in sexual and asexual stages of Plasmodium falciparum. BMC Genomics 2011;12:587.
9Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with tophat and cufflinks. Nat Protoc 2012;7:562-78.
10Aurrecoechea C, Brestelli J, Brunk BP, Dommer J, Fischer S, Gajria B, et al. PlasmoDB: A functional genomic database for malaria parasites. Nucleic Acids Res 2009;37:D539-43.
11Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S, et al. UniProt: The universal protein knowledgebase. Nucleic Acids Res 2004;32:D115-9.
12Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009;37:1-3.
13Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M. KAAS: An automatic genome annotation and pathway reconstruction server. Nucleic Acids Res 2007;35:W182-5.
14Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING database in 2017: Quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 2017;45:D362-8.
15Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics 2011;27:431-2.
16Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC Bioinformatics 2003;4:2.
17Choi Y, Chan AP. PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015;31:2745-7.
18Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 2009;4:1073-81.
19Bendl J, Stourac J, Salanda O, Pavelka A, Wieben ED, Zendulka J, et al. PredictSNP: Robust and accurate consensus classifier for prediction of disease-related mutations. PLoS Comput Biol 2014;10:e1003440.
20Petersen B, Petersen TN, Andersen P, Nielsen M, Lundegaard C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct Biol 2009;9:51.
21Kucukkal TG, Petukh M, Li L, Alexov E. Structural and physico-chemical effects of disease and non-disease nsSNPs on proteins. Curr Opin Struct Biol 2015;32:18-24.
22Looker O, Blanch AJ, Liu B, Nunez-Iglesias J, McMillan PJ, Tilley L, et al. The knob protein KAHRP assembles into a ring-shaped structure that underpins virulence complex assembly. PLoS Pathog 2019;15:e1007761.
23Chan JA, Fowkes FJ, Beeson JG. Surface antigens of Plasmodium falciparum-infected erythrocytes as immune targets and malaria vaccine candidates. Cell Mol Life Sci 2014;71:3633-57.
24Siden-Kiamos I, Pace T, Klonizakis A, Nardini M, Garcia CR, Currà C. Identification of Plasmodium berghei Oocyst Rupture Protein 2 (ORP2) domains involved in sporozoite egress from the oocyst. Int J Parasitol 2018;48:1127-36.
25Marchat LA, Arzola-Rodríguez SI, Hernandez-de la Cruz O, Lopez-Rosas I, Lopez-Camarillo C. DEAD/DExH-Box RNA helicases in selected human parasites. Korean J Parasitol 2015;53:583-95.
26Lilburn TG, Cai H, Zhou Z, Wang Y. Protease-associated cellular networks in malaria parasite Plasmodium falciparum. BMC Genomics 2011;12 Suppl 5:S9.
27Copeland RA, Solomon ME, Richon VM. Protein methyltransferases as a target class for drug discovery. Nat Rev Drug Discov 2009;8:724-32.
28Bansal P, Tripathi A, Thakur V, Mohmmed A, Sharma P. Autophagy-related protein ATG18 regulates apicoplast biogenesis in Apicomplexan Parasites. MBio 2017;8. pii: E01468-17.
29Pursell ZF, Isoz I, Lundström EB, Johansson E, Kunkel TA. Yeast DNA polymerase epsilon participates in leading-strand DNA replication. Science 2007;317:127-30.
30Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, et al. LS-SNP: Large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 2005;21:2814-20.