|Year : 2018 | Volume
| Issue : 4 | Page : 169-176
A big data approach with artificial neural network and molecular similarity for chemical data mining and endocrine disruption prediction
Renjith Paulose1, Kalirajan Jegatheesan2, Gopal Samy Balakrishnan3
1 Research and Development Centre, Bharathiar University, Coimbatore, Tamil Nadu, India
2 Center for Research and PG Studies in Botany and Biotechnology, Thiagarajar College (Autonomous), Madurai, Tamil Nadu, India
3 Department of Biotechnology, Liatris Biosciences LLP, Kottayam, Kerala, India
|Date of Submission||27-Jun-2018|
|Date of Acceptance||31-Jul-2018|
|Date of Web Publication||1-Nov-2018|
Mr. Renjith Paulose
Research and Development Centre, Bharathiar University, Coimbatore - 641 046, Tamil Nadu
Source of Support: None, Conflict of Interest: None
CONTEXT: Chemical toxicity prediction at early stage drug discovery phase has been researched for years, and newest methods are always investigated. Research data comprising chemical physicochemical properties, toxicity, assay, and activity details create massive data which are becoming difficult to manage. Identifying the desired featured chemical with the desired biological activity from millions of chemicals is a challenging task.
AIMS: In this study, we investigate and explore big data technologies and machine learning approaches to do an efficient chemical data mining for endocrine receptor disruption prediction and virtual compound screening. The power of artificial neural network (ANN) in predicting chemicals' activity toward androgen receptor (AR) and estrogen receptor (ER) and thereby classifying into human endocrine disruptor or nondisruptor is investigated.
SUBJECTS AND METHODS: Molecules are collected along with their Inhibitory Concentration (IC50) values toward AR and ER. Training and test datasets are created with active and inactive classes of molecules. Molecular fingerprints of Electro Topological State (E-State) are generated for describing every compound. ANN machine learning model is created using Apache Spark and implemented in Hadoop big data environment. Test chemical's structural similarity toward active class of training compounds is estimated and combined with ANN model for improving prediction accuracy.
RESULTS: AR and ER predictive models applied on corresponding test datasets gave 86.31% and 89.57% accuracies, respectively, in correctly classifying molecules as disruptor or nondisruptor. Molecular fragments and functional groups are ranked based on their importance in forming ANN model and influence toward the AR and ER disruption behavior. Training molecules that are specific to the test molecules' endocrine disruption prediction are retrieved based on the structural similarity values.
CONCLUSIONS: The current study demonstrates a new approach of chemical endocrine receptor disruption prediction combining ANN machine learning method and molecular similarity in a big data environment. This method of predictive modeling can be further tested with more receptors and hormones and predictive power can be examined.
Keywords: Artificial neural network, big data, chemical absorption, distribution, metabolism, and excretion-toxicity screening, endocrine receptor disruption, Hadoop, machine learning
|How to cite this article:|
Paulose R, Jegatheesan K, Balakrishnan GS. A big data approach with artificial neural network and molecular similarity for chemical data mining and endocrine disruption prediction. Indian J Pharmacol 2018;50:169-76
|How to cite this URL:|
Paulose R, Jegatheesan K, Balakrishnan GS. A big data approach with artificial neural network and molecular similarity for chemical data mining and endocrine disruption prediction. Indian J Pharmacol [serial online] 2018 [cited 2019 Feb 18];50:169-76. Available from: http://www.ijp-online.com/text.asp?2018/50/4/169/244720
| » Introduction|| |
The increasing data generation rate across all scientific disciplines offers unbelievable data-driven research opportunities with the potential to change our existing practices. The big data utilization facilitates to commence research projects that are impossible earlier. The ability of data motivated medicinal chemistry methods to develop a supervisory role in drug discovery projects exposed the significant relationships and blueprints in existing data.
Data management and exploration have become terribly imperative as data production now is at a faster rate than its analysis and interpretation. Data-driven drug design is reliant on computational medicinal chemists working on the data volume expansion and identification of methods to obtain better decisions from these resources. Data motivated research can either extract most benefit from the internally generated data or incorporate existing data resources on the outside for decision-making.
The approaches obtained in one field might have impending uses in ostensibly dissimilar disciplines. Data motivated drug discovery and medicinal chemistry get benefited by captivating benefit of progress and procedures from scientific disciplines erstwhile. The rapid data growth caused many technical challenges making the role of the computational medicinal chemist always evolving and not static.
Enumerating the resemblance of a pair of molecules is a main notion and a regular duty in chemoinformatics. It has a wide range of applications in diverse fields, predominantly medicinal chemistry associated like virtual screening. A number of universally applied best calculation practices for molecular similarity are derived from existing practical knowledge. A near never-ending approach space is obtainable with a surplus of molecular representations and similarity definitions to put side by side and coming up to be explored in the interim. The knowledge about the method options and its effects on molecular similarity calculation results and rankings are still comparatively inadequate.
Similarity metrics are engaged in broad areas invigorating their performance assessment in event identification in social media, texture image retrieval, or webpage clustering. Sixteen similarity measures were compared based on high content screening performance, and it was concluded that nonlinear correlation-dependent similarity metrics such as Spearman's ρ and Kendall's τ smashed previously used Euclidean distance.
The popularity of the Tanimoto coefficient was reinforced by comparing 22 similarity metrics. The Tanimoto index was suggested for two-dimensional (2D) fragment-based similarity searching in place of the Euclidean distance. Combinations of 2–4 similarity metrics know how to do better than the Tanimoto index albeit no other combination demonstrated far above the ground performance constantly across various circumstances. When there are no precise data on molecule sizes, the only choice for computing molecular similarities is the deep-rooted Tanimoto coefficient.
The comparison of 51 similarity coefficients supported the effectiveness of the Tanimoto index and identified two additional metrics that are commendable for future applications in chemoinformatics. The Tanimoto coefficient has its weaknesses regardless of the positive findings and one among that is the tendency to prefer Tanimoto index for small compounds in contrast to selection. The Tanimoto index has the ability to generate similarity values around one-third even for structurally distant molecules.
Similarity measures are typically compared consistent with their performance in only some precise situation like the active molecule retrieval on a particular protein. Databases of biologically relevant molecules such as National Cancer Institute anti-AIDS or molecular drug data report databases were used in these studies.
Around 30% of high-priced, late-stage failures in development are due to toxicity. Consequently, identifying and prioritizing as early as possible chemistries with a lower toxicity risk in the process of drug discovery process might lend a hand in tackling the lofty wear and tear rate in pharmaceutical research. Knowledge-based toxicity prediction will alert chemists if their anticipated compounds have greater toxicity threat. Conversely, an alert for impending toxicity must be considered in the compound selection to poise probable opportunities in opposition to downstream toxicity risk. It was apt to step forward with selected compounds and produce experimental data to validate a potential toxicity prediction.
Toxicity of clinical candidates and drugs remained a major concern in pharmaceutical industry making bigger abrasion and price, late-stage collapse, and market abandonment. Nearly 22% of drug candidates entering clinical development have failed by reason of clinical safety issues or nonclinical toxicology. In preclinical improvement, safety and toxicity problems comprised 54% of failures, of which preclinical candidates are 18%. These costly late-stage disappointments constitute a large fraction of the pharmaceutical research cost. Toxicity remained as a problem-making unfavorable drug reactions and pointing out to black box warnings, restricted usage, and also pulling out many drugs from market.
Compounds occurring in nature or synthetic substances that imitate the hormones' role in the body are endocrine disruptors. They induce, block, or alter signals carried by hormones that have an effect on the usual role of organs and tissues associated with developmental, immune, neural, and reproductive problems in laboratory and natural world animals. These substances are harmfully disturbing human health also in the same manner, resulting in infertility and enlarged development of several diseases such as certain cancers, diabetes, obesity, and endometriosis.
On the contrary to the environmental chemicals having estrogenic activity, chemicals with androgen, antiestrogen, antiandrogen, progesterone, or thyroid-like activity were also well known. A diverse range of chemicals including dioxin, diethylstilbestrol, dichlorodiphenyltrichloroethane, polychlorinated biphenyls, and a few pesticides such as Di-(2-ethyl hexyl) phthalate, bisphenol A (BPA), and phytoestrogens like genistein and daidzein are known as endocrine disruptors.
A variety of computational approaches are currently used to predict toxicity. Some of these such as read across, quantitative structure-activity relationships (QSARs), and expert systems typically make predictions based solely on the molecular structure of the toxicant. Some like proteochemometrics require knowledge of the receptor through which the toxicant acts as well as the docking of putative toxicants into the binding sites of 3D receptor models. Others typically require various properties of the toxicant such as physiologically based pharmacokinetic and pharmacodynamic modeling to be experimentally determined. Some predictive toxicology programs employ more than one approach to arrive at predictions; for example, so-called hybrid expert systems may also employ QSAR components.
Animal testing by manufacturers looking for promoting new products might be employed to ascertain product safety. After taking available alternatives into consideration, companies may conclude that animal testing is essential in some cases, to guarantee the product safety. Food and Drug Administration (FDA) supports the development and use of substitute to whole animal testing as well as faithfulness to the most humane methods on hand within the scientific capability limits when animals are employed for testing the cosmetic products' safety.
There have been a large number of rules in pharmaceutical research from the time when the rule of five portrayed oral active composites in terms of a small number of uncomplicated molecular properties. The correlation between animal in vivo tolerance and physicochemical properties were investigated and resolved with fewer toxicity findings. Computational filters have been developed to eliminate reactive molecules from their screening datasets. Thiol-reactive molecules were detected by a La assay to detect reactive molecules by nuclear magnetic resonance assay and its data were employed in creating a Bayesian classifier model to forecast reactivity. Molecules deteriorating these reactivity filters could connect with the various rule of five violations.
The quantity of biological data produced owing to high-throughput screening necessitates storage databases and database to generate mammon of information for creating computational models in the process. Pfizer used data from different assays done in their laboratories and from the literature, for around 100,000 compounds to generate a Bayesian cytotoxicity predicting model.
The present data gaps were filled by the computational models for human toxicities which give information on how to use compounds based on their scoring in the ToxCast library. Along with it, the pregnane X receptor docking models give the best way to check a small part of the chemicals. There are hundreds of endpoints in the case of ToxCast that can be predicted to provide a more comprehensive portrait of toxicity. Computational toxicity models help us to have a better side effect profile of FDA-approved drugs. Toxicity prediction was needed in some extent to different industries, and hence it is vital to provide databases, computational tools, and considerate scientists. This made it mandatory for individuals to work in a collaborative way and share the toxicity data they have. These human toxicity computational models eventually reduce and even replace the classical animal studies.
The target of toxicity prediction is to portray the connection between biological and toxicological processes and chemical properties. Computational prediction derived from theoretical values should be the equivalent to the usual prediction based on extrapolation from laboratory experiments and it should support analyzing a chemical compound lacking laboratory experiments ahead of its production.
The prediction and perceptive of the consequences on human and wildlife health by these chemicals are more important and is done through informal incredibly expensive and years-long experiments at the moment involving animal studies. Among the 19 millions of different compounds known, some toxicity data for only 10% of the industrially produced chemicals are on hand making the computational prediction tougher than ever. The number of compounds to be considered was greatly increased due to the wider use of the combinatorial chemistry adopted by chemical companies.
Finding the relationship between predicting the effects of chemicals using the data and the knowledge available such as chemical structure, acute toxicity, and chronic toxicity are much more important. Artificial Intelligence techniques are good in allowing data reasoning, extracting knowledge, building hypotheses, and finding nonlinear function estimates. In machine learning methods, datasets employed to build the model are ultimately different than the test dataset. Each entity in the dataset is branded by a number of characteristics.
Several data science basic algorithms to crack commerce problems similar to product recommendation and customer response prediction were emphasized by data classification, regression, and similarity matching. These three fundamental data science methods are basis for curating beneficial information from data and also serve as a foundation for many well-known algorithms in data science. Prediction of which class an individual belongs to in a population is done by class probability estimation and classification. Generally, the classes are independent of each other. The goal for the classification task is to determine which class a new given individual belongs to. Applying class probability estimation, a closely related concept to an individual generates a score for the possibility of that individual to fit in the class.
Chemical companies' investments in digital technologies will climb harshly with big data analytics and cloud computing anticipated to produce the maximum return on venture. Almost 94% of the chemical industry executives reviewed arrangement to raise their digital use, with 54% projecting a noteworthy increase. Around 77% trust the best returns from cloud computing (44%) and big data analytics (33%). Most chemicals' company executives consider that the largest profit from digital will embrace better productivity, an impact on customer relationships and redefined product development. Almost 90% organization has improved its investment in employees committed to digital.
Apache Spark is a freely given, easy to use big data handing out framework fabricated for sophisticated analytics and speed. It has its own advantages contrary to Storm and Hadoop. Spark provides an absolutely combined framework for processing of big data with varieties of natural datasets as well as the data source.
Hadoop cluster applications were processed ten times quicker in disk and 100 times quicker in memory by Spark that appears with an integrated set of eighty sophisticated operators and accedes to write applications swiftly in Python, Java, or Scala. Other than MapReduce operations, it holds up SQL queries, graph data processing, streaming data, and machine learning.
In this study, we introduce a combined approach of chemical similarity and machine learning for chemical endocrine receptor disruption prediction on a big data platform.
| » Subjects and Methods|| |
Data source and preprocessing
Biologically active chemical structures along with their binding affinity values toward androgen receptor (AR) and estrogen receptor (ER) datasets were obtained from ChEMBL public chemical database. Chemical structures were represented in the form of SMILES and their bioactivities were denoted in the form of inhibitory constant (IC50) values.
Molecules of unclear activity values, missing values, and duplicate structures were filtered from the datasets. AR and ER datasets after filtering process remained 1793 and 2307 molecules each. Molecules having an IC50 threshold value of ≤500 were considered as “active” and those above 500 were considered as “inactive.”
Separate spreadsheets were created for AR and ER datasets with SMILE notations and activity labels. These spreadsheets were loaded into Hadoop Distributed File System (HDFS) for further processing.
Software and computing environment
Ubuntu 12.04 Single cluster installed with Hadoop 2.6.4 (Apache Software Foundation, USA, 2016) environment was used for parallel computing and analytics. Spark 1.5.2 (Apache Software Foundation, USA, 2015) and Machine Learning Package MLlib were used for statistical computing and building predictive models. R Statistical Software 3.2.4 (R Core Team, 2018) was used for fingerprint generation and similarity estimations. Hive 1.2.1 (Apache Software Foundation, USA, 2015) was used for storing datasets in table format in HDFS.
Kier-Hall Electro Topological state (E-state) molecular descriptors were generated for both AR and ER datasets. Dataset spreadsheets amended with molecular 2D fingerprints were converted to matrices and stored on HDFS for parallel processing. RCDK package in R-Statistical software (R Core Team, 2018) was used for generating the molecular fingerprints.
Molecular similarity estimation
Molecular similarities among molecules were calculated using Tanimoto method based on 2D “fingerprints.” Binary vectors with one (1) indicated the presence of the fragment and zero (0) indicated its absence. It was used to relate structural keys, hashed fingerprints or continuous data such as topological indexes that take into account size, degree of branching, and overall shape.
Dataset matrices along with Kier-Hall E-state molecular descriptors (2D fingerprints) were created to measure the similarity. Structural similarity among the training molecules and test molecules was estimated. Each molecule's structural similarity toward every other molecule in the dataset was estimated. Similarity values were sorted, and separate spreadsheets were created.
Training and test datasets
Datasets were divided into two groups of training and test on an 80%:20% ratio. Random sampling approach was used for training and test molecule selection to maintain the equal distribution of molecular structural diverseness. AR dataset contained 1435 training and 358 test molecules, whereas ER dataset contained 1847 training and 460 test molecules.
HDFS and Hive Data warehouse
Datasets after fingerprint calculations were loaded into the HDFS. Hive was configured and tables were created to store and retrieve molecule structures from HDFS. Separate Hive tables were created based on AR and ER data structure and schema.
Apache spark and machine learning
Apache Spark and its machine learning package SparkML were used for implementing and testing machine learning algorithms on a Hadoop Big Data environment. Datasets were retrieved from respective Hive tables for testing statistical algorithms.
Artificial neural network
Multiple machine learning algorithms were experimented to reach the best accuracy in predicting a molecule as endocrine disruptor or nondisruptor. Artificial neural network showed impressive prediction power and selected for further investigation. Two hidden layers consisting of five neurons in the first and four neurons in the second formed the architecture for artificial neural network algorithm [Figure 1].
| » Results and Discussion|| |
Atom fractions in datasets
Atom fractions in training and test groups of AR and ER did not show significant variation. However, ER test group showed elevated carbon counts which probably could lead to higher molecular weights and influence the disruptive behavior.
AR active group showed almost equal fraction of carbon count but slightly elevated fluorine count. All other atoms were showing almost equal distribution in both the groups of datasets [Figure 2].
Artificial neural network-based predictive model
Artificial neural network model over AR and ER test datasets showed 82.12% and 81.3% prediction accuracies, respectively. This was a similar range to that achieved using neural networks in nuclear receptor and stress panel. There were 129, 174 true positives, 165, 200 true negatives, 50, 30 false positives, and 14, 56 false negatives, respectively, for AR and ER test datasets.
Molecular similarity and influence on endocrine receptor disruption
Predictions from artificial neural network model were further screened based on the Tanimoto molecular similarity and minimum threshold values. Similarity threshold is set to 0.9 and predictions were revised.
Combined endocrine receptor disruption prediction of artificial neural network and molecular similarity models
Combining artificial neural network and similarity models, final prediction accuracies were estimated to be 86.31% and 89.57% for AR and ER test datasets, respectively. There were 173, 206 true positives; 136, 206 true negatives; 43, 24 false positives; and 6, 24 false negatives, respectively, for AR and ER test datasets.
Quantitative assessment of prediction accuracy for novel compounds was a very authoritative feature in the improvement of absorption, distribution, metabolism, and excretion-toxicity (ADMET) models and prediction methods for the physicochemical properties. Incorporation of prediction accuracy significantly improved the quality of compound selection [Figure 3].
|Figure 3: Prediction accuracies for androgen receptor and estrogen receptor datasets|
Click here to view
Reference molecule behind endocrine receptor disruption prediction
Training molecules behind endocrine receptor disruption prediction were retrieved based on similarity coefficients. Test molecules predicted as disruptor were compared with structurally similar molecules in training datasets. Model predictions were further evaluated with comparing the IC50 values of predicted test molecules and training set molecules. IC50 comparisons were in line with the acceptance of predictive model accuracies.
Compared molecules show similar activity ranges. Compounds such as BPA, a commonly used synthetic pesticide and genistein, an isoflavone obtained from a number of plants such as soy were predicted to be endocrine disruptors confirming their endocrine disruptive nature. Whereas another isoflavone from the same plant soy named daidzein was predicted as endocrine nondisruptor demonstrating its inactive nature in endocrine disruption. The model was used to decide whether the new endocrine disruptor is worthwhile or not as it has the ability to predict endocrine receptor disruption in terms of IC50 [Table 1].
Molecular fragments and functional groups importance by artificial neural network model
Molecular fragments and functional groups were ranked based on their importance in formatting the artificial neural network. Cl, N, O, Br, and S were the top five ranked functional groups based on their ANN importance. This was a sign of their role contributed to the activity toward AR and ER and respective disruption abilities [Figure 4].
| » Conclusion|| |
Combining artificial neural network and chemical similarity approach showed significant power in chemical endocrine receptor disruption prediction. Big data approach has a substantial role in handling huge chemical data and ADMET screening. Combined approach gave 86.31% and 89.57%, respectively, for AR and ER datasets which is a very good prediction power. Phytoestrogenic isoflavone genistein from soy was predicted as active or disruptive, whereas the other phytoestrogenic isoflavone daidzein from the same soy was predicted as inactive or nondisruptive against endocrine receptors. Machine learning has a significant role in accurate chemical endocrine receptor disruption prediction and classification. This method can be further tested with diverse datasets and prediction power can be investigated.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| » References|| |
Lombardino JG, Lowe JA 3rd
. The role of the medicinal chemist in drug discovery – Then and now. Nat Rev Drug Discov 2004;3:853-62.
Maggiora G, Vogt M, Stumpfe D, Bajorath J. Molecular similarity in medicinal chemistry. J Med Chem 2014;57:3186-204.
Eckert H, Bajorath J. Molecular similarity analysis in virtual screening: Foundations, limitations and novel approaches. Drug Discov Today 2007;12:225-33.
Becker H, Naaman M, Gravano L. Learning similarity metrics for event identification in social media. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York, USA: ACM; 2010. p. 291-300.
Reisen F, Zhang X, Gabriel D, Selzer P. Benchmarking of multivariate similarity measures for high-content screening fingerprints in phenotypic drug discovery. J Biomol Screen 2013;18:1284-97.
Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today 2006;11:1046-53.
Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P, et al.
Similarity coefficients for binary chemoinformatics data: Overview and extended comparison using simulated and real data sets. J Chem Inf Model 2012;52:2884-901.
Leeson P. In: SCINOVO, unlocking the value of drug candidates. J Am Med Assoc (Stevenage, UK) 2013;309:220.
Paul SM, Mytelka DS, Dunwiddie CT, Persinger CC, Munos BH, Lindborg SR, et al.
How to improve R and D productivity: The pharmaceutical industry's grand challenge. Nat Rev Drug Discov 2010;9:203-14.
National Institute of Health. Endocrine Disruptors; 2010. Available from: http://www.niehs.nih.gov
. [Last accessed on 2016 May 23].
Nigsch F, Lounkine E, McCarren P, Cornett B, Glick M, Azzaoui K, et al.
Computational methods for early predictive safety assessment from biological and chemical data. Expert Opin Drug Metab Toxicol 2011;7:1497-511.
Enoch SJ, Cronin MT, Schultz TW, Madden JC. Quantitative and mechanistic read across for predicting the skin sensitization potential of alkenes acting via Michael addition. Chem Res Toxicol 2008;21:513-20.
Gleeson MP, Modi S, Bender A, Robinson RL, Kirchmair J, Promkatkaew M, et al.
The challenges involved in modeling toxicity data in silico
: A review. Curr Pharm Des 2012;18:1266-91.
Gombar VK, Mattioni BE, Zwicki C, Deahl JT. Wiley series on technologies for the pharmaceutical industry. In: Ekins S, editor. Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals. Hoboken, New Jersey: John Wiley and Sons; 2007. p. 183-95.
Kontijevskis A, Komorowski J, Wikberg JE. Generalized proteochemometric model of multiple cytochrome p450 enzymes and their inhibitors. J Chem Inf Model 2008;48:1840-50.
Du-Cuny L, Chen L, Zhang S. A critical assessment of combined ligand- and structure-based approaches to HERG channel blocker modeling. J Chem Inf Model 2011;51:2948-60.
Obiol-Pardo C, Gomis-Tena J, Sanz F, Saiz J, Pastor M. A multiscale simulation system for the prediction of drug-induced cardiotoxicity. J Chem Inf Model 2011;51:483-92.
Valerio L Jr. Tools for evidence-based toxicology: Computational-based strategies as a viable modality for decision support in chemical safety evaluation and risk assessment. Hum Exp Toxicol 2008;27:757-60.
Judson PN. Wiley series on technologies for the pharmaceutical industry. In: Ekins S, editor. Computational Toxicology: Risk Assessment for Pharmaceutical and Environmental Chemicals. Hoboken, New Jersey: John Wiley and Sons; 2007. p. 521-43.
Hughes JD, Blagg J, Price DA, Bailey S, Decrescenzo GA, Devraj RV, et al.
Physiochemical drug properties associated with in vivo
toxicological outcomes. Bioorg Med Chem Lett 2008;18:4872-5.
Pearce BC, Sofia MJ, Good AC, Drexler DM, Stock DA. An empirical process for the design of high-throughput screening deck filters. J Chem Inf Model 2006;46:1060-8.
Huth JR, Song D, Mendoza RR, Black-Schaefer CL, Mack JC, Dorwin SA, et al.
Toxicological evaluation of thiol-reactive compounds identified using a la assay to detect reactive molecules by nuclear magnetic resonance. Chem Res Toxicol 2007;20:1752-9.
Ekins S, Freundlich JS. Validating new tuberculosis computational models with public whole cell screening aerobic activity datasets. Pharm Res 2011;28:1859-69.
Kortagere S, Krasowski MD, Reschly EJ, Venkatesh M, Mani S, Ekins S, et al.
Evaluation of computational docking to identify pregnane X receptor agonists in the ToxCast database. Environ Health Perspect 2010;118:1412-7.
Ekins S, Williams AJ, Krasowski MD, Freundlich JS. In silico
repositioning of approved drugs for rare and neglected diseases. Drug Discov Today 2011;16:298-310.
Jeevan M. Fundamental Methods of Data Science: Classification, Regression and Similarity Matching; 2015. Available from: http://www.kdnuggetts.com
. [Last accessed on 2016 May 23].
Mayr A, Klambauer G, Unterthiner T and Hochreiter S. DeepTox: Toxicity Prediction using Deep Learning. Front Environ Sci 2016;3:80.
Tetko IV, Bruneau P, Mewes HW, Rohrer DC, Poda GI. Can we estimate the accuracy of ADMET predictions? Drug Discov Today 2006;11:700-7.
Menden MP, Iorio F, Garnett M, McDermott U, Benes CH, Ballester PJ, et al.
Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One 2013;8:e61318.
[Figure 1], [Figure 2], [Figure 3], [Figure 4]