Congratulations
We heartly congratulate our colleague Aljoša on obtaining his Ph.D. in the field of Pharmacy, which we celebrated all together with dinner at a traditional Heurigen in Vienna. We are very happy, that Aljoša will stay in the Pharmacoinforamtics Research Group to start his postdoctoral career. His research will focus on applying data science methodologies to develop in silico predictive and translational safety models, leveraging advanced machine learning and deep learning algorithms.
Aljoša was a PhD Student in the MolTag program and contributed to the eTransafe and Risk-Hunt3r projects.
Thesis Abstract
In silico toxicology is becoming increasingly significant in the field of drug discovery. Recent advances in this field and the utilization of predictive models via traditional machine learning (ML) and artificial neural networks (ANNs) have demonstrated to be effective approaches in screening strategies and drug design. As this area of research is progressing, one challenge remains: the availability of data that can be utilized for such approaches in the public domain. In this thesis, we explore different data sources, the impact of these different sources on predictive ML models and provide a method for sharing and re-training ML models that can be utilized by the community. The findings aim to enhance understanding in model building that can be applied for the prevention of potential adverse events or the search for a drug candidate. The research in this work is organized into three studies and a review describing different machine learning approaches for combating proprietary issues. The first study provides a method for sharing and re-training ML models for six transporters related to the ABC and SLC family. Moreover, models can be created quickly and facile for all six transporters due to the identification of molecular descriptors and hyperparameters via conducted extensive statistical analyses which applicable for all six transporters. The second study contributed to the discovery of a positive data bias within the ChEMBL database and differences in the performance of ML models for off-targets trained on ChEMBL and Roche data sets. Moreover, the study indicated a significant difference in model performance, models trained on ChEMBL overpredicted positives and Roche models overpredicted negatives, when models were trained on the respective data sets. Additionally, both models were utilized for consensus predictions and two drugs were indicated that could be linked to potential AChE inhibition. The third study aimed to explore data sets from proprietary and publicly available data domain for target predictions. Data sets for 40 different targets within Bayer AG and ChEMBL have been collected for comparison analyses to identify differences between the two data sources and identify its impact on ML models when applied on each other’s data source. In addition, different strategies such as assay format information and Tanimoto similarities have been applied for merging data sets for ML model utilization. The study revealed significant differences between Bayer AG and ChEMBL data sources and their limitations only to be applicable on own training data source. MCC values of 0.3 indicated ineffectiveness of ML models for the majority of targets when applied outside their training data source. We demonstrate that considering assay format information may improve predictive quality in some cases while taking chemical similarity into account appears negligible.
Keywords
Toxicity / Data Science / Machine Learning