Congratulations
Congratulations to our colleague Tarik for obtaining his master's degree in Drug Discovery and Development! We are very happy, that Tarik will stay in the Pharmacoinforamtics Research Group to do his PhD. His research focuses on data imputation approaches for SLC transporters using artificial intelligence. He employs methods such as proteochemometrics and multi-task deep neural networks to facilitate shared representation of knowledge between individual targets as output endpoints. Additionally, Tarik has a strong interest in in silico toxicity predictions within the field of small-molecule drug discovery.
Thesis Abstract
Solute carrier (SLC) transporters are one of the major groups of proteins that facilitate transport of a diverse set of molecules across the cell membranes. As regulators of cellular physiology, they are targeted for treatment of psychological and neurodegenerative disorders, metabolic diseases, and cancer. Furthermore, SLC proteins are ubiquitously expressed, and therefore, suitable candidates for targeted drug delivery. Artificial intelligence (AI) has attracted a lot of attention over the past decade with varying forms of utility in drug discovery and development. Data imputation is a method that attempts to address the challenges of sparse data-matrices characterised with incomplete information by filling the missing values. In the context of machine learning, this approach can leverage the shared representation of knowledge by establishing the relationships between different tasks as endpoints. Reliable affinity predictions can assist scientists during important decision-making steps and mitigate risks of attrition rates at early drug development stages. We developed a multi-task deep neural network (MTDNN) to impute the missing affinity values of 87 SLC transporters. The model was trained on a curated dataset of 9.182 compounds across 87 tasks as endpoints. The results indicate that the MTDNN can achieve moderate performance when tested across all tasks simultaneously. However, when evaluated against proteins individually, the model showed different degrees of confidence intervals with some targets that consistently scored high while others varied. Deeper analysis of the chemical space revealed that the ligands of the target with the lowest performance scores occupy an isolated domain implying lack of shared representation with other SLC proteins. Further investigation of the predictions with highest error values indicated activity cliffs within a dataset that could compromise the performance of the model. Although the MTDNN was trained on a relatively small and imbalanced dataset, it was able to show promising evidence as a data imputation tool for protein members of a single superfamily.
Keywords
Artificial intelligence in drug discovery / Chemical space / Data imputation / Multi-task learning / SLC transporters