Doha Naga

Title of the Doctoral Thesis: Machine learning tools for multivariate early assessment of small molecules in drug discovery and development

Publishing year: 2022

Tags: Drug discovery / drug development / preclinical safety / physiologically based pharmacokinetic modeling / PBPK / machine learning / deep learning


Bringing a drug with an optimum efficacy, pharmacokinetics and safety successfully to the market requires in-vitro, in-vivo and human testing which makes it a very costly, lengthy and complex process. However, the emergence of new computational tools and the accumulation of multi-variate drug data within pharmaceutical companies has facilitated the prediction of testing readouts. The main and overall goals of this work are minimizing animal use, shortening cycle times and eliminating late-stage failures of molecules. These goals are served through leveraging Roche in-house preclinical data and using machine learning tools in the prediction of important preclinical readouts such as off-target activities, pharmacokinetic (PK) parameters and in-vivo toxicity findings. Off-target interactions have been linked to many adverse events and thus it is important to predict them early in the drug discovery pipeline. Study 1 presents an open-source machine learning tool for the prediction of off-target activities of compounds based on their chemical structure. The tool incorporates and compares various machine learning approaches. The data behind the models was based on an in-house panel of 50 off-targets, which varied in size and hit percent. High to moderate model performance was observed for the majority of the targets (~ 40 targets) and poor performance was observed for the remaining few targets. Various challenges behind poor model performance such as data imbalance and scarcity and techniques to adapt the models to such challenges are discussed. The presented off-target modelling tool can guide the chemists prior to the design of the chemical structures and thus can enable early detection of adverse events that might be caused by drug promiscuity. Moreover, the tool is easy to use with minimal programming expertise needed. Drug pharmacokinetics is another essential determinant of a drug’s success in preclinical testing. In study 2, prediction success of rat in-vivo oral (PO) and intravenous (IV) pharmacokinetic (PK) parameters were evaluated for a large diverse dataset of compounds, using a bottom-up PK approach. The approach combined in-vitro and in silico inputs with Physiologically Based Pharmacokinetic (PBPK) models and utilized several clearance scaling approaches. The scaling approaches varied in the hepatocyte intrinsic clearance and protein binding inputs. Successful predictions were shown for IV and PO parameters using conventional scaling methods (e.g. direct scaling and dilution) while less success was seen for the machine learning method, which was attributed to clearance miss-predictions. Significant reduction of simulation times was achieved via High Throughput PBPK (HT-PBPK) models, producing comparable results to full PBPK models. Achieving a similar goal to Study 1, this work can help guide chemists in the early design of molecules and in prioritizing molecules with favorable pharmacokinetics properties, in addition to minimizing animal pharmacokinetic studies. More robust and tailored clearance models are required to further improve the prediction success of PK parameters by PBPK approaches. A third critical step in preclinical testing is the in-vivo toxicity assessment studies. These studies are usually divided according to their duration into short, middle and long-term studies. In accordance with the Replacement, Reduction and Refinement (3Rs) principle, the goal of Study 3 was exploring the opportunities to minimize long term-studies. A statistical comparison was therefore performed between the adverse events observed in short-term (~116) vs long-term (~78) studies to assess the possibility of using short-term studies as a predictor for the long-term adverse events. A good concordance was seen between the short term and long-term adverse findings for all large molecules and majority of small molecules in terms of No Observed Adverse Event Level (NOAEL) changes and overall adversity. The work is presented in a form of an open-source analytical framework to allow scientists to reproduce the work and build upon it. To conclude, deploying the previously described tools in the preclinical testing pipeline can potentially contribute to shortening the testing time, minimizing the costs, and most importantly (especially study 2 and 3) decreasing animal use. However, shifting the current drug discovery and development paradigm into a more automated and computational process still requires a lot of effort and similar work should be considered on a larger scale, for example within future cross-institutional efforts.