Using Jupyter Notebooks for re-training machine learning models


The importance of machine learning (ML) approaches in drug discovery and in silico toxicity prediction has shown a significant increase in recent years. As available toxicity data has significantly increased, ML approaches became an essential part of the drug discovery pipeline.

Smajić, A., Grandits, M. & Ecker, G.F. Using Jupyter Notebooks for re-training machine learning models. J Cheminform 14, 54 (2022).



Machine learning (ML) models require an extensive, user-driven selection of molecular descriptors in order to learn from chemical structures to predict actives and inactives with a high reliability. In addition, privacy concerns often restrict the access to sufficient data, leading to models with a narrow chemical space. Therefore, we propose a framework of re-trainable models that can be transferred from one local instance to another, and further allow a less extensive descriptor selection. The models are shared via a Jupyter Notebook, allowing the evaluation and implementation of a broader chemical space by keeping most of the tunable parameters pre-defined. This enables the models to be updated in a decentralized, facile, and fast manner. Herein, the method was evaluated with six transporter datasets (BCRP, BSEP, OATP1B1, OATP1B3, MRP3, P-gp), which revealed the general applicability of this approach.


This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No 777365 (eTRANSAFE). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. The Pharmacoinformatics Research Group (Ecker lab) acknowledges funding provided by the Austrian Science Fund FWF AW012321 MolTag.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Schematic overview of the descriptor analysis carried out for both ABC and SLC transporters

 More News


The Department of Pharmaceutical Sciences is looking for a dedicated Third-Party Funding Officer for the administrative coordination of externally...


The eTRANSAFE consortium members got together on 20-21 October for the second face-to-face meeting of the year. The event also featured the 2nd...


The consortia meetings of REsolution and RESOLUTE took place in Berlin. The event, hosted by Bayer, gathered 60 international participants. On October...