COVER: conformational oversampling as data augmentation for molecules

24.03.2020

Publication in Journal of Cheminformatics - we are very happy that this open access article has now been published. A lot to celebrate once the #pharminfo group can meet again. Keywords: Deep learning, Toxicity, Imbalanced learning, Upsampling

 

Congratulations go to Jennifer Hemmerich, who is recieving funding from Moltag, and our former colleague Ece Asilar. Well done!

Abstract

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Please find the article at https://doi.org/10.1186/s13321-020-00420-z

 More News

Open Access
 

Are you curious what is known about SLCs? And how they are related? We are enthusiastic about our manuscript presenting the data- and...

News
 

Tarik Ćerimagić successfully defended his master thesis: "A Multi-Task Deep Neural Network Approach for Data Imputation of SLC Transporter...

News
 

On July 10th, 2024 our colleague Aljoša successfully defended his PhD thesis: "Machine Learning Approaches for Off-Target and Bioactivity...