COVER: conformational oversampling as data augmentation for molecules

24.03.2020

Publication in Journal of Cheminformatics - we are very happy that this open access article has now been published. A lot to celebrate once the #pharminfo group can meet again. Keywords: Deep learning, Toxicity, Imbalanced learning, Upsampling

 

Congratulations go to Jennifer Hemmerich, who is recieving funding from Moltag, and our former colleague Ece Asilar. Well done!

Abstract

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Please find the article at https://doi.org/10.1186/s13321-020-00420-z

 More News

Open Access
 

Combining bioactivity data from different sources for ML predictions can lead to high variance in values and differences in chemical space. This study...

Open Access
 

Given the complexity of the T-cell response, we explored different approaches to enhance the model’s performance and generalizability. This involved...

Open Access
 

Are you curious what is known about SLCs? And how they are related? We are enthusiastic about our manuscript presenting the data- and...