COVER: conformational oversampling as data augmentation for molecules

Author(s)
Jennifer Hemmerich, Ece Asilar, Gerhard F. Ecker
Abstract

Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Organisation(s)
Journal
Journal of Cheminformatics
Volume
12
No. of pages
12
ISSN
1758-2946
DOI
https://doi.org/10.1186/s13321-020-00420-z
Publication date
12-2020
Peer reviewed
Yes
Austrian Fields of Science 2012
102019 Machine learning
Portal url
https://ucris.univie.ac.at/portal/en/publications/cover-conformational-oversampling-as-data-augmentation-for-molecules(3a45ddca-cd3d-4031-8152-3422fd4c0e11).html