COVER: conformational oversampling as data augmentation for molecules
- Author(s)
- Jennifer Hemmerich, Ece Asilar, Gerhard F. Ecker
- Abstract
Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.
- Organisation(s)
- Journal
- Journal of Cheminformatics
- Volume
- 12
- No. of pages
- 12
- ISSN
- 1758-2946
- DOI
- https://doi.org/10.1186/s13321-020-00420-z
- Publication date
- 03-2020
- Peer reviewed
- Yes
- Austrian Fields of Science 2012
- 102019 Machine learning
- Keywords
- ASJC Scopus subject areas
- Library and Information Sciences, Computer Science Applications, Physical and Theoretical Chemistry, Computer Graphics and Computer-Aided Design
- Portal url
- https://ucrisportal.univie.ac.at/en/publications/cover-conformational-oversampling-as-data-augmentation-for-molecules(3a45ddca-cd3d-4031-8152-3422fd4c0e11).html