Rita Schwaha

Title of the Doctoral Thesis: Similarity based classification studies for prediction of ABCB1 (P-glycoprotein) substrates and non-substrates.

Publishing year: 2013

Tags: ABCB1 / classification / random forest / similarity based / SIBAR / support vector machine / Binary QSAR


Abstract

ABC (ATP-binding cassette) transporters represent membrane bound efflux pumps dependent on ATP with the most prominent member being ABCB1 or P-glycoprotein. This protein is placed at important junctions like the blood-brain barrier, the gut wall mucosa, the placenta and among others hepatobiliary pathways and consequently plays a major role in drug-drug interactions and multi-drug resistance. For this reason recognition of substrate properties and reliable labelling of non-substrates gain more and more importance. In-house derived similarity based descriptors (SIBAR) were previously developed with focus on poly-specific proteins like ABCB1. These descriptors depend upon a set of reference compounds and consist of the calculated euclidian distance between descriptor values of the reference set and the training and test set. The number of final descriptors therefore is dependent on the number of compounds in the reference set. Four different reference sets have been derived and their usability is discussed. The first three reference sets are based on Tudor Oprea's chemography idea of satellite structures on the fringes of the chemical space. The fourth reference set is based on in-house results favouring a tailored reference set to the training set. The focus of this work lay in the exploration of the 3D usability of the SIBAR descriptors and the impact of shape similarity based on a consistent data set. Also the establishment of a suitable reference set for a classification model for ABCB1 is discussed based on 240 highly diverse natural compounds. In order to achieve this goal a variety of different descriptor types and machine learning approaches were performed. These include 2D descriptors, Labute's VSA descriptors, 3D Autocorrelation descriptors and VolSurf descriptors. To further explore the concept of shape similarity the parameters derived from the program ROCS of Openeye via shape overlay between two molecules were also used as descriptors. The machine learning approaches primarily encompass binary QSAR, support vector machine and random forest. Results show that 2D descriptors compare very creditably with 3D or shape based methods and especially the VSA descriptors presented the best model so far with an overall accuracy of 83%. As appropriate reference set reference set B is preferred if a quick overview is necessary. For more detailed analysis of one's data a more time-consuming tailored reference set is the reference set of choice. The overall results were satisfying and especially some of the ROCS parameters showed good performance.