Analoging in large databases with structural fingerprint features

  • Djork-Arné Clevert (Speaker)

Activity: Talk or presentationContributed talkunknown

Description

Analogs share a similar bioactivity with a given lead compound and are vital for drug design helping to improve the final product in terms of effectivity, toxicity, side effects, bacterial resistance and other limitations or optimizations. Structure–Activity Relationship (SAR) is the principle that structural similar molecules have similar activities. Here we propose a method which exploits gene expression data to derive a subset of structural fingerprint features indicative for a gene of interest. Using these fingerprint features a Support Vector Machine is trained and afterwards used to identify analogs in a large database. To get reasonable results two points are crucial: Select the relevant fingerprint features indicative for the bioactivity and the method has to be fast enough to scale with the data e.g. ChEMBL. Both requirements are fulfilled by the Potential Support Vector Machine (P-SVM). To avoid selecting features stemming from possible compound outliers we have defined a robust feature selection protocol based on Leave-One-Out Cross-validation and feature ranking. We briefly introduce the P-SVM focusing on the feature selection capabilities and characteristics of the P-SVM and present the robust feature selection protocol. Based on an example from a gene expression study with 62 compounds with 3200 structural fingerprint features the results of a ChEMBL analog search are shown.
Period25 Sept 2012
Event titleNon-Clinical Statistics Conference 2012
Event typeConference
LocationGermanyShow on map

Fields of science

  • 106005 Bioinformatics
  • 305 Other Human Medicine, Health Sciences
  • 102018 Artificial neural networks
  • 102 Computer Sciences
  • 106041 Structural biology
  • 101029 Mathematical statistics
  • 106023 Molecular biology
  • 106013 Genetics
  • 106002 Biochemistry
  • 102001 Artificial intelligence
  • 101004 Biomathematics
  • 102015 Information systems

JKU Focus areas

  • Computation in Informatics and Mathematics
  • Nano-, Bio- and Polymer-Systems: From Structure to Function