Research
The process of discovering a drug follows a time-consuming and expensive pipeline that explores the chemical space of potential drugs today mainly based on wet-lab experiments and database searches. High expectations are placed on deep learning methods to simplify this process.
We believe that such neuro-explicit approaches are a key concept for substantial advances in drug discovery. Our research focuses on hybrid approaches where domain knowledge is integrated into neural learning models in various forms. This can significantly improve the generalization capabilities of neural models, allow extrapolation beyond training data, and thus need less data to learn from.
Publications
2023
Backenköhler, Michael; Kramer, Paula Linh; Groß, Joschka; Großmann, Gerrit; Joeres, Roman; Tagirdzhanov, Azat; Sydow, Dominique; Ibrahim, Hamza; Odje, Floriane; Wolf, Verena; others,
TeachOpenCADD goes Deep Learning: Open-source Teaching Platform Exploring Molecular DL Applications Working paper
2023.
@workingpaper{teachopencadd-dl,
title = {TeachOpenCADD goes Deep Learning: Open-source Teaching Platform Exploring Molecular DL Applications},
author = {Michael Backenköhler and Paula Linh Kramer and Joschka Groß and Gerrit Großmann and Roman Joeres and Azat Tagirdzhanov and Dominique Sydow and Hamza Ibrahim and Floriane Odje and Verena Wolf and others},
url = {https://chemrxiv.org/engage/chemrxiv/article-details/646b465ff2112b41e9f49997},
doi = {10.26434/chemrxiv-2023-kz1pb},
year = {2023},
date = {2023-05-29},
urldate = {2023-05-29},
abstract = {TeachOpenCADD is a free online platform that offers solutions to common computer-aided drug design (CADD) tasks using Python programming and open-source data and packages. The material is presented through interactive Jupyter notebooks, accommodating users from various backgrounds and programming levels. Due to the tremendous impact of deep learning (DL) methods in drug design, the TeachOpenCADD platform has been expanded to include an introduction to molecular DL tasks. This edition provides an overview of DL and its application in drug design, highlighting the usage of diverse molecular representations in this field. The platform introduces various neural network architectures, including graph neural networks (GNNs), equivariant graph neural networks (EGNNs), and recurrent neural networks (RNNs). It demonstrates how to use these architectures for developing predictive models for molecular property and activity prediction, exemplified by the Quantum Machine $9$ (QM$9$), ChEMBL, and Kinase Inhibitor BioActivity (KiBA) data sets. The DL edition covers methods for evaluating the performance of neural networks using uncertainty estimation. Furthermore, it introduces an application of GNNs for protein-ligand interaction predictions, incorporating protein structure and ligand information. The TeachOpenCADD platform is continuously updated with new content and is open to contributions, bug reports, and questions from the community through its GitHub repository (url{github.com/volkamerlab/teachopencadd}). It can be used for self-study, classroom instruction, and research applications, accommodating users from beginners to advanced levels.},
keywords = {},
pubstate = {published},
tppubtype = {workingpaper}
}
Volkamer, Andrea; Riniker, Sereina; Nittinger, Eva; Lanini, Jessica; Grisoni, Francesca; Evertsson, Emma; Rodríguez-Pérez, Raquel; Schneider, Nadine
Machine learning for small molecule drug discovery in academia and industry Journal Article
In: Artificial Intelligence in the Life Sciences, vol. 3, pp. 100056, 2023, ISSN: 2667-3185.
@article{VOLKAMER2023100056,
title = {Machine learning for small molecule drug discovery in academia and industry},
author = {Andrea Volkamer and Sereina Riniker and Eva Nittinger and Jessica Lanini and Francesca Grisoni and Emma Evertsson and Raquel Rodríguez-Pérez and Nadine Schneider},
url = {https://www.sciencedirect.com/science/article/pii/S2667318522000265},
doi = {https://doi.org/10.1016/j.ailsci.2022.100056},
issn = {2667-3185},
year = {2023},
date = {2023-01-01},
journal = {Artificial Intelligence in the Life Sciences},
volume = {3},
pages = {100056},
abstract = {Academic and pharmaceutical industry research are both key for progresses in the field of molecular machine learning. Despite common open research questions and long-term goals, the nature and scope of investigations typically differ between academia and industry. Herein, we highlight the opportunities that machine learning models offer to accelerate and improve compound selection. All parts of the model life cycle are discussed, including data preparation, model building, validation, and deployment. Main challenges in molecular machine learning as well as differences between academia and industry are highlighted. Furthermore, application aspects in the design-make-test-analyze cycle are discussed. We close with strategies that could improve collaboration between academic and industrial institutions and will advance the field even further.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Born, Jannis; Markert, Greta; Janakarajan, Nikita; Kimber, Talia B.; Volkamer, Andrea; Martínez, María Rodríguez; Manica, Matteo
Chemical representation learning for toxicity prediction Journal Article
In: Digital Discovery, pp. -, 2023.
@article{D2DD00099G,
title = {Chemical representation learning for toxicity prediction},
author = {Jannis Born and Greta Markert and Nikita Janakarajan and Talia B. Kimber and Andrea Volkamer and María Rodríguez Martínez and Matteo Manica},
url = {http://dx.doi.org/10.1039/D2DD00099G},
doi = {10.1039/D2DD00099G},
year = {2023},
date = {2023-01-01},
journal = {Digital Discovery},
pages = {-},
publisher = {RSC},
abstract = {Ündesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures."},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
Bachelor’s/Master’s Thesis Topics
We offer a variety of different topics for Bachelor’s/Master’s theses in the area of deep learning for computer-aided drug discovery and design. If you are interested in doing your thesis with our group, please contact Gerrit Großmann.