.. index:: Introduction Introduction ============ .. index:: About About ----- *CDPKit* (short for *Chemical Data Processing Toolkit*) is an open-source cheminformatics toolkit implemented in C++. CDPKit comprises a suite of software tools and a programming library called the *Chemical Data Processing Library* (CDPL) which provides a high-quality and well-tested modular implementation of basic functionality typically required by any higher-level software application in the field of cheminformatics. In addition to the CDPL C++ API, an equivalent Python-interfacing layer is provided that allows to harness all of CDPL's functionality easily from Python code. .. index:: Key Features .. rubric:: Key Features - Data structures for the representation and processing of molecules, chemical reactions and pharmacophores - Routines for all typical cheminformatics pre-processing tasks (e.g. ring and aromaticity perception, stereochemistry processing, ...) - Powerful methods for molecule and reaction substructure searching - Readers/writers for various file formats (MDL Mol, SDF, Rxn, RDF, Mol2, PDB, MMTF, SMILES, SMARTS, etc.) allowing the I/O of small molecule, macromolecular, reaction and pharmacophore data - Molecule fragmentation algorithms (RECAP :cite:`doi:10.1021/ci970429i`, BRICS :cite:`https://doi.org/10.1002/cmdc.200800178`) - Generation of molecule and pharmacophore fingerprints (e.g. ECFP :cite:`doi:10.1021/ci100050t`) - Large collection of implemented chemical structure descriptors - 2D structure layout and rendering of molecules and reactions - Gaussian shape-based molecule alignment and descriptor calculation :cite:`https://doi.org/10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-K` - Pharmacophore generation, alignment and screening - 3D structure and conformer generation :cite:`doi:10.1021/acs.jcim.3c00563` - Prediction of a wide panel of physicochemical properties - Full-blown test-suite compliant implementation of the MMFF94 :cite:`https://doi.org/10.1002/(SICI)1096-987X(199604)17:5/6<490::AID-JCC1>3.0.CO;2-P` force field - Runs without flaws on Linux, macOS and Windows - C++ implementation follows best practices for a maximum of robustness and speed - ... and many more ... .. rubric:: Machine Learning Integration CDPKit seamlessly integrates with machine learning libraries like `scikit-learn <https://scikit-learn.org>`_, `PyTorch <https://pytorch.org>`_, and `TensorFlow <https://www.tensorflow.org>`_. Utilizing CDPKit for tasks like molecular data I/O, feature extraction, descriptor calculations, and so on, greatly aids scientists that intend to build ML models for the prediction of physicochemical properties, biological activity, site of metabolism , toxicity, and other attributes of potential drug candidates. An example of such an integration with ML methods is showcased in the source code of the software described in :cite:t:`molecules26206185`. .. index:: License License ------- The CDPKit source code is released under the terms of the `GNU Lesser General Public License (LGPL) V2.1-or-later <https://www.gnu.org/licenses/old-licenses/lgpl-2.1.en.html>`_. CDPKit documentation is licensed under the terms of the `GNU Free Documentation License (GFDL) V1.2-or-later <https://www.gnu.org/licenses/old-licenses/fdl-1.2.en.html>`_. Code snippets in tutorials and the source code of CDPL programming examples are distributed under the terms of the `Zero-Clause BSD License (0BSD) <https://opensource.org/license/0bsd>`_. .. index:: Related Software Related software ---------------- Examples of software projects using CDPKit functionality: - `FAME.AL: Site-of-metabolism prediction with active learning <https://github.com/molinfo-vienna/FAME.AL>`_ :cite:`doi:10.1021/acs.jcim.3c01588` - `Python scripts for the generation of GRAIL datasets <https://github.com/molinfo-vienna/GRAIL-Scripts>`_ :cite:`doi:10.1021/acs.jctc.8b00495` - `Scripts implementing the Common Hits Approach (CHA) <https://github.com/molinfo-vienna/commonHitsApproach>`_ :cite:`doi:10.1021/acs.jcim.6b00674` - `Workflow scripts for the generation of receptor-based pharmacophore models (apo2ph4) <https://github.com/molinfo-vienna/apo2ph4>`_ :cite:`ph15091122` - `Analysis of MD-trajectories of ligand-receptor complexes regarding the frequency of observable non-bonding interactions <https://github.com/molinfo-vienna/Ligand-Interaction-Maps>`_ - `Implementation of the QPhAR algorithm <https://github.com/StefanKohlbacher/QuantPharmacophore>`_ :cite:`doi:10.1021/acs.jcim.6b00674` .. index:: Publications Scientific publications ----------------------- Published scientific work that relies on CDPKit functionality: .. bibliography:: :list: bullet :filter: False doi:10.1021/acs.jcim.3c01588 doi:10.1021/acs.jcim.3c00563 molecules26206185 doi:10.1021/acs.jcim.2c00814 ph15091122 Kohlbacher2021 doi:10.1021/acs.jctc.8b00495 doi:10.1021/acs.jcim.6b00674 .. index:: Citing How to cite ----------- - *Source code:* Thomas Seidel, *Chemical Data Processing Toolkit source code repository*, https://github.com/molinfo-vienna/CDPKit - *Documentation:* Thomas Seidel, Oliver Wieder, *Chemical Data Processing Toolkit documentation pages*, https://cdpkit.org .. index:: People, Authors People ------ - `Thomas Seidel <https://cheminfo.univie.ac.at/people/senior-scientists/thomas-seidel>`__ (project founder, main developer) - `Oliver Wieder <https://cheminfo.univie.ac.at/people/post-doctoral-researchers/oliver-wieder>`__ (documentation)