This project aims at using information from ligand pieces bound to protein subpockets to
(semi-) automatically build new molecules tailored to a particular target pocket
on the basis of the subpockets/target pocket estimated similarities.
This workflow was tested to design new sub-micromolar hit candidates for CDK8 inhibition:
Eguida M., Schmitt-Valencia C., Hibert M., Villa, P. and Rognan, D. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking. J. Med. Chem. 2022, 65, 13771–13783. https://doi.org/10.1021/acs.jmedchem.2c00931
Please note that the publication refers to release v1.0.0
envs/
---> conda environmentscdk8_structures/
---> target structuresscripts/
---> scripts for library generationaligned_fragments.tgz
downloadable at 10.5281/zenodo.7023191 ---> output data after steps 1-3, input for step 4+output_files.tgz
downloadable at 10.5281/zenodo.7023191 ---> data obtained at each step, and depedencies
- Conda: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html environements with python 3.6+.
- IChem: http://bioinfo-pharma.u-strasbg.fr/labwebsite/download.html
- ProCare: https://github.com/kimeguida/ProCare
- DeLinker: https://github.com/oxpig/DeLinker
- RDKit: http://www.rdkit.org, https://github.com/rdkit/rdkit
sc-PDB subpockets and fragments
- input structures from the sc-PDB database
- fragmentation, cavity clouds and interactions computed with IChem
for more information on fragmentation, sc-PDBFrag
CDK8 pocket
<this_repo>/cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2
ProCare: https://github.com/kimeguida/ProCare (state of our conda env: envs/procare.yml
)
source <path_to_your_conda>
conda activate procare
cd aligned_fragments/
python ../scripts/procare_launcher.py -s <subpocket> -t ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2 --transform --ligandtransform <fragment>
Outputs:
- aligned subpockets, fragments and
procare_scores.tsv
available on zenodo - subpocket: cfh_xx_fragN_cavity4.mol2
- fragment: cfh_xx_fragN.mol2
We used OpenEye python toolkits (state of our conda env: envs/oepython.yml):
conda deactivate
conda activate oepython
../scripts/convert.py <fragment>.mol2 <fragment>.sdf
Outputs:
- sdf available on zenodo
- fragment: cfh_xx_fragN.sdf
<path_to_your_ichem>/IChem ../cdk8_structures/5hbh_protein.mol2 <fragment>.mol2 > <fragment>.ifp
Outputs:
- interactions available on zenodo
- interactions file: cfh_xx_fragN.ifp
To reproduce this step, the required data out of steps 1-to-3 were made availaible at https://zenodo.org/record/7023191 as aligned_fragments.tgz
and ouputs obtained output_files.tgz
tar -xzf aligned_fragments.tgz
cd aligned_fragments/
current directory: aligned_fragments
containing data from steps 1-3
assignment of CDK8 areas
conda deactivate
conda activate delinker
(state of our conda env: envs/delinker.yml
)
python ../scripts/select_fragments_round1.py -f ../output_files/procare_scores.tsv -d . -p ../cdk8_structures/5hbh_protein.mol2 -c ../cdk8_structures/5hbh_cavityALL_p0-p1-p6.mol2
Outputs:
subpocket_p0_gate.list
which corresponds to GA1 in papersubpocket_p0_hinge.list
---> Hsubpocket_p0_solv_1.list
---> SE2subpocket_p0_solv_2.list
---> SE1subpocket_p6_alphaC.list
---> ACsubpocket_p6_lys52.list
---> GA2
available in<this_repo>/output_files/
To reproduce these steps, the required data were made availaible at https://zenodo.org/record/7023191 as aligned_fragments.tgz
and ouputs obtained output_files.tgz
python ../scripts/linkable_fragments_round1_job.py --hinge subpocket_p0_hinge.list --gate subpocket_p0_gate.list --solv1 subpocket_p0_solv_1.list --solv2 subpocket_p0_solv_2.list --alphac subpocket_p6_alphaC.list --lys52 subpocket_p6_lys52.list
Outputs:
linkable_fragments_round1_<N>.list
with N in {0, 1, 2, 3, 4, 5, 6} available in<this_repo>/output_files/
DeLinker: https://github.com/oxpig/DeLinker
Feed DeLinker with connecatble candidates:
python ../scripts/delinker.py -f linkable_fragments_round1_<N>.list -p <your_path_to_DeLinker>/DeLinker/ > /dev/null
This step was distributed on computer clusters.
Output: generation.smi renamed and zipped as generation_<N>.smi.gz
with N in {0, 1, 2, 3, 4, 5, 6} available in <this_repo>/output_files/
Sometimes, no linker is generated and DeLinker might return truncated attempts.
python ../scripts/get_linker.py --file generation_<N>.smi.gz --fragsdir . --pathdelinker <your_path_to_DeLinker>/DeLinker/
Outputs:
generation_complete.smi
generation_uncomplete.smi
SMILES were assigned IDs to keep track of the molecules infos. Filter will protonate and generate canonical SMILES different from RDKit's.
python ../scripts/index_generated_molecules.py -i generation_complete.smi -o generation_complete_indexed.smi
Output:
generation_complete_indexed.smi
<path_to_openeye>/filter -in generation_complete_indexed.smi -out druglike_molecules.smi -fail druglike_failed_molecules.smi -filter ../output_files/filter_labo_cdk8.txt
Check that other annotations in the file did not affect how Filter processed the SMILES. In our case, we extracted the SMILES and indexes to a separate file molecules.smi
.
Output:
druglike_molecules.smi
SAscore from https://github.com/rdkit/rdkit/blob/master/Contrib/SA_Score/sascorer.py
python ../scripts/get_sascore.py -i druglike_molecules.smi -o druglike_molecules_sascore.tsv
Output:
druglike_molecules_sascore.tsv
RDKit descriptors
python ../scripts/get_druglike_descriptors.py -i druglike_molecules.smi -o druglike_molecules_descriptors.tsv
python ../scripts/get_linker_descriptors.py -i generation_complete_indexed.smi -o generation_linker_descriptors.tsv
Outputs:
druglike_molecules_descriptors.tsv
generation_linker_descriptors.tsv
Clean generated linkers, remove too flexible, hydrophobic
python ../scripts/filter_linker.py -i generation_linker_descriptors.tsv -o linker_discarded.tsv
Output:
linker_discarded.tsv
python ../scripts/library_round1.py --descriptor druglike_molecules_descriptors.tsv --sascore druglike_molecules_sascore.tsv --discarded linker_discarded.tsv -o libr1.txt
Output:
libr1.txt
Example of hit compound 12
python ../scripts/round2_fuse_mols.py --dl druglike_molecules.smi --gen generation_complete_indexed.smi --origin ../output_files/frag_origin.tsv --discarded linker_discarded.tsv --procare ../output_files/procare_scores.tsv
Outputs:
hit12_round2_mols.tsv
hit12_round2_sascore_pass.tsv
RDKit descriptors
python ../scripts/get_round2_descriptors.py -i hit12_round2_mols.tsv -o hit12_round2_mols_descriptors.tsv
Output:
hit12_round2_mols_descriptors.tsv
candidates for synthesis
python ../scripts/library_round2.py -i hit12_round2_mols_descriptors.tsv --sascore hit12_round2_sascore_pass.tsv -o libr2.txt
Output:
libr2.txt
https://github.com/kimeguida/POEM/issues
Merveille Eguida: keguida'[at]'unistra'[dot]'fr
Didier Rognan, PhD: rognan'[at]'unistra'[dot]'fr
-
Eguida, M.; Schmitt-Valencia, C.; Hibert, M.; Villa, P.; Rognan, D. Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking. J. Med. Chem. 2022, 65, 13771–13783. https://doi.org/10.1021/acs.jmedchem.2c00931
-
CACHE hits prediction Challenge #1 https://cache-challenge.org
If you use POEM, please cite:
@article{doi:10.1021/acs.jmedchem.2c00931,
abstract = {We here describe a computational approach (POEM: Pocket Oriented Elaboration of Molecules) to drive the generation of target-focused libraries while taking advantage of all publicly available structural information on protein-ligand complexes. A collection of 31 384 PDB-derived images with key shapes and pharmacophoric properties, describing fragment-bound microenvironments, is first aligned to the query target cavity by a computer vision method. The fragments of the most similar PDB subpockets are then directly positioned in the query cavity using the corresponding image transformation matrices. Lastly, suitable connectable atoms of oriented fragment pairs are linked by a deep generative model to yield fully connected molecules. POEM was applied to generate a library of 1.5 million potential cyclin-dependent kinase 8 inhibitors. By synthesizing and testing as few as 43 compounds, a few nanomolar inhibitors were quickly obtained with limited resources in just two iterative cycles.},
author = {Eguida, Merveille; Schmitt-Valencia, Christel; Hibert, Marcel; Villa, Pascal and Rognan, Didier},
title = {{Target-Focused Library Design by Pocket-Applied Computer Vision and Fragment Deep Generative Linking}},
journal = {Journal of Medicinal Chemistry},
volume = {65},
number = {20},
pages = {13771--13783},
year = {2022},
doi = {10.1021/acs.jmedchem.2c00931},
URL = {https://doi.org/10.1021/acs.jmedchem.2c00931},
}
- RDKit: Open-source cheminformatics; http://www.rdkit.org, https://github.com/rdkit/rdkit
- Da Silva, F.; Desaphy, J.; Rognan, D. IChem: A Versatile Toolkit for Detecting, Comparing, and Predicting Protein–Ligand Interactions. ChemMedChem 2018, 13, 507–510.
- Desaphy, J.; Bret, G.; Rognan, D.; Kellenberger, E. Sc-PDB: A 3D-Database of Ligandable Binding Sites—10 Years On. Nucleic Acids Res. 2014, 43, D399–D404.
- Eguida, M.; Rognan, D. A Computer Vision Approach to Align and Compare Protein Cavities: Application to Fragment-Based Drug Design. J. Med. Chem. 2020, 63, 7127–7142.
- Imrie, F.; Bradley, A. R.; van der Schaar, M.; Deane, C. M. Deep Generative Models for 3D Linker Design. J. Chem. Inf. Model. 2020, 60, 1983–1995.