DrugGEN: Target Centric De Novo Design of Drug Candidate Molecules with Graph Generative Deep Adversarial Networks
DrugGEN-AKT1
This model is designed to generate molecules targeting the human AKT1 protein (UniProt ID: P31749). Trained with 2,607 bioactive compounds. Molecules larger than 45 heavy atoms were excluded.
DrugGEN-CDK2
This model is designed to generate molecules targeting the human CDK2 protein (UniProt ID: P24941). Trained with 1,817 bioactive compounds. Molecules larger than 38 heavy atoms were excluded.
DrugGEN-NoTarget
This is a general-purpose model that generates diverse drug-like molecules without targeting a specific protein. Trained with a general ChEMBL dataset Molecules larger than 45 heavy atoms were excluded.
- Useful for exploring chemical space, generating diverse scaffolds, and creating molecules with drug-like properties.
For more details, see our paper on arXiv.
Basic Metrics
- Validity: Percentage of generated molecules that are chemically valid
- Uniqueness: Percentage of unique molecules among valid ones
- Runtime: Time taken to generate or evaluate the molecules
Novelty Metrics
- Novelty (Train): Percentage of molecules not found in the training set. These molecules are used as inputs to the generator during training.
- Novelty (Inference): Percentage of molecules not found in the inference set. These molecules are used as inputs to the generator during inference.
- Novelty (Real Inhibitors): Percentage of molecules not found in known inhibitors of the target protein (look at About DrugGEN Models for details). These molecules are used as inputs to the discriminator during training.
Structural Metrics
- Average Length: Normalized average number of atoms in the generated molecules, normalized by the maximum number of atoms (e.g., 45 for AKT1/NoTarget, 38 for CDK2)
- Mean Atom Type: Average number of distinct atom types in the generated molecules
- Internal Diversity: Diversity within the generated set (higher is more diverse)
Drug-likeness Metrics
- QED (Quantitative Estimate of Drug-likeness): Score from 0-1 measuring how drug-like a molecule is (higher is better)
- SA Score (Synthetic Accessibility): Score from 1-10 indicating ease of synthesis (lower is better)
Similarity Metrics
- SNN ChEMBL: Similarity to ChEMBL molecules (higher means more similar to known drug-like compounds)
- SNN Real Inhibitors: Similarity to the real inhibitors of the selected target (higher means more similar to the real inhibitors)
10 200