Medicinal chemists face many challenges toward developing a new drug. First, they typically try to increase the potency of the drug by maximizing the affinity of drug candidates toward the target. This process if often guided performing structure-activity studies on a variety of ligands to identify the pharmacophore. If the three-dimensional structure of the target is available, the optimization can be also based on such structure. In cases where the potency is controlled by the drug's bioavailability, quantitative structure-activity studies are valuable in guiding toward more bioavailable compounds. Second, medicinal chemists try to minimize the drug's toxicity. However, there are a multitude of factors that can possibly lead to the drug's toxicity. For example, specific binding to another target molecule, non-specific binding to nucleic acids, general reactivity towards cellular nucleophiles (protein sulfhydryls, DNA bases), production of free-radicals, or effects on membranes and ion potentials across membranes could contribute to drugs toxicity. Because of this multitude of effects, toxicity studies are typically done in vivo, using model organisms. One of the simple model organisms for toxicity studies is a ciliated protozoan Tetrahymena. The advantages of Tetrahymena are it can be easily cultured, and it is biochemically and genetically well studied. For example, one could use Tetrahymena to identify the mechanism of toxicity by controlling the expression of genes that may mediate the toxicity of a compound. Tetrahymena model is currently widely used to predict environmental toxicology; due to differences between humans and ciliated protozoa, the Tetrahymena is less-than-ideal model for prediction of drug toxicity in humans. Toxicity is determined as the concentration of the compound that gives 50% growth impairment of Tetrahymena in aqueous medium (IGC50). For mathematical analysis, the logarithm of inverse of the toxic concentration (log[IGC50-1]) is commonly used. Toxic compounds have small IGC50 and large log[IGC50-1].
In order to predict toxicity of drugs before they are synthesized, the factors that correlate with toxicity in the class of compounds must be determined. In the absence of a priori known mechanism of toxicity, such factors can be elucidated from QSAR in which correlation of various factors to toxicity is assessed via statistical data analysis. The data sets used for elucidation of molecular descriptors that contribute to toxicity tend to be large (hundreds of compounds) for two reasons. First, it is possible that many factors simultaneously contribute to the toxicity, and thus it is expected that the QSAR model contains many independent variables, possibly relating to the toxicity in a non-linear manner. Second, for valid statistical analysis, one needs a significant number of data points per descriptor. For example, the TETRATOX database currently contains Tetrahymena growth impairment data for over 2,400 organic compounds. In this homework, you will simulate a QSAR analysis of Tetrahymena toxicity using a small training data set that has been obtained from a larger data set by (i) eliminating compounds that were too similar to retained compounds, (ii) eliminating gross outliers, (ii) eliminating molecules for which different hydrophobicity measures diverged too much. The large data set is published in "Identification of reactive toxicants: Structure–activity relationships for amides" by T.W. Schultz, J.W. Yarbrough and S.K. Koss. The molecules here are smaller than real drugs, allowing for rapid computational analysis. Students in the course may access this paper from here.
In their work, Schultz and co-workers linked the toxicity to two molecular descriptors: hydrophobicity and electrophilicity. The hydrophobicity is numerically characterized by the logarithm of 1-octanol/water partition coefficient (logP, or logKow. This property can be determined experimentally or predicted with some reliability using various fragment-based prediction schemes. Free online services (e.g. Virtual Computational Chemistry Laboratory (VCCL) ) can provide hydrophobicity estimates for many molecules. In this assignment you will use the same hydrophobicity estimates as were used in the original paper. You can either look these up from Shultz's paper or use the VCCL service (pick the experimental values if available, or use KOWWIN estimates). The electrophilicity is numerically characterized by the energy of the lowest unoccupied molecular orbital (LUMO). The LUMO energy can be predicted using quantum mechanics (QM). However, it is well known that the results of QM calculations depend on approximations that were used to make the calculations possible. The authors had used the AM1 semiempirical model. One of your tasks is to assess how the LUMO energies at AM1 level compare with LUMO energies calculated at ab initio SCF level using moderately large basis sets.
The following table lists 12 amides for which toxicity has been experimentally determined by Schultz and co-workers. The compounds are ranked from the least toxic to the most toxic. IGC50 is given in mM.
Name of the compound CAS number SMILES string log[IGC50-1] Acetamide 60-35-5 CC(=O)N -2.32 Propionamide 79-05-0 CCC(=O)N -2.09 n-Butyramide 541-35-5 CCCC(=O)N -1.79 3-Chloropriopionamide 5875-24-1 C(Cl)CC(=O)N -1.59 N-Methylpropionamide 1187-58-2 CCC(=O)NC -1.53 Trimethylacetamide 754-10-9 CC(C)(C)C(=O)N -1.48 2-Chloropropionamide 27816-36-0 CC(Cl)C(=O)N -1.44 N-Isopropylacrylamide 2210-25-5 C=CC(=O)NC(C)C -1.31 N,N-Dimethylacrylamide 2680-03-7 C=CC(=O)N(C)C -1.24 2,2-Dichloroacetamide 683-72-7 C(Cl)(Cl)C(=O)N -0.98 n-Hexanoamide 628-02-4 CCCCCC(=O)N -0.91 Acrylamide 79-06-1 C=CC(=O)N -0.81 2,2,2-Trichloroacetamide 594-65-0 C(Cl)(Cl)(Cl)C(=O)N -0.29
When performing the following tasks, you can either work independently or you can pair up with another student in the class. Each student or pair will perform the following tasks. Workstations drug1, drug2, and drug3 are available for calculations.