Utility of the HardSoft Acid-Base Principle via the Fukui

Utility of the HardSoft Acid-Base Principle via the Fukui

(Parte 1 de 4)

Utility of the Hard/Soft Acid-Base Principle via the Fukui Function in Biological Systems

John Faver and Kenneth M. Merz, Jr.*

Quantum Theory Project, Department of Chemistry, UniVersity of Florida, 2328 New Physics Building, P.O. Box 118435, GainesVille, Florida 32611-8435

Received September 25, 2009

Abstract: The hard/soft acid-base (HSAB) principle has long been known to be an excellent predictorof chemicalreactivity.The Fukui function,a reactivitydescriptorfrom conceptualdensity functionaltheory, has been shown to be related to the local softnessof a system. The usefulness of the Fukui functionis exploredand demonstratedherein for three common biologicalproblems: ligand docking, active site detection, and protein folding. In each type of study, a scoring function is developed on the basis of the local HSAB principle using atomic Fukui indices. Even with necessary approximations for its use in large systems, the Fukui function remains a useful descriptor for predicting chemical reactivity and understanding chemical systems.

1. Introduction

Computational biochemistry is an expanding field that relies heavily on the increasing efficiency of computers and clever algorithms to approach very large and complex problems. While methods for high-level quantum mechanical (QM) calculations have been developed and proven to be very successful in calculating energies, equilibrium structures, vibrational frequencies, and more properties of small- to medium-sized molecules, the computational resources required for very large systems (e.g., a protein of many hundredsof atoms)are usuallyunattainable.1–3 For these very large systems, more approximate modeling tools are often used such as molecular mechanics.4,5 These more approximate methods greatly accelerate the speed at which energy calculations are performed, but they do not explicitly account for electronic structure. This can be a disadvantage, because there is a significant amount of information encoded in the electronic structure of a system.

Conceptualdensityfunctionaltheory(CDFT)definesmany reactivity descriptors for a system based on its electron density and provides a large set of tools for use in the prediction and understanding of chemical reactivity. An extensive review of CDFT and the myriad of possible descriptors has been compiled by Geerlings, De Proft, and Langenaeker.6 These descriptors have been used in the past for a diverse set of chemical systems.7–9 More recently, they have been used with some success in biochemically relevant systems including the detection of metabolic sites in known drug molecules, the understanding of metal binding to porphyrin, and enzymatic catalysis.10–12 A beneficial characteristic of these descriptors is that the majority of them depend on quantities such as electron density that can be obtainedfrom any QM method,includingsemiempiricalQM Hamiltonians.13,14

In the past two decades, advances in algorithms have allowed computational chemists to perform QM calculations on large systems such as proteins.15,16 One such method is the divide and conquer method.17–21 By dividing a molecule into smaller subsystemsand performingseparatecalculations followed by the formation of a global density matrix, the method greatly accelerates calculations for large systems. An important result of this development is that electron density and descriptors based on electron density can now be calculated for large molecules as well as small molecules. Khandogin and York recently described a few such useful descriptors for divide and conquer semiempirical calculations.2

Pearson’shard/softacid-base (HSAB)principlestatesthat chemical species can be described as being either hard or soft acids or bases.23 Soft speciestend to be easilypolarizable and large in volume, have low charge, and have small HOMO-LUMOgaps.Hardspeciestendto havetheopposite characteristics: they are not easily polarized, are small in

* Corresponding author phone: (352) 392-6973; fax: (352) 392- 8722; e-mail: merz@qtp.ufl.edu.

J. Chem. Theory Comput. X, x, 0 A

10.1021/ct9005085 X American Chemical Society

volume, are highly charged,and have large HOMO-LUMO gaps. The HSAB concept can be summarized as one simple rule: hard species favor interacting with hard species, and soft species tend to favor interacting with soft species. The HSAB concept has been successful in predicting reactivity preferences in many systems since its inception.24–32

Researchers have devised various methods of quantifying hardness and softness. Although empirical approximations have been used in the past, in this paper we will describe the use of one related reactivitydescriptorfrom CDFT called the Fukui function which has been shown to carry information about chemical softness.3–35 This work then explores its applicability to biological problems, specifically ligand docking, active site detection, and protein folding.

2. Background

According to density functional theory, changes in the electronic energy dE[F(r)] are related to changes in the number of electrons N and changes in the external potential υ(r) felt by the electron distribution (which usually refers to the nuclear positions in chemical systems):

For simplicity, consider a molecule at a given geometry in its ground state so that dυ(r) is zero. Thus, the partial derivative of energy with respect to the number of electrons N at constant geometry is the electronic chemical potential µ:

This quantity has been related conceptually to the electronegativity of a system.36 This definition agrees with chemical intuition, as more energeticallyfavorable changes in electron number yield higher values of electronegativity. Consider now the second partial derivative of the energy with respect to the electron number

which has been defined as η, or chemical hardness as described by Pearson.3 This definition can be understood by the analogy of a spring constant in classical physics. The spring constant is the second derivative of energy with respect to displacement and measures the difficulty of displacing a spring from its equilibrium position. Equation 3 can be thought of as measuring the difficulty of changing a system’snumberof electrons,which is conceptuallysimilar to nonpolarizability, or hardness. Since softness is the opposite of hardness, it has been defined as the inverse of hardness:

Parr and Yang have also defined a distance-dependent version of softness, called the local softness, as

The local softness function identifies the softest regions of a molecule. A system has a total softness S that is distributed throughout the molecule by a function f(r) called the Fukui function:

The Fukui function is normalized to unity so that the local softness integrates over all space to yield the total softness. Furthermore,the Fukui function can be viewed as containing the same information as the local softness, since the two are proportional to each other by a constant, S. Although there exist several descriptors for local hardness, the problem of definingit has not been resolved.37,38 In this work, low values of local softness are assumed to be locally hard. From the equations of DFT, we now have the Fukui function, a descriptor that identifies the softest (and hardest) regions of a molecule. With this knowledge in hand, one can begin to make predictions about chemical reactivity.

One issue that arises when calculating the Fukui function is that it is a derivative of the electron number, which is by nature an integer. Although recent studies have examined ways to circumventthis apparentdiscontinuity,thesemethods are impractical at this time for the large systems considered here.39,40 Limiting the calculations to changes with integer electrons, it is necessary to use finite difference derivatives. With the finite difference formulas, there is the option of taking the derivative from the left, right, or center:

The Fukui function taken from the left is the difference in electron density between the reference system and the system with an electron removed, e.g., a ground state and its cation (eq 7). Because maxima in this function represent areas where electron density is most favorably decreased, they are interpreted as areas in a molecule most favorable for electrophilic attack. The Fukui function taken from the right has maxima that are interpreted as areas most favorable for nucleophilic attack, since it detects areas where electron density increases most favorably under addition of electrons (eq 8). The centered derivative is simply the average of the two other derivatives and has often been interpreted as showing areas most favorable for attack by a radical (eq 9). A recent study has explored the validity of this interpretation and found that it may not be quite as easy to interpret as the other derivatives.41 While the left and right derivatives are clearly understood in terms of two classical reaction mechanisms, the middle derivative can for now be viewed as the best approximation of the derivative at the reference state.

B J. Chem. Theory Comput., Vol. x, No. x, X Faver and Merz

In addition to finite difference derivatives, a second common approximation of the Fukui function is the condensed Fukui function, which is composed of atomic Fukui indices.42 Within this approximation, atomic partial charges are used to replace the electron density in the expression for the Fukui function. Though this may be a crude approximation of the full electron density and Fukui function, several studies have been successful with its use.7,8,10,12,43–45 In general,one must choosea densitypartitioningschemewhich unfortunatelycan depend heavilyon the QM method or basis set and thus introduce error. Because of this, Fukui indices are sometimesnegative,which seems unphysical.A negative Fukui index implies that addition of electrons to a system decreases density in locations in the system or vice versa. Though some example molecules have been shown to have this interesting property, it should not be as common as the use of Fukui indices suggests.46,47 Keeping this in mind, Fukui indices were used in this work rather than full Fukui functions, simply because full Fukui functions are considerably more expensive to calculate.

A third common approximation is the use of the frozen orbital approximation, in which a single calculation is done to obtain the eigenstates of the system which are assumed to be “frozen” in place as electrons are added or removed. Clearly, changing the electron number in a system will alter the forces felt by the remainder of the electrons, and the eigenstates will be altered in a phenomenon called orbital relaxation. Examples have been shown in which Fukui indices based on the frozen orbital approximation fail to predictcorrectreactivityin small organicmolecules.47 Orbital relaxation effects were taken into account in this work by performingseparate calculationsfor the ground-statesystem, the system with added electrons, and the system with electrons removed, rather than using the frozen orbital approximation.

Khandogin et al. have described the calculation and interpretation of several QM-based reactivity descriptors for biomolecules, including the Fukui function and local softness.2 The present work concentrates on one of them, the Fukui function, and attempts to determine in what kinds of applications it can be used and for what kinds of interactions it can account and then determine the extent of its reliability. With these approximations (divide and conquer, AM1, finite difference, Mulliken atomic charges), five specific types of problems were addressed: finding correct ligand poses in an active site, detecting active binders from a set including decoy ligands, ranking binding affinities of ligands, finding reactive sites in a protein, and detecting native from decoy protein structures. In each of these systems, it was hypothesized that molecular interactions are favorable when hard areas are near hard areas and soft areas are near soft areas. In each of the five problems, a scoring function based on this hypothesis is developed and used to predict the preferred molecular interactions.

3. General Computational Details

All calculationswere performedwith the semiempiricalAM1 Hamiltonian in the DivCon program utilizing the divide and conquer strategy for proteins.17–19,48 Standard unrestricted

AM1 calculations were done in DivCon for all ligand molecules.Unrestricteddivide and conquercalculationswere done for the proteinsin the 1F40 and 2FOM docking studies, and restricted divide and conquer calculations were done for the 1EFY docking study and the 1ORC and 1I6C protein folding study. Atomic Fukui indices were calculated from centered finite difference derivatives of Mulliken charges. The derivatives were calculated by varying the electron number by 1 for ligand molecules. The electron number was varied by 5 in the case of 1F40 and 2FOM, by 8 for 1EFY, and by 4 for 1ORC and 1I6C. These values were chosen to roughly correspond to the size of the proteins. Fukui indices and molecular surfaces were visualized with the program PYMOL.49 3.1. Docking. Docking studies were performed on three protein/ligand systems: FKBP12 (PDB ID 1F40), dengue virus type 2 NS3 protease (PDB ID 2FOM), and poly(ADP- ribose) polymerase (PDB ID 1EFY). 1F40 is an NMR structure bound to a synthetic ligand, GPI-1046.50 2FOM is an X-raycrystalstructure(1.50Å resolution)withouta bound ligand.51 Several known active binders with experimental

IC50 values for the dengue protease were taken from a previous docking study.52,53 1EFY is an X-ray crystal structure (2.20 Å resolution) with bound inhibitor. The structure was taken from the DUD (directory of useful decoys) data set along with 32 active inhibitors with experimental Ki values taken from Tikhe et al.54,5 Schrodinger’s Glide program was used for all docking studies with the XP scoring function except for where it is mentioned otherwisein the 1F40 study (whereAutoDockwas used).56,57 Hydrogens were added to the crystal structures, and the structures were relaxed with the OPLS 2001 force field within the Maestro program prior to grid generation and docking. These final structures from Maestro (receptors and ligand poses) were used in the AM1 single-pointcalculations to obtain atomic Fukui indices. 3.1.1. Ranking Ligand Poses in a Receptor. The first docking test was to determine the correct pose of a ligand in the active site of a receptor. The ligand from the FKBP12 system was docked to FKBP12 with the Autodock program.58 Ten of the best poses from the docking results were taken to evaluatethe hardnessand softnessmatchingbetween atoms in the docked conformations. A score was developed to measure the complementarity of a given ligand and its receptor, hereafter called the FRMSD, or the root-meansquare difference in the Fukui index. For each atom in the docked ligand (Li), the nearest atom of the protein (Ri) was matched to it to form a closest match atomic pair. The difference in Fukui indices for each atomic pair is squared and then averaged over all ligand atoms (eq 10). A lower value of FRMSD represents a better ligand pose with respect to the match between the hardness and softness of the atoms in the two molecules.

A score was calculated for the 10 best poses generated by Autodock, and the results are shown in Figure 1. The two

(fLi - fRi

Utility of the HSAB Principle in Biological Systems J. Chem. Theory Comput., Vol. x, No. x, X C

best scoring poses from the Autodock run (poses 1 and 2) score well with the FRMSD score, and worse poses from the Autodock score generally score worse with the FRMSD score. Perhaps the most interesting finding is that the observed pose from the NMR structure (pose 0) has the best FRMSD score, meaning the observed pose is among the docked poses with the best soft/hard matching between closest atom pairs. To show that the NMR pose is actually an acceptable reference pose, an energy minimization was carried out in AMBER for the ligand in the restrained active site. The relaxed ligand structure had an rmsd of 0.255 Å with respect to the NMR ligand structure, which, in our opinion, is a negligible difference.

Figure 1b (bottom) plots the FRMSD of each pose vs the geometric rmsd with respect to the pose from the NMR structure. It was observed that the ligands are divided almost evenly into those with good FRMSD scores and those with poorerFRMSD scores.Upon visualizationof the good poses, it was seen that poses 0 and 1 are actually very similar, with the major differences being a rotation of the pyridine ring and a rotation of the tert-butyl group. Pose 7 had the same placementof the centralpyrrolidinering but had the positions of the tert-butyland pyridylgroupsswapped(i.e.,a molecular rotation by 180°). This pose was also observed in a docking study by Wang et al. in which it was shown to match NMR chemical shift data fairly well.59

A benefit of this closest atom pair scoring is that the resulting data can be qualitatively analyzed by simply searching for the best and worst matched pairs. A simple scriptcan analyzethe data and produceinput for visualization programssuch as PyMOL,as demonstratedin Figure2. Such visual and qualitative measurement of hard/soft matching could be useful in the drug design process, as the human eye can easily detect the best and worst hard/soft matches. In addition, it provides a method of verifying the FRMSD results. In Figure 2, good contacts are marked by shades of blue and poor contacts are marked by shades of red. Of course one could show as many contacts as desired, but here only the two best matches and the two worst matches are shown.

Figure2. Two dockedposesfor the FKBP/GPIcomplex.The active site is shown as a white surface, and the ligand is shown as white sticks. Good hard/soft matching atom pairs are shown in blue and poor hard/soft matches are shown in red on both the ligand and the protein surface. (a) Pose number 5 from the docking procedure. (b) Pose 0, the NMR pose.

Figure 3. Active binders to dengue type 2 protease taken from a previous docking study by Othman et al.:53 (a, left) pinostrobin (R ) H; R′ ) Me), pinocembrin (R ) H; R′ ) H), alpinetin (R ) Me; R′ ) H), (b, right) pinostrobin chalcone (R ) H; R′ ) Me), pinocembrin chalcone (R ) H; R′ ) H), cardamonin (R ) Me; R′ ) H). Reprinted from ref 53. Copyright 2008 American Chemical Society.

Figure 1. (a) GPI molecule used in the docking procedure for FKB12. (b, top) Docking of a known binder (GPI-1046) to FKBP12 (PDB ID 1F40). Each point on the horizontal axis represents a different pose taken from the top 10 poses generated by the program Autodock, arranged by decreasing Autodock score. Zero on the horizontal axis represents the observedNMRstructure.LowFRMSDvaluesindicatea better hard/soft match between the ligand and its receptor. (b, bottom) FRMSD vs geometric rmsd from the NMR docked structure.

D J. Chem. Theory Comput., Vol. x, No. x, X Faver and Merz

Pose number 5 from the docking run had the worst

FRMSD score out of all 10 poses, and its best and worst pairs are shown in Figure 2a. This figure highlights an important interaction not contained in the Fukui functions hydrogen bonding. One of the worst hard/soft mismatches is between the pyridyl nitrogen and the hydroxyl hydrogen from a tyrosine residue inside the binding pocket, which are at a distance of 2.26 Å from each other. This should be a favorable interaction, but is considered a poor interaction from a hard/soft perspective. This example suggests that the Fukui function alone would not be able to account for all types of molecular interactionsand would need contributions from additional terms in a scoring function (such as an electrostatic or hydrogen-bonding term) to be universally applicable or to correctly predict binding affinity. In the meantime it is assumed that ligands of similar construction with similartypes of interactionscan be analyzedby hardness and softness alone.

The NMR pose is shown in Figure 2b. The native pose has its best contacts just outside the binding pocket, and the worst contacts are with the terminal pyridyl group, which faces outward from the binding site. A tyrosine residue near the pocket is shaded purple because it makes both good contactswith the carbonchain of the ligandand poor contacts with the pyridyl group. The color-coding helps in qualitatively understanding why the native pose is a good docking pose. The pyridyl group may have poor hard/soft matching with the receptor, but it is directed outward from the binding site, making the interaction longer ranged and possibly less unfavorable. This would suggest that if distance were taken into account in the scoring function, the native pose would be even more preferred by FRMSD. From visualizing this pose it seems that another way to improve the FRMSD score would be to include a distance dependence, which is introduced in section 3.1.3. 3.1.2. Selection of ActiVe Ligands from Decoys. The second docking test involved the same receptor (FKBP12) with its active binder, GPI-1046, along with a set of decoy ligands from the data set for the dengue virus proteaseshown in Figure 3. Though these are known binders for the dengue virus type 2 protease, they are assumed to be nonbinding decoys for the FKBP12 system. The top 10 docked poses of each ligand from Glide XP were retained and scored with the closest atom pair FRMSD score. Figure 4a shows only the FRMSD scores for the best scoring pose of each ligand.

Figure 4b shows the Glide XP scores of the best scoring poses for comparison.

Both FRMSD and Glide XP were able to score the correct ligand, GPI-1046, as the best binder. In fact, almost all 10 of the poses generated by Glide scored better than all of the decoyposes.Uponvisualizationof the worstFRMSDscoring pose of GPI-1046 (Figure 5), the poorest matching pairs are a carbonyl oxygen in the ligand with a γ-methyl hydrogen from an isoleucine residue (at 2.74 Å) and the nitrogen from the ligand’s pyrrolidine ring with the R-hydrogen of a valine residue (at 3.6 Å). Both of these pairs should represent somewhat favorable electrostatic interactions, which are not captured by the FRMSD score. The best pairs are between the ligand and a tyrosine group, but both of these pairs are at a distance greater than 3.0 Å, leading to a match that is probably overaccounted for in the distance-independent FRMSD score. This pose provides more evidence that distance should be accounted for in a score based on Fukui indices. 3.1.3. Ranking of Different Ligands by Binding Affinity.

The third type of docking experiment for hardness/softnessbased scoring was to rank ligand molecules by binding affinity using the Fukui indices for the ligands and receptor. From visualization of the previous docking results it is apparent that a distant-dependent score is necessary. Here a second score is introduced, hereafter referred to as the Fukui grid score, in which distancedependenceto the Fukui indices

Figure 4. (a) FRMSD scores of the best scoring poses of the GPI molecule and decoy ligands to FKBP12. (b) Glide XP scores. The ligands are numbered as (1) GPI-1046, (2) alpinetin, (3) pinocembrin, (4) pinocembrine chalcone, (5) pinostrobin, and (6) pinostrobin chalcone. Both scores correctly discriminate the known binder from the decoy ligands.

(Parte 1 de 4)