A Transferable H-Bonding Correction for Semiempirical

A Transferable H-Bonding Correction for Semiempirical

(Parte 1 de 4)

A Transferable H-Bonding Correction for Semiempirical Quantum-Chemical Methods

Martin Korth,† Michal Pitonak,†,‡ Jan Rezac,,† and Pavel Hobza*,†,§

Instituteof OrganicChemistryandBiochemistry,Academyof Sciencesof theCzechRepublic and Center for Biomolecules and Complex Systems, 16610 Prague 6, Czech Republic,

Department of Physical and Theoretical Chemistry, Faculty of Natural Sciences, Comenius UniVersity, 84215 BratislaVa 4, SloVak Republic, and Department of Physical Chemistry, Palacky UniVersity, 771 46 Olomouc, Czech Republic

Received October 13, 2009

Abstract: Semiempirical methods could offer a feasible compromise between ab initio and empirical approaches for the calculation of large molecules with biological relevance. A key problem for attempts in this direction is the rather bad performance of current semiempirical methods for noncovalent interactions, especially hydrogen-bonding. On the basis of the recently introduced PM6-DH method, which includes empirical corrections for dispersion (D) and hydrogen-bond (H) interactions, we have developed an improved and transferable H-bonding correction for semiempirical quantum chemical methods. The performance of the improved correction is evaluated for PM6, AM1, OM3, and SCC-DFTB (enhanced by standard empirical dispersion corrections) with several test sets for noncovalent interactions and is shown to reach the quality of current DFT-D approaches for these types of problems.

1. Introduction

The abilityto performfast and accuratecomputersimulations of biomolecularsystemshas the potentialto bringnew insight and application opportunities in several scientific fields, for example, the development of selective receptors, catalysts, and enzymeinhibitorsin computationaldrug design.Complementary computationalmethods for de novo drug design and virtual screening have already made striking successes possible, for example, through computer-aided drug lead generation and optimization.1,2 Although these approaches can support and complement drug design, they can not be seen as fully mature, because both the modeling tools used and our understanding of protein-ligand recognition principles are still limited, especially regarding the effects of protein flexibility and solvation.3

Even though many advanced and accurate computational methods exist, their application to large-scale simulations of biomolecules is not possible, because these methods are computationally too demanding. As a result, the method of choice for these applications is molecular mechanics (M). Although M performs well in many cases, it has several drawbacks: By design, it cannot describe quantum effects like, for example, changes in electronic structure, such as chemical reactions or charge transfer, and most M models also neglect polarization effects, which were shown to be important, for example, for the solvation of biomolecules.4 Promising tools to overcome these limitations while maintaining efficiency (allowing extensive sampling of biologically relevant molecular systems) are semiempirical (SE) quantum mechanical methods.

The application of current SE methods to biochemical problems is unfortunately not straightforward, because the structure and function of biomacromolecules are dominantly influenced by noncovalent interactions like dispersion and hydrogen-bonding,5 that generally need very high-level quantum chemical methods to be modeled with sufficient accuracy.6 Despite this, the past few years have seen great success with the incorporation of dispersion effects via

* To whom correspondence should be addressed. E-mail: pavel.hobza@marge.uochb.cas.cz.

† Academy of Sciences of the Czech Republic and Center for

Biomolecules and Complex Systems.

‡ Comenius University. § Palacky University.

J. Chem. Theory Comput. X, x, 0 A

10.1021/ct900541n X American Chemical Society empirical corrections for a wide range of DFT7,8 and also SE (e.g., PM3-D, AM1-D9) methods. But because substrate recognition and binding is most often dominated by electrostatics, the accurate description of these effects and especially the hydrogen-bond interactions are also of fundamental importance for any biomolecular modeling approach. Examples for the importance of hydrogen-bonding for molecular recognition are, for example, DNA base pairing, protein folding, enzyme activity, crystal structures, properties of liquids, and pharmaceutical drug solubility and activity. While electrostatics in general are not a problem for SE methods, current SE methods are known to be deficient in the description of hydrogen-bonding (with hydrogen core-core terms, missing polarization functions on hydrogen, missing orthogonalization corrections, and in general parametrizationas discussed reasons, see refs 10 and 1 and references therein). We see this to be the major obstacle limiting the accuracy of SE methods when applied to biomolecules.

As classical modeling approaches are further pushed to their limit, and more and more pitfalls are coming to light,12 the interest in improving SE for biomolecular modeling purposes grew substantially over the past few years and has led to a number of related publications: As a result of the first biomolecularapplicationattemptswith OMn13 and SCCDFTB14 and explorative approaches to describe protein ligand docking with PM315,16 and AM1,17 it became clear that the (earlier known) deficiencies of SE methods for the description of hydrogen bonding18,19 are of crucial importance in these applications.16,20 On the other hand, first largescale SE modeling of protein structures gave promising results21-23 and showed that the capability of SE methods to detect native structures from collections of decoys is quite remarkable.12 In order to surpass the accuracy of the description of noncovalent interactions by M force fields, improving the description of hydrogen-bonding interactions in SE methods is clearly necessary.

A number of approaches offering improvement in this direction have been suggested in the literature so far, for example, on the basis of additional or modified core-core terms (like PM3-PIF24,25 and PDDG/PM326), third-order terms, and modified parameters for SCC-DFTB27 and also reformulated QM/M interaction terms (to improve hydrogen bonding at the QM/M interface28). An overview of the problem and the proposed solutions can be found in refs 10 and 1. While a significantly better performance is observed when applying these techniques, the results still leave large space for furtherimprovements.(It is nevertheless hard to understand why a recent comparison of the performance of semiempirical QM/M approaches with force fields29 ignores all developments except the PDDG approaches.) Concerning force field and ab initio results, the following has to be kept in mind: A recent study that evaluatedthe performanceof a set of widely used force fields by calculating the geometries and stabilization energies for a large collection of intermolecular complexes showed that the magnitude of hydrogen-bondinginteractionsare severely underestimated by all of the force fields tested.30 And albeit much better, also the performance of DFT methods for the calculation of (especially the relative) strength of hydrogenbond interactionsis not always of satisfactorilyhigh accuracy (see ref 31 and references therein).

Recently, our group managed to successfully open up a new path to improve SE methods for hydrogen-bonding interactions: We augmented the new PM6 method32 with empirical corrections for dispersion and hydrogen-bonding interactions(referred to as PM6-DH1 in the following)3 and were able to achieve large improvements in accuracy for interaction energies of biologically relevant, noncovalently bound systems. PM6 was chosen, because this model is parametrized for 80 elements and was shown to be one of the most accurate SE approaches for a wide range of problems.32 Furthermore, PM6 is implemented also as a linear-scaling, localized molecular orbital algorithm (termed MOZYME34) in Mopac200935 and VAMP 10.0, which allows the modeling of most of the proteins in the PDB (with less than about5000 atoms)on standarddesktopcomputers.34 While our first-generation H-bonding correction was already a major step forward in accuracy, we have found further improvement possible, to be presented in the following.

2. Empirical H-Bonding Corrections for Semiempirical Quantum Chemical Methods

The First-Generation Correction. To incorporate the major characteristicsof hydrogen-bondinteractions,the firstgeneration correction made use of the charges q on the acceptor (A) and hydrogen (H) atoms, the H-bond distance r between these atoms, and a cosine term that promotes a 180° bonding situation for the A··· H-D (with the donor atom D) angle:

The parameters a, b, and c were optimized for eight different bond types, leading to overall 24 parameters for the description of common H bonds involving nitrogen and oxygen acceptor and donor atoms. As the discussion of the results in section 4 will illustrate, this approach leads to a significantly improved performance of PM6 for the description of H-bond interactions.

An in-depth analysis of our correction revealed the following improvement opportunities: While the H-bond distance and the 180° condition for the A··· H-D angle are the most important geometrical features of hydrogenbonding, two additional internal coordinates are needed to complete the sterical description by taking care of the “orientation of the lone pair” at the acceptor atom. We will show later that the full description of all important geometrical features of hydrogen-bondingin the second-generation correction is the major reason for its improved accuracy and reliability. It turns out that the change to a physically more sound description of hydrogen bonding allows us to fix two other problems of the first-generation correction: First, the second term in eq 1 is only dependent on the H-bond distance coordinate. This leads to discontinuous potentials around values of 90° for the A··· H-D angle. Second, for some H-bond types, the secondsmeant to be

B J. Chem. Theory Comput., Vol. x, No. x, X Korth et al.

repulsivesterm is of an attractive nature (the optimization of the b parameter with constraints on the sign was tested but discarded because of a diminished accuracy of the results). By fixing these two problems in the new version, we are able to significantly increase the robustness of our empirical correction approach.

To prevent problems with the optimization of very strong

H bonds (like in the case of the formic acid dimer), we had to keep the distance cutoff of the first-generation correction (see ref 3 for a more detailed discussion of this approximation). The cutoff is applied in such a way that H-bond distances below 1.8 Å are set to 1.8 Å for the calculation of the hydrogen-bond energy. It acts like a damping function that forces the correction to a constant value in the repulsive region, where the description seems to be quite deficient otherwise. For molecular dynamics simulations, this cutoff will be implemented with an interpolating polynomial to prevent kinks in the potential surface. We also tried multiplication by a damping function for the second generation correction but found no advantages over the cutoff distance solution.

The application of the above listed changes in the secondgeneration correction furthermore allows us to change from bond-type to atom-type parameters and to apply our ansatz also to other semiempirical methods.

The Second-Generation Correction. The sterical ar-

rangement of the two system parts involved in a H bond can be defined with six internal coordinates (see Figure 1 for an illustration of the following explanations): the H-bond distance r, the two angles A··· H-D (termed Θ) and

R2-A··· H (termed Φ, with R2 being a donor “base atom”), and the corresponding three torsional angles of which only one directly influences the H-bond interaction energy,

R1R2A··· H (termedΨ). The first two mentionedcoordinates, the H-bond distance and the angle Θ between acceptor, hydrogen, and donor atoms were incorporated into the firstgenerationcorrection.The secondtwo mentionedcoordinates define the relative position of the acceptor atom system part

(or so to say the spatial arrangement of the acceptor lone pair),which is importantto preventnonphysicalcontributions to the H-bond interaction energies (e.g., through other atoms or in the case of purely dispersion-boundcomplexes). Figure 1 shows r, Θ, Φ, and Ψ for two different cases, an sp2 oxygen-type acceptor atom (a) and sp2 nitrogen or general sp3-type acceptor atoms (b), which require a different choice of atoms for the definition of the torsion angle coordinate. Note that, for our choice of coordinates, the out-of-plane “movement” in case a (described by Ψ′) is actually realized by a combined change of the two internal coordinates Φ and Ψ. As a result of these considerations, the new version of our empirical H-bonding correction takes the following form:

with φ as the deviation of the R2-A··· H angle from the idealized optimal H-bond angle (taken as 109.48° for sp3 and 120° for sp2 structures) and ψ as the deviation of the

R1R2A··· H torsion angle from the idealized optimal H-bond torsion angle (taken to be 109.48° forNs p3, 109.48/2° for other sp3 structures, and 0° for sp2 structures). Two special cases arise: For sp2 oxygen acceptor atoms, not only 120° but also 180° has to be considered as the idealized optimal H-bondangle;the one with the smallerdeviationto the actual binding situation is chosen for the calculation. For NR3 nitrogen acceptor atoms, the possible planarization of this group has to be taken into account; we chose to calculate the idealized optimal φ and ψ values by linear extrapolation between the tetrahedral and planar values (109.48 and 90 for φ, 109.48/2 and 90 for ψ) subject to the actual value of the torsion angle between R1R2A and the remaining third of the NR3 group. An alternative (but considering gradient derivationsunnecessarilycomplicated)implementationof our general approach would be to use a cosine term for the angle between the H-bond and an “idealized lone-pair orientation” vector.

The resulting equation was analyzed for the importance of the different terms and parametrized for acceptor atom types in a stepwise optimization process that led to some remarkable observations illustrated by data for the H-bonded complexes of the S26 set (the S22 benchmark set6 for noncovalentinteractions,extendedwith four singlyhydrogenbonded complexes,36 see also Figure 2) in Table 1:

single parameter (termed P1 in Table 1: b ) 3.2) to correct PM6-D for all different types of nitrogen and oxygen hydrogen-bondinteractions,our new correction leads to very good (PM6-DH1-quality) values for almost all S26 H-bond interaction energies (please note that the intermediate parametrization results are given to show the robustness of our approach and that the Pn parametrizations are therefore not meant to be some kind of “intermediate” methods).

The additional distinction between nitrogen and oxygen acceptor atom types (resulting in overall two parameters

Figure 1. Illustration of the geometric features of hydrogenbonding; see text for explanation.

H-Bonding Correction for Quantum-Chemical Methods J. Chem. Theory Comput., Vol. x, No. x, X C

(termed P2)), then leads to comparably small but still significant improvements.

In the case of PM6-D, the accurate description of

H-bonding interactions involving water and peptide systems requires the inclusion of at least one additional atom-type parameter (termed P3).

Also for AM1-D* (not AM1-D, see next paragraph for explanation), SCC-DFTB-D and OM3-D, a significant improvement is found with only a single parameter (b ) 2.2 for AM1, 3.2 for DFTB, and 3.1 for OM3). For these methods, no additional parameters for water or peptide systems are necessary, but one additional parameter is required for the proper description of acid acceptor atoms.

We decided not to use the AM1-D method published by

McNamara and Hillier9 with our correction scheme, because their method is not simply AM1 with a standard empirical dispersion correction but is additionally based on a refit of 18 AM1 parameters that also account to some extent for hydrogen bonding. The variant we use here, termed AM1- D* in the following, refers to standard AM1 with a standard empirical dispersion correction of the Jurecka type.8 The optimization of the D parameters for the dispersion-bound complexes of the S22 benchmark set led to the values sr )

0.91, R) 56.0, and s6 ) 1.18. This way, the mean unsigned error for the dispersion-boundS22 complexes was decreased from 6.7 kcal/mol for AM1 to 0.4 kcal/mol for AM1-D*, slightly lower than the 0.6 kcal/mol found for AM1-D.9

We also adjusted the dispersion correction for PM6, because we have found that PM6-DH1 overestimatesdisper- sion effects in saturated systems. The new parameters are sr ) 1.04, R) 20.0, and s6 ) 0.89 (instead of sr ) 1.07, R) 1.0, and s6 ) 1.0 for the old version), in combination with c6 ) 0.95 for sp3 carbon and c6 ) 1.65 for other carbon atoms (instead of c6 ) 1.65 for all carbon atoms) and a hydrogen van der Waals radius of 156 pm (instead of 120 pm). The effects of these changes are very small for the

Figure 2. The S22 benchmark set for noncovalent interactions.6 Table 1. Intermediate Results from the Optimization Processa

S26 entry reference PM6 PM6-D PM6-

(Parte 1 de 4)