SM6 A Density Functional Theory Continuum Solvation

SM6 A Density Functional Theory Continuum Solvation

(Parte 1 de 6)

SM6: A Density Functional Theory Continuum Solvation Model for Calculating Aqueous Solvation Free Energies of Neutrals, Ions, and Solute-Water Clusters

Casey P. Kelly, Christopher J. Cramer,* and Donald G. Truhlar*

Department of Chemistry and Supercomputing Institute, 207 Pleasant St. SE, UniVersity of Minnesota, Minneapolis, Minnesota 55455-0431

Received June 28, 2005

Abstract: A new charge model, called Charge Model 4 (CM4), and a new continuum solvent model, called Solvation Model 6 (SM6), are presented. Using a database of aqueous solvation free energies for 273 neutrals, 112 ions, and 31 ion-water clusters, parameter sets for the mPW0 hybrid density functional of Adamo and Barone (Adamo, C.; Barone, V. J. Chem. Phys. 1998, 108, 664-675) were optimized for use with the following four basis sets: MIDI!6D, 6-31G- (d), 6-31+G(d), and 6-31+G(d,p). SM6 separates the observableaqueous solvation free energy intotwo differentcomponents: one arisingfromlong-rangebulkelectrostaticeffectsand a second from short-range interactions between the solute and solvent molecules in the first solvation shell. This partition of the observable solvation free energy allows SM6 to effectively model a wide range of solutes. For the 273 neutral solutes in the test set, SM6 achieves an average error of 0.50 kcal/mol in the aqueous solvation free energies. For solutes, especially ions, that have highly concentrated regions of charge density, adding an explicit water molecule to the calculation significantly improves the performance of SM6 for predicting solvation free energies.The performanceof SM6 was testedagainstseveralothercontinuummodels,including SM5.43R and several different implementations of the Polarizable Continuum Model (PCM). For both neutral and ionic solutes, SM6 outperforms all of the models against which it was tested. Also, SM6 is the only model (except for one with an average error 3.4 times larger) that improves when an explicit solvent molecule is added to solutes with concentrated charge densities. Thus, in SM6, unlike the other continuum models tested here, adding one or more explicit solvent molecules to the calculation is an effective strategy for improving the prediction of the aqueous solvation free energies of solutes with strong local solute-solvent interactions. This is important, because local solute-solvent interactions are not specifically accounted for by bulk electrostatics, but modeling these interactions correctly is important for predicting the aqueous solvation free energies of certain solutes. Finally, SM6 retains its accuracy when used in conjunction with the B3LYP and B3PW91 functionals, and in fact the solvation parameters obtained with a given basis set may be used with any good density functional or fraction of Hartree-Fock exchange.

1. Introduction

Continuum solvent models are an attractive alternative to explicit solvent approaches, because they require less computational effort, making them applicable to larger solutes, extensive conformational analysis, and large libraries of compounds. Continuum solvent models have advanced to a point where aqueous solvationfree energiesof typical neutral organic solutes can usually be predicted accurately to better than 1 kcal/mol. However, the development of methods for accuratelypredictingaqueous solvationfree energies of ionic solutes has been much less successful. In part, this is due to

* Corresponding author e-mail: (C.J.C.) and (D.G.T).

10.1021/ct050164b C: $30.25 © 2005 American Chemical Society Published on Web 10/2/2005 the limitedavailabilityof experimentalsolvationfree energies for ionic solutes. Unlike neutral solutes, for which aqueous solvationfree energiescan be obtaineddirectlyfrom partition coefficients of solutes between the gas phase and dilute aqueous solution, aqueous solvation free energies of ionic solutes must be determined from other experimental observables. Because of this, there is a certain degree of uncertainty associated with making direct comparisons between calculated and experimental aqueous solvation free energies for ions, making the development of a model that is able to treat neutral and ionic solutes at the same level of accuracy a very challenging task. In addition, because of strong electrostatic effects arising from localizedsolute-solvent interactions,the magnitudes of solvation free energies are much greater for ions than for neutrals, requiring smaller percentage errors for the same absolute accuracy. Because the differential solvation free energy between a given acid/base pair can be used in various thermodynamic cycles to determine pKa, developing a model that can be used to predict these free energies accurately is a high priority.

A number of differentstrategieshave been used to account for short-range interactions within the framework of continuum solvation theory.1,2 For example, the SMx series of models developed by our co-workers3-17 and us augments the electrostatic portion of the calculated solvation free energy with an empiricalterm that accounts for, among other things, deviations of short-range interactions,primarily those in the first solvation shell, from the bulk electrostatic limit. Althoughthis approachhas been very successfulin predicting solvation free energies of neutral solutes, it remains unclear whether this type of correction to the solvation free energy can be applied to ionic solutes with the same success.

A key issue in predicting the large electrostatic effects involved in solvation of ions is determining the shape of the cavity that is used to define the boundary between the electronic distribution of the solute and the continuum solvent. In all of our recent SMx models7-16 (including the one presented in this article), a single set of radii that are dependent only on the atomic number of the given atom are used to build up the molecular cavity. More elaborate methods for assigning atomic radii depend on the local chemical environment of the atom.18-23 One such prescription, called the united atom for Hartree-Fock (UAHF) method,23 assigns radii to atoms based on their hybridization states, what other atoms are bonded to them, and their formal charge. These radii are often used in conjunction with the popularpolarizablecontinuummodels(PCMs)24-34 to predict aqueous solvation free energies. Other methods have been proposed in which the atomic radii depend on partial atomic charge, and several groups have had some success using charge-dependent atomic radii in continuum solvation calculations,35-41 although the work has been limited to a small number of solutes. Other methods define the solute cavity as the contour on which the solute electronic density is equal to some constant value.42 For example, Chipman has recently shown43 that using a value of 0.001 e/a03 for the isodensitycontouralongwith a continuummodel44 results in accurate aqueous solvation free energies for a series of protonated amines. However, Chipman also showed in his work43 that for a series of oxygen-containing anions, no single common contour value could be used to accurately predict their absolute aqueous solvation free energies. Chipman attributed this finding to the inability of a continuum model alone to account for strong anion-water interactions in the first solvation shell and suggested that better results might be obtained by augmenting continuum solvent calculations with other complementary methods that are especially designed to account for specific short-range interactions.

Adding explicit solvent molecules has been a popular strategy for trying to incorporate the effects of specific solute-solvent interactions into continuum solvent calculations. Often, this involves treating enough solvent molecules classically or quantum mechanically to account for at least the entire first solvation shell around the given solute.45 Depending on the solute, this may require a large number of explicit solvent molecules, which can lead to a significant increase in the amount of computational effort expended. In addition to this problem, there are several other potential problems associated with treating solvent molecules explicitly. First, for many solutes,there is no easy way to determine the number or orientation of explicit water molecules in the first solvation shell. For example, X-ray diffraction experiments46 and various theoretical calculations47-51 lead to average coordination numbers ranging from 6 to 9.3 for the Ca2+ ion, suggestingthat severaldifferentsolvationstructures exist. Second, even for solutes for which the first solvation shell is well defined, to properly treat even a few solvent molecules explicitly will most likely involve the need to sample over a large number of conformations that are local minima. Finally, introducing explicit solvent molecules will not yield more accurate solvation free energies if the level of theoryused to treat the systemis not high enough.Because properly treating nonbonded interactions usually requires treatment of electron correlation and the use of fairly large basis sets,52 any realistic attempt at using solute-water clusters to calculate aqueous solvation free energies requires an accurate treatment of the entire solute-solvent system, which is practical only for small numbers of solvent molecules. Despite these problems, hybrid approaches combining quantal and classical treatments of solvent molecules have had some success in predicting aqueous solvation free energies of ions. For example, Pliego and Riveros showed53 that for a test set of 17 ions, including several explicit water molecules in the continuum calculation significantly improved the performance of the model. To determine the number of explicit solvent molecules required in the calculation, these workers developed an approach in which the aqueous solvationfree energy of the bare solute is minimized with respect to the number of coordinating waters.54 Besides predicting solvation free energies of ions, this approach has also been used to predict solvatochromic shifts,5,56 where explicit solute-solvent effects between the electronically excited solute and surrounding solvent molecules can have large effects on both the magnitude and direction of the shift.

In the present paper, we will present a new continuum solvent model called Solvation Model 6 (SM6). This model is similar to our most recentlydevelopedpreviouscontinuum model, called SM5.43R,15,16 but improves on it in a number of significant ways. In both of these models, SM6 and

1134 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al.

SM5.43R, the aqueous solvation free energy is calculated as a sum of free energies arising from long-range bulk electrostaticeffects, which are calculated by a self-consistent reaction field (SCRF) calculation,12,57,58 and those from nonbulk electrostatic effects, which are calculated using the solvent accessible surface area (SASA)59,60 of the solute and a set of atomic surface tensions that depend on a set of empirical parameters and the geometry of the solute. SM6 differs from SM5.43R in two important ways. First, SM6 uses an improved charge model, called Charge Model 4 (CM4), for assigning partial atomic charges. CM4 is a new charge model developed as part of the present effort, and it is presented later in the text and in the Supporting Information. Second, SM6 is parametrized with a training set of aqueous solvation free energies that has been improved in three ways: (1) the neutral portion of the training set has been extended to include molecules containing certain functionalities that were not present in the SM5.43R training set, (2) we use a larger and improved set of data for ionic solutes,and (3) aqueoussolvationdata for variousion-water clusters and the water dimer have been added, and the entire philosophy of the parametrization of Coulomb radii is changed to reflect the use of cluster data in place of bareion data for cases where continuum solvent models are expected to be inadequate for bare ions.

SM6 calculations may use any reasonable gas-phase or liquid-phase geometry of the solute to calculate its aqueous solvation free energy. In addition, geometry optimizations in the liquid phase using analytical free-energy gradients can be efficiently carried out.13 We previously denoted the case for which aqueous solvation free energies were calculated using gas-phase geometries with the suffix “R” and those in which they were calculated using liquid-phase geometries by dropping the “R” suffix (which stands for “rigid”); here we will drop the “R” suffix in all cases and use the standard Pople notation. For example, a single-point SM6 calculation at the MPW25/6-31+G(d,p) level using a gas-phase geometry optimized at the MPW25/MIDI! level of theory would be written as SM6/MPW25/6-31+G(d,p)//MPW25/MIDI!, whereas if the consistent liquid-phase geometry were used, this calculation would be written as SM6/MPW25/6-31+G- (d,p). A solvation free energy calculated by SM6/MPW25/ 6-31+G(d,p) at a gas-phase geometry computed by the same electronic structure level (i.e., MPW25/6-31+G(d,p)) can be denoted SM6/MPW25/6-31+G(d,p)//MPW25/6-31+G(d,p) or for short SM6/MPW25/6-31+G(d,p)//g, where //g denotes a gas-phase geometry at the same level.

Four new parametrizations of SM6 for the MPWX hybrid density functionalwill be presented,where each parametrization uses a particular basis set. These four basis sets are MIDI!6D61,62 and Pople’s63 popular 6-31G(d), 6-31+G(d), and 6-31+G(d,p) basis sets. The MPWX functional uses Barone and Adamo’s64 modified version of Perdew and Wang’s exchange functional,65 Perdew and Wang’s PW91 correlation functional, and a percentage X of Hartree-Fock exchange.16 The parameters presented here can be used with any value of X, which is a stability feature pointed out in a previous paper.16 This is particularlyuseful, because depending on the problem, it may be advantageous to optimize X in the gas phase or in solution. For example, X ) 42.8 has been optimized for kinetics (resulting in the MPW1K functional66), X ) 40.6 for Y- + RY nucleophilicsubstitution reactions (Y ) F, Cl; the MPW1N functional67), X ) 6 for conformations of sugars,68 and X ) 25 has been suggested for predicting heats of formation64 (this is the mPW1PW91 functional of Barone and Adamo,64 which they also call mPW0 and which we also refer to as MPW25, or MPWX, with X ) 25). We chose to base the present parametrizations on the MPWX hybrid density functional for two reasons. First, as mentioned above, methods for predicting various gas-phase properties that employ different values of X with this functional have already been developed, and it is useful to have a set of solvation parameters that can also be used with any X. Second, the MPWX functional has been shown to be more accurate than the popular B3LYP69 and HF70 methods for predicting energies of reaction and barrier heights.6,71 Furthermore, two of the parametrizations presented here are for basis sets containing diffuse functions. This is importantbecause diffuse functionsare often required for accurate calculations of conformational energies and barrier heights.72 Thus, the parametrizations based on the 6-31+G(d) and 6-31G+(d,p) basis sets are of specialinterest, because they can be applied in cases where one wants to use the same level of theory for calculating relative energies in the gas phase and in the aqueous phase.

In addition to parametrizing a new charge model and a new aqueous solvent model, the present article has a third goal, namely, to ascertain what effect adding explicit solvent molecules has on the accuracy of continuum solvent models for predicting aqueous solvation free energies. For this, we added a single explicit water molecule to some of the solutes in our training set, and we used the resulting solute-solvent cluster to calculate the aqueous solvation free energy. Since the effort in this approach is modest because we are limiting the number of explicit water molecules to one and because the solute-solvent system will be modeled as a single, rigid conformation, the approach is very practical and does not have most of the problems associated with adding several explicit solvent molecules that were outlined above. Furthermore, the comparison between aqueous solvation free energies calculated using bare solutes to those calculated using solute-solvent clusters provides insight into whether continuum solvent models are appropriate for calculating aqueous solvation free energies of solute-water clusters as well as whether the performance of these models can be improved in cases where specific localized solute-solvent interactions are expected to play a large role in determining an aqueous solvation free energy.

Section 2 presents the experimental data used to train and test the new model, which is itself presented in section 3. Section 4 is concerned with parametrization, and section 5 gives the results. Sections 6 and 7 present discussion and conclusions, respectively.

2. Experimental Data

2.1. Standard States. All data and calculations are for 298 K. All experimental and calculated gas-phase free energies are tabulated using an ideal gas at 1 atm as the reference state. Free energies that employ this standard state definition will be denoted by the superscript “o”. All experimental and

Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1135

calculated solvation free energies are tabulated for an ideal gas at a gas-phase concentration of 1 mol/L dissolving as an ideal solution at a liquid-phase concentration of 1 mol/L. Free energies that employ this standard state definition will be denoted by the superscript “*”. The relationship between these two standard states is and where73

2.2. Neutral Solutes. For neutral solutes, we start with all of the experimental aqueous solvation free energy data from the previouslydescribed SM5.43R training set,15 which includes 257 aqueous solvation free energies for solutes containing at most H, C, N, O, F, P, S, Cl, and/or Br. To these data, we added additional aqueous solvation data for various reasons. Aqueous solvation free energies were added for methylhydrazine and 1,1-dimethylhydrazine in order to test the performance of SM6 without the surface tension previously7-17,74,75 applied to solutes containing hydrogen atoms in the vicinity of two nitrogen atoms (e.g., hydrazines and hydrazones with a hydrogen attached to one of the individual nitrogen atoms). Data for hydrogen peroxide, methyl peroxide, and ethyl peroxide were added because the SM5.43Rtrainingset does not containany data for peroxides. The SM5.43R training set contains an aqueous solvation free energyfor anilinebut no otheranilineanalogues,so we added data for ortho-, meta-, and para-methylaniline as well as N-methyl-,N-ethyl-,and N,N-dimethylaniline.We also added 1,2-ethanediamineand 3-aminoanilinebecause the SM5.43R training set does not contain any data for solutes with more than one amino group in the same solute. Finally, we added urea and benzamide because the SM5.43R training set contains only one solute with urea functionality(1-dimethyl- 3-phenylurea) and three solutes with amide functionality (acetamide, Z-N-methylacetamide, and E-N-methylacetamide).

We also examined the accuracy of several experimental solvation free energies that were used in earlier versions of our training sets. In our previous training sets that include hydrazine,7-17,74,75 we used a value of -9.30 kcal/mol for the aqueous solvation free energy. Here, this value has been replaced by a value of -6.26 kcal/mol, which was obtained using experimentalvaluesfor the vapor pressureand aqueous solubility.76,7 For methyl benzoate, the value of -2.2 kcal/ mol that was used in our previous training sets15-17 has been replaced by a value of -3.91 kcal/mol that was also obtained using experimentalvaluesfor the vapor pressureand aqueous solubility.76,7

We also added the aqueous solvation free energy for the water dimer. This free energy can be determined using the free energy cycle shown in Scheme 1, according to where ¢GS/(H2O) is the aqueous solvation free energy of water, and ¢G°gas(B.E.) is the gas-phase binding free en- ergy, which equals G°gas(H2OâH2O) - 2G°gas(H2O). Substituting experimental values of -6.31 kcal/mol for

¢GS/(H2O) and 3.34 kcal/mol78 for ¢G°gas(B.E.) into eq 4 gives -14.06 kcal/mol for the aqueous solvation free energy of the water dimer (at 298 K).

Adding the new data described to the SM5.43R training set and correcting the two values mentioned in the previous paragraph results in a new data set with a total of 273 aqueous solvation free energies for neutral solutes containing at most H, C, N, O, F, P, S, Cl, and/or Br. This will be called the SM6 neutral-solute aqueous free-energy-of-solvation data set. These solvation free energies are listed in Table S1 of the Supporting Information. 2.3. Ionic Solutes. For our previous models that included data for ionic soutes,3-17,74,75 aqueous solvation free energies were taken from Florian and Warshel79 and Pearson80 and then updated based on changes in the accepted absolute aqueous solvation energy of the proton. Based on a careful analysis of the ionic data in the SM5.43R training set (which containsaqueoussolvationfree energiesfor 47 ionic solutes), we decided to develop two entirely new data sets of experimental solvation free energies for ionic solutes. The first new data set, which is listed in Tables 1 and 2, contains aqueous solvation free energies for 112 ionic solutes (60 anions and 52 cations). This will be called the SM6 unclustered-ion data set, and it is described in the next two paragraphs and further discussed in the two paragraphs after that. For the aqueous solvation free energy of the proton,

¢GS/(H+), we used Zhan and Dixon’s value of -264 kcal/ mol.81 For the remainingcations,we used the thermodynamic cycle shown in Scheme 2 along with eq 5 where pKa is the negative common logarithm of the aqueous acid dissociation constant of AH and ¢Ggas/ is the same as

¢Gaq/ except for the gas phase. Using Scheme 2 then yields the standard-state aqueous solvation free energy of AH + as where ¢GS/(A) is the aqueous solvation free energy of the neutral species A, ¢GS/(H+) is the aqueous solvation free energy of the proton, and ¢G°gas(A) is the gas-phase basicity of A, equal to G°gas(A) + G°gas(H+) - G°gas(AH+). The

Scheme 1

1136 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al.

Table 1. Aqueous Solvation Free Energies (kcal/mol) of Bare Anionsa

a Aqueous solvation free energies are for a temperature of 298 K. b Gas-phase basicities taken from ref 83. c From ref 87, unless otherwise noted. d From the current data set unless otherwise noted. e Reference 89. f Reference 76. g Reference 91. h Reference 86. i Reference 92. j Reference 85. k Reference 82.

Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1137

experimental aqueous solvation free energies of F-,C l-, and Br- were taken from Tissandier et al.82 and adjusted for a change in the standard state and the value used here for

¢GS/(H+). For the remaining anions, we used the free energy cycle shown in Scheme 2 and eq 5 to determine the aqueous solvation free energy according to

Table 2. Aqueous Solvation Free Energies of Bare Cationsa a Aqueous solvation free energies are for a temperature of 298 K. b Gas-phase basicities from ref 84. c From ref 87, unless otherwise noted. d From the current data set, unless otherwise noted. e Reference 76. f Reference 91. g Reference 8. h Reference 92.

1138 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al.

G°gas(A-) + G°gas(H+) - G°gas(AH). Experimental gas-phase basicities of anions and acidities of neutral species were taken from the National Institute of Standards and Technology (NIST) database;83 experimental gas-phase basicities of neutral species were taken from the most recent compilation of Lias et al.84 For neutral species, experimentalaqueoussolvationfree energieswere takenfrom the data set described in section 2.2 and from several additional sources.76,85,86 A large part of the experimental

aqueous pKa data used here was taken from the compilation of Stewart;87 pKa data not available in this compilation were taken from several additional sources.8-92

We note that gas-phasefree energiesand aqueoussolvation energies for neutral solutes can be calculated fairly accurately, and several other compilationsof aqueous solvation free energies of ions79,80,93 use theoretical data for these quantities. Although we did not use theoretical values in deriving any of the free energies shown in Tables 1 and 2, future extensions of the present database could include aqueous solvation energies determined from calculated data.

The aqueous solvation free energies reported here can be compared to the recent compilation of Pliego et al.,93 who used the same thermodynamic cycle, along with Tissandier et al.’s82 value of -266 kcal/mol for ¢GS/(H+), in determining the aqueous solvationfree energies for 56 ions. (Note that the above value of -266 kcal/mol for ¢GS/(H+) reflects the standard-state correction required in order to adjust

Tissandier et al.’s reported value of -264 kcal/mol,82 which is for a gas-phase standard state of 1 atm combined with an aqueous phase standard state of 1 mol/L, to a standard state that uses a concentration of 1 mol/L in both the gas and aqueous phases. Also note that Tissandier’s reported value of -264 kcal/mol has sometimes been misinterpreted as corresponding to a standard state of 1 mol/L in both the gas and aqueous phases.15,16,94-96) Making an adjustment to account for the difference between the value used for

¢GS/(H+) here and the value used in Pliego and Riveros’ work brings the two sets of data into very good agreement with one another. 2.4. Water-Solute Clusters. We also compiled another data set, to be called the selectively clustered-ion data set, in which 31 bare ions in the unclustered-ion data set were replaced by the ion-water clusters that are listed (along with the water dimer) in Table 3. Using the free energy cycle shown in Scheme 3, the aqueous solvation free energies of these 31 solute-water clusters were determined according to

In the above equation, aqueous solvation free energies of the unclustered ions ¢GS/(M() were taken from Tables 1 or 2. When available, experimental values for the gas-phase binding energies were used in the above equation. When experimental values were not available, we calculated them at the B97-197/MG3S98 level of theory, which has recently been shown52 to perform well for nonbonded interactions in the gas phase. 2.5. Uncertainty of Experimental Data. We have previously estimated an average uncertainty of 0.2 kcal/mol for aqueous solvation free energies of neutral solutes in our data sets.15 The uncertaintiesin the aqueoussolvationfree energies for ionic solutes are significantly greater due in part to their

Scheme 2 Table 3. Aqueous Solvation Free Energies for Solute-Water Clustersa

AâH2Ob ¢G°gas(B.E.)c ¢GS/(A)d ¢GS/(AâH2O) a Aqueous solvation free energies are for a temperature of 298 K. b B97-1/MG3S optimized geometries. c Water-solute binding free energies. d Aqueous solvation free energy of the bare solute. e Experimental value, taken from ref 78. f Experimental value, taken from ref 100. g Theoretical value, calculated at the B97-1/MG3S level of theory.

Scheme 3

Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1139 large magnitudesbut also due to uncertaintiesassociatedwith each of the experimental quantities (except pKa, as discussed below) used to determine them. The uncertaintiesin aqueous solvation free energies for ionic solutes are reported here as the root-sum-of-squares combination of each of the uncertainties associated with the individual experimentalmeasurements used to determine them.

(Parte 1 de 6)