Baixe SM6 A Density Functional Theory Continuum Solvation e outras Notas de estudo em PDF para Engenharia Elétrica, somente na Docsity! SM6: A Density Functional Theory Continuum Solvation Model for Calculating Aqueous Solvation Free Energies of Neutrals, Ions, and Solute-Water Clusters Casey P. Kelly, Christopher J. Cramer,* and Donald G. Truhlar* Department of Chemistry and Supercomputing Institute, 207 Pleasant St. SE, UniVersity of Minnesota, Minneapolis, Minnesota 55455-0431 Received June 28, 2005 Abstract: A new charge model, called Charge Model 4 (CM4), and a new continuum solvent model, called Solvation Model 6 (SM6), are presented. Using a database of aqueous solvation free energies for 273 neutrals, 112 ions, and 31 ion-water clusters, parameter sets for the mPW0 hybrid density functional of Adamo and Barone (Adamo, C.; Barone, V. J. Chem. Phys. 1998, 108, 664-675) were optimized for use with the following four basis sets: MIDI!6D, 6-31G- (d), 6-31+G(d), and 6-31+G(d,p). SM6 separates the observable aqueous solvation free energy into two different components: one arising from long-range bulk electrostatic effects and a second from short-range interactions between the solute and solvent molecules in the first solvation shell. This partition of the observable solvation free energy allows SM6 to effectively model a wide range of solutes. For the 273 neutral solutes in the test set, SM6 achieves an average error of ∼0.50 kcal/mol in the aqueous solvation free energies. For solutes, especially ions, that have highly concentrated regions of charge density, adding an explicit water molecule to the calculation significantly improves the performance of SM6 for predicting solvation free energies. The performance of SM6 was tested against several other continuum models, including SM5.43R and several different implementations of the Polarizable Continuum Model (PCM). For both neutral and ionic solutes, SM6 outperforms all of the models against which it was tested. Also, SM6 is the only model (except for one with an average error 3.4 times larger) that improves when an explicit solvent molecule is added to solutes with concentrated charge densities. Thus, in SM6, unlike the other continuum models tested here, adding one or more explicit solvent molecules to the calculation is an effective strategy for improving the prediction of the aqueous solvation free energies of solutes with strong local solute-solvent interactions. This is important, because local solute-solvent interactions are not specifically accounted for by bulk electrostatics, but modeling these interactions correctly is important for predicting the aqueous solvation free energies of certain solutes. Finally, SM6 retains its accuracy when used in conjunction with the B3LYP and B3PW91 functionals, and in fact the solvation parameters obtained with a given basis set may be used with any good density functional or fraction of Hartree-Fock exchange. 1. Introduction Continuum solvent models are an attractive alternative to explicit solvent approaches, because they require less com- putational effort, making them applicable to larger solutes, extensive conformational analysis, and large libraries of compounds. Continuum solvent models have advanced to a point where aqueous solvation free energies of typical neutral organic solutes can usually be predicted accurately to better than 1 kcal/mol. However, the development of methods for accurately predicting aqueous solvation free energies of ionic solutes has been much less successful. In part, this is due to * Corresponding author e-mail: cramer@chem.umn.edu (C.J.C.) and truhlar@chem.umn.edu (D.G.T). 1133J. Chem. Theory Comput. 2005, 1, 1133-1152 10.1021/ct050164b CCC: $30.25 © 2005 American Chemical Society Published on Web 10/22/2005 the limited availability of experimental solvation free energies for ionic solutes. Unlike neutral solutes, for which aqueous solvation free energies can be obtained directly from partition coefficients of solutes between the gas phase and dilute aqueous solution, aqueous solvation free energies of ionic solutes must be determined from other experimental observ- ables. Because of this, there is a certain degree of uncertainty associated with making direct comparisons between calcu- lated and experimental aqueous solvation free energies for ions, making the development of a model that is able to treat neutral and ionic solutes at the same level of accuracy a very challenging task. In addition, because of strong electrostatic effects arising from localized solute-solvent interactions, the magnitudes of solvation free energies are much greater for ions than for neutrals, requiring smaller percentage errors for the same absolute accuracy. Because the differential solvation free energy between a given acid/base pair can be used in various thermodynamic cycles to determine pKa, developing a model that can be used to predict these free energies accurately is a high priority. A number of different strategies have been used to account for short-range interactions within the framework of con- tinuum solvation theory.1,2 For example, the SMx series of models developed by our co-workers3-17 and us augments the electrostatic portion of the calculated solvation free energy with an empirical term that accounts for, among other things, deviations of short-range interactions, primarily those in the first solvation shell, from the bulk electrostatic limit. Although this approach has been very successful in predicting solvation free energies of neutral solutes, it remains unclear whether this type of correction to the solvation free energy can be applied to ionic solutes with the same success. A key issue in predicting the large electrostatic effects involved in solvation of ions is determining the shape of the cavity that is used to define the boundary between the electronic distribution of the solute and the continuum solvent. In all of our recent SMx models7-16 (including the one presented in this article), a single set of radii that are dependent only on the atomic number of the given atom are used to build up the molecular cavity. More elaborate methods for assigning atomic radii depend on the local chemical environment of the atom.18-23 One such prescrip- tion, called the united atom for Hartree-Fock (UAHF) method,23 assigns radii to atoms based on their hybridization states, what other atoms are bonded to them, and their formal charge. These radii are often used in conjunction with the popular polarizable continuum models (PCMs)24-34 to predict aqueous solvation free energies. Other methods have been proposed in which the atomic radii depend on partial atomic charge, and several groups have had some success using charge-dependent atomic radii in continuum solvation calculations,35-41 although the work has been limited to a small number of solutes. Other methods define the solute cavity as the contour on which the solute electronic density is equal to some constant value.42 For example, Chipman has recently shown43 that using a value of 0.001 e/a03 for the isodensity contour along with a continuum model44 results in accurate aqueous solvation free energies for a series of protonated amines. However, Chipman also showed in his work43 that for a series of oxygen-containing anions, no single common contour value could be used to accurately predict their absolute aqueous solvation free energies. Chipman attributed this finding to the inability of a con- tinuum model alone to account for strong anion-water interactions in the first solvation shell and suggested that better results might be obtained by augmenting continuum solvent calculations with other complementary methods that are especially designed to account for specific short-range interactions. Adding explicit solvent molecules has been a popular strategy for trying to incorporate the effects of specific solute-solvent interactions into continuum solvent calcula- tions. Often, this involves treating enough solvent molecules classically or quantum mechanically to account for at least the entire first solvation shell around the given solute.45 Depending on the solute, this may require a large number of explicit solvent molecules, which can lead to a significant increase in the amount of computational effort expended. In addition to this problem, there are several other potential problems associated with treating solvent molecules explic- itly. First, for many solutes, there is no easy way to determine the number or orientation of explicit water molecules in the first solvation shell. For example, X-ray diffraction experi- ments46 and various theoretical calculations47-51 lead to average coordination numbers ranging from 6 to 9.3 for the Ca2+ ion, suggesting that several different solvation structures exist. Second, even for solutes for which the first solvation shell is well defined, to properly treat even a few solvent molecules explicitly will most likely involve the need to sample over a large number of conformations that are local minima. Finally, introducing explicit solvent molecules will not yield more accurate solvation free energies if the level of theory used to treat the system is not high enough. Because properly treating nonbonded interactions usually requires treatment of electron correlation and the use of fairly large basis sets,52 any realistic attempt at using solute-water clusters to calculate aqueous solvation free energies requires an accurate treatment of the entire solute-solvent system, which is practical only for small numbers of solvent molecules. Despite these problems, hybrid approaches com- bining quantal and classical treatments of solvent molecules have had some success in predicting aqueous solvation free energies of ions. For example, Pliego and Riveros showed53 that for a test set of 17 ions, including several explicit water molecules in the continuum calculation significantly im- proved the performance of the model. To determine the number of explicit solvent molecules required in the calcula- tion, these workers developed an approach in which the aqueous solvation free energy of the bare solute is minimized with respect to the number of coordinating waters.54 Besides predicting solvation free energies of ions, this approach has also been used to predict solvatochromic shifts,55,56 where explicit solute-solvent effects between the electronically excited solute and surrounding solvent molecules can have large effects on both the magnitude and direction of the shift. In the present paper, we will present a new continuum solvent model called Solvation Model 6 (SM6). This model is similar to our most recently developed previous continuum model, called SM5.43R,15,16 but improves on it in a number of significant ways. In both of these models, SM6 and 1134 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al. Table 1. Aqueous Solvation Free Energies (kcal/mol) of Bare Anionsa A- AH ∆G°gas (AH)b pKa (AH)c ∆GS / (AH)d ∆GS / (A-) HC2- acetylene 370.0 ( 1.8 21.7e 0.0 -74 ( 3 HCO2- formic acid 338.3 ( 1.5 3.8 -7.0f -78 ( 2 CH3CO2- acetic acid 341.4 ( 2.0 4.8 -6.7 -80 ( 3 CH3CH2CO2- propanoic acid 340.4 ( 2.0 4.9 -6.5 -78 ( 3 CH3(CH2)4CO2- hexanoic acid 339.0 ( 2.1 4.9 -6.2 -76 ( 3 H2CdCHCO2- acrylic acid 337.2 ( 2.8 4.3 -6.6f -76 ( 3 CH3COCO2- pyruvic acid 326.5 ( 2.8 2.5 -9.4f -70 ( 3 C6H5CO2- benzoic acid 333.0 ( 2.0 4.2 -7.9f -73 ( 3 CH3O- methanol 375.0 ( 0.6 15.5 -5.1 -97 ( 2 C2H5O- ethanol 371.3 ( 1.1 15.9 -5.0 -93 ( 2 CH3CH2CH2O- 1-propanol 369.4 ( 1.4 16.1 -4.8 -90 ( 2 (CH3)2CHO- 2-propanol 368.8 ( 1.1 17.1 -4.8 -88 ( 2 CH3CH2CHOCH3- 2-butanol 367.5 ( 2.0 17.6 -4.7f -86 ( 3 C(CH3)3O- t-butanol 367.9 ( 1.1 19.2 -4.5 -84 ( 2 H2CdCHCH2O- allyl alcohol 366.6 ( 2.8 15.5 -5.1 -88 ( 3 C6H5CH2O- benzyl alcohol 363.4 ( 2.0 15.4 -6.6f -87 ( 3 CH3OCH2CH2O- 2-methoxyethanol 366.8 ( 2.0 14.8 -6.8 -91 ( 3 C6H5O- phenol 342.9 ( 1.3 10.0 -6.6 -74 ( 2 o-CH3C6H4O- 2-methylphenol 342.4 ( 2.0 10.3 -5.9 -72 ( 3 m-CH3C6H4O- 3-methylphenol 343.3 ( 2.0 10.1 -5.5 -73 ( 3 p-CH3C6H4O- 4-methylphenol 343.8 ( 2.0 10.3 -6.1 -74 ( 3 CH2OHCH2O- 1,2-ethanediol 360.9 ( 2.0 15.4 -9.3 -87 ( 3 m-HOC6H4O- 3-hydroxyphenol 339.1 ( 2.0 9.3g -11.4f -76 ( 3 p-HOC6H4O- 4-hydroxyphenol 343.1 ( 2.0 9.9g -11.9f -80 ( 3 CH3OO- methyl hydroperoxide 367.6 ( 0.7 11.5 -5.3h -95 ( 2 CH3CH2OO- ethyl hydroperoxide 363.9 ( 2.0 11.8 -5.3h -91 ( 3 CH2(O)CH - acetaldehyde 359.4 ( 2.0 16.5 -3.5f -78 ( 3 CH3C(O)CH2- acetone 362.2 ( 2.0 19.0 -3.9 -78 ( 3 CH3CH2C(O)CHCH3- 3-pentanone 361.4 ( 2.0 19.9 -3.3f -76 ( 3 CH2CN - acetonitrile 366.0 ( 2.0 25.0 -3.9 -74 ( 3 NCNH - cyanamide 344.0 ( 2.0 10.3g -6.2f -74 ( 3 C6H5NH - aniline 359.1 ( 2.0 27.7 -5.5 -65 ( 3 (C6H5)2N - diphenylamine 343.8 ( 2.0 22.4 -5.3f -56 ( 3 CN - hydrogen cyanide 343.7 ( 0.3 9.2 i -3.1f -72 ( 2 o-NO2C6H4O- 2-nitrophenol 329.5 ( 2.0 7.2 -4.5f -62 ( 3 m-NO2C6H4O- 3-nitrophenol 327.6 ( 2.0 8.4 -9.6f -64 ( 3 p-NO2C6H4O- 4-nitrophenol 320.9 ( 2.0 7.1 -10.6f -60 ( 3 CH2NO2- nitromethane 350.4 ( 2.0 10.2 -4.0f -78 ( 3 p-NO2C6H5NH - 4-nitroaniline 336.2 ( 2.0 18.2 -9.9f -59 ( 3 CH3CONH - acetamide 355.0 ( 2.0 15.1 -9.7 -82 ( 3 CH3S- methanethiol 350.6 ( 2.0 10.3 -1.2 -76 ( 3 CH3CH2S- ethanethiol 348.9 ( 2.0 10.6 -1.3 -74 ( 3 C3H7S- 1-propanethiol 347.9 ( 2.0 10.7 -1.1 -72 ( 3 C6H5S- thiophenol 333.8 ( 2.0 6.6 -2.6 -65 ( 3 CH3S(O)CH2- dimethyl sulfoxide 366.8 ( 2.0 33.0 -9.8f -70 ( 3 CCl3- chloroform 349.7 ( 2.0 24.0 -1.1 -56 ( 3 CF3CO2- trifluoroacetic acid 316.7 ( 2.0 0.5 -7.3f -61 ( 3 CH2ClCO2- chloroacetic acid 328.9 ( 2.0 2.9 -8.7f -72 ( 3 CHCl2CO2- dichloroacetic acid 321.5 ( 2.0 1.4 -6.6f -64 ( 3 CF3CH2O- 2,2,2-trifluoroethanol 354.1 ( 2.0 12.4 -4.3 -80 ( 3 CH(CF3)2O- 1,1,1,3,3,3-hexafluoropropan-2-ol 338.4 ( 2.0 9.3 -3.8 -67 ( 3 ClC6H4O- 2-chlorophenol 337.1 ( 2.0 8.5 -4.5f -68 ( 3 ClC6H4O- 4-chlorophenol 336.5 ( 2.0 9.4 -6.2f -68 ( 3 HO - water 383.7 ( 0.2 15.7 -6.3 -107 ( 2 HO2- hydrogen peroxide 368.6 ( 0.6 11.7 -8.6h -99 ( 2 O2- hydroperoxyl radical 346.7 ( 0.8 4.7 -7.0 j -85 ( 2 HS- hydrogen sulfide 344.9 ( 1.2 7.0 -0.7 -74 ( 2 F- hydrofluoric acid -102 ( 2k Cl- hydrochloric acid -73 ( 2k Br- hydrobromic acid -66 ( 2k a Aqueous solvation free energies are for a temperature of 298 K. b Gas-phase basicities taken from ref 83. c From ref 87, unless otherwise noted. d From the current data set unless otherwise noted. e Reference 89. f Reference 76. g Reference 91. h Reference 86. i Reference 92. j Reference 85. k Reference 82. Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1137 experimental aqueous solvation free energies of F-, Cl-, and Br- were taken from Tissandier et al.82 and adjusted for a change in the standard state and the value used here for ∆GS /(H+). For the remaining anions, we used the free energy cycle shown in Scheme 2 and eq 5 to determine the aqueous solvation free energy according to Table 2. Aqueous Solvation Free Energies of Bare Cationsa AH+ AH ∆G°gas (AH+)b pKa (AH+) c ∆GS / (AH)d ∆GS / (AH+) CH3OH2+ methanol 173.2 ( 2.0 -2.1 -5.1 -91 ( 3 CH3CH2OH2+ ethanol 178.0 ( 2.0 -1.9 -5.0 -86 ( 3 (CH3)2OH+ dimethyl ether 182.7 ( 2.0 -2.5 -1.8 -78 ( 3 (C2H5)2OH+ diethyl ether 191.0 ( 2.0 -2.4 -1.8 -70 ( 3 CH3C(OH)CH3+ acetone 186.9 ( 2.0 -2.9 -3.9 -75 ( 3 CH3COHC6H5+ acetophenone 198.2 ( 2.0 -4.3 -4.6 -63 ( 3 CH3NH3+ methylamine 206.6 ( 2.0 10.6 -4.6 -74 ( 3 CH3(CH2)2NH3+ n-propylamine 211.3 ( 2.0 10.6 -4.4 -70 ( 3 (CH3)2CHNH3+ isopropylamine 212.5 ( 2.0 10.6 -3.7e -68 ( 3 C(CH3)3NH3+ t-butylamine 215.1 ( 2.0 10.7 -3.9e -65 ( 3 c-C6H11NH3+ cyclohexanamine 215.0 ( 2.0 10.7 -5.1e -67 ( 3 H2CdCHCH2NH3+ allylamine 209.2 ( 2.0 9.5 -4.3e -70 ( 3 (CH3)2NH2+ dimethylamine 214.3 ( 2.0 10.7 -4.3 -67 ( 3 (C2H5)2NH2+ diethylamine 219.7 ( 2.0 11.0 -4.1 -62 ( 3 (n-C3H7)2NH2+ di-n-propylamine 222.1 ( 2.0 11.0 -3.7 -59 ( 3 (H2CdCHCH2)2NH2+ diallylamine 219.0 ( 2.0 9.3 -4.0e -60 ( 3 (CH3)3NH+ trimethylamine 219.4 ( 2.0 9.8 -3.2 -59 ( 3 (C2H5)3NH+ triethylamine 227.0 ( 2.0 10.8 -3.0e -53 ( 3 (n-C3H7)3NH+ tri-n-propylamine 229.5 ( 2.0 10.3 -2.5e -49 ( 3 C6H5NH3+ aniline 203.3 ( 2.0 4.6 -5.5 -70 ( 3 o-CH3C6H4NH3+ 2-methylaniline 205.3 ( 2.0 4.5 -5.6e -68 ( 3 m-CH3C6H4NH3+ 3-methylaniline 206.5 ( 2.0 4.7 -5.7e -68 ( 3 p-CH3C6H4NH3+ 4-methylaniline 206.7 ( 2.0 5.1 -5.6e -68 ( 3 m-NH2C6H4NH3+ 3-aminoaniline 214.9 ( 2.0 5.0 -9.9e -64 ( 3 C6H5NH2CH3+ N-methylaniline 212.7 ( 2.0 4.9 f -4.7e -61 ( 3 C6H5NH2CH2CH3+ N-ethylaniline 213.4 ( 2.0 5.1 f -4.6e -60 ( 3 C6H5NH(CH3)2+ N,N-dimethylaniline 217.3 ( 2.0 5.1 -3.6e -55 ( 3 p-CH3C6H4NH(CH3)2+ 4-methyl-N,N-dimethylaniline 219.4 ( 2.0 5.6 -3.7e -54 ( 3 C6H5NH(CH2CH3)2+ N,N-diethylaniline 221.8 ( 2.0 6.6 -2.9e -52 ( 3 C10H7NH3+ 1-aminonaphthalene 209.2 ( 2.0 3.9 -7.3e -66 ( 3 C2H4NH2+ aziridine 208.5 ( 2.0 8.0 -4.5e -69 ( 3 C3H6NH2+ azetidine 217.2 ( 2.0 11.3 -5.6 -66 ( 3 C4H8NH2+ pyrrolidine 218.8 ( 2.0 11.3 -5.5 -64 ( 3 C5H10NH2+ piperidine 220.0 ( 2.0 11.1 -5.1 -62 ( 3 C6H12NH2+ azacycloheptane 220.7 ( 2.0 11.1 -4.9e -61 ( 3 C4H5NH+ pyrrole 201.7 ( 2.0 -3.8 -4.3e -60 ( 3 C5H5NH+ pyridine 214.7 ( 2.0 5.2 -4.7 -59 ( 3 C9H7NH+ quinoline 220.2 ( 2.0 4.8 -5.7e -54 ( 3 C4H8NHNH2+ piperazine 218.6 ( 2.0 9.7 -7.4 -64 ( 3 CH3CNH+ acetonitrile 179.0 ( 2.0 -10.0g -3.9 -73 ( 3 p-CH3OC6H4NH3+ 4-methoxyaniline 207.6 ( 2.0 5.3 -7.6e -69 ( 3 p-NO2C6H4NH3+ 4-nitroaniline 199.4 ( 2.0 1.0 -9.9e -74 ( 3 C4H8ONH2+ morpholine 213.0 ( 2.0 8.4 -7.2 -68 ( 3 CH3COHNH2+ acetamide 199.0 ( 2.0 -0.6 -9.7 -72 ( 3 C6H5COHNH2+ benzamide 205.8 ( 2.0 -1.4 -10.9e -65 ( 3 (CH3)2SH+ dimethyl sulfide 191.5 ( 2.0 -7.0 -1.5 -63 ( 3 (CH3)2SOH+ dimethyl sulfoxide 204.0 ( 2.0 -1.5 -9.8e -66 ( 3 m-ClC6H4NH3+ 3-chloroaniline 199.9 ( 2.0 3.5 -5.8e -73 ( 3 p-ClC6H4NH3+ 4-chloroaniline 201.2 ( 2.0 4.0 -5.9e -72 ( 3 NH4+ ammonia 195.7 ( 2.0 9.3 h -4.3 -83 ( 3 HNNH2+ hydrazine 196.6 ( 2.0 8.1 -6.3e -83 ( 3 H3O+ water 157.7 ( 0.7 -1.7 -6.3 -108 ( 2 a Aqueous solvation free energies are for a temperature of 298 K. b Gas-phase basicities from ref 84. c From ref 87, unless otherwise noted. d From the current data set, unless otherwise noted. e Reference 76. f Reference 91. g Reference 88. h Reference 92. ∆GS /(A-) ) -∆G°gas(A -) - ∆G°f* + ∆GS /(AH) - ∆GS /(H+) + 2.303RTpKa(AH) (7) 1138 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al. where ∆G°gas(A-) is the gas-phase basicity of A-, equal to G°gas(A-) + G°gas(H+) - G°gas(AH). Experimental gas-phase basicities of anions and acidities of neutral species were taken from the National Institute of Standards and Technology (NIST) database;83 experimental gas-phase basicities of neutral species were taken from the most recent compilation of Lias et al.84 For neutral species, experimental aqueous solvation free energies were taken from the data set described in section 2.2 and from several additional sources.76,85,86 A large part of the experimental aqueous pKa data used here was taken from the compilation of Stewart;87 pKa data not available in this compilation were taken from several additional sources.88-92 We note that gas-phase free energies and aqueous solvation energies for neutral solutes can be calculated fairly ac- curately, and several other compilations of aqueous solvation free energies of ions79,80,93 use theoretical data for these quantities. Although we did not use theoretical values in deriving any of the free energies shown in Tables 1 and 2, future extensions of the present database could include aqueous solvation energies determined from calculated data. The aqueous solvation free energies reported here can be compared to the recent compilation of Pliego et al.,93 who used the same thermodynamic cycle, along with Tissandier et al.’s82 value of -266 kcal/mol for ∆GS /(H+), in deter- mining the aqueous solvation free energies for 56 ions. (Note that the above value of -266 kcal/mol for ∆GS /(H+) reflects the standard-state correction required in order to adjust Tissandier et al.’s reported value of -264 kcal/mol,82 which is for a gas-phase standard state of 1 atm combined with an aqueous phase standard state of 1 mol/L, to a standard state that uses a concentration of 1 mol/L in both the gas and aqueous phases. Also note that Tissandier’s reported value of -264 kcal/mol has sometimes been misinterpreted as corresponding to a standard state of 1 mol/L in both the gas and aqueous phases.15,16,94-96) Making an adjustment to account for the difference between the value used for ∆GS /(H+) here and the value used in Pliego and Riveros’ work brings the two sets of data into very good agreement with one another. 2.4. Water-Solute Clusters. We also compiled another data set, to be called the selectively clustered-ion data set, in which 31 bare ions in the unclustered-ion data set were replaced by the ion-water clusters that are listed (along with the water dimer) in Table 3. Using the free energy cycle shown in Scheme 3, the aqueous solvation free energies of these 31 solute-water clusters were determined according to In the above equation, aqueous solvation free energies of the unclustered ions ∆GS /(M() were taken from Tables 1 or 2. When available, experimental values for the gas-phase binding energies were used in the above equation. When experimental values were not available, we calculated them at the B97-197/MG3S98 level of theory, which has recently been shown52 to perform well for nonbonded interactions in the gas phase. 2.5. Uncertainty of Experimental Data. We have previ- ously estimated an average uncertainty of 0.2 kcal/mol for aqueous solvation free energies of neutral solutes in our data sets.15 The uncertainties in the aqueous solvation free energies for ionic solutes are significantly greater due in part to their Scheme 2 Table 3. Aqueous Solvation Free Energies for Solute-Water Clustersa A‚H2Ob ∆G°gas(B.E.)c ∆GS / (A)d ∆GS / (A‚H2O) H2O(H2O) 3.34 ( 0.50e -6.31 ( 0.20 -14.06 ( 0.57 CH3OH2(H2O)+ -18.5 f -91.1 ( 2.8 -77 ( 3 CH3CH2OH2(H2O)+ -16.8f -86.5 ( 2.8 -74 ( 3 (CH3)2OH(H2O)+ -15.4f -77.8 ( 2.8 -67 ( 3 (C2H5)2OH(H2O)+ -11.4 ( 2.0g -69.6 ( 2.8 -62 ( 3 CH3C(OH)CH3(H2O)+ -12.8f -75.1 ( 2.8 -67 ( 3 CH3COHC6H5(H2O)+ -10.8f -62.6 ( 2.8 -56 ( 3 NH4(H2O)+ -12.6f -83.3 ( 2.8 -75 ( 3 H3O(H2O)+ -27.0 ( 2.0g -108.4 ( 2.0 -86 ( 3 C2H(H2O)- -10.6 ( 1.0f -78.4 ( 2.6 -72 ( 3 CN(H2O)- -8.3 ( 0.7f -72.2 ( 1.9 -68 ( 2 CH3O(H2O)- -17.00 ( 0.30f -96.9 ( 2.0 -84 ( 2 C2H5O(H2O)- -14.2 ( 2.0g -92.6 ( 2.2 -83 ( 3 CH3CH2CH2O(H2O)- -14.6 ( 2.0g -90.2 ( 2.4 -80 ( 3 (CH3)2CHO(H2O)- -12.3 ( 2.0g -88.2 ( 2.2 -80 ( 3 CH3CH2CHOCH3(H2O)- -9.9 ( 2.0g -86.1 ( 2.8 -81 ( 3 C(CH3)3O(H2O)- -12.2 ( 2.0g -84.2 ( 2.2 -76 ( 3 H2CdCHCH2O(H2O)- -13.5 ( 2.0g -88.5 ( 3.4 -79 ( 4 C6H5CH2O(H2O)- -11.6 ( 2.0g -87.0 ( 2.8 -80 ( 3 CH3OCH2CH2O(H2O)- -13.5 ( 2.0g -91.4 ( 2.8 -82 ( 3 CH2OHCH2O(H2O)- -14.0 ( 2.0g -87.2 ( 2.8 -78 ( 3 CF3CH2O(H2O)- -11.6 ( 2.0g -79.5 ( 2.8 -72 ( 3 CH(CF3)2O(H2O)- -6.0 ( 2.0g -67.4 ( 2.8 -66 ( 3 CH3OO(H2O)- -14.6 ( 2.0g -95.2 ( 2.0 -85 ( 3 CH3CH2OO(H2O)- -14.1 ( 2.0g -91.1 ( 2.8 -91 ( 3 HO(H2O)- -19.8 ( 1.4f -106.6 ( 1.9 -91 ( 2 HO2(H2O)- -17.0 ( 2.0g -99.2 ( 2.0 -87 ( 3 O2(H2O)- -12.1 ( 2.0f -85.2 ( 2.1 -78 ( 3 HS(H2O)- -8.6 ( 2.0f -74.0 ( 2.3 -70 ( 3 F(H2O)- -12.5 ( 1.6f -102.5 ( 1.9 -94 ( 3 Cl(H2O)- -9.0 ( 4.0f -72.7 ( 1.9 -68 ( 4 Br(H2O)- -7.1 ( 2.0f -66.3 ( 1.9 -64 ( 3 a Aqueous solvation free energies are for a temperature of 298 K. b B97-1/MG3S optimized geometries. c Water-solute binding free energies. d Aqueous solvation free energy of the bare solute. e Ex- perimental value, taken from ref 78. f Experimental value, taken from ref 100. g Theoretical value, calculated at the B97-1/MG3S level of theory. Scheme 3 ∆GS /(H2O‚M () ) ∆GS /(H2O) + ∆GS /(M() - ∆G°gas(B.E.) + ∆G° f* (8) Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1139 atomic surface tensions are given below, where we write Z for Zk to simplify the notation: The functional forms shown above are a convenient way to treat different types of chemical environments that a particular atom in a solute might encounter. Unlike models that require the user to assign types to atoms (e.g., molecular mechanics force fields), these functional forms do not require the user to make assignments. This feature means that the user is never in doubt about which parameter to use. Furthermore, because the above equations are smooth functions of the solute geometry, the present model can be applied to systems containing nonbonded or partially bonded pairs of atoms, such as transition states and solute-solvent clusters. It is important to point out that although separating the free energy of solvation into the above components allows us to effectively model a wide variety of solutes, only the total free energy is a state function, so there is a certain degree of ambiguity associated with separating the aqueous solvation free energy, which is an experimental observable, into several contributions that cannot be measured indepen- dently. Thus, there is some flexibility in how to interpret these various contributions. We interpret the contribution to the free energy arising from GP as accounting for electrostatic interactions between the charge distribution of the solute (which is modeled as a collection of point charges distributed over atomic spheres) and the bulk electric field of the solvent when it is assumed to begin at a boundary defined by the effective Born radii. For ionic solutes, GP makes the largest contribution to the overall solvation free energy. Because GP is calculated under the assumption that the solvent responds linearly to the electronic distribution of the solute (hence58 the factor 1/2 appearing in eq 10), nonlinear solvent effects, such as changes in the dielectric constant of the solvent in the vicinity of the solute, and strong solute-solvent hydrogen bonds, which prevent the solvent from fully responding to the solute charge distribution, are not explicitly accounted for by this term. One strategy that we have used to partly account for these nonlinear effects has been to empirically adjust the values that we use for the intrinsic Coulomb radii. For neutrals, previous experience has shown that the overall performance of our models is relatively insensitive to the values used for these atomic radii, whereas for ions the performance of our models is quite sensitive to the choice of these radii. By making adjustments to the atomic radii, our previous models have achieved an accuracy of ∼5 kcal/ mol in aqueous solvation free energies for the majority of the ions used to train these models. However, the present data set is much larger and more diverse than our previous data sets for ions. Thus, one of the goals of this work will be to see if the deficiencies described above can adequately be accounted for by making empirical adjustments to the intrinsic Coulomb radii. A second strategy to account for deficiencies of a bulk electrostatic model is to include GCDS. In the past, we have interpreted the GCDS term as formally accounting for cavita- tion (i.e. the free energy cost associated with creating a cavity in the solvent to accommodate the solute), dispersion interactions between the solute and solvent, and specific solute effects on the solvent structure (e.g. the loss of orientational freedom of water molecules around a nonpolar solute). However, because the GCDS term is empirical in nature, it can be more accurately interpreted as accounting for any contribution to the total solvation free energy of a given solute that is not explicitly accounted for by the bulk electrostatic (GP) term. Such effects include, but are not limited to, the nonlinear solvent effects described above, deviations of the true solute-solvent interface defined by the atomic radii, short-range exchange and repulsion forces between the solute and solvent, neglect of charge transfer between the solute and solvent, and any systematic errors that may arise from the GB approximation, the ability of partial atomic charges to represent the true solute charge distribution, or the level of theory used to calculate the electronic wave function of the solute. In addition, several other effects are implicitly accounted for by GCDS that could, in principle, be explicitly calculated, such as the change in the solute’s translational, vibrational, or rotational free energy in moving from the gas phase to solution. By using atomic surface tensions, our previous models have been quite σZ|Z)1 ) σ̃Z + ∑ k ′ Zk ′)6,7,8,16 σ̃ZZ ′[T(Rkk ′|RhZZ ′,W)] (16) σZ|Z)6 ) σ̃Z + ∑ k ′*k Zk ′)6 σ̃ZZ ′[T(Rkk ′|RhZZ ′,W)] + ∑ k ′*k Zk ′)6 σ̃ZZ ′ (2) [T(Rkk ′|RhZZ ′(2) ,WCC(2))] + ∑ k ′*k Zk ′)7 σ̃ZZ ′[T(Rkk ′|RhZZ ′,W)]2 (17) σZ|Z)7 ) σ̃Z + ∑ k ′*k Zk ′)6 σ̃ZZ ′{[T(Rkk ′|RhZZ ′,W)] ∑ k ′′*k k ′′*k ′ [T(Rkk ′|RhZZ ′,W)]2}1.3 + ∑ k ′* k Zk ′)6 σ̃ZZ ′ (2) [T(Rkk ′|RhZZ ′,W)] ∑ k ′′ Zk ′′)8 [T(Rk ′k ′′|RhZ ′Z ′′,W)] + ∑ k ′* k Zk ′)6 σ̃ZZ ′ (3) [T(Rkk ′|RhZZ ′,WNC)] (18) σZ|Z)8 ) σ̃Z + ∑ k ′* k Zk ′)6 σ̃ZZ ′[T(Rkk ′|RhZZ ′,WOC)] + ∑ k ′* k Zk ′)7,8,15 σ̃ZZ ′[T(Rkk ′|RhZZ ′,W)] (19) σZ|Z)9 ) σ̃Z (20) σZ|Z)15 ) σ̃Z (21) σZ|Z)16 ) σ̃Z + ∑ k ′* k Zk ′)15,16 σ̃ZZ ′[T(Rkk ′|RhZZ ′,W)] (22) σZ|Z)17 ) σ̃Z (23) σZ|Z)35 ) σ̃Z (24) 1142 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al. successful at predicting aqueous solvation free energies for a wide variety of neutral solutes (an average error of ∼0.5 kcal/mol), including many hydrogen bonding solutes. It is important to point out that because (in the present work and in most, but not all, of our previous work) we optimize the parameters contained in the GCDS term using only neutral solutes, applying this scheme also to ions involves the assumption that first solvation shell interactions are similar for a given pair of solutes containing the same functionality but different formal charge; this may be a major contributor to the residual error. We could of course eliminate this problem by using different surface tension parameters for ions, but, as stated above, unlike some of the other methods in the literature, we avoid using molecular mechanics types. 4. Parameters To Be Optimized Three different types of parameters will be optimized as part of this work: (1) the atomic radii in eq 12, (2) the value for d in eq 11, and (3) the atomic surface tension parameters σ̃Z and σ̃ZZ ′ in eqs 14-24. The above parameters will be optimized using gas-phase geometries. In all cases, the solutes in our database were represented by a single, lowest- energy conformation. For some of the solutes, in particular the solute-water clusters, this involved performing a con- formational analysis to identify the global minimum on the potential energy surface. For the acetamide cation, we used the geometry corresponding to the oxygen-protonated spe- cies, which is 9.7 kcal/mol lower in free energy in the gas phase than the nitrogen-protonated species (MPW25/MIDI! level of theory). For the acetamide anion, we used the deprotonated imidate form, which is lower in free energy in the gas phase than the enolate by 15.7 kcal/mol (MPW25/ MIDI! level of theory). For all of the unclustered solutes used in this work, we used geometries optimized at the MPW25/MIDI! level of theory. The MIDI! basis set61,62,128 is an especially economical basis set for calculations on large organic systems, but it was designed to give particularly accurate geometries. All of the solute-water clusters were optimized at the B97-1/ MG3S level of theory. 5. Results 5.1. Partial Atomic Charges. Although there is formally no “correct” method for assigning partial atomic charges because partial atomic charge it is not a quantum mechanical observable,129 several qualities make CM4 partial atomic charges more suitable for use in the present model than charges obtained from other methods. First, dipole moments derived from CM4 point-charges are generally more accurate than point-charge-derived dipole moments obtained using other charge partitioning schemes. This is demonstrated by the data in Table 5, which lists the mean unsigned errors between experimental dipole moments for 397 neutral molecules and those dipole moments calculated using CM4 point charges and those obtained from a redistributed Löwdin population analysis (RLPA).117 Also listed are the average errors for 107 unclustered ionic molecules (experimental dipole moments are not available for ionic solutes, so we used density dipole moments calculated at the MPW25/ MG3S//MPW25/MIDI! level of theory for comparison). The data in Table 5 show that for most of the molecules tested here, CM4-point-charge-derived dipole moments are more accurate than those calculated using RLPA partial atomic charges (for nondiffuse basis sets, RLPA partial atomic charges are equivalent to those obtained from a Löwdin population analysis113-116). One notable exception is for the anions tested here, where at the MPW25/MIDI!6D level of theory, RLPA-point-charge-derived dipole moments are more accurate than the CM4-point-charge-derived dipole moments (for other levels of theory, the CM4-point-charge derived dipole moments are more accurate). A second reason we prefer using CM4 partial atomic charges is because they are less sensitive to changes in the basis set than partial atomic charges obtained from other models. This becomes especially true when diffuse functions are added. Table 5 shows that as polarization and diffuse functions are added to the basis set, the quality of RLPA charges progressively worsens, whereas a much smaller dependency on basis set is observed for the CM4 charges. Finally, it has recently been shown112 that CM3 (a model quite similar to CM4) delivers more accurate charges for interior or buried atoms than partial atomic charges calculated from a fit to the electrostatic potential calculated around the molecule of interest (e.g. the ChElPG charge model130). In this same paper, it was also shown that CM3 charges are much less sensitive to small conformational changes and to the level of treatment of electron correlation than are ChElPG charges. 5.2. Aqueous Solvation Free Energies Calculated Using Previously Defined Atomic Radii. First, we tested several previously defined sets of atomic radii for predicting aqueous solvation free energies. For this, we developed three inter- mediate models using three different sets of atomic radii in eq 12, in particular the following: Bondi’s atomic radii (which we also use to calculate the solvent-accessible-surface area in eq 13) and the radii used by our SM5.42R10-14 and SM5.43R15,16 models. These sets of atomic radii are listed in Table 6. For each intermediate method, we calculated ∆GEP values at the MPW25/6-31+G(d,p) level of theory for all of the solutes in our data set, with d fixed at 4. With calculated values for ∆GEP in hand, we then optimized a set of surface tension coefficients σ̃Z and σ̃ZZ′ for each intermedi- ate model by minimizing the errors between the experimental Table 5. Mean Unsigned Errors (Debyes) in Dipole Moments Calculated with Partial Atomic Charges Obtained from CM4 and RLPA at the MPW25 Level of Theorya CM4 RLPAb basis set neutrals cations anions neutrals cations anions MIDI!6D 0.19 0.20 0.61 0.37 0.24 0.35 6-31G(d) 0.23 0.24 0.36 0.62 0.21 0.46 6-31+G(d) 0.27 0.32 0.44 0.76 0.43 0.55 6-31+G(d,p) 0.26 0.33 0.40 0.81 0.67 0.67 a For the neutral solutes, point-charge-derived dipole moments were compared to dipole moments taken from the CM3 training set, which is described in ref 112 (397 total dipole moments). For the dipolar ions in Tables 1 and 2, point-charged derived dipole moments were compared to density dipole moments calculated at the MPW25/ MG3S level of theory (107 total dipole moments). b For nondiffuse basis sets, RLPA partial atomic charges are equivalent to Löwdin partial atomic charges. Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1143 and calculated solvation free energies for all 273 of the neutral solutes. This step was accomplished using a NAG Fortran 90 routine, in particular the linear least squares solver routine.131 The performance of the three intermediate models is summarized in Table 7. The data in Table 7 show that all three models lead to similar errors in the solvation free energies of neutral solutes. Although not shown explicitly in Table 7, all three of the intermediate models predict an aqueous solvation free energy for water that is ∼4 kcal/mol too negative. We encountered a similar error during the development of earlier models and removed it by including a surface tension that identified oxygen atoms in the vicinity of two hydrogen atoms. We did not include this surface tension here, because doing so would have a negative impact on the performance of the model for hydronium, protonated alcohols, and solute-water clusters. For the water dimer, all three of the intermediate models predict its solvation free energy correctly to within 1.2 kcal/mol without using the special surface tension, which is a significant improvement compared to the performance of these models for the bare water solute. For ions, the errors shown in Table 7 are broken down into two different subsets: the unclustered-ion data set and the selectively clustered-ion data set. The unclustered-ion data set contains all of the experimental aqueous solvation free energies listed in Tables 1 and 2 but none of the solute- water clusters in Table 3 (112 ionic solutes). The selectively clustered-ion data set contains all of the experimental aqueous solvation free energies in Table 3 plus those aqueous solvation free energies that are in Tables 1 or 2 but not Table 3 (i.e. solutes that are included in the selectively clustered- ion data set as solute-water clusters are not included as their analogous bare ions). The criteria we used to decide to which of the solutes in our training set to add an explicit water molecule (i.e. which bare solutes would be deleted from the selectively clustered-ion data set and replaced by their analogous water-solute cluster) is as follows. First, we added an explicit water molecule to any ionic solute containing three or fewer atoms. Second, we added an explicit water molecule to any ionic solute with one or more oxygen atoms bearing a more negative partial atomic charge than bare water solute (as judged by comparison of CM4 gas-phase charges computed at the MPW25/6-31+G(d,p) level of theory). Finally, we added an explicit water molecule to ammonium and to all of the oxonium ions. We singled out solutes that satisfied one or more of these three criteria because we felt these solutes were likely to form strong solute-solvent hydrogen bonds with water and therefore would serve as a useful indicator as to whether including a single explicit water molecule in the calculation is an effective strategy for accounting for strong solute-solvent hydrogen bonding effects. The data in Table 7 are consistent with the fact that the solvation free energies of ions are more sensitive to the choice of atomic radii than the neutrals. This is not surprising, since the ∆GEP term is the major contributor to the total solvation free energy for ions and because the surface tensions are optimized for a given set of values of ∆GEP. The data in Table 7 also show that all three intermediate models give significantly lower errors for the selectively clustered-ion data set than for the unclustered-ion data set. This indicates that for the above intermediate models including a single explicit water molecule in the calculation is at least partly effective in accounting for strong specific solute-solvent hydrogen bonding interactions. 5.3. Aqueous Solvation Free Energies Calculated Using Optimized Atomic Radii and d Parameter. Of the three intermediate models presented above, the one based on SM5.42R radii performs the best for ionic solutes, giving a mean unsigned error of 5.64 kcal/mol for the ions in the unclustered-ion data set and 4.73 kcal/mol for the ions in the selectively clustered-ion data set. Next, we examined whether optimizing a new set of radii would lead to more accurate solvation free energies. For this, we again used MPW25/6-31+G(d,p). Throughout the parameter optimiza- tion process, for each set of intrinsic Coulomb radii, we optimized a different set of surface tension coefficients for each set of radii by minimizing the overall error between the calculated and experimental aqueous solvation free energies for all of the neutral solutes. This two-step procedure (where first a set of intrinsic Coulomb radii are chosen, and then surface tension coefficients are optimized for that set of atomic radii) was implemented into a genetic algorithm,132 and the average error between the calculated and experi- mental aqueous solvation free energies for all of the ionic solutes in the selectively clustered-ion data set (the reason we chose to use only the data from the selectively clustered- ion data set is discussed in section 6.2) was minimized. To ensure that a physical parametrization was achieved, we constrained the optimization in the following ways. First, any set of intrinsic Coulomb radii that yielded positive ∆GEP Table 6. Atomic Radii Used by Various Models atom Bondi SM5.42R SM5.43R SM6 H 1.20 0.91 0.79 1.02 C 1.70 1.78 1.81 1.57 N 1.55 1.92 1.66 1.61 O 1.52 1.60 1.63 1.52c F 1.47 1.50 1.58 1.47c P 1.80 2.27 2.01 1.80c S 1.80 1.98 2.22 2.12 Cl 1.75 2.13 2.28 2.02 Br 1.85 2.31 2.38 2.60 a Not optimized, held fixed at Bondi’s value. Table 7. Mean Unsigned Errors in Aqueous Solvation Free Energies (kcal/mol) Obtained Using Various Sets of Atomic Radiia ions atomic radiib neutrals unclusteredc selectively clusteredd Bondi 0.56 6.87 5.55 SM5.42R 0.52 5.64 4.73 SM5.43R 0.52 6.06 5.32 a For each set of radii, a different set of atomic surface tensions was optimized. All d values were fixed at 4 for the calculations in this table, and (in the whole article) we always use Bondi’s radii in eq 13. The calculations in this table were carried out using MPW25/6- 31+G(d,p). b Intrinsic Coulomb radii. c This data set contains all 112 ions listed in Tables 1 and 2. d This data set contains 81 of the ions listed in Tables 1 and 2 (those that do not appear in clustered form in Table 3) plus the 31 clustered ions listed in Table 3. 1144 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al. 8. The resulting errors are shown in the final two columns of Table 9. In almost all cases, the aqueous solvation free energies calculated at these levels of theory are very close to those calculated at the SM6/MPW25/6-31+G(d,p)// MPW25/MIDI! level of theory. This invariance is due to the ability of these density functionals to deliver accurate electronic wave functions, and the above results are encour- aging because they show that the above parameters can be applied to a wide variety of different density functionals, assuming that the given density functional is able to provide a reasonably accurate electronic wave function for the solute of interest. 6. Discussion 6.1. Optimizing Solvation Parameters Based on Gas- Phase Geometries. The first point worth some discussion is the fact that the method is parametrized using gas-phase geometries. We optimize the parameters based on gas- phase geometries here as well as in recent previous work7,10-12,14-17,74,75 for two reasons. First, for many solutes, less expensive (e.g., semiempirical or molecular mechanics methods) can yield accurate gas-phase geometries. Second, our experience is that optimizing solvation parameters based on gas-phase geometries does not cause a problem because for all or almost all of the molecules in the parametrization set (with the possible exception of those containing an explicit solvent molecule), the difference in solvation free energy between using a gas-phase geometry and using an aqueous geometry is smaller than the mean error of the model. Having obtained the parameters with such a training set, they can be used more broadly. Thus, when the geometry does change significantly in solution, the molecule should be optimized in the aqueous phase, and such optimization would be expected to give more accurate results in such a case. The ASA algorithm122 that we use for the solvation calculations has excellent analytic gradients that allow for efficient and stable geometry optimization in solution. 6.2. Optimizing Atomic Radii Against the Unclustered- Ion Data Set. The results presented above suggest that for some of the ions considered here as well as water adding an explicit water molecule to the calculation is one way to increase the accuracy of the model for these solutes. However, since the solute-water clusters in Table 3 were included in the training set used to obtain the parameters contained in the present model, an interesting question is whether better results might be obtained if only the unclus- tered ions were considered. One strategy that could be used to improve the performance of the model for the unclustered- ion data set would simply be to repeat the above optimization using the unclustered instead of the selectively clustered- ion data set. Indeed, it was shown in section 5.3 that the solvation free energies of the ions are quite sensitive to the choice of radii. Based on a careful examination of the solvation free energies for all of the ions in our data set though, we determined that some of the bare ions required a much different set of parameters than the majority of the other solutes in our data set. Therefore, fitting the unclus- tered-ion data set with a single set of atomic radii gives uneven results, and that strategy was abandoned. Another strategy that could be used to try to fit all of the data in the unclustered-ion data set would be to make the atomic radius of a given atom not only a function of its atomic number but also a function of its partial atomic charge. In fact, our earliest models (SM1-SM33-6) used charge-dependent intrinsic Coulomb radii. We experimented with several different functional forms for charge-dependent radii, including (as just one example) making intrinsic Coulomb radii a quadratic function of partial atomic charge, and we found that while making the intrinsic Coulomb radii a function of partial atomic charge did improve the perfor- mance of the model for many of the ions it did so at the cost of having a deleterious effect on some of the other ions. We eventually abandoned the idea of using charge-dependent radii because of this finding and for two additional reasons. First, a model that uses charge-dependent radii would be more likely to be highly basis-set dependent than a model that uses atomic-number-dependent radii.35,36 Second, charge- dependent atomic radii might lead to highly questionable results in cases where this dependence has not been carefully examined (e.g. our data set does not contain any zwitterions or large biomolecules) or in cases where the atomic radius might not be a smooth function of its partial atomic charge (e.g. transition states that involve the displacement of a charged or partially charged leaving group). Several additional strategies that were not considered here, but that have been used by others, include using atom-typed radii18-23 or using different sets of atomic radii for neutrals and ions.134 An example of this former type of approach is the united atom for Hartree-Fock (UAHF)23 method of Barone and co-workers, in which the atom’s radius depends on its hybridization state, connectivity, and formal charge (so, in a sense, this method falls under both of the above categories). Here, we did not consider using atom-typed radii because it is not completely clear whether they can be used for modeling chemical reactions or predicting activation free energies (which require modeling a transition state) in the liquid phase. We feel that adding one or more explicit water molecules to the calculation is the most reasonable approach to modeling some ionic solutes with a continuum solvent model. Ideally we would give a definite prescription for when the user should add a specific water molecule in using this model. However, it is not possible to do this in a way that covers the great diversity of possible cases that occur in applications, especially if one includes reaction paths and enzymes. One prescription would be to add an explicit water whenever one wants to improve the accuracy, since adding an explicit water should almost always improve the accuracy when the effect is large, but it is relatively safe because it cannot make the accuracy much worse when the effect is small. In considering this question, we should note that although we obtain better results when we add an explicit water in cases where the given solute satisfies one or more of the three criteria explained in section 5.2, we also obtain reasonably good results in most cases even without the explicit water (see Table 10). Although not tested as part of this work, the above strategy of adding one or more explicit solvent molecules should also improve the accuracy for predicting solvation free energies of some solutes in nonaqueous solvents, in particular those where specific solute-solvent hydrogen bonds are expected Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1147 to occur. Of course, testing this strategy for solvents other than water would require a careful comparison between experimental and calculated solvation free energies of solutes in nonaqueous solvents, which is beyond the scope of the present work. 6.3. Performance of Other Continuum Models. Using our database of aqueous solvation free energies, along with gas-phase geometries optimized at the MPW25/MIDI! level of theory, we tested the performance of the SM5.43R and PCM continuum models for predicting aqueous solvation free energies. For all PCM calculations, we used the UAHF method for assigning atomic radii,23 which is the recom- mended method for predicting aqueous solvation free ener- gies with PCM according to the Gaussian 03 manual.135 Because the parameters contained in the UAHF method were originally optimized for use with the HF/6-31G(d) level of theory,23 we used this level of theory to calculate aqueous solvation free energies for all of the PCM methods tested here. (Thus the PCM methods are tested in a way that should allow them to perform at their best.) However, we also wanted to see what effect changing the level of theory had on the accuracy of PCM, so we tested one of the PCM methods described below at the PCM/MPW25/6-31+G(d,p)// MPW25/MIDI! level of theory also. There are several different varieties of PCM, and most of these are implemented differently in Gaussian 98 136 and Gaussian 03.135 Here, we tested the dielectric version26-28 of PCM (DPCM) as implemented in both Gaussian 98 and Gaussian 03.29 These two models will be referred to as DPCM/98 and DPCM/03, respectively. We also tested CPCM/9825,137,138 and CPCM/0324,137,138 as well as the default PCM method in Gaussian 03, IEF-PCM/03.29-32 The IEF- PCM/03 model is particularly interesting because Chipman139 found that it includes charge penetration effects “extremely well for all solutes”. The results of these calculations are summarized in Table 10. The data in Table 10 show that SM6 outperforms all of the other models tested above for both neutral and ionic solutes. For PCM, the most accurate solvation free energies are obtained using the older DPCM/98 implementation. The data in Table 10 also show that changing the level of theory has a negative effect on the performance of IEF-PCM/03, which is not surprising, since the UAHF method for assigning atomic radii was optimized23 using DPCM/98 at the HF/6- 31G(d) level of theory. Of course, the performance of SM6 for anions is dependent on the basis set used (see the data in Table 9), although its performance improves as the basis set size in increased (for neutrals and cations, there is very little basis-set dependence in aqueous solvation free energies calculated with SM6). Comparing the overall errors for the unclustered-ion data set to those in the selectively clustered-ion data set shows that only the performance of SM6 improves significantly when a single explicit solvent molecule is added to the calculation (the overall error decreases from 4.19 kcal/mol for the ions in the unclustered-ion data set to 3.21 kcal/mol for the ions in the selectively clustered-ion data set). Again, this suggests that including a small number of explicit water molecules in SM6 calculations may be an effective strategy for predicting the aqueous solvation free energies of some ions in cases where strong solute-solvent hydrogen bonds are expected to play an important role in the aqueous phase. Furthermore, we feel that this is a much more reasonable strategy than trying to use drastically scaled values for the atomic radii or atom-typed or charge-dependent radii. The excellent performance of the SM6 model as compared to all the models in the popular Gaussian packages is especially remarkable when one remembers that the atomic radii in SM6 are functions of only atomic number (and the radii for O and F are not even optimized), whereas the recommended radii used in Gaussian depend on connectivity, hybridization state, and formal charge. 7. Concluding Remarks We have presented a new database of experimental aqueous solvation free energies that contains 273 neutral and 143 ionic solutes, including 31 ion-water clusters. Using these data, we developed a new continuum solvent model called SM6. This model can be used to calculate aqueous solvation free energies and, although not demonstrated here, liquid-phase geometries in aqueous solution. SM6 uses partial atomic charges obtained from a new charge model, Charge Model 4 (CM4), which has been shown to give accurate partial charges for both neutral and ionic solutes. In addition, we have shown that the partial atomic charges obtained from CM4 are much less dependent upon changes in the basis set than partial atomic charges obtained from a Löwdin or Redistributed Löwdin population analysis of the wave function. For some of the ions in our data set, we showed that the addition of a single explicit water molecule to the calculation (i.e., modeling the solute as a solute-water cluster) improved the performance of SM6 for predicting aqueous solvation free energies, indicating that large numbers of solvent molecules are not necessarily required for improving the treatment of some strong solute-solvent hydrogen bonds in Table 10. Mean Unsigned Errors (kcal/mol) in Aqueous Solvation Free Energies Calculated Using Different Continuum Solvent Models MPW25/6-31+G(d,p) HF/6-31G(d) SM6 SM5.43R DPCM/98 DPCM/03 CPCM/98 CPCM/03 IEF-PCM/03 MPW25/6-31+G(d,p) IEF-PCM/03 neutralsa 0.54 0.62 1.02 1.40 1.06 1.11 1.10 1.21 unclustered ionsb 4.19 6.12 4.40 14.95 5.02 6.32 6.37 7.78 clustered ionsb 3.21 6.16 5.86 14.32 6.33 7.58 7.63 8.98 all ionsc 4.38 6.92 5.83 15.60 6.40 7.47 7.53 8.88 all datad 1.86 2.79 2.69 6.28 2.91 3.30 3.31 3.85 a 273 molecules. b 112 ions. c 143 ions. d 416 data. 1148 J. Chem. Theory Comput., Vol. 1, No. 6, 2005 Kelly et al. the first solvation shell. This is encouraging, because treating large numbers of solvent molecules explicitly often presents many problems. Furthermore, we feel that this strategy is more reasonable than using unphysical values for the atomic radii or using atom-typed or charge-dependent radii. We also used our new database of aqueous solvation free energies to test the performance of several other continuum solvent models, namely SM5.43R and several different implementations of PCM. For both neutral and ionic solutes, SM6 outperforms all of the methods against which it was tested for predicting aqueous solvation free energies. Fur- thermore, we found that SM6 is the only model of those tested here (except for one model with a mean error 3.4 times larger) that improves upon the addition of a single explicit water molecule to the calculation. Thus, unlike the other models tested here, adding a single explicit water molecule to SM6 calculations in cases where strong solute-solvent hydrogen bonds are expected to occur in the aqueous phase appears to be both practical and effective for improving the accuracy of the present model for these types of solutes. Finally, it was shown that SM6 retains its accuracy when used in conjunction with the B3LYP and B3PW91 func- tionals. Based on this analysis, we proposed that the charge and solvation parameters obtained with a given basis set (charge and solvation parameters have been optimized for the MIDI!6D, 6-31G(d), 6-31+G(d), and 6-31+G(d,p) basis sets) may be used with any good density functional or fraction of Hartree-Fock exchange. Availability of SM6. All of the SM6 parametrizations presented in this article are available in the SMXGAUSS140 program. This program can read a Gaussian output file corresponding to a gas-phase calculation of a given solute and carry out a single-point calculation with SM6. In addition, the above program allows liquid-phase geometry optimizations and Hessian calculations to be carried out with SM6. Although SMXGAUSS requires only a Gaussian output file to perform SM6 calculations, users that have a Gaussian 03 executable can use SMXGAUSS in conjunction with the powerful geometry optimizers available in Gaussian. For non-Gaussian users, the CM4 and SM6 parametrizations are also available in the GAMESSPLUS141 and HONDOPLUS142,143 software programs. All three of these programs are available free of charge and can be downloaded from our Web site, http://comp.chem.umn.edu/software. Acknowledgment. We thank Adam Moser for his help during preparation of the ionic portion of our data set. This work was supported by the NIH training grant for Neuro- physical-computational Sciences, by the U.S. Army Research Office under Multidisciplinary Research Program of the University Research Initiative (MURI) through grant number DAAD19-02-1-0176, by the Minnesota Partnership for Biotechnology and Medical Genomics, and by the National Science Foundation (CHR-230446). Supporting Information Available: Further details regarding CM4, all 273 experimental aqueous solvation free energies for the neutral solutes, and a breakdown of errors by data set for all of the SM6 methods tested in this paper. This material is available free of charge via the Internet at http://pubs.acs.org. References (1) Cramer, C. J.; Truhlar, D. G. Chem. ReV. 1999, 99, 2161- 2200. (2) Tomasi, J.; Persico, M. Chem. ReV. 1994, 94, 2027-2094. (3) Cramer, C. J.; Truhlar, D. G. J. Am. Chem. Soc. 1991, 113, 8305-8311. (4) Cramer, C. J.; Truhlar, D. G. J. Comput.-Aided Mol. Des. 1995, 6, 629-666. (5) Cramer, C. J.; Truhlar, D. G. Science 1992, 256, 213-217. (6) Cramer, C. J.; Truhlar, D. G. J. Comput. Chem. 1992, 13, 1089-1097. (7) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. B 1998, 102, 3257-3271. (8) Chambers, C. C.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. 1996, 100, 16385-16398. (9) Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. 1996, 100, 19824-19839. (10) Li, J.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. Chem. Phys. Lett. 1998, 288, 293-298. (11) Li, J.; Zhu, T.; Hawkins, G. D.; Winget, P.; Liotard, D. A.; Cramer, C. J.; Truhlar, D. G. Theor. Chem. Acc. 1999, 103, 9-63. (12) Zhu, T.; Li, J.; Hawkins, G. D.; Cramer, C. J.; Truhlar, D. G. J. Chem. Phys. 1998, 109, 9117-9133. (13) Zhu, T.; Li, J.; Liotard, D. A.; Cramer, C. J.; Truhlar, D. G. J. Chem. Phys. 1999, 110, 5503-5513. (14) Li, J.; Zhu, T.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. A 2000, 104, 2178-2182. (15) Thompson, J. D.; Cramer, C. J.; Truhlar, D. G. J. Phys. Chem. A 2004, 108, 6532-6542. (16) Thompson, D. J.; Cramer, C. J.; Truhlar, D. G. Theor. Chem. Acc. 2005, 113, 107-131. (17) Dolney, D. M.; Hawkins, G. D.; Winget, P.; Liotard, D. A.; Cramer, C. J.; Truhlar, D. G. J. Comput. Chem. 2000, 21, 340-366. (18) Banavali, N. K.; Roux, B. J. Phys. Chem. B 2002, 106, 11026-11035. (19) Curutchet, C.; Bidon-Chanal, A.; Soteras, I.; Orozco, M.; Luque, F. J. J. Phys. Chem. B 2005, 109, 3565-3574. (20) Marten, B.; Kim, K.; Cortis, C.; Friesner, R.; Murphy, R. B.; Ringnalda, M. N.; Sitkoff, D.; Honig, B. J. Phys. Chem. 1996, 100, 11775-11778. (21) Nina, M.; Beglov, D.; Roux, B. J. Phys. Chem. B 1997, 101, 5239-5248. (22) Swanson, J. M. J.; Adcock, S. A.; McCammon, J. A. J. Chem. Theory. Comput. 2005, 1, 484-493. (23) Barone, V.; Cossi, M.; Tomasi, J. J. Chem. Phys. 1997, 107, 3210-3221. (24) Cossi, M.; Rega, N.; Scalamani, G.; Barone, V. J. Comput. Chem. 2003, 24, 669-681. (25) Barone, V.; Cossi, M. J. Phys. Chem. A 1998, 102, 1995- 2001. Continuum Solvation Model J. Chem. Theory Comput., Vol. 1, No. 6, 2005 1149