Density Functional Theory and Isodesmic Reaction Based Prediction of Four Stepwise

Density Functional Theory and Isodesmic Reaction Based Prediction of Four Stepwise

(Parte 1 de 4)

Density Functional Theory and Isodesmic Reaction Based Prediction of Four Stepwise

Protonation Constants, as log KH (n), for Nitrilotriacetic Acid. The Importance of a Kind and

Protonated Form of a Reference Molecule Used

Krishna K. Govender and Ignacy Cukrowski*

Department of Chemistry, Faculty of Natural Sciences, UniVersity of Pretoria, Lynnwood Road, Hillcrest, Pretoria 0002, South Africa

ReceiVed: July 29, 2009; ReVised Manuscript ReceiVed: NoVember 25, 2009

An explicit application of isodesmic reaction (a proton exchange between the studied and structurally similar reference molecule), where the free energy change of the protonation reaction in water was obtained using the free energies in solution from a single continuum model, was used to predict stepwise protonationconstants of nitrilotriaceticacid.Calculationswere performedat the RB3LYP/6-311+G(d,p)levelof theoryin conjunction with the PCM-UA0 solvation model. Five reference molecules were investigated. It has been established that one must pay special attention to structural similarities between the studied and reference molecules and selection of a protonated form of the reference molecule. The protonation reactions in which the studied and reference molecule are involved in must be (if possible) of the same order; e.g., the first (or generally nth) protonation reaction of the reference molecule must be used to compute the first (or nth) protonation constant of the studied molecule. The lowest energy conformer must always be used. The first, second, third, and fourth computed protonation constants differed, on average, from experimental values by 3.3, 0.8, 0.2, and 0.2 log units, respectively. It appears that the charge on the reference molecule has more decisive influence on the accuracy of computed protonation constants than its structural differences when compared with the studied molecule. Results reported can be used as a guide in constructing isodesmic reactions useful for the theoretical prediction of protonation constants by use of methodology described in this work.

1. Introduction

Knowledgeof protonation,KH, and dissociation,Ka, constants is of special interest to many chemists and life scientists1 as they constitute important thermodynamic property of a compound that might be of biological, medicinal, or industrial (just to mentionfew) importance.Althougha numberof experimental techniques has been developed to measure protonation/dissociation constants under various experimental conditions, many of the chemical species are not easily amenable to a full experimental characterization.2 A number of papers has been reported1-54 on theoretical prediction of dissociation constants. Most of them employedthermodynamiccycles(TC) to compute the free energies of dissociation reaction. Often, high-level theories were used in the gas-phase calculations where they are known to be accurate.The solution-phasecalculationswere used to provide the solvation energies (Gsol); usually low-level continuum models were employed for the purpose. When the above protocol is used, the absolute pKa value is obtained. To avoid uncertainties related to the solvation energies of either

H+ or H3O+ ions, isodesmic reaction (IRn) was incorporated within TC;1,3-7 this protocol of calculation results in relative3 pKa values. Results reported to date predominantly describe the calculations of singly charged molecules, either anions1-29 (a study of doubly charged anions is very rare) or cations.30-35 This is most likely due to the fact that (i) it is very difficult for DFT methods to properly describe anions (with multiple negative charges) in the gas phase because in the absence of an external stabilization of the charge (e.g., solvent) DFT methods have a bias toward “over-delocalization” of the charge (one might observe bonds that are longer than expected and significant reduction on the HOMO-LUMO gap)5 and (i) computational evaluations of ionic solvation free energies for highly charged anions are inaccurate (these energies are highly dependent on the solvation model used due to different models chosen to generate the “best” electrostatic cavity).56 Accuracies achieved thus far for computed dissociation constants (for a singly dissociable organic acids) are often within (1.0 log unit, on average, when compared with experimentally available values,but differencesof severallog unitsare not uncommon.3-5,56

Recently we reported the DFT-predicted four stepwise protonation constants, expressed as log KH (n), for a highly charged molecule nitrilotripropanoicacid (NTPA).54 An explicit application of an isodesmic reaction involving two structurally similar ligands, where the free energy change of the protonation reaction in solution was obtained using the free energies in solution from a single continuum model, resulted in the average difference between predicted and experimental stepwise protonation constants being (0.5 log unit. This suggested that, in principle, even though serious concerns were expressed,5,56 accurate determination of stepwise protonation constants for highly negatively charged molecules is possible.

In this paper our focus is on parameters that influence the accuracy in predicting four consecutive protonation constants when the IRn-based procedure, as reported by us recently,54 is employed. It is important to investigate a wide range of polycharged compounds (with negative and positive charges) to establish (i) whether implementation of the protocol can indeedproduceconsistentlygood predictions,(i) how significant the selection of a reference molecule is from the point of view of its structural similarity to the studied compound, (i) to what* Corresponding author. E-mail:

J. Phys. Chem. A X, x, 0 A

10.1021/jp9092964 X American Chemical Society

degree the selection of a conformer impacts the accuracy in computed protonation constants, (iv) if the selection of a different protonated form of the reference molecule has an influence on the accuracy of computed protonation constants, (v) which one, structuralsimilarityor the charge on a protonated form of a reference molecule, plays a more important role when the IRn-based methodology described in this work is employed, (vi) if it is possible to predict protonation constants in correct order, as determined from an experiment, (vii) to what extent the selection of level of theory and a basis set has an influence on the accuracy in computed protonation constants, and (viii) how significant is the kind of solvation model employed. These are just a few of the important questions, and this work will address only some of them.

We have chosenthe ligandnitrilotriaceticacid (NTA) because it is an importantderivativeof glycinethat is widely studieddue to itsexcellentchelatingabilities.57 Thisis a ligandthathasenjoyed numerous applications in medicine,58,59 biochemistry,58,59 and industry.60-62 In medicinalandbiologicalstudiesit wasshownthat aliphatic amine salts of NTA inhibit the growth of bacteria and fungiand have herbicidalactivity.63 NTA has also been used as a transientphytoextractionagentthatcombineshighbiodegradability and low phytotoxicitywith chelatingstrength.64

The application of IRn requires a reference molecule and its selection appears to be crucial. Molecules shown in Figure 1 (iminodiacetic acid (IDA), methyliminodiacetic acid (MIDA), ethyliminodiaceticacid(EIDA),propyliminodiaceticacid(PIDA), and hydroxyethyliminodiacetic acid (HIDA)) were used here as they have many structural similarities with the compound of interest (NTA) and their experimental stepwise protonation constants are well-known.57

2. Computational Details

All calculations were performed using GAUSSIAN 03, revision D.01,65 on a 64-bit Linux workstation in parallel environment (Opensuse 10.3). Molecular visualizations were accomplished with the aid of GaussView 4.6 Since it is of paramount importance to include diffuse functions for anions,19 both gas-phase and solvent (water, ε ) 78.39) optimizations were performed at the RB3LYP level of theory67 in conjunction with a 6-311+G(d,p) basis set. Full solvent-optimization was performed with the default solvation model provided by Gaussian,i.e., Tomasi’spolarizedcontinuummodel (PCM),68-70 and UA0 radii (unitedatom topologicalmodel).We have chosen this model because it generated acceptable results in the

Figure 1. Top view of the ligands (in fully protonated forms) discussed in this work.

B J. Phys. Chem. A, Vol. x, No. x, X Govender and Cukrowski

prediction of log KH (n) values for NTPA.54 Single point calcula- tions (SPCs) were carried out in solvent at the RB3LYP and HF levels of theory in conjunction with the 6-311+G(d,p) basis set on the gas-phase structures as well as structures fully optimizedinsolvent,usingthePCM-UAHFandCPCM-UAHF71,72 (polarizable conductor model in combination with the united atom for Hartree-Fock radii) solvation models. With CPCM, the solutecavitiesare modeledon the optimizedmolecularshape and include both electrostatic and nonelectrostatic contributions to energies.10 The HF level of theory also was used for single point calculations since the UAHF radii were optimized for HF.19

Full geometry optimization of all protonated forms of the reference molecules, seen in Figure 1, was carried out in solvent using the same procedure as for NTA; there was no need to perform single point calculationson these molecules.Frequency calculations were also performed, along with the geometry optimization, to ensure that each of the optimized structures did not lie at a saddle point (no imaginary frequencies were present in all structures reported here).

3. Results and Discussion

Structural Considerations. In the Cambridge Structural Database (CSD)73 there are only two crystallographic structures of NTA74,75 and they both are of the H3L form of the ligand; all crystallographic structures are marked with asterisks. Even though H3L* has no overall charge, it has two charged centers with opposite polarities, the positive one on the protonated

N-atom and negative one on the deprotonated -COO- group; see Figure 2. The remainingavailablereportedcrystalstructures, that of the H2L* form of IDA and MIDA, as well as the H3L* form of HIDA, are shown in Figure S1, SupportingInformation.

Unfortunately, no crystallographic data were found for EIDA and PIDA.From visualanalysisof availablestructuresone might conclude that there are two distinctive types of arrangements of arms containing the -COOH and -COO- groups, namely (i) all arms are bent toward the central and protonated N-atom, resulting in the formation of H-bonds between carboxylic oxygen and H-atom on nitrogen (the H2L* form of MIDA and

H3L* form of HIDA), and (i) all arms are placed as described above except one that is bent outward from the central and protonated N-atom (the H3L* form of NTA and H2L* form of IDA) (because of that this arm is not involved in the formation of the intramolecular H-bond). The analysis of packing in the available crystal structures provided appropriate explanation for that, somewhat unexpected, placement of one of the arms; see Figure 3 as an example. It is clear that one of the NTA arms is involved in very strong intermolecular H-bond characterized by the shortest bond distance (1.603 Å) among all inter- and intramolecular H-bonds observed. In the case of HIDA (see Figure S2, Supporting Information) all the intramolecular H-bonds are preserved even though O-atoms are involved in extremely strong interactions with other molecules; this is most likely due to specific, in this case, crystal packing. We came to the conclusion that in solvent the attractive interactions between the H-atom on the central N-atom and carboxylic O-atoms must prevailand have constructedan additionalH3L form of the NTA molecule with all three arms bent toward the central N-atom.

To maintain structural (conformational) consistency within a full set of necessary protonated forms of the ligand NTA, the deprotonated ligand), and H4L+ (fully protonated form of the ligand) were constructedas follows.We started with the energy- optimized solvent structure of H3L. The H2L- form was generated by removing a dissociable proton from a -COOH group and H4L+ was generated by adding a proton to the remaining -COO- group. A similar procedure was followed to generate HL2- and L3-, where the proton was removed from energy-optimized solvent structure to generate the product of stepwise dissociation reaction. The same procedure was applied to generate all the protonated forms of the reference molecules seen in Figure 1.

The computedstructuralmatrixof the solvent-optimizedH3L* and H3L of NTA, together with the data available from the CSD,73 is given in Table S1, Supporting Information; the relevant data obtained for IDA, MIDA, and HIDA are provided in Tables S2-S4 (Supporting Information), where the numbering of atoms is the same as seen in Figure S1, Supporting Information. All solvent-optimized protonated forms of NTA, including the crystallographicstructureH3L*, are shown in Figure 4 (those of the reference molecules are provided in Figures S3-S7,

Supporting Information). All protonated forms of the ligand NTA have considerablystrong intramolecularH-bonds between oxygen on the -COO- or -COOH groups and a proton on the N-atom and they vary in length between about 2.0 and 2.35 Å. Interestingly, the shortest H-bonds, of 1.985 and 1.960 Å, were found in H3L and H3L*, respectively. As discussed above, the self-constructed and energy-minimized H3L molecule differs significantly from H3L* and it was of importance to find out which one will generate better estimates in the computed protonation constants.

Figure 2. Fully labeled reported crystal structure74 of the H3L* form of NTA.

Figure 3. Crystallographic structure74 of NTA (molecules within a unit cell) with selectedintra-and intermolecularnonbondinginteractions marked by dashed lines and distances in Å.

Protonation Constants for Nitrilotriacetic Acid J. Phys. Chem. A, Vol. x, No. x, X C

Preliminary Investigations. Initially, we have used TCs to compute absolute protonation constants. A detailed description of TCs used and results obtained are provided in Supporting Information (see Tables S5-S7). Unfortunately, the methodology based on TC principles could not be fully applied in our studies due to significant structural differences of molecules in the gas phase and solvent.Migrationof a proton from the central N-atom to the -COO- group took place when optimizationwas performed in the gas phase; a similar observation was also reported elsewhere.13,54 We have also tested whether a higher level of theory could result in preserving the solvent structure. The H3L form of NTA was subjected to the full gas-phase optimization at the RMP2/

6-311+G(d,p) level of theory. Unfortunately, the proton migrated again from the N-atom to the carboxylic group. Clearly, even with a higher level of theory that generates accurate gasphase free energies, we were unable to apply commonly used TCs in the study of molecules considered in this work.

Isodesmic Reaction. To date, isodesmic reaction principles (where two structurally similar compounds were used, investigated and reference molecule) have been extensively utilized in the prediction of enthalpies of formation.76-84 In some cases a reference molecule has been incorporated within TCs1,3-7 to eliminate uncertainties related to either H+ or H3O+ ions. Dissociation constants of a number of compounds with a single dissociable proton have also been computed directly in solvent

The implementation of IRn has an advantage, when the total free energies in solution from a single continuum model calculation are used, because it should minimize (or systematically eliminate) errors related to the solvation models used, provided that the same level of theory, basis set, and solvation model are used for each component involved in the reaction of interest.56 The main challenge associated with the use of IRn, however, is the selection of appropriate reference molecule.56,80 The ligand NTA can be seen as a set of five molecules that differ in (i) a number of protons (from 0 to 4) and (i) charges on the molecule (from +1t o -3). Clearly, a careful selection must be made to find the most appropriate protonated form of the reference molecule that has to be included in each of IRn needed to compute four protonation constants of NTA. By consideringthe structuralpropertiesof NTA (called further L(1)), we opted for IDA, MIDA, EIDA, PIDA, and HIDA as reference compounds (called further L(2)) because each of them has two acetate groups (there are three in NTA) and the same kind of electrondonor atoms (-COO- and R3N:) that can be protonated in a solution. In addition, protonation constants for all of the chosen reference molecules are well-known as these ligands are widely studied.57

Figure 4. Self-constructed protonated forms of NTA and a crystal structure H3L* fully optimized at the RB3LYP/6-311+G(d,p) level of theory in solvent (PCM/UA0).

D J. Phys. Chem. A, Vol. x, No. x, X Govender and Cukrowski

IRn employed here can be seen as a competition reaction between two ligands for a proton (proton transfer reaction), and for the first protonation constant of NTA it can be written as

To investigate the impact the kind and protonated form of the referencemoleculehas on the theoreticallygeneratedprotonation constants of NTA, a large number of isodesmic reactions was tested. Here, each reference molecule has three protonation constants (NTA has four), hence for each pair of ligands (L(1) and L(2)) 12 isodesmic reactions (such as eq 1, but involving different protonated forms of the ligands) had to be considered.

For simplicity,only the first protonationreaction(PRn) in which each of the two ligands (NTA and a reference molecule) is involved is shown as eqs 2 and 3

A protonation reaction is the reverse of a weak acid dissociation reaction (DRn) and in the case of stepwise reactions the following holds

where k ) 1 + m - n, m and n representthe highestdissociation constant (here m ) 4) and an nth consecutive dissociation constant (1 e n e m), respectively, and k applies to a kth consecutive protonation constant, 1 e k e m. Note that the ligand NTA has three acidic groups and only three dissociation constants would be reported when, for example, the TC-based methodology was employed. However, due to the protonation/ deprotonation of the N-atom in NTA, it is of paramount importance to consider also the first protonation constant, log

KH(1). From this it follows that the fourth dissociation constant of NTA is linked through eq 4 with the first protonationconstant of this ligand. In the case of the reference molecule, the third dissociation constant is linked through eq 4 with the first protonation constant of this reference molecule. (The above is provided here for convenience and to ensure clarity in nomenclature used in this work.) The relationshipsbetween the change in Gibbs energies for protonation and dissociation reactions applicable to eqs 2 and 3 can be written as

where ∆GPRn(aq)L(1)(k) and ∆GDRn(aq)L(1)(n) refer to the kth (here k ) 1) stepwise protonation reaction and relevant nth (here n )

4 for NTA) stepwisedissociationreaction,respectively(the same applies to the reference molecule L(2), but n ) 3). The change in Gibbs energies for each PRn, eqs 2 and 3, can be written, respectively, as

The isodesmic reaction of interest (eq 1) can be obtained by subtracting eq 3 from eq 2, and hence by subtracting eq 8 from eq 7 one obtains expressions for the change in Gibbs energy applicable to this isodesmic reaction where the uncertainty related to Gaq(H+) is no longer applicable as this term cancels out (this eliminates any error that might have been introduced by the use of an experimental value for this quantity).

Equation 10 was used to calculate ∆GIRn(aq) of IRn (eq 1) from appropriate Gibbs energies obtained for relevant and fully solvent-optimized structures of the ligand NTA and reference molecule L(2). Table S8 in Supporting Information provides the

(Parte 1 de 4)