**UFRJ**

# Quantum espresso

(Parte **5** de 7)

QUANTUM ESPRESSO 2 available to many groups [81]. In particular, the QUANTUM ESPRESSO developers’ team is now working to better exploit new hardware trends, particularly in the field of multicore architectures. The current version implements a partial but fully functional OpenMP parallelization [171], that is especially suitable for modern multicore CPU’s. Mixing OpenMP with MPI also allows to extend scalability towards a higher number of processors, by adding a parallelization level on top of what can already be achieved using MPI. Preliminary tests on realistic physical systems demonstrate scalability up to 65536 cores, so far.

Looking ahead, future developments will likely focus on hybrid systems with hardware accelerators (GPUs and cell co-processors).

6. Perspectives and Outlook

Further developments and extensions of QUANTUM ESPRESSO will be driven by the needs of the community using it and working on it. Many of the soon-to-come additions will deal with excited-state calculations within time-dependent DFT (TDDFT [172, 173]) and/or manybody perturbation theory [174]. A new approach to the calculation of optical spectra within TDDFT has been recently developed [175], based on a finite-frequency generalization of density-functional perturbation theory [54, 5], and implemented in QUANTUM ESPRESSO. Another important development presently under way is an efficient implementation of GW calculations for large systems (whose size is of the order of a few hundreds inequivalent atoms) [176]. The implementation of efficient algorithms for calculating correlation energies at the RPA level is also presently under way [177, 178, 179]. It is foreseen that by the time this paper will appear, many of these developments will be publicly released.

It is hoped that many new functionalities will be made available to QUANTUM

ESPRESSO users by external groups who will make their own software compatible/interfaceable with QUANTUM ESPRESSO. At the time of the writing of the present paper, third-party scientific software compatible with QUANTUM ESPRESSO and available to its users’ community include: yambo, a general-purpose code for excited-state calculations within many-body perturbation theory [180]; casino, a code for electronic-structure quantum Monte Carlo simulations [161]; want, a code for the simulation of ballistic transport in nanostructures, based on Wannier functions [181]; xcrysden, a molecular graphics application, especially suited for periodic structures [159]. The qe-forge portal is expected to burst the production and availability of third-party software compatible with QUANTUM ESPRESSO. Among the projects already available, or soon-to-be available, on qe-forge, we mention: SaX [163], an open-source project implementing state-of-the-art many-body perturbation theory methods for excited states; dmft [162], a code to perform Dynamical Mean-Field Theory calculations on top of a tight-binding representation of the DFT band structure; qha, a set of codes for calculating thermal properties of materials within the quasiharmonic approximation [182]; pwtk, a fully functional Tcl scripting interface to PWscf [183].

Efforts towards better interoperability with third-party software will be geared towards releasing accurate specifications for data structures and data file formats and providing

QUANTUM ESPRESSO 23 interfaces to and from other codes and packages used by the scientific community. Further work will be also devoted to the extension to the US-PPs and PAW schemes of the parts of QUANTUM ESPRESSO that are now limited to NC-PPs.

The increasing availability of massively parallel machines will likely lead to an increased interest towards large-scale calculations. The ongoing effort in this field will continue. A special attention will be paid to the requirements imposed by the architecture of the new machines, in particular multicore CPUs, for which a mixed OpenMP-MPI approach seems to be the only viable solution yielding maximum performances. Grid computing and the commoditization of computer cluster will also lead to great improvements in high-throughput calculations for materials design and discovery.

The new trend towards distributed computing is exemplified by the recent development of the Vlab cyber-infrastructure (CI) [184, 185], a service-oriented architecture (SOA) that uses QUANTUM ESPRESSO as the back-end computational package plus a web portal[186]. This SOA consists of scientific workflows for calculations of high-pressure (P) and temperature (T) properties of materials[187], programmed as a collection of web services running in distributed environments, plus analysis tools to monitor workflow execution and visualization tools. High PT properties of a inexhaustible series of minerals is essential for the interpretation of seismic data and as input for geodynamic simulations. The VLab-CI was developed to: 1) handle the job deluge created by the large number of points (102-104) in the parameter (pressures, strains, phonon q-points, composition) space sampled by these calculations, each point consisting of a first-principle task (PWscf or PHonon execution); 2) handle the information flow between multi-leveled groups of tasks, with outputs from one level used to generate inputs for the next level; 3) harness the scalable aggregated throughput power of scattered computational resources.

Acknowledgments

The QUANTUM ESPRESSO project is an initiative of the CNR-INFM DEMOCRITOS National Simulation Center in Trieste (Italy) and its partners, in collaboration with MIT, Princeton University, the University of Minnesota, the Ecole Polytechnique Fédérale de Lausanne, the Université Pierre et Marie Curie in Paris, the Jožef Stefan Institute in Ljubljana, and the S3 research center in Modena. Many of the ideas embodied in the QUANTUM ESPRESSO codes have flourished in the very stimulating environment of the International School of Advanced Studies, SISSA, where Democritos is hosted, and have benefited from the ingenuity of generations of graduate students and young postdocs. SISSA and the CINECA National Supercomputing Center are currently providing valuable support to the QUANTUM ESPRESSO project.

Appendix

This appendix contains the description of some algorithms used in QUANTUM ESPRESSO that have not been documented elsewhere.

QUANTUM ESPRESSO 24

Appendix A.1. Self-consistency

The problem of finding a self-consistent solution to the KS equations can be recast into the solution of a nonlinear problem where vector x contains the N Fourier components or real-space values of the charge density ρ or the KS potential V (the sum of Hartree and exchange-correlation potentials); F[x(in)] is a functional of the input charge density or potential x(in), yielding the output vector x(out) via the solution of KS equations. A solution can be found via an iterative procedure. PWscf uses an algorithm based on the modified Broyden method [8] in which x contains the components of the charge density in reciprocal space. Mixing algorithms typically find the optimal linear combination of a few x(in) from previous iterations, that minimizes some suitably defined norm ||x(out)−x(in)||, vanishing at convergence, that we will call in the following “scf norm”.

Ideally, the scf norm is a measure of the self-consistency error on the total energy. Let us write an estimate of the latter for the simplest case: an insulator with NC PPs and simple

LDA or GGA. At a given iteration we have(

where i and ψi are KS energies and orbitals respectively, i labels the occupied states, Vext is the sum of the PPs of atomic cores (written for simplicity as a local potential), the input

Hartree and exchange-correlation potential V (in)(r) = VHxc[ρ(in)(r)] is a functional of the input charge density ρ(in). The output charge density is given by

Let us compare the DFT energy calculated in the standard way:

where EHxc is the Hartree and exchange-correlation energy, with the Harris-Weinert-Foulkes functional form, which doesn’t use ρ(out):

Both forms are variational, i.e. the first-order variation of the energy with respect to the charge density vanish, and both converge to the same result when self-consistency is achieved. Their difference can be approximated by the following expression, in which only the dominant Hartree term is considered:

QUANTUM ESPRESSO 25 where ∆ρ = ρ(out) − ρ(in) and ∆VH is the Hartree potential generated by ∆ρ. Moreover it can be shown that, when exchange and correlation contributions to the electronic screening do not dominate over the electrostatic ones, this quantity is an upper bound to the self-consistent error incurred when using the standard form for the DFT energy. We therefore take this term, which can be trivially calculated in reciprocal space, as our squared scf norm:

where G are the vectors in reciprocal space and Ω is the volume of the unit cell.

Once the optimal linear combination of ρ(in) from previous iterations (typically 4 to 8) is determined, one adds a step in the new search direction that is, in the simplest case, a fraction of the optimal ∆ρ or, taking advantage of some approximate electronic screening[188], a preconditioned ∆ρ. In particular, the simple, Thomas-Fermi, and local Thomas-Fermi mixing described in Ref. [188] are implemented and used.

The above algorithm has been extended to more sophisticated calculations, in which the x vector introduced above may contain additional quantities: for DFT+U, occupancies of atomic correlated states; for meta-GGA, kinetic energy density; for PAW, the quantities∑i〈ψi|βn〉〈βm|ψi〉, where the β functions are the atomic-based projectors appearing in the PAW formalism. The scf norm is modified accordingly in such a way to include the additional variables in the estimated self-consistency error.

Appendix A.2. Iterative diagonalization

During self-consistency one has to solve the generalized eigenvalue problem for all N occupied states in which both H (the Hamiltonian) and S (the overlap matrix) are available as operators (i.e. Hψ and Sψ products can be calculated for a generic state ψ ). Eigenvectors are normalized according to the generalized orthonormality constraints 〈ψi|S|ψj〉 = δij. This problem is solved using iterative methods. Currently PWscf implements a block Davidson algorithm and an alternative algorithm based on band-by-band minimization using conjugate gradient.

Appendix A.2.1. Davidson One starts from an initial set of orthonormalized trial orbitals the previous scf iteration, if available, and if not, from the previous time step, or optimization step, or from a superposition of atomic orbitals. We introduce the residual vectors a measure of the error on the trial solution, and the correction vectors δψ(0) i = Dg(0)i , where

D is a suitable approximation to (H − (0) i S)−1. The eigenvalue problem is then solved in the

QUANTUM ESPRESSO 26

2N-dimensional subspace spanned by the reduced basis set φ(0), formed by φ(0) i = ψ(0)i and

where

Conventional algorithms for matrix diagonalization are used in this step. A new set of trial eigenvectors and eigenvalues is obtained:

and the procedure is iterated until a satisfactory convergence is achieved. Alternatively, one may enlarge the reduced basis set with the new correction vectors δψ(1) i = Dg(1)i , solve a

3N-dimensional problem, and so on, until a prefixed size of the reduced basis set is reached.

The latter approach is typically slightly faster at the expenses of a larger memory usage.

The operator D must be easy to estimate. A natural choice in the PW basis set is a diagonal matrix, obtained keeping only the diagonal term of the Hamiltonian:

where k is the Bloch vector of the electronic states under consideration, |k + G′〉 denotes PWs, an estimate of the highest occupied eigenvalue. Since the Hamiltonian is a diagonally dominant operator and the kinetic energy of PWs is the dominant part at high G, this simple form is very effective.

Appendix A.2.2. Conjugate-Gradient The eigenvalue problem of Eq.(A.8) can be recast into

where the λj are Lagrange multipliers. This can be solved using a preconditioned conjugate gradient algorithm with minor modifications to ensure constraint enforcement. The algorithm here described was inspired by the conjugate-gradient algorithm of Ref. [189], and is similar to one of the variants described in Ref. [190].

Let us assume that eigenvectors ψj up to j = i − 1 have already been calculated. We start from an initial guess ψ(0) for the i-th eigenvector, such that 〈ψ(0)|S|ψ(0)〉 = 1 and

QUANTUM ESPRESSO 27

By imposing that the gradient is orthonormal to the starting vector: 〈g(0)| S|y(0)〉 = 0, one determines the value of the Lagrange multiplier:

The remaining orthonormality constraints are imposed on Pg(0) by explicit orthonormaliza- tion (e.g. Gram-Schmid) to the ψj. We introduce the conjugate gradient h(0), which for the first step is set equal to g(0) (after orthonormalization), and the normalized direction

This form ensures that the constraint on the norm is correctly enforced. The calculation of the minimum can be analytically performed and yields

2 atan

h(n) is subsequently re-orthogonalized to y(n). We remarks that in the practical implementation only Pg and Ph need to be calculated and that only P2 – the analogous of the D matrix of Davidson algorithm – is actually used. A kinetic-only form of P2 has proved satisfactory:

Appendix A.3. Wavefunction extrapolation

In molecular dynamics runs and in structural relaxations, extrapolations are employed to generate good initial guesses for the wavefunctions at time t + dt from wavefunctions at previous time steps. The extrapolation algorithms used are similar to those described in Ref. [189]. The alignment procedure, needed when wavefunctions are the results of a self-consistent calculation, is as follows. The overlap matrix Oij between wavefunctions at consecutive time steps:

can be used to generate the unitary transformation U [191] that aligns ψ(t + dt) to ψ(t):

i(t+dt) = ∑ j Uijψj(t+dt). Since O is not unitary, it needs to be made unitary via e.g. the unitarization procedure

QUANTUM ESPRESSO 28

The operation above is performed using a singular value decomposition: let the overlap matrix be O = vDw, where v and w are unitary matrix and D is a diagonal non-negative definite matrix, whose eigenvalues are close to 1 if the two sets of wavefunctions are very similar. The needed unitary transformation is then simply given by U ' w†v†. This procedure is simpler than the original proposal and prevents the alignment algorithm to break in the occasional situation where, due to level crossing in the band structure between subsequent time steps, one or more of the eigenvalues of the D matrix vanish.

Appendix A.4. Symmetry

Symmetry is exploited almost everywhere, with the notable exception of CP. The latter is devised to study aperiodic systems or large supercells where symmetry is either absent or of little use even if present.

In addition to lattice translations, the space group of a crystal contains symmetry operations S combining rotations and translations that leave the crystal unchanged: S ≡ {R|f}, where R is a 3 × 3 orthogonal matrix, f is a vector (called fractional translation) and symmetry requires that any atomic position, τs is transformed into an equivalent one,

(Parte **5** de 7)