Interior Point Algoritms for Linear Programming Decoding

Interior Point Algoritms for Linear Programming Decoding

(Parte 1 de 2)

arXiv:0802.1369v1 [cs.IT] 1 Feb 2008

Interior-Point Algorithms for Linear-Programming Decoding0

Pascal O. Vontobel

Hewlett–Packard Laboratories Palo Alto, CA 94304, USA Email:

Abstract— Interior-point algorithms constitute a very interesting class of algorithms for solving linear-programming problems. In this paper we study efficient implementations of such algorithms for solving the linear program that appears in the linearprogramming decoder formulation.

Consider a binary linear code C of length n that is used for data transmission over a binary-input discrete memoryless channel. As was observed by Feldman et al. [1], [2], the ML decoder for this setup can be written as xML = argmin x∈C where γ is a length-n vector that contains the log-likelihood ratios and where 〈γ,x〉 is the inner product (in R) of the vector γ with the vector x. Because the cost function in this minimization problem is linear, this is essentially equivalent to the solution of x′ ML , arg min where conv(C) denotes the convex hull of C when C is embedded in Rn. (We say “essentially equivalent” because in the case where there is a unique optimal codeword then the two minimization problems yield the same solution. However, when there are multiple optimal codewords then xML and x′ ML are non-singlet sets and it holds that conv(xML) = x′ ML.)

Because the above two optimization problems are usually

practically intractable, Feldman et al. [1], [2] proposed to solve a relaxation of the above problem. Namely, for a code C that can be written as the intersection of m binary linear codes of length n, i.e., C , ∩m j=1Cj, they introduced the so-called linear programming (LP) decoder xLP , argmin x∈P with the relaxed polytope

0This is essentially the paper that appeared in the Proceedings of the 2008 Information Theory and Applications Workshop, UC San Diego, CA, USA, January 27 – February 1, 2008. The only (major) change concerns the vector γ: it is now defined such that xML, x ′

ML , and xLP are solutions to minimization (and not maximization) problems.

for which it can easily be shown that all codewords in C are vertices of P.

The same polytope P appeared also in papers by Koetter and Vontobel [3], [4], [5], where message-passing iterative (MPI) decoders were analyzed and where this polytope P was called the fundamental polytope. The appearance of the same object in these two different contexts suggests that there is a tight connection between LP decoding and MPI decoding.

The above codes Cj can be any codes of length n, however, in the following we will focus on the case where these codes are codes of dimension n−1. For example, let H be an m×n parity-check matrix for the code C and let hTj be the j-th row of H.1 Then, defining

Of course, the reason why the decoder in (1) is called LP decoder is because the optimization problem in that equation is a linear program (LP).2 There are two standard forms for LPs, namely and maximize 〈b,λ〉

Any LP can be reformulated (by introducing suitable auxiliary variables, by reformulating equalities as two inequalities, etc.) so that it looks like the first standard form. Any LP can also be reformulated so that it looks like the second standard form. Moreover, the first and second standard form are tightly related in the sense that they are dual convex programs. Usually, the LP in (3) is called the primal LP and the LP in (4) is called the dual LP. (As it is to be expected from the expression “duality,” the primal LP is the dual of the dual LP.)

Not unexpectedly, there are many ways to express the LP that appears in (1) in either the first or the second standard

1Note that in this paper all vectors are column vectors. 2We use LP to denote both “linear programming” and “linear program.” form, and each of these reformulations has its advantages (and disadvantages). Once it is expressed in one of the standard forms, any general-purpose LP solver can basically be used to obtain the LP decoder output. However, the LP at hand has a lot of structure and one should take advantage of it in order to obtain very fast algorithms that can compete complexity- and time-wise with MPI decoders.

Several ideas have been presented in the past in this direction, e.g., by Feldman et al. [6] who briefly mentioned the use of sub-gradient methods for solving the LP of an early version of the LP decoder (namely for turbo-like codes), by Yang et al. [7], [8] on efficiently solvable variations of the LP decoder, by Taghavi and Siegel [9] on cutting-hyperplanetype approaches, by Vontobel and Koetter [10] on coordinateascent-type approaches, by Dimakis and Wainwright [1] and by Draper et al. [12] on improvements upon the LP decoder solution, and by Taghavi and Siegel [13] and by Wadayama [14] on using variations of LP decoding (together with efficient implementations) for intersymbol-interference channels.

In this paper our focus will be on so-called interior-point algorithms, a type of LP solvers that has become popular with the seminal work of Karmarkar [15]. (After the publication of [15] in 1984, earlier work on interior-point-type algorithms by Dikin [16] and others became more widely known). We present some initial thoughts on how to use this class of algorithms in the context of LP decoding. So far, with the notable exception of [14], interior-point-type algorithms that are especially targeted to the LP in (1) do not seem to have been considered. One of our goals by pursuing these type of methods is that we can potentially obtain algorithms that are better analyzable than MPI decoders, especially when it comes to finite-length codes. (Wadayama [14] discusses some efficient interior-point-type methods, however, he is trying to minimize a quadratic cost function, and the final solution is obtained through the use of the sum-product algorithm that is initialized by the result of the interior-point search. Although [14] presents some very interesting approaches that are worthwhile pursuing, it is not quite clear if these algorithms are better analyzable than MPI decoders.)

There are some interesting facts about interior-point-type algorithms that make them worthwhile study objects. First of all, there are variants for which one can prove polynomial-time convergence (even in the worst case, which is in contrast to the simplex algorithm). Secondly, we can round an intermediate result to the next vector with only 0 / 12 / 1 entries and check if it is a codeword.3 (This is very similar to the stopping criterion that is used for MPI algorithms.) Note that a similar approach will probably not work well for simplextype algorithms that typically wander from vertex to vertex of the fundamental polytope. The reason is that rounding the coordinates of a vertex yields only a codeword if the vertex

3To be precise, by rounding we mean that coordinates below 12 are mapped to 0, that coordinates above 12 are mapped to 1, and that coordinates equal to 12 are mapped to 1 2 .

was a codeword.4 Thirdly, interior-point-type algorithms are also interesting because they are less sensitive than simplextype algorithms to degenerate vertices of the feasible region; this is important because the fundamental polytope has many degenerate vertices.

The present paper is structured as follows. In Secs. I and II we discuss two classes of interior-point algorithms, namely affine-scaling algorithms and primal-dual interior-point algorithms, respectively. As we will see, the bottleneck step of the algorithms in these two sections is to repeatedly find the solution to a certain type of system of linear equations. Therefore, we will address this issue, and efficient solutions to it, in Sec. IV. Finally, we briefly mention some approaches for potential algorithm simplifications in Sec. V and we conclude the paper in Sec. VI.

An interesting class of interior-point-type algorithms are so-called affine scaling algorithms which were introduced by Dikin [16] and re-invented many times afterwards. Good introductions to this class of algorithms can be found in [17], [18].

Fig. 1 gives an intuitive picture of the workings of one instance of an affine-scaling algorithm. Consider the LP in (3) and assume that the set of all feasible points, i.e., the set of all x such that Ax = b and x ≥ 0, is a triangle. For the vector c shown in Fig. 1, the optimal solution will be the vertex in the lower left part. The algorithm works as follows:

1) Select an initial point that is in the interior of the set of all feasible points, cf. Fig. 1(b), and let the current point be equal to this initial point. 2) Minimizing 〈c, x〉 over the triangle is difficult (in fact, it is the problem we are trying to solve); therefore, we replace the triangle constraint by an ellipsoidal constraint that is centered around the current point. Such an ellipsoid is shown in Fig. 1(c). Its skewness depends on the closeness to the different boundaries. 3) We then minimize the function 〈c,x〉 over this ellipsoid.

The difference vector between this minimizing point and the center of the ellipsoid (see the little vector in Fig. 1(d)) points in the direction in which the next step will be taken. 4) Depending on what strategy is pursued, a shorter or a longer step in the above-found direction is taken. This results in a new current point. (Whatever step size is taken, we always impose the constraint that the step size is such that the new current point lies in the interior of the set of feasible points.) 5) If the current point is “close enough” to some vertex then stop, otherwise go to Step 2. (“Closeness” is determined according to some criterion.)

4Proof: in an LDPC code where all checks have degree at least two, the largest coordinate of any nonzero-vector vertex is at least 1 2 . Therefore, there is no nonzero-vector vertex that is rounded to the all-zero codeword. The proof is finished by using the symmetry of the fundamental polytope, i.e., the fact that the fundamental polytope “looks” the same from any codeword.

Fig. 1. Some iterations of the affine-scaling algorithm. (See text for details.)

Not surprisingly, when short (long) steps are taken in Step 4, the resulting algorithm is called the short-step (long-step) affine-scaling algorithm. Convergence proofs for different cases can be found in [19], [20], [21].

Of course, an affine-scaling algorithm can also be formulated for the LP in (4). Moreover, instead of the abovedescribed discrete-time version, one can easily come up with a continuous-time version, see e.g. [2]. The latter type of algorithms might actually be interesting for decoders that are implemented in analog VLSI.

The bottleneck step in the affine-scaling algorithm is to find the new direction, which amounts to solving a system of linear equations of the form Pu = v, where P is a given (iterationdependent) positive definite matrix, v is a given vector, and u is the direction vector that needs to be found. We will comment on efficient approaches for solving such systems of equations in Sec. IV.

In contrast to affine-scaling algorithms, which either work only with the primal LP or only with the dual LP, primaldual interior point algorithms – as the name suggests – work simultaneously on obtaining a primal and a dual optimal solution. A very readable and detailed introduction to this topic can be found in [23]. As in the case of the affine-scaling algorithm there are many different variations: short-step, longstep, predictor-corrector, path-following, etc.

Again, the bottleneck step is to find the solution to a system of linear equations Pu = v, where P is a given (iterationdependent) positive definite matrix, v is a given (iterationdependent) vector, and u is the sought quantity. We will comment in Sec. IV on how such systems of linear equations can be solved efficiently.

A variant that is worthwhile to be mentioned is the class of so-called infeasible-interior-point algorithms. The reason is that very often it is easy to find an initial primal feasible point or it is easy to find an initial dual feasible point but not both at the same time. Therefore, one starts the algorithm with a primal/dual point pair where the primal and/or the dual point are infeasible points; the algorithm then tries to decrease the amount of “infeasibility” (a quantity that we will not define here) at every iteration, besides of course optimizing the cost function.

One of the most intriguing aspects of primal-dual interiorpoint algorithms is the polynomial-time worst-case bounds that can be stated. Of course, these bounds say mostly something about the behavior when the algorithm is already close to the solution vertex. It remains to be seen if these results are useful for implementations of the LP decoder where it is desirable that the initial iterations are as aggressive as possible and where the behavior close to a vertex is not that crucial. (We remind the reader of the rounding-procedure that was discussed at the end of Sec. I, a procedure that took advantage of some special properties of the fundamental polytope.)


In Secs. I and II we saw that the crucial part in the discussed algorithms was to repeatedly and efficiently solve a system of linear equations that looks like where P is an iteration-dependent positive definite matrix and where v is an iteration-dependent vector. The fact that P is positive definite helps because u can also be seen to be the solution of the quadratic unconstrained optimization problem minimize 1 subj. to u ∈ R h where we assumed that P is an h×h-matrix. It is important to remark that for the algorithms in Secs. I and II the vector u usually does not have to be found perfectly. It is good enough to find an approximation of u that is close enough to the correct u. (For more details, see e.g. [18, Ch. 9].)

Using a standard gradient-type algorithm to find u might work. However, the matrix P is often ill-conditioned, i.e., the ratio of the largest to the smallest eigenvalue can be quite big (especially towards the final iterations), and so the convergence speed of a gradient-type algorithm might suffer considerably.

Therefore, more sophisticated approaches are desirable.

Such an approach is the conjugate-gradient algorithm which was introduced by Hestenes and Stiefel [24]. (See Shewchuk’s paper [25] for a very readable introduction to this topic and for some historical comments.) This method is especially attractive when P is sparse or when P can be written as a product of sparse matrices, the latter being the case for LP decoding of LDPC codes.

In the context of the affine-scaling algorithm, e.g. Resende and Veiga [26] used the conjugate-gradient algorithm to efficiently solve the relevant equation systems. They also studied the behavior of the conjugate-gradient algorithm with different pre-conditioners.

A quite different, yet interesting variant to solve the minimization problem in (5) is by using graphical models. Namely, one can represent the cost function in (5) by an additive factor graph [27], [28], [29]. Of course, there are a variety of factor graph representations for the this cost function, however, probably the most reasonable choice in the context of LP decoding is to choose the factor graph that looks topologically like the factor graph that is usually used for sum-product or min-sum algorithm decoding of LDPC codes. One can then try to find the solution with the help of the min-sum algorithm. [Equivalently, one can look at the maximization problem maximize exp( −

(Parte 1 de 2)