Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas

Matrix Analysis and Applied Linear Algebra, Notas de estudo de Cultura

Carl D. Meyer - Matrix Analysis and Applied Linear Algebra

Tipologia: Notas de estudo

2011

Compartilhado em 24/01/2011

fernanda-ribeiro-21
fernanda-ribeiro-21 🇧🇷

4.4

(36)

25 documentos

1 / 890

Documentos relacionados


Pré-visualização parcial do texto

Baixe Matrix Analysis and Applied Linear Algebra e outras Notas de estudo em PDF para Cultura, somente na Docsity! Matrix Analysis and Applied Linear Algebra Carl D. Meyer Contents Preface . . . . . . . . . . . . . . . . . . . . . . . ix 1. Linear Equations . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . 1 1.2 Gaussian Elimination and Matrices . . . . . . . . 3 1.3 Gauss–Jordan Method . . . . . . . . . . . . . . 15 1.4 Two-Point Boundary Value Problems . . . . . . . 18 1.5 Making Gaussian Elimination Work . . . . . . . . 21 1.6 Ill-Conditioned Systems . . . . . . . . . . . . . 33 2. Rectangular Systems and Echelon Forms . . . 41 2.1 Row Echelon Form and Rank . . . . . . . . . . . 41 2.2 Reduced Row Echelon Form . . . . . . . . . . . 47 2.3 Consistency of Linear Systems . . . . . . . . . . 53 2.4 Homogeneous Systems . . . . . . . . . . . . . . 57 2.5 Nonhomogeneous Systems . . . . . . . . . . . . 64 2.6 Electrical Circuits . . . . . . . . . . . . . . . . 73 3. Matrix Algebra . . . . . . . . . . . . . . 79 3.1 From Ancient China to Arthur Cayley . . . . . . . 79 3.2 Addition and Transposition . . . . . . . . . . . 81 3.3 Linearity . . . . . . . . . . . . . . . . . . . . 89 3.4 Why Do It This Way . . . . . . . . . . . . . . 93 3.5 Matrix Multiplication . . . . . . . . . . . . . . 95 3.6 Properties of Matrix Multiplication . . . . . . . 105 3.7 Matrix Inversion . . . . . . . . . . . . . . . 115 3.8 Inverses of Sums and Sensitivity . . . . . . . . 124 3.9 Elementary Matrices and Equivalence . . . . . . 131 3.10 The LU Factorization . . . . . . . . . . . . . 141 4. Vector Spaces . . . . . . . . . . . . . . . 159 4.1 Spaces and Subspaces . . . . . . . . . . . . . 159 4.2 Four Fundamental Subspaces . . . . . . . . . . 169 4.3 Linear Independence . . . . . . . . . . . . . 181 4.4 Basis and Dimension . . . . . . . . . . . . . 194 Preface Scaffolding Reacting to criticism concerning the lack of motivation in his writings, Gauss remarked that architects of great cathedrals do not obscure the beauty of their work by leaving the scaffolding in place after the construction has been completed. His philosophy epitomized the formal presentation and teaching of mathematics throughout the nineteenth and twentieth centuries, and it is still commonly found in mid-to-upper-level mathematics textbooks. The inherent ef- ficiency and natural beauty of mathematics are compromised by straying too far from Gauss’s viewpoint. But, as with most things in life, appreciation is gen- erally preceded by some understanding seasoned with a bit of maturity, and in mathematics this comes from seeing some of the scaffolding. Purpose, Gap, and Challenge The purpose of this text is to present the contemporary theory and applica- tions of linear algebra to university students studying mathematics, engineering, or applied science at the postcalculus level. Because linear algebra is usually en- countered between basic problem solving courses such as calculus or differential equations and more advanced courses that require students to cope with mathe- matical rigors, the challenge in teaching applied linear algebra is to expose some of the scaffolding while conditioning students to appreciate the utility and beauty of the subject. Effectively meeting this challenge and bridging the inherent gaps between basic and more advanced mathematics are primary goals of this book. Rigor and Formalism To reveal portions of the scaffolding, narratives, examples, and summaries are used in place of the formal definition–theorem–proof development. But while well-chosen examples can be more effective in promoting understanding than rigorous proofs, and while precious classroom minutes cannot be squandered on theoretical details, I believe that all scientifically oriented students should be exposed to some degree of mathematical thought, logic, and rigor. And if logic and rigor are to reside anywhere, they have to be in the textbook. So even when logic and rigor are not the primary thrust, they are always available. Formal definition–theorem–proof designations are not used, but definitions, theorems, and proofs nevertheless exist, and they become evident as a student’s maturity increases. A significant effort is made to present a linear development that avoids forward references, circular arguments, and dependence on prior knowledge of the subject. This results in some inefficiencies—e.g., the matrix 2-norm is presented x Preface before eigenvalues or singular values are thoroughly discussed. To compensate, I try to provide enough “wiggle room” so that an instructor can temper the inefficiencies by tailoring the approach to the students’ prior background. Comprehensiveness and Flexibility A rather comprehensive treatment of linear algebra and its applications is presented and, consequently, the book is not meant to be devoured cover-to-cover in a typical one-semester course. However, the presentation is structured to pro- vide flexibility in topic selection so that the text can be easily adapted to meet the demands of different course outlines without suffering breaks in continuity. Each section contains basic material paired with straightforward explanations, examples, and exercises. But every section also contains a degree of depth coupled with thought-provoking examples and exercises that can take interested students to a higher level. The exercises are formulated not only to make a student think about material from a current section, but they are designed also to pave the way for ideas in future sections in a smooth and often transparent manner. The text accommodates a variety of presentation levels by allowing instructors to select sections, discussions, examples, and exercises of appropriate sophistication. For example, traditional one-semester undergraduate courses can be taught from the basic material in Chapter 1 (Linear Equations); Chapter 2 (Rectangular Systems and Echelon Forms); Chapter 3 (Matrix Algebra); Chapter 4 (Vector Spaces); Chapter 5 (Norms, Inner Products, and Orthogonality); Chapter 6 (Determi- nants); and Chapter 7 (Eigenvalues and Eigenvectors). The level of the course and the degree of rigor are controlled by the selection and depth of coverage in the latter sections of Chapters 4, 5, and 7. An upper-level course might consist of a quick review of Chapters 1, 2, and 3 followed by a more in-depth treatment of Chapters 4, 5, and 7. For courses containing advanced undergraduate or grad- uate students, the focus can be on material in the latter sections of Chapters 4, 5, 7, and Chapter 8 (Perron–Frobenius Theory of Nonnegative Matrices). A rich two-semester course can be taught by using the text in its entirety. What Does “Applied” Mean? Most people agree that linear algebra is at the heart of applied science, but there are divergent views concerning what “applied linear algebra” really means; the academician’s perspective is not always the same as that of the practitioner. In a poll conducted by SIAM in preparation for one of the triannual SIAM con- ferences on applied linear algebra, a diverse group of internationally recognized scientific corporations and government laboratories was asked how linear algebra finds application in their missions. The overwhelming response was that the pri- mary use of linear algebra in applied industrial and laboratory work involves the development, analysis, and implementation of numerical algorithms along with some discrete and statistical modeling. The applications in this book tend to reflect this realization. While most of the popular “academic” applications are included, and “applications” to other areas of mathematics are honestly treated, Preface xi there is an emphasis on numerical issues designed to prepare students to use linear algebra in scientific environments outside the classroom. Computing Projects Computing projects help solidify concepts, and I include many exercises that can be incorporated into a laboratory setting. But my goal is to write a mathematics text that can last, so I don’t muddy the development by marrying the material to a particular computer package or language. I am old enough to remember what happened to the FORTRAN- and APL-based calculus and linear algebra texts that came to market in the 1970s. I provide instructors with a flexible environment that allows for an ancillary computing laboratory in which any number of popular packages and lab manuals can be used in conjunction with the material in the text. History Finally, I believe that revealing only the scaffolding without teaching some- thing about the scientific architects who erected it deprives students of an im- portant part of their mathematical heritage. It also tends to dehumanize mathe- matics, which is the epitome of human endeavor. Consequently, I make an effort to say things (sometimes very human things that are not always complimentary) about the lives of the people who contributed to the development and applica- tions of linear algebra. But, as I came to realize, this is a perilous task because writing history is frequently an interpretation of facts rather than a statement of facts. I considered documenting the sources of the historical remarks to help mitigate the inevitable challenges, but it soon became apparent that the sheer volume required to do so would skew the direction and flavor of the text. I can only assure the reader that I made an effort to be as honest as possible, and I tried to corroborate “facts.” Nevertheless, there were times when interpreta- tions had to be made, and these were no doubt influenced by my own views and experiences. Supplements Included with this text is a solutions manual and a CD-ROM. The solutions manual contains the solutions for each exercise given in the book. The solutions are constructed to be an integral part of the learning process. Rather than just providing answers, the solutions often contain details and discussions that are intended to stimulate thought and motivate material in the following sections. The CD, produced by Vickie Kearn and the people at SIAM, contains the entire book along with the solutions manual in PDF format. This electronic version of the text is completely searchable and linked. With a click of the mouse a student can jump to a referenced page, equation, theorem, definition, or proof, and then jump back to the sentence containing the reference, thereby making learning quite efficient. In addition, the CD contains material that extends his- torical remarks in the book and brings them to life with a large selection of 2 Chapter 1 Linear Equations a square array on a “counting board” and then manipulated the lines of the array according to prescribed rules of thumb. Their counting board techniques and rules of thumb found their way to Japan and eventually appeared in Europe with the colored rods having been replaced by numerals and the counting board replaced by pen and paper. In Europe, the technique became known as Gaussian elimination in honor of the German mathematician Carl Gauss,1 whose extensive use of it popularized the method. Because this elimination technique is fundamental, we begin the study of our subject by learning how to apply this method in order to compute solutions for linear equations. After the computational aspects have been mastered, we will turn to the more theoretical facets surrounding linear systems. 1 Carl Friedrich Gauss (1777–1855) is considered by many to have been the greatest mathemati- cian who has ever lived, and his astounding career requires several volumes to document. He was referred to by his peers as the “prince of mathematicians.” Upon Gauss’s death one of them wrote that “His mind penetrated into the deepest secrets of numbers, space, and nature; He measured the course of the stars, the form and forces of the Earth; He carried within himself the evolution of mathematical sciences of a coming century.” History has proven this remark to be true. 1.2 Gaussian Elimination and Matrices 3 1.2 GAUSSIAN ELIMINATION AND MATRICES The problem is to calculate, if possible, a common solution for a system of m linear algebraic equations in n unknowns a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... am1x1 + am2x2 + · · · + amnxn = bm, where the xi ’s are the unknowns and the aij ’s and the bi ’s are known constants. The aij ’s are called the coefficients of the system, and the set of bi ’s is referred to as the right-hand side of the system. For any such system, there are exactly three possibilities for the set of solutions. Three Possibilities • UNIQUE SOLUTION: There is one and only one set of values for the xi ’s that satisfies all equations simultaneously. • NO SOLUTION: There is no set of values for the xi ’s that satisfies all equations simultaneously—the solution set is empty. • INFINITELY MANY SOLUTIONS: There are infinitely many different sets of values for the xi ’s that satisfy all equations simultaneously. It is not difficult to prove that if a system has more than one solution, then it has infinitely many solutions. For example, it is impossible for a system to have exactly two different solutions. Part of the job in dealing with a linear system is to decide which one of these three possibilities is true. The other part of the task is to compute the solution if it is unique or to describe the set of all solutions if there are many solutions. Gaussian elimination is a tool that can be used to accomplish all of these goals. Gaussian elimination is a methodical process of systematically transform- ing one system into another simpler, but equivalent, system (two systems are called equivalent if they possess equal solution sets) by successively eliminating unknowns and eventually arriving at a system that is easily solvable. The elimi- nation process relies on three simple operations by which to transform one system to another equivalent system. To describe these operations, let Ek denote the kth equation Ek : ak1x1 + ak2x2 + · · · + aknxn = bk 4 Chapter 1 Linear Equations and write the system as S =   E1 E2 ... Em   . For a linear system S , each of the following three elementary operations results in an equivalent system S ′. (1) Interchange the ith and jth equations. That is, if S =   E1 ... Ei ... Ej ... Em   , then S ′ =   E1 ... Ej ... Ei ... Em   . (1.2.1) (2) Replace the ith equation by a nonzero multiple of itself. That is, S ′ =   E1 ... αEi ... Em   , where α = 0. (1.2.2) (3) Replace the jth equation by a combination of itself plus a multiple of the ith equation. That is, S ′ =   E1 ... Ei ... Ej + αEi ... Em   . (1.2.3) 1.2 Gaussian Elimination and Matrices 7 Finally, substitute z = 1 and y = 2 back into the first equation in (1.2.5) to get x = 1 2 (1 − y − z) = 1 2 (1 − 2 − 1) = −1, which completes the solution. It should be clear that there is no reason to write down the symbols such as “ x, ” “ y, ” “ z, ” and “ = ” at each step since we are only manipulating the coefficients. If such symbols are discarded, then a system of linear equations reduces to a rectangular array of numbers in which each horizontal line represents one equation. For example, the system in (1.2.4) reduces to the following array:  2 1 1 16 2 1 −1 −2 2 1 7   . (The line emphasizes where = appeared.) The array of coefficients—the numbers on the left-hand side of the vertical line—is called the coefficient matrix for the system. The entire array—the coefficient matrix augmented by the numbers from the right-hand side of the system—is called the augmented matrix associated with the system. If the coefficient matrix is denoted by A and the right-hand side is denoted by b , then the augmented matrix associated with the system is denoted by [A|b]. Formally, a scalar is either a real number or a complex number, and a matrix is a rectangular array of scalars. It is common practice to use uppercase boldface letters to denote matrices and to use the corresponding lowercase letters with two subscripts to denote individual entries in a matrix. For example, A =   a11 a12 · · · a1n a21 a22 · · · a2n ... ... . . . ... am1 am2 · · · amn   . The first subscript on an individual entry in a matrix designates the row (the horizontal line), and the second subscript denotes the column (the vertical line) that the entry occupies. For example, if A =   2 1 3 48 6 5 −9 −3 8 3 7   , then a11 = 2, a12 = 1, . . . , a34 = 7. (1.2.6) A submatrix of a given matrix A is an array obtained by deleting any combination of rows and columns from A. For example, B = ( 2 4 −3 7 ) is a submatrix of the matrix A in (1.2.6) because B is the result of deleting the second row and the second and third columns of A. 8 Chapter 1 Linear Equations Matrix A is said to have shape or size m× n —pronounced “m by n”— whenever A has exactly m rows and n columns. For example, the matrix in (1.2.6) is a 3 × 4 matrix. By agreement, 1 × 1 matrices are identified with scalars and vice versa. To emphasize that matrix A has shape m× n, subscripts are sometimes placed on A as Am×n. Whenever m = n (i.e., when A has the same number of rows as columns), A is called a square matrix. Otherwise, A is said to be rectangular. Matrices consisting of a single row or a single column are often called row vectors or column vectors, respectively. The symbol Ai∗ is used to denote the ith row, while A∗j denotes the jth column of matrix A . For example, if A is the matrix in (1.2.6), then A2∗ = ( 8 6 5 −9 ) and A∗2 =   16 8   . For a linear system of equations a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... am1x1 + am2x2 + · · · + amnxn = bm, Gaussian elimination can be executed on the associated augmented matrix [A|b] by performing elementary operations to the rows of [A|b]. These row operations correspond to the three elementary operations (1.2.1), (1.2.2), and (1.2.3) used to manipulate linear systems. For an m× n matrix M =   M1∗ ... Mi∗ ... Mj∗ ... Mm∗   , the three types of elementary row operations on M are as follows. • Type I: Interchange rows i and j to produce   M1∗ ... Mj∗ ... Mi∗ ... Mm∗   . (1.2.7) 1.2 Gaussian Elimination and Matrices 9 • Type II: Replace row i by a nonzero multiple of itself to produce  M1∗ ... αMi∗ ... Mm∗   , where α = 0. (1.2.8) • Type III: Replace row j by a combination of itself plus a multiple of row i to produce   M1∗ ... Mi∗ ... Mj∗ + αMi∗ ... Mm∗   . (1.2.9) To solve the system (1.2.4) by using elementary row operations, start with the associated augmented matrix [A|b] and triangularize the coefficient matrix A by performing exactly the same sequence of row operations that corresponds to the elementary operations executed on the equations themselves:  ©2 1 1 16 2 1 −1 −2 2 1 7   R2 − 3R1 R3 + R1 −→   2 1 1 10 ©-1 −2 −4 0 3 2 8   R3 + 3R2 −→   2 1 1 10 −1 −2 −4 0 0 −4 −4   . The final array represents the triangular system 2x + y + z = 1, − y − 2z = − 4, − 4z = − 4 that is solved by back substitution as described earlier. In general, if an n× n system has been triangularized to the form  t11 t12 · · · t1n c1 0 t22 · · · t2n c2 ... ... . . . ... ... 0 0 · · · tnn cn   (1.2.10) in which each tii = 0 (i.e., there are no zero pivots), then the general algorithm for back substitution is as follows. 12 Chapter 1 Linear Equations 1.2.2. Apply Gaussian elimination with back substitution to the following sys- tem: 2x1 − x2 = 0, −x1 + 2x2 − x3 = 0, −x2 + x3 = 1. 1.2.3. Use Gaussian elimination with back substitution to solve the following system: 4x2 − 3x3 = 3, −x1 + 7x2 − 5x3 = 4, −x1 + 8x2 − 6x3 = 5. 1.2.4. Solve the following system: x1 + x2 + x3 + x4 = 1, x1 + x2 + 3x3 + 3x4 = 3, x1 + x2 + 2x3 + 3x4 = 3, x1 + 3x2 + 3x3 + 3x4 = 4. 1.2.5. Consider the following three systems where the coefficients are the same for each system, but the right-hand sides are different (this situation occurs frequently): 4x− 8y + 5z = 1 0 0, 4x− 7y + 4z = 0 1 0, 3x− 4y + 2z = 0 0 1. Solve all three systems at one time by performing Gaussian elimination on an augmented matrix of the form[ A ∣∣ b1 ∣∣ b2 ∣∣ b3] . 1.2.6. Suppose that matrix B is obtained by performing a sequence of row operations on matrix A . Explain why A can be obtained by performing row operations on B . 1.2.7. Find angles α, β, and γ such that 2 sinα− cosβ + 3 tan γ = 3, 4 sinα + 2 cosβ − 2 tan γ = 2, 6 sinα− 3 cosβ + tan γ = 9, where 0 ≤ α ≤ 2π, 0 ≤ β ≤ 2π, and 0 ≤ γ < π. 1.2 Gaussian Elimination and Matrices 13 1.2.8. The following system has no solution: −x1 + 3x2 − 2x3 = 1, −x1 + 4x2 − 3x3 = 0, −x1 + 5x2 − 4x3 = 0. Attempt to solve this system using Gaussian elimination and explain what occurs to indicate that the system is impossible to solve. 1.2.9. Attempt to solve the system −x1 + 3x2 − 2x3 = 4, −x1 + 4x2 − 3x3 = 5, −x1 + 5x2 − 4x3 = 6, using Gaussian elimination and explain why this system must have in- finitely many solutions. 1.2.10. By solving a 3 × 3 system, find the coefficients in the equation of the parabola y = α+βx+γx2 that passes through the points (1, 1), (2, 2), and (3, 0). 1.2.11. Suppose that 100 insects are distributed in an enclosure consisting of four chambers with passageways between them as shown below. #1 #2 #3 #4 At the end of one minute, the insects have redistributed themselves. Assume that a minute is not enough time for an insect to visit more than one chamber and that at the end of a minute 40% of the insects in each chamber have not left the chamber they occupied at the beginning of the minute. The insects that leave a chamber disperse uniformly among the chambers that are directly accessible from the one they initially occupied—e.g., from #3, half move to #2 and half move to #4. 14 Chapter 1 Linear Equations (a) If at the end of one minute there are 12, 25, 26, and 37 insects in chambers #1, #2, #3, and #4, respectively, determine what the initial distribution had to be. (b) If the initial distribution is 20, 20, 20, 40, what is the distribution at the end of one minute? 1.2.12. Show that the three types of elementary row operations discussed on p. 8 are not independent by showing that the interchange operation (1.2.7) can be accomplished by a sequence of the other two types of row operations given in (1.2.8) and (1.2.9). 1.2.13. Suppose that [A|b] is the augmented matrix associated with a linear system. You know that performing row operations on [A|b] does not change the solution of the system. However, no mention of column oper- ations was ever made because column operations can alter the solution. (a) Describe the effect on the solution of a linear system when columns A∗j and A∗k are interchanged. (b) Describe the effect when column A∗j is replaced by αA∗j for α = 0. (c) Describe the effect when A∗j is replaced by A∗j + αA∗k. Hint: Experiment with a 2 × 2 or 3 × 3 system. 1.2.14. Consider the n× n Hilbert matrix defined by H =   1 12 1 3 · · · 1n 1 2 1 3 1 4 · · · 1n+1 1 3 1 4 1 5 · · · 1n+2 ... ... ... · · · ... 1 n 1 n+1 1 n+2 · · · 12n−1   . Express the individual entries hij in terms of i and j. 1.2.15. Verify that the operation counts given in the text for Gaussian elimi- nation with back substitution are correct for a general 3 × 3 system. If you are up to the challenge, try to verify these counts for a general n× n system. 1.2.16. Explain why a linear system can never have exactly two different solu- tions. Extend your argument to explain the fact that if a system has more than one solution, then it must have infinitely many different solutions. 1.3 Gauss–Jordan Method 17 number of additions/subtractions. Compare this with the n3/2 factor required by the Gauss–Jordan method, and you can see that Gauss–Jordan requires about 50% more effort than Gaussian elimination with back substitution. For small sys- tems of the textbook variety (e.g., n = 3 ), these comparisons do not show a great deal of difference. However, in practical work, the systems that are encountered can be quite large, and the difference between Gauss–Jordan and Gaussian elim- ination with back substitution can be significant. For example, if n = 100, then n3/3 is about 333,333, while n3/2 is 500,000, which is a difference of 166,667 multiplications/divisions as well as that many additions/subtractions. Although the Gauss–Jordan method is not recommended for solving linear systems that arise in practical applications, it does have some theoretical advan- tages. Furthermore, it can be a useful technique for tasks other than computing solutions to linear systems. We will make use of the Gauss–Jordan procedure when matrix inversion is discussed—this is the primary reason for introducing Gauss–Jordan. Exercises for section 1.3 1.3.1. Use the Gauss–Jordan method to solve the following system: 4x2 − 3x3 = 3, −x1 + 7x2 − 5x3 = 4, −x1 + 8x2 − 6x3 = 5. 1.3.2. Apply the Gauss–Jordan method to the following system: x1 + x2 + x3 + x4 = 1, x1 + 2x2 + 2x3 + 2x4 = 0, x1 + 2x2 + 3x3 + 3x4 = 0, x1 + 2x2 + 3x3 + 4x4 = 0. 1.3.3. Use the Gauss–Jordan method to solve the following three systems at the same time. 2x1 − x2 = 1 0 0, −x1 + 2x2 − x3 = 0 1 0, −x2 + x3 = 0 0 1. 1.3.4. Verify that the operation counts given in the text for the Gauss–Jordan method are correct for a general 3 × 3 system. If you are up to the challenge, try to verify these counts for a general n× n system. 18 Chapter 1 Linear Equations 1.4 TWO-POINT BOUNDARY VALUE PROBLEMS It was stated previously that linear systems that arise in practice can become quite large in size. The purpose of this section is to understand why this often occurs and why there is frequently a special structure to the linear systems that come from practical applications. Given an interval [a, b] and two numbers α and β, consider the general problem of trying to find a function y(t) that satisfies the differential equation u(t)y′′(t)+v(t)y′(t)+w(t)y(t) = f(t), where y(a) = α and y(b) = β. (1.4.1) The functions u, v, w, and f are assumed to be known functions on [a, b]. Because the unknown function y(t) is specified at the boundary points a and b, problem (1.4.1) is known as a two-point boundary value problem. Such problems abound in nature and are frequently very hard to handle because it is often not possible to express y(t) in terms of elementary functions. Numerical methods are usually employed to approximate y(t) at discrete points inside [a, b]. Approximations are produced by subdividing the interval [a, b] into n+1 equal subintervals, each of length h = (b− a)/(n + 1) as shown below. h h h︸ ︷︷ ︸ ︸ ︷︷ ︸ ︸ ︷︷ ︸ · · · · · · t0 = a t1 = a + h t2 = a + 2h tn = a + nh tn+1 = b Derivative approximations at the interior nodes (grid points) ti = a + ih are made by using Taylor series expansions y(t) = ∑∞ k=0 y (k)(ti)(t− ti)k/k! to write y(ti + h) = y(ti) + y′(ti)h + y′′(ti)h2 2! + y′′′(ti)h3 3! + · · · , y(ti − h) = y(ti) − y′(ti)h + y′′(ti)h2 2! − y ′′′(ti)h3 3! + · · · , (1.4.2) and then subtracting and adding these expressions to produce y′(ti) = y(ti + h) − y(ti − h) 2h + O(h3) and y′′(ti) = y(ti − h) − 2y(ti) + y(ti + h) h2 + O(h4), where O(hp) denotes 5 terms containing pth and higher powers of h. The 5 Formally, a function f(h) is O(hp) if f(h)/hp remains bounded as h → 0, but f(h)/hq becomes unbounded if q > p. This means that f goes to zero as fast as hp goes to zero. 1.4 Two-Point Boundary Value Problems 19 resulting approximations y′(ti) ≈ y(ti+h) − y(ti−h) 2h and y′′(ti) ≈ y(ti−h) − 2y(ti) + y(ti+h) h2 (1.4.3) are called centered difference approximations, and they are preferred over less accurate one-sided approximations such as y′(ti) ≈ y(ti + h) − y(ti) h or y′(ti) ≈ y(t) − y(t− h) h . The value h = (b − a)/(n + 1) is called the step size. Smaller step sizes pro- duce better derivative approximations, so obtaining an accurate solution usually requires a small step size and a large number of grid points. By evaluating the centered difference approximations at each grid point and substituting the result into the original differential equation (1.4.1), a system of n linear equations in n unknowns is produced in which the unknowns are the values y(ti). A simple example can serve to illustrate this point. Example 1.4.1 Suppose that f(t) is a known function and consider the two-point boundary value problem y′′(t) = f(t) on [0, 1] with y(0) = y(1) = 0. The goal is to approximate the values of y at n equally spaced grid points ti interior to [0, 1]. The step size is therefore h = 1/(n + 1). For the sake of convenience, let yi = y(ti) and fi = f(ti). Use the approximation yi−1 − 2yi + yi+1 h2 ≈ y′′(ti) = fi along with y0 = 0 and yn+1 = 0 to produce the system of equations −yi−1 + 2yi − yi+1 ≈ −h2fi for i = 1, 2, . . . , n. (The signs are chosen to make the 2’s positive to be consistent with later devel- opments.) The augmented matrix associated with this system is shown below:  2 −1 0 · · · 0 0 0 −h2f1 −1 2 −1 · · · 0 0 0 −h2f2 0 −1 2 · · · 0 0 0 −h2f3 ... ... ... . . . ... ... ... ... 0 0 0 · · · 2 −1 0 −h2fn−2 0 0 0 · · · −1 2 −1 −h2fn−1 0 0 0 · · · 0 −1 2 −h2fn   . By solving this system, approximate values of the unknown function y at the grid points ti are obtained. Larger values of n produce smaller values of h and hence better approximations to the exact values of the yi ’s. 22 Chapter 1 Linear Equations to look at digit dt+1 in x = .d1d2 · · · dtdt+1 · · · × 10 (making sure d1 = 0) and then set fl(x) = { .d1d2 · · · dt × 10 if dt+1 < 5, ([.d1d2 · · · dt] + 10−t) × 10 if dt+1 ≥ 5. For example, in 2 -digit, base-10 floating-point arithmetic, fl (3/80) = fl(.0375) = fl(.375 × 10−1) = .38 × 10−1 = .038. By considering η = 1/3 and ξ = 3 with t -digit base-10 arithmetic, it’s easy to see that fl(η + ξ) = fl(η) + fl(ξ) and fl(ηξ) = fl(η)fl(ξ). Furthermore, several familiar rules of real arithmetic do not hold for floating- point arithmetic—associativity is one outstanding example. This, among other reasons, makes the analysis of floating-point computation difficult. It also means that you must be careful when working the examples and exercises in this text because although most calculators and computers can be instructed to display varying numbers of digits, most have a fixed internal precision with which all calculations are made before numbers are displayed, and this internal precision cannot be altered. Almost certainly, the internal precision of your calculator or computer is greater than the precision called for by the examples and exercises in this text. This means that each time you perform a t-digit calculation, you should manually round the result to t significant digits and reenter the rounded number before proceeding to the next calculation. In other words, don’t “chain” operations in your calculator or computer. To understand how to execute Gaussian elimination using floating-point arithmetic, let’s compare the use of exact arithmetic with the use of 3-digit base-10 arithmetic to solve the following system: 47x + 28y = 19, 89x + 53y = 36. Using Gaussian elimination with exact arithmetic, we multiply the first equation by the multiplier m = 89/47 and subtract the result from the second equation to produce ( 47 28 19 0 −1/47 1/47 ) . Back substitution yields the exact solution x = 1 and y = −1. Using 3-digit arithmetic, the multiplier is fl(m) = fl ( 89 47 ) = .189 × 101 = 1.89. 1.5 Making Gaussian Elimination Work 23 Since fl ( fl(m)fl(47) ) = fl(1.89 × 47) = .888 × 102 = 88.8, f l ( fl(m)fl(28) ) = fl(1.89 × 28) = .529 × 102 = 52.9, f l ( fl(m)fl(19) ) = fl(1.89 × 19) = .359 × 102 = 35.9, the first step of 3-digit Gaussian elimination is as shown below:( 47 28 19 fl(89 − 88.8) fl(53 − 52.9) fl(36 − 35.9) ) = ( 47 28 19 ©.2 .1 .1 ) . The goal is to triangularize the system—to produce a zero in the circled (2,1)-position—but this cannot be accomplished with 3-digit arithmetic. Unless the circled value ©.2 is replaced by 0, back substitution cannot be executed. Henceforth, we will agree simply to enter 0 in the position that we are trying to annihilate, regardless of the value of the floating-point number that might actually appear. The value of the position being annihilated is generally not even computed. For example, don’t even bother computing fl [ 89 − fl ( fl(m)fl(47) )] = fl(89 − 88.8) = .2 in the above example. Hence the result of 3-digit Gaussian elimination for this example is ( 47 28 19 0 .1 .1 ) . Apply 3-digit back substitution to obtain the 3-digit floating-point solution y = fl ( .1 .1 ) = 1, x = fl ( 19 − 28 47 ) = fl (−9 47 ) = −.191. The vast discrepancy between the exact solution (1,−1) and the 3-digit solution (−.191, 1) illustrates some of the problems we can expect to encounter while trying to solve linear systems with floating-point arithmetic. Sometimes using a higher precision may help, but this is not always possible because on all machines there are natural limits that make extended precision arithmetic impractical past a certain point. Even if it is possible to increase the precision, it 24 Chapter 1 Linear Equations may not buy you very much because there are many cases for which an increase in precision does not produce a comparable decrease in the accumulated roundoff error. Given any particular precision (say, t ), it is not difficult to provide exam- ples of linear systems for which the computed t-digit solution is just as bad as the one in our 3-digit example above. Although the effects of rounding can almost never be eliminated, there are some simple techniques that can help to minimize these machine induced errors. Partial Pivoting At each step, search the positions on and below the pivotal position for the coefficient of maximum magnitude. If necessary perform the appro- priate row interchange to bring this maximal coefficient into the pivotal position. Illustrated below is the third step in a typical case:   ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗ 0 0 ©S ∗ ∗ ∗ 0 0 S ∗ ∗ ∗ 0 0 S ∗ ∗ ∗   . Search the positions in the third column marked “ S ” for the coefficient of maximal magnitude and, if necessary, interchange rows to bring this coefficient into the circled pivotal position. Simply stated, the strategy is to maximize the magnitude of the pivot at each step by using only row interchanges. On the surface, it is probably not apparent why partial pivoting should make a difference. The following example not only shows that partial pivoting can indeed make a great deal of difference, but it also indicates what makes this strategy effective. Example 1.5.1 It is easy to verify that the exact solution to the system −10−4x + y = 1, x + y = 2, is given by x = 1 1.0001 and y = 1.0002 1.0001 . If 3-digit arithmetic without partial pivoting is used, then the result is 1.5 Making Gaussian Elimination Work 27 is given by x = 1 1.0001 and y = 1.0002 1.0001 . Suppose that 3-digit arithmetic with partial pivoting is used. Since | − 10| > 1, no interchange is called for and we obtain( −10 105 105 1 1 2 ) R2 + 10−1R1 −→ ( −10 105 105 0 104 104 ) because fl(1 + 104) = fl(.10001 × 105) = .100 × 105 = 104 and fl(2 + 104) = fl(.10002 × 105) = .100 × 105 = 104. Back substitution yields x = 0 and y = 1, which must be considered to be very bad—the computed 3-digit solution for y is not too bad, but the computed 3-digit solution for x is terrible! What is the source of difficulty in Example 1.5.2? This time, the multi- plier cannot be blamed. The trouble stems from the fact that the first equation contains coefficients that are much larger than the coefficients in the second equation. That is, there is a problem of scale due to the fact that the coefficients are of different orders of magnitude. Therefore, we should somehow rescale the system before attempting to solve it. If the first equation in the above example is rescaled to insure that the coefficient of maximum magnitude is a 1, which is accomplished by multiplying the first equation by 10−5, then the system given in Example 1.5.1 is obtained, and we know from that example that partial pivoting produces a very good approximation to the exact solution. This points to the fact that the success of partial pivoting can hinge on maintaining the proper scale among the coefficients. Therefore, the second re- finement needed to make Gaussian elimination practical is a reasonable scaling strategy. Unfortunately, there is no known scaling procedure that will produce optimum results for every possible system, so we must settle for a strategy that will work most of the time. The strategy is to combine row scaling—multiplying selected rows by nonzero multipliers—with column scaling—multiplying se- lected columns of the coefficient matrix A by nonzero multipliers. Row scaling doesn’t alter the exact solution, but column scaling does—see Exercise 1.2.13(b). Column scaling is equivalent to changing the units of the kth unknown. For example, if the units of the kth unknown xk in [A|b] are millimeters, and if the kth column of A is multiplied by . 001, then the kth unknown in the scaled system [Â | b] is x̂i = 1000xi, and thus the units of the scaled unknown x̂k become meters. 28 Chapter 1 Linear Equations Experience has shown that the following strategy for combining row scaling with column scaling usually works reasonably well. Practical Scaling Strategy 1. Choose units that are natural to the problem and do not dis- tort the relationships between the sizes of things. These natural units are usually self-evident, and further column scaling past this point is not ordinarily attempted. 2. Row scale the system [A|b] so that the coefficient of maximum magnitude in each row of A is equal to 1. That is, divide each equation by the coefficient of maximum magnitude. Partial pivoting together with the scaling strategy described above makes Gaussian elimination with back substitution an extremely effec- tive tool. Over the course of time, this technique has proven to be reliable for solving a majority of linear systems encountered in practical work. Although it is not extensively used, there is an extension of partial pivoting known as complete pivoting which, in some special cases, can be more effective than partial pivoting in helping to control the effects of roundoff error. Complete Pivoting If [A|b] is the augmented matrix at the kth step of Gaussian elimina- tion, then search the pivotal position together with every position in A that is below or to the right of the pivotal position for the coefficient of maximum magnitude. If necessary, perform the appropriate row and column interchanges to bring the coefficient of maximum magnitude into the pivotal position. Shown below is the third step in a typical situation:  ∗ ∗ ∗ ∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗ 0 0 ©S S S ∗ 0 0 S S S ∗ 0 0 S S S ∗   Search the positions marked “ S ” for the coefficient of maximal magni- tude. If necessary, interchange rows and columns to bring this maximal coefficient into the circled pivotal position. Recall from Exercise 1.2.13 that the effect of a column interchange in A is equivalent to permuting (or renaming) the associated unknowns. 1.5 Making Gaussian Elimination Work 29 You should be able to see that complete pivoting should be at least as effec- tive as partial pivoting. Moreover, it is possible to construct specialized exam- ples where complete pivoting is superior to partial pivoting—a famous example is presented in Exercise 1.5.7. However, one rarely encounters systems of this nature in practice. A deeper comparison between no pivoting, partial pivoting, and complete pivoting is given on p. 348. Example 1.5.3 Problem: Use 3-digit arithmetic together with complete pivoting to solve the following system: x− y = −2, −9x + 10y = 12. Solution: Since 10 is the coefficient of maximal magnitude that lies in the search pattern, interchange the first and second rows and then interchange the first and second columns: ( 1 −1 −2 −9 10 12 ) −→ ( −9 10 12 1 −1 −2 ) −→ ( 10 −9 12 −1 1 −2 ) −→ ( 10 −9 12 0 .1 −.8 ) . The effect of the column interchange is to rename the unknowns to x̂ and ŷ, where x̂ = y and ŷ = x. Back substitution yields ŷ = −8 and x̂ = −6 so that x = ŷ = −8 and y = x̂ = −6. In this case, the 3-digit solution and the exact solution agree. If only partial pivoting is used, the 3-digit solution will not be as accurate. However, if scaled partial pivoting is used, the result is the same as when complete pivoting is used. If the cost of using complete pivoting was nearly the same as the cost of using partial pivoting, we would always use complete pivoting. However, it is not diffi- cult to show that complete pivoting approximately doubles the cost over straight Gaussian elimination, whereas partial pivoting adds only a negligible amount. Couple this with the fact that it is extremely rare to encounter a practical system where scaled partial pivoting is not adequate while complete pivoting is, and it is easy to understand why complete pivoting is seldom used in practice. Gaus- sian elimination with scaled partial pivoting is the preferred method for dense systems (i.e., not a lot of zeros) of moderate size. 32 Chapter 1 Linear Equations (c) Using 3-digit arithmetic, column scale the coefficients by chang- ing units: convert pounds of silica to tons of silica, pounds of iron to half-tons of iron, and pounds of gold to troy ounces of gold (1 lb. = 12 troy oz.). (d) Use 3-digit arithmetic with partial pivoting to solve the column scaled system of part (c). Then approximate the exact solution by using your machine’s (or calculator’s) full precision with par- tial pivoting to solve the system in part (c), and compare this with your 3-digit solution by computing the relative error er as defined in part (b). 1.5.6. Consider the system given in Example 1.5.3. (a) Use 3-digit arithmetic with partial pivoting but with no scaling to solve the system. (b) Now use partial pivoting with scaling. Does complete pivoting provide an advantage over scaled partial pivoting in this case? 1.5.7. Consider the following well-scaled matrix: Wn =   1 0 0 · · · 0 0 1 −1 1 0 · · · 0 0 1 −1 −1 1 . . . 0 0 1 ... ... . . . . . . . . . ... ... −1 −1 −1 . . . 1 0 1 −1 −1 −1 · · · −1 1 1 −1 −1 −1 · · · −1 −1 1   . (a) Reduce Wn to an upper-triangular form using Gaussian elimi- nation with partial pivoting, and determine the element of max- imal magnitude that emerges during the elimination procedure. (b) Now use complete pivoting and repeat part (a). (c) Formulate a statement comparing the results of partial pivoting with those of complete pivoting for Wn, and describe the effect this would have in determining the t -digit solution for a system whose augmented matrix is [Wn | b]. 1.5.8. Suppose that A is an n× n matrix of real numbers that has been scaled so that each entry satisfies |aij | ≤ 1, and consider reducing A to tri- angular form using Gaussian elimination with partial pivoting. Demon- strate that after k steps of the process, no entry can have a magnitude that exceeds 2k. Note: The previous exercise shows that there are cases where it is possible for some elements to actually attain the maximum magnitude of 2k after k steps. 1.6 Ill-Conditioned Systems 33 1.6 ILL-CONDITIONED SYSTEMS Gaussian elimination with partial pivoting on a properly scaled system is perhaps the most fundamental algorithm in the practical use of linear algebra. However, it is not a universal algorithm nor can it be used blindly. The purpose of this section is to make the point that when solving a linear system some discretion must always be exercised because there are some systems that are so inordinately sensitive to small perturbations that no numerical technique can be used with confidence. Example 1.6.1 Consider the system .835x + .667y = .168, .333x + .266y = .067, for which the exact solution is x = 1 and y = −1. If b2 = .067 is only slightly perturbed to become b̂2 = .066, then the exact solution changes dramatically to become x̂ = −666 and ŷ = 834. This is an example of a system whose solution is extremely sensitive to a small perturbation. This sensitivity is intrinsic to the system itself and is not a result of any numerical procedure. Therefore, you cannot expect some “numerical trick” to remove the sensitivity. If the exact solution is sensitive to small perturbations, then any computed solution cannot be less so, regardless of the algorithm used. Ill-Conditioned Linear Systems A system of linear equations is said to be ill-conditioned when some small perturbation in the system can produce relatively large changes in the exact solution. Otherwise, the system is said to be well- conditioned. It is easy to visualize what causes a 2 × 2 system to be ill-conditioned. Geometrically, two equations in two unknowns represent two straight lines, and the point of intersection is the solution for the system. An ill-conditioned system represents two straight lines that are almost parallel. 34 Chapter 1 Linear Equations If two straight lines are almost parallel and if one of the lines is tilted only slightly, then the point of intersection (i.e., the solution of the associated 2 × 2 linear system) is drastically altered. L L' Original Solution Perturbed Solution Figure 1.6.1 This is illustrated in Figure 1.6.1 in which line L is slightly perturbed to become line L′. Notice how this small perturbation results in a large change in the point of intersection. This was exactly the situation for the system given in Example 1.6.1. In general, ill-conditioned systems are those that represent almost parallel lines, almost parallel planes, and generalizations of these notions. Because roundoff errors can be viewed as perturbations to the original coeffi- cients of the system, employing even a generally good numerical technique—short of exact arithmetic—on an ill-conditioned system carries the risk of producing nonsensical results. In dealing with an ill-conditioned system, the engineer or scientist is often confronted with a much more basic (and sometimes more disturbing) problem than that of simply trying to solve the system. Even if a minor miracle could be performed so that the exact solution could be extracted, the scientist or engineer might still have a nonsensical solution that could lead to totally incorrect conclusions. The problem stems from the fact that the coefficients are often empirically obtained and are therefore known only within certain tolerances. For an ill-conditioned system, a small uncertainty in any of the coefficients can mean an extremely large uncertainty may exist in the solution. This large uncertainty can render even the exact solution totally useless. Example 1.6.2 Suppose that for the system .835x + .667y = b1 .333x + .266y = b2 the numbers b1 and b2 are the results of an experiment and must be read from the dial of a test instrument. Suppose that the dial can be read to within a 1.6 Ill-Conditioned Systems 37 changes. If a radical change in the solution is observed for a small perturbation to some set of coefficients, then you have uncovered an ill-conditioned situation. If a given perturbation does not produce a large change in the solution, then nothing can be concluded—perhaps you perturbed the wrong set of coefficients. By performing several such experiments using different sets of coefficients, a feel (but not a guarantee) for the extent of ill-conditioning can be obtained. This is expensive and not very satisfying. But before more can be said, more sophisti- cated tools need to be developed—the topics of sensitivity and conditioning are revisited on p. 127 and in Example 5.12.1 on p. 414. Exercises for section 1.6 1.6.1. Consider the ill-conditioned system of Example 1.6.1: .835x + .667y = .168, .333x + .266y = .067. (a) Describe the outcome when you attempt to solve the system using 5-digit arithmetic with no scaling. (b) Again using 5-digit arithmetic, first row scale the system before attempting to solve it. Describe to what extent this helps. (c) Now use 6-digit arithmetic with no scaling. Compare the results with the exact solution. (d) Using 6-digit arithmetic, compute the residuals for your solution of part (c), and interpret the results. (e) For the same solution obtained in part (c), again compute the residuals, but use 7-digit arithmetic this time, and interpret the results. (f) Formulate a concluding statement that summarizes the points made in parts (a)–(e). 1.6.2. Perturb the ill-conditioned system given in Exercise 1.6.1 above so as to form the following system: .835x + .667y = .1669995, .333x + .266y = .066601. (a) Determine the exact solution, and compare it with the exact solution of the system in Exercise 1.6.1. (b) On the basis of the results of part (a), formulate a statement concerning the necessity for the solution of an ill-conditioned system to undergo a radical change for every perturbation of the original system. 38 Chapter 1 Linear Equations 1.6.3. Consider the two straight lines determined by the graphs of the following two equations: .835x + .667y = .168, .333x + .266y = .067. (a) Use 5-digit arithmetic to compute the slopes of each of the lines, and then use 6-digit arithmetic to do the same. In each case, sketch the graphs on a coordinate system. (b) Show by diagram why a small perturbation in either of these lines can result in a large change in the solution. (c) Describe in geometrical terms the situation that must exist in order for a system to be optimally well-conditioned. 1.6.4. Using geometric considerations, rank the following three systems accord- ing to their condition. (a) 1.001x− y = .235, x + .0001y = .765. (b) 1.001x− y = .235, x + .9999y = .765. (c) 1.001x + y = .235, x + .9999y = .765. 1.6.5. Determine the exact solution of the following system: 8x + 5y + 2z = 15, 21x + 19y + 16z = 56, 39x + 48y + 53z = 140. Now change 15 to 14 in the first equation and again solve the system with exact arithmetic. Is the system ill-conditioned? 1.6.6. Show that the system v − w − x− y − z = 0, w − x− y − z = 0, x− y − z = 0, y − z = 0, z = 1, 1.6 Ill-Conditioned Systems 39 is ill-conditioned by considering the following perturbed system: v − w − x− y − z = 0, − 1 15 v + w − x− y − z = 0, − 1 15 v + x− y − z = 0, − 1 15 v + y − z = 0, − 1 15 v + z = 1. 1.6.7. Let f(x) = sinπx on [0, 1]. The object of this problem is to determine the coefficients αi of the cubic polynomial p(x) = 3∑ i=0 αix i that is as close to f(x) as possible in the sense that r = ∫ 1 0 [f(x) − p(x)]2dx = ∫ 1 0 [f(x)]2dx− 2 3∑ i=0 αi ∫ 1 0 xif(x)dx + ∫ 1 0 ( 3∑ i=0 αix i )2 dx is as small as possible. (a) In order to minimize r, impose the condition that ∂r/∂αi = 0 for each i = 0, 1, 2, 3, and show this results in a system of linear equations whose augmented matrix is [H4 | b], where H4 and b are given by H4 =   1 12 1 3 1 4 1 2 1 3 1 4 1 5 1 3 1 4 1 5 1 6 1 4 1 5 1 6 1 7   and b =   2 π 1 π 1 π − 4π3 1 π − 6π3   . Any matrix Hn that has the same form as H4 is called a Hilbert matrix of order n. (b) Systems involving Hilbert matrices are badly ill-conditioned, and the ill-conditioning becomes worse as the size increases. Use exact arithmetic with Gaussian elimination to reduce H4 to tri- angular form. Assuming that the case in which n = 4 is typical, explain why a general system [Hn | b] will be ill-conditioned. Notice that even complete pivoting is of no help. 2.1 Row Echelon Form and Rank 43 Modified Gaussian Elimination Suppose that U is the augmented matrix associated with the system after i − 1 elimination steps have been completed. To execute the ith step, proceed as follows: • Moving from left to right in U , locate the first column that contains a nonzero entry on or below the ith position—say it is U∗j . • The pivotal position for the ith step is the (i, j) -position. • If necessary, interchange the ith row with a lower row to bring a nonzero number into the (i, j) -position, and then annihilate all en- tries below this pivot. • If row Ui∗ as well as all rows in U below Ui∗ consist entirely of zeros, then the elimination process is completed. Illustrated below is the result of applying this modified version of Gaussian elimination to the matrix given in (2.1.1). Example 2.1.1 Problem: Apply modified Gaussian elimination to the following matrix and circle the pivot positions: A =   1 2 1 3 3 2 4 0 4 4 1 2 3 5 5 2 4 0 4 7   . Solution:   ©1 2 1 3 3 2 4 0 4 4 1 2 3 5 5 2 4 0 4 7   −→   ©1 2 1 3 3 0 0 ©-2 −2 −2 0 0 2 2 2 0 0 −2 −2 1   −→   ©1 2 1 3 3 0 0 ©-2 −2 −2 0 0 0 0 ©0 0 0 0 0 3   −→   ©1 2 1 3 3 0 0 ©-2 −2 −2 0 0 0 0 ©3 0 0 0 0 0   . 44 Chapter 2 Rectangular Systems and Echelon Forms Notice that the final result of applying Gaussian elimination in the above example is not a purely triangular form but rather a jagged or “stair-step” type of triangular form. Hereafter, a matrix that exhibits this stair-step structure will be said to be in row echelon form. Row Echelon Form An m× n matrix E with rows Ei∗ and columns E∗j is said to be in row echelon form provided the following two conditions hold. • If Ei∗ consists entirely of zeros, then all rows below Ei∗ are also entirely zero; i.e., all zero rows are at the bottom. • If the first nonzero entry in Ei∗ lies in the jth position, then all entries below the ith position in columns E∗1,E∗2, . . . ,E∗j are zero. These two conditions say that the nonzero entries in an echelon form must lie on or above a stair-step line that emanates from the upper- left-hand corner and slopes down and to the right. The pivots are the first nonzero entries in each row. A typical structure for a matrix in row echelon form is illustrated below with the pivots circled.   ©* ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 ©* ∗ ∗ ∗ ∗ ∗ 0 0 0 ©* ∗ ∗ ∗ ∗ 0 0 0 0 0 0 ©* ∗ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   Because of the flexibility in choosing row operations to reduce a matrix A to a row echelon form E, the entries in E are not uniquely determined by A. Nevertheless, it can be proven that the “form” of E is unique in the sense that the positions of the pivots in E (and A) are uniquely determined by the entries in A . 9 Because the pivotal positions are unique, it follows that the number of pivots, which is the same as the number of nonzero rows in E, is also uniquely determined by the entries in A . This number is called the rank 10 of A, and it 9 The fact that the pivotal positions are unique should be intuitively evident. If it isn’t, take the matrix given in (2.1.1) and try to force some different pivotal positions by a different sequence of row operations. 10 The word “rank” was introduced in 1879 by the German mathematician Ferdinand Georg Frobenius (p. 662), who thought of it as the size of the largest nonzero minor determinant in A. But the concept had been used as early as 1851 by the English mathematician James J. Sylvester (1814–1897). 2.1 Row Echelon Form and Rank 45 is extremely important in the development of our subject. Rank of a Matrix Suppose Am×n is reduced by row operations to an echelon form E. The rank of A is defined to be the number rank (A) = number of pivots = number of nonzero rows in E = number of basic columns in A, where the basic columns of A are defined to be those columns in A that contain the pivotal positions. Example 2.1.2 Problem: Determine the rank, and identify the basic columns in A =   1 2 1 1 2 4 2 2 3 6 3 4   . Solution: Reduce A to row echelon form as shown below: A =   ©1 2 1 1 2 4 2 2 3 6 3 4   −→   ©1 2 1 1 0 0 0 ©0 0 0 0 1   −→   ©1 2 1 1 0 0 0 ©1 0 0 0 0   = E. Consequently, rank (A) = 2. The pivotal positions lie in the first and fourth columns so that the basic columns of A are A∗1 and A∗4. That is, Basic Columns =     1 2 3   ,   1 2 4     . Pay particular attention to the fact that the basic columns are extracted from A and not from the row echelon form E . 48 Chapter 2 Rectangular Systems and Echelon Forms Compare the results of this example with the results of Example 2.1.1, and notice that the “form” of the final matrix is the same in both examples, which indeed must be the case because of the uniqueness of “form” mentioned in the previous section. The only difference is in the numerical value of some of the entries. By the nature of Gauss–Jordan elimination, each pivot is 1 and all entries above and below each pivot are 0. Consequently, the row echelon form produced by the Gauss–Jordan method contains a reduced number of nonzero entries, so it seems only natural to refer to this as a reduced row echelon form. 11 Reduced Row Echelon Form A matrix Em×n is said to be in reduced row echelon form provided that the following three conditions hold. • E is in row echelon form. • The first nonzero entry in each row (i.e., each pivot) is 1. • All entries above each pivot are 0. A typical structure for a matrix in reduced row echelon form is illustrated below, where entries marked * can be either zero or nonzero numbers:   ©1 ∗ 0 0 ∗ ∗ 0 ∗ 0 0 ©1 0 ∗ ∗ 0 ∗ 0 0 0 ©1 ∗ ∗ 0 ∗ 0 0 0 0 0 0 ©1 ∗ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   . As previously stated, if matrix A is transformed to a row echelon form by row operations, then the “form” is uniquely determined by A, but the in- dividual entries in the form are not unique. However, if A is transformed by row operations to a reduced row echelon form EA, then it can be shown 12 that both the “form” as well as the individual entries in EA are uniquely determined by A. In other words, the reduced row echelon form EA produced from A is independent of whatever elimination scheme is used. Producing an unreduced form is computationally more efficient, but the uniqueness of EA makes it more useful for theoretical purposes. 11 In some of the older books this is called the Hermite normal form in honor of the French mathematician Charles Hermite (1822–1901), who, around 1851, investigated reducing matrices by row operations. 12 A formal uniqueness proof must wait until Example 3.9.2, but you can make this intuitively clear right now with some experiments. Try to produce two different reduced row echelon forms from the same matrix. 2.2 Reduced Row Echelon Form 49 EA Notation For a matrix A, the symbol EA will hereafter denote the unique re- duced row echelon form derived from A by means of row operations. Example 2.2.2 Problem: Determine EA, deduce rank (A), and identify the basic columns of A =   1 2 2 3 1 2 4 4 6 2 3 6 6 9 6 1 2 4 5 3   . Solution:   ©1 2 2 3 1 2 4 4 6 2 3 6 6 9 6 1 2 4 5 3   −→   ©1 2 2 3 1 0 0 ©0 0 0 0 0 0 0 3 0 0 2 2 2   −→   ©1 2 2 3 1 0 0 ©2 2 2 0 0 0 0 3 0 0 0 0 0   −→   ©1 2 2 3 1 0 0 ©1 1 1 0 0 0 0 3 0 0 0 0 0   −→   ©1 2 0 1 −1 0 0 ©1 1 1 0 0 0 0 ©3 0 0 0 0 0   −→   ©1 2 0 1 −1 0 0 ©1 1 1 0 0 0 0 ©1 0 0 0 0 0   −→   ©1 2 0 1 0 0 0 ©1 1 0 0 0 0 0 ©1 0 0 0 0 0   Therefore, rank (A) = 3, and {A∗1,A∗3,A∗5} are the three basic columns. The above example illustrates another important feature of EA and ex- plains why the basic columns are indeed “basic.” Each nonbasic column is ex- pressible as a combination of basic columns. In Example 2.2.2, A∗2 = 2A∗1 and A∗4 = A∗1 + A∗3. (2.2.1) Notice that exactly the same set of relationships hold in EA. That is, E∗2 = 2E∗1 and E∗4 = E∗1 + E∗3. (2.2.2) This is no coincidence—it’s characteristic of what happens in general. There’s more to observe. The relationships between the nonbasic and basic columns in a 50 Chapter 2 Rectangular Systems and Echelon Forms general matrix A are usually obscure, but the relationships among the columns in EA are absolutely transparent. For example, notice that the multipliers used in the relationships (2.2.1) and (2.2.2) appear explicitly in the two nonbasic columns in EA —they are just the nonzero entries in these nonbasic columns. This is important because it means that EA can be used as a “map” or “key” to discover or unlock the hidden relationships among the columns of A . Finally, observe from Example 2.2.2 that only the basic columns to the left of a given nonbasic column are needed in order to express the nonbasic column as a combination of basic columns—e.g., representing A∗2 requires only A∗1 and not A∗3 or A∗5, while representing A∗4 requires only A∗1 and A∗3. This too is typical. For the time being, we accept the following statements to be true. A rigorous proof is given later on p. 136. Column Relationships in A and EA • Each nonbasic column E∗k in EA is a combination (a sum of mul- tiples) of the basic columns in EA to the left of E∗k. That is, E∗k = µ1E∗b1 + µ2E∗b2 + · · ·+ µjE∗bj = µ1   1 0 ... 0 ... 0   + µ2   0 1 ... 0 ... 0   + · · ·+ µj   0 0 ... 1 ... 0   =   µ1 µ2 ... µj ... 0   , where the E∗bi’s are the basic columns to the left of E∗k and where the multipliers µi are the first j entries in E∗k. • The relationships that exist among the columns of A are exactly the same as the relationships that exist among the columns of EA. In particular, if A∗k is a nonbasic column in A , then A∗k = µ1A∗b1 + µ2A∗b2 + · · ·+ µjA∗bj , (2.2.3) where the A∗bi’s are the basic columns to the left of A∗k, and where the multipliers µi are as described above—the first j entries in E∗k. 2.3 Consistency of Linear Systems 53 2.3 CONSISTENCY OF LINEAR SYSTEMS A system of m linear equations in n unknowns is said to be a consistent sys- tem if it possesses at least one solution. If there are no solutions, then the system is called inconsistent. The purpose of this section is to determine conditions under which a given system will be consistent. Stating conditions for consistency of systems involving only two or three unknowns is easy. A linear equation in two unknowns represents a line in 2-space, and a linear equation in three unknowns is a plane in 3-space. Consequently, a linear system of m equations in two unknowns is consistent if and only if the m lines defined by the m equations have at least one common point of intersection. Similarly, a system of m equations in three unknowns is consistent if and only if the associated m planes have at least one common point of intersection. However, when m is large, these geometric conditions may not be easy to verify visually, and when n > 3, the generalizations of intersecting lines or planes are impossible to visualize with the eye. Rather than depending on geometry to establish consistency, we use Gaus- sian elimination. If the associated augmented matrix [A|b] is reduced by row operations to a matrix [E|c] that is in row echelon form, then consistency—or lack of it—becomes evident. Suppose that somewhere in the process of reduc- ing [A|b] to [E|c] a situation arises in which the only nonzero entry in a row appears on the right-hand side, as illustrated below: Row i −→   ∗ ∗ ∗ ∗ ∗ ∗ ∗ 0 0 0 ∗ ∗ ∗ ∗ 0 0 0 0 ∗ ∗ ∗ 0 0 0 0 0 0 α • • • • • • • • • • • • • •   ←− α = 0. If this occurs in the ith row, then the ith equation of the associated system is 0x1 + 0x2 + · · ·+ 0xn = α. For α = 0, this equation has no solution, and hence the original system must also be inconsistent (because row operations don’t alter the solution set). The converse also holds. That is, if a system is inconsistent, then somewhere in the elimination process a row of the form ( 0 0 · · · 0 | α ) , α = 0 (2.3.1) must appear. Otherwise, the back substitution process can be completed and a solution is produced. There is no inconsistency indicated when a row of the form (0 0 · · · 0 | 0) is encountered. This simply says that 0 = 0, and although 54 Chapter 2 Rectangular Systems and Echelon Forms this is no help in determining the value of any unknown, it is nevertheless a true statement, so it doesn’t indicate inconsistency in the system. There are some other ways to characterize the consistency (or inconsistency) of a system. One of these is to observe that if the last column b in the augmented matrix [A|b] is a nonbasic column, then no pivot can exist in the last column, and hence the system is consistent because the situation (2.3.1) cannot occur. Conversely, if the system is consistent, then the situation (2.3.1) never occurs during Gaussian elimination and consequently the last column cannot be basic. In other words, [A|b] is consistent if and only if b is a nonbasic column. Saying that b is a nonbasic column in [A|b] is equivalent to saying that all basic columns in [A|b] lie in the coefficient matrix A . Since the number of basic columns in a matrix is the rank, consistency may also be characterized by stating that a system is consistent if and only if rank[A|b] = rank (A). Recall from the previous section the fact that each nonbasic column in [A|b] must be expressible in terms of the basic columns. Because a consistent system is characterized by the fact that the right-hand side b is a nonbasic column, it follows that a system is consistent if and only if the right-hand side b is a combination of columns from the coefficient matrix A. Each of the equivalent 13 ways of saying that a system is consistent is sum- marized below. Consistency Each of the following is equivalent to saying that [A|b] is consistent. • In row reducing [A|b], a row of the following form never appears: ( 0 0 · · · 0 | α ) , where α = 0. (2.3.2) • b is a nonbasic column in [A|b]. (2.3.3) • rank[A|b] = rank (A). (2.3.4) • b is a combination of the basic columns in A. (2.3.5) Example 2.3.1 Problem: Determine if the following system is consistent: x1 + x2 + 2x3 + 2x4 + x5 = 1, 2x1 + 2x2 + 4x3 + 4x4 + 3x5 = 1, 2x1 + 2x2 + 4x3 + 4x4 + 2x5 = 2, 3x1 + 5x2 + 8x3 + 6x4 + 5x5 = 3. 13 Statements P and Q are said to be equivalent when (P implies Q) as well as its converse (Q implies P) are true statements. This is also the meaning of the phrase “P if and only if Q.” 2.3 Consistency of Linear Systems 55 Solution: Apply Gaussian elimination to the augmented matrix [A|b] as shown:   ©1 1 2 2 1 1 2 2 4 4 3 1 2 2 4 4 2 2 3 5 8 6 5 3   −→   ©1 1 2 2 1 1 0 ©0 0 0 1 −1 0 0 0 0 0 0 0 2 2 0 2 0   −→   ©1 1 2 2 1 1 0 ©2 2 0 2 0 0 0 0 0 ©1 −1 0 0 0 0 0 0   . Because a row of the form ( 0 0 · · · 0 | α ) with α = 0 never emerges, the system is consistent. We might also observe that b is a nonbasic column in [A|b] so that rank[A|b] = rank (A). Finally, by completely reducing A to EA, it is possible to verify that b is indeed a combination of the basic columns {A∗1,A∗2,A∗5}. Exercises for section 2.3 2.3.1. Determine which of the following systems are consistent. (a) x + 2y + z = 2, 2x + 4y = 2, 3x + 6y + z = 4. (b) 2x + 2y + 4z = 0, 3x + 2y + 5z = 0, 4x + 2y + 6z = 0. (c) x− y + z = 1, x− y − z = 2, x + y − z = 3, x + y + z = 4. (d) x− y + z = 1, x− y − z = 2, x + y − z = 3, x + y + z = 2. (e) 2w + x + 3y + 5z = 1, 4w + 4y + 8z = 0, w + x + 2y + 3z = 0, x + y + z = 0. (f) 2w + x + 3y + 5z = 7, 4w + 4y + 8z = 8, w + x + 2y + 3z = 5, x + y + z = 3. 2.3.2. Construct a 3× 4 matrix A and 3× 1 columns b and c such that [A|b] is the augmented matrix for an inconsistent system, but [A|c] is the augmented matrix for a consistent system. 2.3.3. If A is an m× n matrix with rank (A) = m, explain why the system [A|b] must be consistent for every right-hand side b . 58 Chapter 2 Rectangular Systems and Echelon Forms Since there are four unknowns but only two equations in this reduced system, it is impossible to extract a unique solution for each unknown. The best we can do is to pick two “basic” unknowns—which will be called the basic variables and solve for these in terms of the other two unknowns—whose values must remain arbitrary or “free,” and consequently they will be referred to as the free variables. Although there are several possibilities for selecting a set of basic variables, the convention is to always solve for the unknowns corresponding to the pivotal positions—or, equivalently, the unknowns corresponding to the basic columns. In this example, the pivots (as well as the basic columns) lie in the first and third positions, so the strategy is to apply back substitution to solve the reduced system (2.4.2) for the basic variables x1 and x3 in terms of the free variables x2 and x4. The second equation in (2.4.2) yields x3 = −x4 and substitution back into the first equation produces x1 = −2x2 − 2x3 − 3x4, = −2x2 − 2(−x4)− 3x4, = −2x2 − x4. Therefore, all solutions of the original homogeneous system can be described by saying x1 = −2x2 − x4, x2 is “free,” x3 = −x4, x4 is “free.” (2.4.3) As the free variables x2 and x4 range over all possible values, the above ex- pressions describe all possible solutions. For example, when x2 and x4 assume the values x2 = 1 and x4 = −2, then the particular solution x1 = 0, x2 = 1, x3 = 2, x4 = −2 is produced. When x2 = π and x4 = √ 2, then another particular solution x1 = −2π − √ 2, x2 = π, x3 = − √ 2, x4 = √ 2 is generated. Rather than describing the solution set as illustrated in (2.4.3), future de- velopments will make it more convenient to express the solution set by writing   x1 x2 x3 x4   =   −2x2 − x4 x2 −x4 x4   = x2   −2 1 0 0   + x4   −1 0 −1 1   (2.4.4) 2.4 Homogeneous Systems 59 with the understanding that x2 and x4 are free variables that can range over all possible numbers. This representation will be called the general solution of the homogeneous system. This expression for the general solution emphasizes that every solution is some combination of the two particular solutions h1 =   −2 1 0 0   and h2 =   −1 0 −1 1   . The fact that h1 and h2 are each solutions is clear because h1 is produced when the free variables assume the values x2 = 1 and x4 = 0, whereas the solution h2 is generated when x2 = 0 and x4 = 1. Now consider a general homogeneous system [A|0] of m linear equations in n unknowns. If the coefficient matrix is such that rank (A) = r, then it should be apparent from the preceding discussion that there will be exactly r basic variables—corresponding to the positions of the basic columns in A —and exactly n − r free variables—corresponding to the positions of the nonbasic columns in A . Reducing A to a row echelon form using Gaussian elimination and then using back substitution to solve for the basic variables in terms of the free variables produces the general solution, which has the form x = xf1h1 + xf2h2 + · · ·+ xfn−rhn−r, (2.4.5) where xf1 , xf2 , . . . , xfn−r are the free variables and where h1,h2, . . . ,hn−r are n× 1 columns that represent particular solutions of the system. As the free variables xfi range over all possible values, the general solution generates all possible solutions. The general solution does not depend on which row echelon form is used in the sense that using back substitution to solve for the basic variables in terms of the nonbasic variables generates a unique set of particular solutions {h1,h2, . . . ,hn−r}, regardless of which row echelon form is used. Without going into great detail, one can argue that this is true because using back substitution in any row echelon form to solve for the basic variables must produce exactly the same result as that obtained by completely reducing A to EA and then solving the reduced homogeneous system for the basic variables. Uniqueness of EA guarantees the uniqueness of the hi ’s. For example, if the coefficient matrix A associated with the system (2.4.1) is completely reduced by the Gauss–Jordan procedure to EA A =   1 2 2 3 2 4 1 3 3 6 1 4   −→   1 2 0 1 0 0 1 1 0 0 0 0   = EA, 60 Chapter 2 Rectangular Systems and Echelon Forms then we obtain the following reduced system: x1 + 2x2 + x4 = 0, x3 + x4 = 0. Solving for the basic variables x1 and x3 in terms of x2 and x4 produces exactly the same result as given in (2.4.3) and hence generates exactly the same general solution as shown in (2.4.4). Because it avoids the back substitution process, you may find it more con- venient to use the Gauss–Jordan procedure to reduce A completely to EA and then construct the general solution directly from the entries in EA. This approach usually will be adopted in the examples and exercises. As was previously observed, all homogeneous systems are consistent because the trivial solution consisting of all zeros is always one solution. The natural question is, “When is the trivial solution the only solution?” In other words, we wish to know when a homogeneous system possesses a unique solution. The form of the general solution (2.4.5) makes the answer transparent. As long as there is at least one free variable, then it is clear from (2.4.5) that there will be an infinite number of solutions. Consequently, the trivial solution is the only solution if and only if there are no free variables. Because the number of free variables is given by n− r, where r = rank (A), the previous statement can be reformulated to say that a homogeneous system possesses a unique solution—the trivial solution—if and only if rank (A) = n. Example 2.4.1 The homogeneous system x1 + 2x2 + 2x3 = 0, 2x1 + 5x2 + 7x3 = 0, 3x1 + 6x2 + 8x3 = 0, has only the trivial solution because A =   1 2 2 2 5 7 3 6 8   −→   1 2 2 0 1 3 0 0 2   = E shows that rank (A) = n = 3. Indeed, it is also obvious from E that applying back substitution in the system [E|0] yields only the trivial solution. Example 2.4.2 Problem: Explain why the following homogeneous system has infinitely many solutions, and exhibit the general solution: x1 + 2x2 + 2x3 = 0, 2x1 + 5x2 + 7x3 = 0, 3x1 + 6x2 + 6x3 = 0. 2.4 Homogeneous Systems 63 2.4.5. Suppose that A is the coefficient matrix for a homogeneous system of four equations in six unknowns and suppose that A has at least one nonzero row. (a) Determine the fewest number of free variables that are possible. (b) Determine the maximum number of free variables that are pos- sible. 2.4.6. Explain why a homogeneous system of m equations in n unknowns where m < n must always possess an infinite number of solutions. 2.4.7. Construct a homogeneous system of three equations in four unknowns that has x2   −2 1 0 0   + x4   −3 0 2 1   as its general solution. 2.4.8. If c1 and c2 are columns that represent two particular solutions of the same homogeneous system, explain why the sum c1 + c2 must also represent a solution of this system. 64 Chapter 2 Rectangular Systems and Echelon Forms 2.5 NONHOMOGENEOUS SYSTEMS Recall that a system of m linear equations in n unknowns a11x1 + a12x2 + · · · + a1nxn = b1, a21x1 + a22x2 + · · · + a2nxn = b2, ... am1x1 + am2x2 + · · · + amnxn = bm, is said to be nonhomogeneous whenever bi = 0 for at least one i. Unlike homogeneous systems, a nonhomogeneous system may be inconsistent and the techniques of §2.3 must be applied in order to determine if solutions do indeed exist. Unless otherwise stated, it is assumed that all systems in this section are consistent. To describe the set of all possible solutions of a consistent nonhomogeneous system, construct a general solution by exactly the same method used for homo- geneous systems as follows. • Use Gaussian elimination to reduce the associated augmented matrix [A|b] to a row echelon form [E|c]. • Identify the basic variables and the free variables in the same manner de- scribed in §2.4. • Apply back substitution to [E|c] and solve for the basic variables in terms of the free variables. • Write the result in the form x = p + xf1h1 + xf2h2 + · · ·+ xfn−rhn−r, (2.5.1) where xf1 , xf2 , . . . , xfn−r are the free variables and p,h1,h2, . . . ,hn−r are n× 1 columns. This is the general solution of the nonhomogeneous system. As the free variables xfi range over all possible values, the general solu- tion (2.5.1) generates all possible solutions of the system [A|b]. Just as in the homogeneous case, the columns hi and p are independent of which row eche- lon form [E|c] is used. Therefore, [A|b] may be completely reduced to E[A|b] by using the Gauss–Jordan method thereby avoiding the need to perform back substitution. We will use this approach whenever it is convenient. The difference between the general solution of a nonhomogeneous system and the general solution of a homogeneous system is the column p that appears 2.5 Nonhomogeneous Systems 65 in (2.5.1). To understand why p appears and where it comes from, consider the nonhomogeneous system x1 + 2x2 + 2x3 + 3x4 = 4, 2x1 + 4x2 + x3 + 3x4 = 5, 3x1 + 6x2 + x3 + 4x4 = 7, (2.5.2) in which the coefficient matrix is the same as the coefficient matrix for the homogeneous system (2.4.1) used in the previous section. If [A|b] is completely reduced by the Gauss–Jordan procedure to E[A|b] [A|b] =   1 2 2 3 4 2 4 1 3 5 3 6 1 4 7   −→   1 2 0 1 2 0 0 1 1 1 0 0 0 0 0   = E[A|b], then the following reduced system is obtained: x1 + 2x2 + x4 = 2, x3 + x4 = 1. Solving for the basic variables, x1 and x3, in terms of the free variables, x2 and x4, produces x1 = 2− 2x2 − x4, x2 is “free,” x3 = 1− x4, x4 is “free.” The general solution is obtained by writing these statements in the form   x1 x2 x3 x4   =   2− 2x2 − x4 x2 1− x4 x4   =   2 0 1 0   + x2   −2 1 0 0   + x4   −1 0 −1 1   . (2.5.3) As the free variables x2 and x4 range over all possible numbers, this generates all possible solutions of the nonhomogeneous system (2.5.2). Notice that the column   2 0 1 0   in (2.5.3) is a particular solution of the nonhomogeneous system (2.5.2)—it is the solution produced when the free variables assume the values x2 = 0 and x4 = 0. 68 Chapter 2 Rectangular Systems and Echelon Forms Observe that the system is indeed consistent because the last column is nonbasic. Solve the reduced system for the basic variables x1, x2, and x5 in terms of the free variables x3 and x4 to obtain x1 = 1− x3 − 2x4, x2 = 1− x3, x3 is “free,” x4 is “free,” x5 = −1. The general solution to the nonhomogeneous system is x =   x1 x2 x3 x4 x5   =   1− x3 − 2x4 1− x3 x3 x4 −1   =   1 1 0 0 −1   + x3   −1 −1 1 0 0   + x4   −2 0 0 1 0   . The general solution of the associated homogeneous system is x =   x1 x2 x3 x4 x5   =   −x3 − 2x4 −x3 x3 x4 0   = x3   −1 −1 1 0 0   + x4   −2 0 0 1 0   . You should verify for yourself that p =   1 1 0 0 −1   is indeed a particular solution to the nonhomogeneous system and that h3 =   −1 −1 1 0 0   and h4 =   −2 0 0 1 0   are particular solutions to the associated homogeneous system. 2.5 Nonhomogeneous Systems 69 Now turn to the question, “When does a consistent system have a unique solution?” It is known from (2.5.7) that the general solution of a consistent m× n nonhomogeneous system [A|b] with rank (A) = r is given by x = p + xf1h1 + xf2h2 + · · ·+ xfn−rhn−r, where xf1h1 + xf2h2 + · · ·+ xfn−rhn−r is the general solution of the associated homogeneous system. Consequently, it is evident that the nonhomogeneous system [A|b] will have a unique solution (namely, p ) if and only if there are no free variables—i.e., if and only if r = n (= number of unknowns)—this is equivalent to saying that the associated ho- mogeneous system [A|0] has only the trivial solution. Example 2.5.2 Consider the following nonhomogeneous system: 2x1 + 4x2 + 6x3 = 2, x1 + 2x2 + 3x3 = 1, x1 + x3 = −3, 2x1 + 4x2 = 8. Reducing [A|b] to E[A|b] yields [A|b] =   2 4 6 2 1 2 3 1 1 0 1 −3 2 4 0 8   −→   1 0 0 −2 0 1 0 3 0 0 1 −1 0 0 0 0   = E[A|b]. The system is consistent because the last column is nonbasic. There are several ways to see that the system has a unique solution. Notice that rank (A) = 3 = number of unknowns, which is the same as observing that there are no free variables. Furthermore, the associated homogeneous system clearly has only the trivial solution. Finally, because we completely reduced [A|b] to E[A|b], it is obvious that there is only one solution possible and that it is given by p =   −2 3 −1   . 70 Chapter 2 Rectangular Systems and Echelon Forms Summary Let [A|b] be the augmented matrix for a consistent m× n nonhomo- geneous system in which rank (A) = r. • Reducing [A|b] to a row echelon form using Gaussian elimination and then solving for the basic variables in terms of the free variables leads to the general solution x = p + xf1h1 + xf2h2 + · · ·+ xfn−rhn−r. As the free variables xfi range over all possible values, this general solution generates all possible solutions of the system. • Column p is a particular solution of the nonhomogeneous system. • The expression xf1h1 + xf2h2 + · · ·+ xfn−rhn−r is the general so- lution of the associated homogeneous system. • Column p as well as the columns hi are independent of the row echelon form to which [A|b] is reduced. • The system possesses a unique solution if and only if any of the following is true.  rank (A) = n = number of unknowns.  There are no free variables.  The associated homogeneous system possesses only the trivial solution. Exercises for section 2.5 2.5.1. Determine the general solution for each of the following nonhomogeneous systems. (a) x1 + 2x2 + x3 + 2x4 = 3, 2x1 + 4x2 + x3 + 3x4 = 4, 3x1 + 6x2 + x3 + 4x4 = 5. (b) 2x + y + z = 4, 4x + 2y + z = 6, 6x + 3y + z = 8, 8x + 4y + z = 10. (c) x1 + x2 + 2x3 = 1, 3x1 + 3x3 + 3x4 = 6, 2x1 + x2 + 3x3 + x4 = 3, x1 + 2x2 + 3x3 − x4 = 0. (d) 2x + y + z = 2, 4x + 2y + z = 5, 6x + 3y + z = 8, 8x + 5y + z = 8. 2.6 Electrical Circuits 73 2.6 ELECTRICAL CIRCUITS The theory of electrical circuits is an important application that naturally gives rise to rectangular systems of linear equations. Because the underlying mathe- matics depends on several of the concepts discussed in the preceding sections, you may find it interesting and worthwhile to make a small excursion into the elementary mathematical analysis of electrical circuits. However, the continuity of the text is not compromised by omitting this section. In a direct current circuit containing resistances and sources of electromo- tive force (abbreviated EMF) such as batteries, a point at which three or more conductors are joined is called a node or branch point of the circuit, and a closed conduction path is called a loop. Any part of a circuit between two ad- joining nodes is called a branch of the circuit. The circuit shown in Figure 2.6.1 is a typical example that contains four nodes, seven loops, and six branches. 1 2 3 4 R1 R6 R5 R4 R3 R2 E4 E3 E2E1 I2 I4 I1 I5 I6I3 A B C Figure 2.6.1 The problem is to relate the currents Ik in each branch to the resistances Rk and the EMFs Ek. 15 This is accomplished by using Ohm’s law in conjunction with Kirchhoff’s rules to produce a system of linear equations. Ohm’s Law Ohm’s law states that for a current of I amps, the voltage drop (in volts) across a resistance of R ohms is given by V = IR. Kirchhoff’s rules—formally stated below—are the two fundamental laws that govern the study of electrical circuits. 15 For an EMF source of magnitude E and a current I, there is always a small internal resistance in the source, and the voltage drop across it is V = E−I×(internal resistance). But internal source resistance is usually negligible, so the voltage drop across the source can be taken as V = E. When internal resistance cannot be ignored, its effects may be incorporated into existing external resistances, or it can be treated as a separate external resistance. 74 Chapter 2 Rectangular Systems and Echelon Forms Kirchhoff’s Rules NODE RULE: The algebraic sum of currents toward each node is zero. That is, the total incoming current must equal the total outgoing current. This is simply a statement of conser- vation of charge. LOOP RULE: The algebraic sum of the EMFs around each loop must equal the algebraic sum of the IR products in the same loop. That is, assuming internal source resistances have been accounted for, the algebraic sum of the voltage drops over the sources equals the algebraic sum of the voltage drops over the resistances in each loop. This is a statement of conservation of energy. Kirchhoff’s rules may be used without knowing the directions of the currents and EMFs in advance. You may arbitrarily assign directions. If negative values emerge in the final solution, then the actual direction is opposite to that assumed. To apply the node rule, consider a current to be positive if its direction is toward the node—otherwise, consider the current to be negative. It should be clear that the node rule will always generate a homogeneous system. For example, applying the node rule to the circuit in Figure 2.6.1 yields four homogeneous equations in six unknowns—the unknowns are the Ik ’s: Node 1: I1 − I2 − I5 = 0, Node 2: − I1 − I3 + I4 = 0, Node 3: I3 + I5 + I6 = 0, Node 4: I2 − I4 − I6 = 0. To apply the loop rule, some direction (clockwise or counterclockwise) must be chosen as the positive direction, and all EMFs and currents in that direction are considered positive and those in the opposite direction are negative. It is possible for a current to be considered positive for the node rule but considered negative when it is used in the loop rule. If the positive direction is considered to be clockwise in each case, then applying the loop rule to the three indicated loops A, B, and C in the circuit shown in Figure 2.6.1 produces the three non- homogeneous equations in six unknowns—the Ik ’s are treated as the unknowns, while the Rk ’s and Ek ’s are assumed to be known. Loop A: I1R1 − I3R3 + I5R5 = E1 − E3, Loop B: I2R2 − I5R5 + I6R6 = E2, Loop C: I3R3 + I4R4 − I6R6 = E3 + E4. 2.6 Electrical Circuits 75 There are 4 additional loops that also produce loop equations thereby mak- ing a total of 11 equations (4 nodal equations and 7 loop equations) in 6 un- knowns. Although this appears to be a rather general 11× 6 system of equations, it really is not. If the circuit is in a state of equilibrium, then the physics of the situation dictates that for each set of EMFs Ek, the corresponding currents Ik must be uniquely determined. In other words, physics guarantees that the 11× 6 system produced by applying the two Kirchhoff rules must be consistent and possess a unique solution. Suppose that [A|b] represents the augmented matrix for the 11× 6 system generated by Kirchhoff’s rules. From the results in §2.5, we know that the system has a unique solution if and only if rank (A) = number of unknowns = 6. Furthermore, it was demonstrated in §2.3 that the system is consistent if and only if rank[A|b] = rank (A). Combining these two facts allows us to conclude that rank[A|b] = 6 so that when [A|b] is reduced to E[A|b], there will be exactly 6 nonzero rows and 5 zero rows. Therefore, 5 of the original 11 equations are redundant in the sense that they can be “zeroed out” by forming combinations of some particular set of 6 “independent” equations. It is desirable to know beforehand which of the 11 equations will be redundant and which can act as the “independent” set. Notice that in using the node rule, the equation corresponding to node 4 is simply the negative sum of the equations for nodes 1, 2, and 3, and that the first three equations are independent in the sense that no one of the three can be written as a combination of any other two. This situation is typical. For a general circuit with n nodes, it can be demonstrated that the equations for the first n − 1 nodes are independent, and the equation for the last node is redundant. The loop rule also can generate redundant equations. Only simple loops— loops not containing smaller loops—give rise to independent equations. For ex- ample, consider the loop consisting of the three exterior branches in the circuit shown in Figure 2.6.1. Applying the loop rule to this large loop will produce no new information because the large loop can be constructed by “adding” the three simple loops A, B, and C contained within. The equation associated with the large outside loop is I1R1 + I2R2 + I4R4 = E1 + E2 + E4, which is precisely the sum of the equations that correspond to the three compo- nent loops A, B, and C. This phenomenon will hold in general so that only the simple loops need to be considered when using the loop rule. CHAPTER 3 Matrix Algebra 3.1 FROM ANCIENT CHINA TO ARTHUR CAYLEY The ancient Chinese appreciated the advantages of array manipulation in dealing with systems of linear equations, and they possessed the seed that might have germinated into a genuine theory of matrices. Unfortunately, in the year 213 B.C., emperor Shih Hoang-ti ordered that “all books be burned and all scholars be buried.” It is presumed that the emperor wanted all knowledge and written records to begin with him and his regime. The edict was carried out, and it will never be known how much knowledge was lost. The book Chiu-chang Suan-shu (Nine Chapters on Arithmetic), mentioned in the introduction to Chapter 1, was compiled on the basis of remnants that survived. More than a millennium passed before further progress was documented. The Chinese counting board with its colored rods and its applications involving array manipulation to solve linear systems eventually found its way to Japan. Seki Kowa (1642–1708), whom many Japanese consider to be one of the greatest mathematicians that their country has produced, carried forward the Chinese principles involving “rule of thumb” elimination methods on arrays of numbers. His understanding of the elementary operations used in the Chinese elimination process led him to formulate the concept of what we now call the determinant. While formulating his ideas concerning the solution of linear systems, Seki Kowa anticipated the fundamental concepts of array operations that today form the basis for matrix algebra. However, there is no evidence that he developed his array operations to actually construct an algebra for matrices. From the middle 1600s to the middle 1800s, while Europe was flowering in mathematical development, the study of array manipulation was exclusively 80 Chapter 3 Matrix Algebra dedicated to the theory of determinants. Curiously, matrix algebra did not evolve along with the study of determinants. It was not until the work of the British mathematician Arthur Cayley (1821– 1895) that the matrix was singled out as a separate entity, distinct from the notion of a determinant, and algebraic operations between matrices were defined. In an 1855 paper, Cayley first introduced his basic ideas that were presented mainly to simplify notation. Finally, in 1857, Cayley expanded on his original ideas and wrote A Memoir on the Theory of Matrices. This laid the foundations for the modern theory and is generally credited for being the birth of the subjects of matrix analysis and linear algebra. Arthur Cayley began his career by studying literature at Trinity College, Cambridge (1838–1842), but developed a side interest in mathematics, which he studied in his spare time. This “hobby” resulted in his first mathematical paper in 1841 when he was only 20 years old. To make a living, he entered the legal profession and practiced law for 14 years. However, his main interest was still mathematics. During the legal years alone, Cayley published almost 300 papers in mathematics. In 1850 Cayley crossed paths with James J. Sylvester, and between the two of them matrix theory was born and nurtured. The two have been referred to as the “invariant twins.” Although Cayley and Sylvester shared many mathe- matical interests, they were quite different people, especially in their approach to mathematics. Cayley had an insatiable hunger for the subject, and he read everything that he could lay his hands on. Sylvester, on the other hand, could not stand the sight of papers written by others. Cayley never forgot anything he had read or seen—he became a living encyclopedia. Sylvester, so it is said, would frequently fail to remember even his own theorems. In 1863, Cayley was given a chair in mathematics at Cambridge University, and thereafter his mathematical output was enormous. Only Cauchy and Euler were as prolific. Cayley often said, “I really love my subject,” and all indica- tions substantiate that this was indeed the way he felt. He remained a working mathematician until his death at age 74. Because the idea of the determinant preceded concepts of matrix algebra by at least two centuries, Morris Kline says in his book Mathematical Thought from Ancient to Modern Times that “the subject of matrix theory was well developed before it was created.” This must have indeed been the case because immediately after the publication of Cayley’s memoir, the subjects of matrix theory and linear algebra virtually exploded and quickly evolved into a discipline that now occupies a central position in applied mathematics. 3.2 Addition and Transposition 81 3.2 ADDITION AND TRANSPOSITION In the previous chapters, matrix language and notation were used simply to for- mulate some of the elementary concepts surrounding linear systems. The purpose now is to turn this language into a mathematical theory. 16 Unless otherwise stated, a scalar is a complex number. Real numbers are a subset of the complex numbers, and hence real numbers are also scalar quan- tities. In the early stages, there is little harm in thinking only in terms of real scalars. Later on, however, the necessity for dealing with complex numbers will be unavoidable. Throughout the text,  will denote the set of real numbers, and C will denote the complex numbers. The set of all n -tuples of real numbers will be denoted by n, and the set of all complex n -tuples will be denoted by Cn. For example, 2 is the set of all ordered pairs of real numbers (i.e., the standard cartesian plane), and 3 is ordinary 3-space. Analogously, m×n and Cm×n denote the m× n matrices containing real numbers and complex numbers, respectively. Matrices A = [aij ] and B = [bij ] are defined to be equal matrices when A and B have the same shape and corresponding entries are equal. That is, aij = bij for each i = 1, 2, . . . ,m and j = 1, 2, . . . , n. In particular, this definition applies to arrays such as u =   12 3   and v = ( 1 2 3 ) . Even though u and v describe exactly the same point in 3-space, we cannot consider them to be equal matrices because they have different shapes. An array (or matrix) consisting of a single column, such as u, is called a column vector, while an array consisting of a single row, such as v, is called a row vector. Addition of Matrices If A and B are m× n matrices, the sum of A and B is defined to be the m× n matrix A+B obtained by adding corresponding entries. That is, [A + B]ij = [A]ij + [B]ij for each i and j. For example,( −2 x 3 z + 3 4 −y ) + ( 2 1− x −2 −3 4 + x 4 + y ) = ( 0 1 1 z 8 + x 4 ) . 16 The great French mathematician Pierre-Simon Laplace (1749–1827) said that, “Such is the ad- vantage of a well-constructed language that its simplified notation often becomes the source of profound theories.” The theory of matrices is a testament to the validity of Laplace’s statement. 84 Chapter 3 Matrix Algebra Conjugate Transpose For A = [aij ], the conjugate matrix is defined to be A = [aij ] , and the conjugate transpose of A is defined to be ĀT = AT . From now on, ĀT will be denoted by A∗, so [A∗]ij = aji. For example, ( 1− 4i i 2 3 2 + i 0 )∗ =   1 + 4i 3−i 2− i 2 0   . (A∗)∗ = A for all matrices, and A∗ = AT whenever A contains only real entries. Sometimes the matrix A∗ is called the adjoint of A. The transpose (and conjugate transpose) operation is easily combined with matrix addition and scalar multiplication. The basic rules are given below. Properties of the Transpose If A and B are two matrices of the same shape, and if α is a scalar, then each of the following statements is true. (A + B)T = AT + BT and (A + B)∗ = A∗ + B∗. (3.2.1) (αA)T = αAT and (αA)∗ = αA∗. (3.2.2) Proof.17 We will prove that (3.2.1) and (3.2.2) hold for the transpose operation. The proofs of the statements involving conjugate transposes are similar and are left as exercises. For each i and j, it is true that [(A + B)T ]ij = [A + B]ji = [A]ji + [B]ji = [A T ]ij + [BT ]ij = [AT + BT ]ij . 17 Computers can outperform people in many respects in that they do arithmetic much faster and more accurately than we can, and they are now rather adept at symbolic computation and mechanical manipulation of formulas. But computers can’t do mathematics—people still hold the monopoly. Mathematics emanates from the uniquely human capacity to reason abstractly in a creative and logical manner, and learning mathematics goes hand-in-hand with learning how to reason abstractly and create logical arguments. This is true regardless of whether your orientation is applied or theoretical. For this reason, formal proofs will appear more frequently as the text evolves, and it is expected that your level of comprehension as well as your ability to create proofs will grow as you proceed. 3.2 Addition and Transposition 85 This proves that corresponding entries in (A + B)T and AT + BT are equal, so it must be the case that (A + B)T = AT +BT . Similarly, for each i and j, [(αA)T ]ij = [αA]ji = α[A]ji = α[AT ]ij =⇒ (αA)T = αAT . Sometimes transposition doesn’t change anything. For example, if A =   1 2 32 4 5 3 5 6   , then AT = A. This is because the entries in A are symmetrically located about the main di- agonal—the line from the upper-left-hand corner to the lower-right-hand corner. Matrices of the form D =   λ1 0 · · · 00 λ2 · · · 0. . . . . . . . . . . . 0 0 · · · λn   are called diagonal matrices, and they are clearly symmetric in the sense that D = DT . This is one of several kinds of symmetries described below. Symmetries Let A = [aij ] be a square matrix. • A is said to be a symmetric matrix whenever A = AT , i.e., whenever aij = aji. • A is said to be a skew-symmetric matrix whenever A = −AT , i.e., whenever aij = −aji. • A is said to be a hermitian matrix whenever A = A∗, i.e., whenever aij = aji. This is the complex analog of symmetry. • A is said to be a skew-hermitian matrix when A = −A∗, i.e., whenever aij = −aji. This is the complex analog of skew symmetry. For example, consider A =   1 2 + 4i 1− 3i2− 4i 3 8 + 6i 1 + 3i 8− 6i 5   and B =   1 2 + 4i 1− 3i2 + 4i 3 8 + 6i 1− 3i 8 + 6i 5   . Can you see that A is hermitian but not symmetric, while B is symmetric but not hermitian? Nature abounds with symmetry, and very often physical symmetry manifests itself as a symmetric matrix in a mathematical model. The following example is an illustration of this principle. 86 Chapter 3 Matrix Algebra Example 3.2.1 Consider two springs that are connected as shown in Figure 3.2.1. x1 x2 x3 k1 k2Node 1 Node 2 Node 3 F1 -F1 F3-F3 Figure 3.2.1 The springs at the top represent the “no tension” position in which no force is being exerted on any of the nodes. Suppose that the springs are stretched or compressed so that the nodes are displaced as indicated in the lower portion of Figure 3.2.1. Stretching or compressing the springs creates a force on each node according to Hooke’s law 18 that says that the force exerted by a spring is F = kx, where x is the distance the spring is stretched or compressed and where k is a stiffness constant inherent to the spring. Suppose our springs have stiffness constants k1 and k2, and let Fi be the force on node i when the springs are stretched or compressed. Let’s agree that a displacement to the left is positive, while a displacement to the right is negative, and consider a force directed to the right to be positive while one directed to the left is negative. If node 1 is displaced x1 units, and if node 2 is displaced x2 units, then the left-hand spring is stretched (or compressed) by a total amount of x1−x2 units, so the force on node 1 is F1 = k1(x1 − x2). Similarly, if node 2 is displaced x2 units, and if node 3 is displaced x3 units, then the right-hand spring is stretched by a total amount of x2 − x3 units, so the force on node 3 is F3 = −k2(x2 − x3). The minus sign indicates the force is directed to the left. The force on the left- hand side of node 2 is the opposite of the force on node 1, while the force on the right-hand side of node 2 must be the opposite of the force on node 3. That is, F2 = −F1 − F3. 18 Hooke’s law is named for Robert Hooke (1635–1703), an English physicist, but it was generally known to several people (including Newton) before Hooke’s 1678 claim to it was made. Hooke was a creative person who is credited with several inventions, including the wheel barometer, but he was reputed to be a man of “terrible character.” This characteristic virtually destroyed his scientific career as well as his personal life. It is said that he lacked mathematical sophis- tication and that he left much of his work in incomplete form, but he bitterly resented people who built on his ideas by expressing them in terms of elegant mathematical formulations. 3.3 Linearity 89 3.3 LINEARITY The concept of linearity is the underlying theme of our subject. In elementary mathematics the term “linear function” refers to straight lines, but in higher mathematics linearity means something much more general. Recall that a func- tion f is simply a rule for associating points in one set D—called the domain of f —to points in another set R—the range of f. A linear function is a particular type of function that is characterized by the following two properties. Linear Functions Suppose that D and R are sets that possess an addition operation as well as a scalar multiplication operation—i.e., a multiplication between scalars and set members. A function f that maps points in D to points in R is said to be a linear function whenever f satisfies the conditions that f(x+ y) = f(x) + f(y) (3.3.1) and f(αx) = αf(x) (3.3.2) for every x and y in D and for all scalars α. These two conditions may be combined by saying that f is a linear function whenever f(αx+ y) = αf(x) + f(y) (3.3.3) for all scalars α and for all x, y ∈ D. One of the simplest linear functions is f(x) = αx, whose graph in 2 is a straight line through the origin. You should convince yourself that f is indeed a linear function according to the above definition. However, f(x) = αx + β does not qualify for the title “linear function”—it is a linear function that has been translated by a constant β. Translations of linear functions are referred to as affine functions. Virtually all information concerning affine functions can be derived from an understanding of linear functions, and consequently we will focus only on issues of linearity. In 3, the surface described by a function of the form f(x1, x2) = α1x1 + α2x2 is a plane through the origin, and it is easy to verify that f is a linear function. For β = 0, the graph of f(x1, x2) = α1x1 + α2x2 + β is a plane not passing through the origin, and f is no longer a linear function—it is an affine function. 90 Chapter 3 Matrix Algebra In 2 and 3, the graphs of linear functions are lines and planes through the origin, and there seems to be a pattern forming. Although we cannot visualize higher dimensions with our eyes, it seems reasonable to suggest that a general linear function of the form f(x1, x2, . . . , xn) = α1x1 + α2x2 + · · ·+ αnxn somehow represents a “linear” or “flat” surface passing through the origin 0 = (0, 0, . . . , 0) in n+1. One of the goals of the next chapter is to learn how to better interpret and understand this statement. Linearity is encountered at every turn. For example, the familiar operations of differentiation and integration may be viewed as linear functions. Since d(f + g) dx = df dx + dg dx and d(αf) dx = α df dx , the differentiation operator Dx(f) = df/dx is linear. Similarly,∫ (f + g)dx = ∫ fdx+ ∫ gdx and ∫ αfdx = α ∫ fdx means that the integration operator I(f) = ∫ fdx is linear. There are several important matrix functions that are linear. For example, the transposition function f(Xm×n) = XT is linear because (A + B)T = AT + BT and (αA)T = αAT (recall (3.2.1) and (3.2.2)). Another matrix function that is linear is the trace function presented below. Example 3.3.1 The trace of an n× n matrix A = [aij ] is defined to be the sum of the entries lying on the main diagonal of A. That is, trace (A) = a11 + a22 + · · ·+ ann = n∑ i=1 aii. Problem: Show that f(Xn×n) = trace (X) is a linear function. Solution: Let’s be efficient by showing that (3.3.3) holds. Let A = [aij ] and B = [bij ], and write f(αA + B) = trace (αA + B) = n∑ i=1 [αA + B]ii = n∑ i=1 (αaii + bii) = n∑ i=1 αaii + n∑ i=1 bii = α n∑ i=1 aii + n∑ i=1 bii = α trace (A) + trace (B) = αf(A) + f(B). 3.3 Linearity 91 Example 3.3.2 Consider a linear system a11x1 + a12x2 + · · · + a1nxn = u1, a21x1 + a22x2 + · · · + a2nxn = u2, ... am1x1 + am2x2 + · · · + amnxn = um, to be a function u = f(x) that maps x =   x1 x2 ... xn   ∈ n to u =   u1 u2 ... um   ∈ m. Problem: Show that u = f(x) is linear. Solution: Let A = [aij ] be the matrix of coefficients, and write f(αx + y) = f   αx1 + y1 αx2 + y2 ... αxn + yn   = n∑ j=1 (αxj + yj)A∗j = n∑ j=1 (αxjA∗j + yjA∗j) = n∑ j=1 αxjA∗j + n∑ j=1 yjA∗j = α n∑ j=1 xjA∗j + n∑ j=1 yjA∗j = αf(x) + f(y). According to (3.3.3), the function f is linear. The following terminology will be used from now on. Linear Combinations For scalars αj and matrices Xj , the expression α1X1 + α2X2 + · · ·+ αnXn = n∑ j=1 αjXj is called a linear combination of the Xj ’s.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved