Baixe Inferencia Estatistica: Manual de Soluções e outras Exercícios em PDF para Economia, somente na Docsity! Solutions Manual for Statistical Inference, Second Edition George Casella University of Florida Roger L. Berger North Carolina State University Damaris Santana University of Florida 0-2 Solutions Manual for Statistical Inference “When I hear you give your reasons,” I remarked, “the thing always appears to me to be so ridiculously simple that I could easily do it myself, though at each successive instance of your reasoning I am baffled until you explain your process.” Dr. Watson to Sherlock Holmes A Scandal in Bohemia 0.1 Description This solutions manual contains solutions for all odd numbered problems plus a large number of solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition, this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions were included in this manual. We assembled all of the solutions that we had from the first edition, and filled in so that all odd-numbered problems were done. In the passage from the first to the second edition, problems were shuffled with no attention paid to numbering (hence no attention paid to minimize the new effort), but rather we tried to put the problems in logical order. A major change from the first edition is the use of the computer, both symbolically through Mathematicatm and numerically using R. Some solutions are given as code in either of these lan- guages. Mathematicatm can be purchased from Wolfram Research, and R is a free download from http://www.r-project.org/. Here is a detailed listing of the solutions included. Chapter Number of Exercises Number of Solutions Missing 1 55 51 26, 30, 36, 42 2 40 37 34, 38, 40 3 50 42 4, 6, 10, 20, 30, 32, 34, 36 4 65 52 8, 14, 22, 28, 36, 40 48, 50, 52, 56, 58, 60, 62 5 69 46 2, 4, 12, 14, 26, 28 all even problems from 36− 68 6 43 35 8, 16, 26, 28, 34, 36, 38, 42 7 66 52 4, 14, 16, 28, 30, 32, 34, 36, 42, 54, 58, 60, 62, 64 8 58 51 36, 40, 46, 48, 52, 56, 58 9 58 41 2, 8, 10, 20, 22, 24, 26, 28, 30 32, 38, 40, 42, 44, 50, 54, 56 10 48 26 all even problems except 4 and 32 11 41 35 4, 20, 22, 24, 26, 40 12 31 16 all even problems 0.2 Acknowledgement Many people contributed to the assembly of this solutions manual. We again thank all of those who contributed solutions to the first edition – many problems have carried over into the second edition. Moreover, throughout the years a number of people have been in constant touch with us, contributing to both the presentations and solutions. We apologize in advance for those we forget to mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman, and Tom Wehrly. Thank you all for your help. And, as we said the first time around, although we have benefited greatly from the assistance and 1-2 Solutions Manual for Statistical Inference b. “A or B but not both” is (A ∩Bc) ∪ (B ∩Ac). Thus we have P ((A ∩Bc) ∪ (B ∩Ac)) = P (A ∩Bc) + P (B ∩Ac) (disjoint union) = [P (A)− P (A ∩B)] + [P (B)− P (A ∩B)] (Theorem1.2.9a) = P (A) + P (B)− 2P (A ∩B). c. “At least one of A or B” is A ∪B. So we get the same answer as in a). d. “At most one of A or B” is (A ∩B)c, and P ((A ∩B)c) = 1− P (A ∩B). 1.5 a. A ∩B ∩ C = {a U.S. birth results in identical twins that are female} b. P (A ∩B ∩ C) = 190 × 1 3 × 1 2 1.6 p0 = (1− u)(1− w), p1 = u(1− w) + w(1− u), p2 = uw, p0 = p2 ⇒ u + w = 1 p1 = p2 ⇒ uw = 1/3. These two equations imply u(1 − u) = 1/3, which has no solution in the real numbers. Thus, the probability assignment is not legitimate. 1.7 a. P (scoring i points) = { 1− πr 2 A if i = 0 πr2 A [ (6−i)2−(5−i)2 52 ] if i = 1, . . . , 5. b. P (scoring i points|board is hit) = P (scoring i points ∩ board is hit) P (board is hit) P (board is hit) = πr2 A P (scoring i points ∩ board is hit) = πr 2 A [ (6− i)2 − (5− i)2 52 ] i = 1, . . . , 5. Therefore, P (scoring i points|board is hit) = (6− i) 2 − (5− i)2 52 i = 1, . . . , 5 which is exactly the probability distribution of Example 1.2.7. 1.8 a. P (scoring exactly i points) = P (inside circle i) − P (inside circle i + 1). Circle i has radius (6− i)r/5, so P (sscoring exactly i points) = π(6− i)2r2 52πr2 − π ((6−(i + 1))) 2r2 52πr2 = (6− i)2−(5− i)2 52 . b. Expanding the squares in part a) we find P (scoring exactly i points) = 11−2i25 , which is decreasing in i. c. Let P (i) = 11−2i25 . Since i ≤ 5, P (i) ≥ 0 for all i. P (S) = P (hitting the dartboard) = 1 by definition. Lastly, P (i ∪ j) = area of i ring + area of j ring = P (i) + P (j). 1.9 a. Suppose x ∈ (∪αAα)c, by the definition of complement x 6∈ ∪αAα, that is x 6∈ Aα for all α ∈ Γ. Therefore x ∈ Acα for all α ∈ Γ. Thus x ∈ ∩αAcα and, by the definition of intersection x ∈ Acα for all α ∈ Γ. By the definition of complement x 6∈ Aα for all α ∈ Γ. Therefore x 6∈ ∪αAα. Thus x ∈ (∪αAα)c. Second Edition 1-3 b. Suppose x ∈ (∩αAα)c, by the definition of complement x 6∈ (∩αAα). Therefore x 6∈ Aα for some α ∈ Γ. Therefore x ∈ Acα for some α ∈ Γ. Thus x ∈ ∪αAcα and, by the definition of union, x ∈ Acα for some α ∈ Γ. Therefore x 6∈ Aα for some α ∈ Γ. Therefore x 6∈ ∩αAα. Thus x ∈ (∩αAα)c. 1.10 For A1, . . . , An (i) ( n⋃ i=1 Ai )c = n⋂ i=1 Aci (ii) ( n⋂ i=1 Ai )c = n⋃ i=1 Aci Proof of (i): If x ∈ (∪Ai)c, then x /∈ ∪Ai. That implies x /∈ Ai for any i, so x ∈ Aci for every i and x ∈ ∩Ai. Proof of (ii): If x ∈ (∩Ai)c, then x /∈ ∩Ai. That implies x ∈ Aci for some i, so x ∈ ∪Aci . 1.11 We must verify each of the three properties in Definition 1.2.1. a. (1) The empty set ∅ ∈ {∅, S}. Thus ∅ ∈ B. (2) ∅c = S ∈ B and Sc = ∅ ∈ B. (3) ∅∪S = S ∈ B. b. (1) The empty set ∅ is a subset of any set, in particular, ∅ ⊂ S. Thus ∅ ∈ B. (2) If A ∈ B, then A ⊂ S. By the definition of complementation, Ac is also a subset of S, and, hence, Ac ∈ B. (3) If A1, A2, . . . ∈ B, then, for each i, Ai ⊂ S. By the definition of union, ∪Ai ⊂ S. Hence, ∪Ai ∈ B. c. Let B1 and B2 be the two sigma algebras. (1) ∅ ∈ B1 and ∅ ∈ B2 since B1 and B2 are sigma algebras. Thus ∅ ∈ B1 ∩ B2. (2) If A ∈ B1 ∩ B2, then A ∈ B1 and A ∈ B2. Since B1 and B2 are both sigma algebra Ac ∈ B1 and Ac ∈ B2. Therefore Ac ∈ B1 ∩ B2. (3) If A1, A2, . . . ∈ B1 ∩B2, then A1, A2, . . . ∈ B1 and A1, A2, . . . ∈ B2. Therefore, since B1 and B2 are both sigma algebra, ∪∞i=1Ai ∈ B1 and ∪∞i=1Ai ∈ B2. Thus ∪∞i=1Ai ∈ B1 ∩ B2. 1.12 First write P ( ∞⋃ i=1 Ai ) = P ( n⋃ i=1 Ai ∪ ∞⋃ i=n+1 Ai ) = P ( n⋃ i=1 Ai ) + P ( ∞⋃ i=n+1 Ai ) (Ais are disjoint) = n∑ i=1 P (Ai) + P ( ∞⋃ i=n+1 Ai ) (finite additivity) Now define Bk = ⋃∞ i=k Ai. Note that Bk+1 ⊂ Bk and Bk → φ as k →∞. (Otherwise the sum of the probabilities would be infinite.) Thus P ( ∞⋃ i=1 Ai ) = lim n→∞ P ( ∞⋃ i=1 Ai ) = lim n→∞ [ n∑ i=1 P (Ai) + P (Bn+1) ] = ∞∑ i=1 P (Ai). 1.13 If A and B are disjoint, P (A ∪ B) = P (A) + P (B) = 13 + 3 4 = 13 12 , which is impossible. More generally, if A and B are disjoint, then A ⊂ Bc and P (A) ≤ P (Bc). But here P (A) > P (Bc), so A and B cannot be disjoint. 1.14 If S = {s1, . . . , sn}, then any subset of S can be constructed by either including or excluding si, for each i. Thus there are 2n possible choices. 1.15 Proof by induction. The proof for k = 2 is given after Theorem 1.2.14. Assume true for k, that is, the entire job can be done in n1 × n2 × · · · × nk ways. For k + 1, the k + 1th task can be done in nk+1 ways, and for each one of these ways we can complete the job by performing 1-4 Solutions Manual for Statistical Inference the remaining k tasks. Thus for each of the nk+1 we have n1 × n2 × · · · × nk ways of com- pleting the job by the induction hypothesis. Thus, the number of ways we can do the job is (1× (n1 × n2 × · · · × nk)) + · · ·+ (1× (n1 × n2 × · · · × nk))︸︷︷︸ nk+1terms = n1 × n2 × · · · × nk × nk+1. 1.16 a) 263. b) 263 + 262. c) 264 + 263 + 262. 1.17 There are ( n 2 ) = n(n− 1)/2 pieces on which the two numbers do not match. (Choose 2 out of n numbers without replacement.) There are n pieces on which the two numbers match. So the total number of different pieces is n + n(n− 1)/2 = n(n + 1)/2. 1.18 The probability is ( n 2)n! nn = (n−1)(n−1)! 2nn−2 . There are many ways to obtain this. Here is one. The denominator is nn because this is the number of ways to place n balls in n cells. The numerator is the number of ways of placing the balls such that exactly one cell is empty. There are n ways to specify the empty cell. There are n− 1 ways of choosing the cell with two balls. There are( n 2 ) ways of picking the 2 balls to go into this cell. And there are (n− 2)! ways of placing the remaining n − 2 balls into the n − 2 cells, one ball in each cell. The product of these is the numerator n(n− 1) ( n 2 ) (n− 2)! = ( n 2 ) n!. 1.19 a. ( 6 4 ) = 15. b. Think of the n variables as n bins. Differentiating with respect to one of the variables is equivalent to putting a ball in the bin. Thus there are r unlabeled balls to be placed in n unlabeled bins, and there are ( n+r−1 r ) ways to do this. 1.20 A sample point specifies on which day (1 through 7) each of the 12 calls happens. Thus there are 712 equally likely sample points. There are several different ways that the calls might be assigned so that there is at least one call each day. There might be 6 calls one day and 1 call each of the other days. Denote this by 6111111. The number of sample points with this pattern is 7 ( 12 6 ) 6!. There are 7 ways to specify the day with 6 calls. There are ( 12 6 ) to specify which of the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111. The number of sample points with this pattern is 7 ( 12 4 )( 6 2 )( 8 2 )( 6 2 ) 4!. (7 ways to pick day with 4 calls, ( 12 4 ) to pick the calls for that day, ( 6 2 ) to pick two days with two calls, ( 8 2 ) ways to pick two calls for lowered numbered day, ( 6 2 ) ways to pick the two calls for higher numbered day, 4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the sample points for each one. pattern number of sample points 6111111 7 ( 12 6 ) 6! = 4,656,960 5211111 7 ( 12 5 ) 6 ( 7 2 ) 5! = 83,825,280 4221111 7 ( 12 4 )( 6 2 )( 8 2 )( 6 2 ) 4! = 523,908,000 4311111 7 ( 12 4 ) 6 ( 8 3 ) 5! = 139,708,800 3321111 ( 7 2 )( 12 3 )( 9 3 ) 5 ( 6 2 ) 4! = 698,544,000 3222111 7 ( 12 3 )( 6 3 )( 9 3 )( 7 2 )( 5 2 ) 3! = 1,397,088,000 2222211 ( 7 5 )( 12 2 )( 10 2 )( 8 2 )( 6 2 )( 4 2 ) 2! = 314,344,800 3,162,075,840 The probability is the total number of sample points divided by 712, which is 3,162,075,840712 ≈ .2285. 1.21 The probability is ( n 2r)22r (2n2r) . There are ( 2n 2r ) ways of choosing 2r shoes from a total of 2n shoes. Thus there are ( 2n 2r ) equally likely sample points. The numerator is the number of sample points for which there will be no matching pair. There are ( n 2r ) ways of choosing 2r different shoes Second Edition 1-7 has probability n!nn . Any other unordered outcome from {x1, . . . , xn}, distinct from the un- ordered sample {x1, . . . , xn}, will contain m different numbers repeated k1, . . . , km times where k1 + k2 + · · · + km = n with at least one of the ki’s satisfying 2 ≤ ki ≤ n. The probability of obtaining the corresponding average of such outcome is n! k1!k2! · · · km!nn < n! nn , since k1!k2! · · · km! > 1. Therefore the outcome with average x1+x2+···+xnn is the most likely. b. Stirling’s approximation is that, as n →∞, n! ≈ √ 2πnn+(1/2)e−n, and thus( n! nn )/(√ 2nπ en ) = n!en nn √ 2nπ = √ 2πnn+(1/2)e−nen nn √ 2nπ = 1. c. Since we are drawing with replacement from the set {x1, . . . , xn}, the probability of choosing any xi is 1n . Therefore the probability of obtaining an ordered sample of size n without xi is (1− 1n ) n. To prove that limn→∞(1− 1n ) n = e−1, calculate the limit of the log. That is lim n→∞ n log ( 1− 1 n ) = lim n→∞ log ( 1− 1n ) 1/n . L’Hôpital’s rule shows that the limit is −1, establishing the result. See also Lemma 2.3.14. 1.32 This is most easily seen by doing each possibility. Let P (i) = probability that the candidate hired on the ith trial is best. Then P (1) = 1 N , P (2) = 1 N − 1 , . . . , P (i) = 1 N − i + 1 , . . . , P (N) = 1. 1.33 Using Bayes rule P (M |CB) = P (CB|M)P (M) P (CB|M)P (M) + P (CB|F )P (F ) = .05× 12 .05× 12+.0025× 1 2 = .9524. 1.34 a. P (Brown Hair) = P (Brown Hair|Litter 1)P (Litter 1) + P (Brown Hair|Litter 2)P (Litter 2) = ( 2 3 )( 1 2 ) + ( 3 5 )( 1 2 ) = 19 30 . b. Use Bayes Theorem P (Litter 1|Brown Hair) = P (BH|L1)P (L1) P (BH|L1)P (L1) + P (BH|L2)P (L2 = ( 2 3 ) ( 1 2 ) 19 30 = 10 19 . 1.35 Clearly P (·|B) ≥ 0, and P (S|B) = 1. If A1, A2, . . . are disjoint, then P ( ∞⋃ i=1 Ai ∣∣∣∣∣B ) = P ( ⋃∞ i=1 Ai ∩B) P (B) = P ( ⋃∞ i=1 (Ai ∩B)) P (B) = ∑∞ i=1 P (Ai ∩B) P (B) = ∞∑ i=1 P (Ai|B). 1-8 Solutions Manual for Statistical Inference 1.37 a. Using the same events A, B, C and W as in Example 1.3.4, we have P (W) = P (W|A)P (A) + P (W|B)P (B) + P (W|C)P (C) = γ ( 1 3 ) + 0 ( 1 3 ) + 1 ( 1 3 ) = γ+1 3 . Thus, P (A|W) = P (A∩W)P (W) = γ/3 (γ+1)/3 = γ γ+1 where, γ γ+1 = 1 3 if γ = 1 2 γ γ+1 < 1 3 if γ < 1 2 γ γ+1 > 1 3 if γ > 1 2 . b. By Exercise 1.35, P (·|W) is a probability function. A, B and C are a partition. So P (A|W) + P (B|W) + P (C|W) = 1. But, P (B|W) = 0. Thus, P (A|W) + P (C|W) = 1. Since P (A|W) = 1/3, P (C|W) = 2/3. (This could be calculated directly, as in Example 1.3.4.) So if A can swap fates with C, his chance of survival becomes 2/3. 1.38 a. P (A) = P (A ∩ B) + P (A ∩ Bc) from Theorem 1.2.11a. But (A ∩ Bc) ⊂ Bc and P (Bc) = 1− P (B) = 0. So P (A ∩Bc) = 0, and P (A) = P (A ∩B). Thus, P (A|B) = P (A ∩B) P (B) = P (A) 1 = P (A) . b. A ⊂ B implies A ∩B = A. Thus, P (B|A) = P (A ∩B) P (A) = P (A) P (A) = 1. And also, P (A|B) = P (A ∩B) P (B) = P (A) P (B) . c. If A and B are mutually exclusive, then P (A ∪ B) = P (A) + P (B) and A ∩ (A ∪ B) = A. Thus, P (A|A ∪B) = P (A ∩ (A ∪B)) P (A ∪B) = P (A) P (A) + P (B) . d. P (A ∩B ∩ C) = P (A ∩ (B ∩ C)) = P (A|B ∩ C)P (B ∩ C) = P (A|B ∩ C)P (B|C)P (C). 1.39 a. Suppose A and B are mutually exclusive. Then A ∩ B = ∅ and P (A ∩ B) = 0. If A and B are independent, then 0 = P (A ∩ B) = P (A)P (B). But this cannot be since P (A) > 0 and P (B) > 0. Thus A and B cannot be independent. b. If A and B are independent and both have positive probability, then 0 < P (A)P (B) = P (A ∩B). This implies A ∩B 6= ∅, that is, A and B are not mutually exclusive. 1.40 a. P (Ac ∩ B) = P (Ac|B)P (B) = [1− P (A|B)]P (B) = [1− P (A)]P (B) = P (Ac)P (B) , where the third equality follows from the independence of A and B. b. P (Ac ∩Bc) = P (Ac)− P (Ac ∩B) = P (Ac)− P (Ac)P (B) = P (Ac)P (Bc). Second Edition 1-9 1.41 a. P ( dash sent | dash rec) = P ( dash rec | dash sent)P ( dash sent) P ( dash rec | dash sent)P ( dash sent) + P ( dash rec | dot sent)P ( dot sent) = (2/3)(4/7) (2/3)(4/7) + (1/4)(3/7) = 32/41. b. By a similar calculation as the one in (a) P (dot sent|dot rec) = 27/434. Then we have P ( dash sent|dot rec) = 1643 . Given that dot-dot was received, the distribution of the four possibilities of what was sent are Event Probability dash-dash (16/43)2 dash-dot (16/43)(27/43) dot-dash (27/43)(16/43) dot-dot (27/43)2 1.43 a. For Boole’s Inequality, P (∪ni=1) ≤ n∑ i=1 P (Ai)− P2 + P3 + · · · ± Pn ≤ n∑ i=1 P (Ai) since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0 for k = 1, . . . , n−12 when n is odd. When n is even the last term to consider is −Pn ≤ 0. For Bonferroni’s Inequality apply the inclusion-exclusion identity to the Aci , and use the argument leading to (1.2.10). b. We illustrate the proof that the Pi are increasing by showing that P2 ≥ P3. The other arguments are similar. Write P2 = ∑ 1≤i<j≤n P (Ai ∩Aj) = n−1∑ i=1 n∑ j=i+1 P (Ai ∩Aj) = n−1∑ i=1 n∑ j=i+1 [ n∑ k=1 P (Ai ∩Aj ∩Ak) + P (Ai ∩Aj ∩ (∪kAk)c) ] Now to get to P3 we drop terms from this last expression. That is n−1∑ i=1 n∑ j=i+1 [ n∑ k=1 P (Ai ∩Aj ∩Ak) + P (Ai ∩Aj ∩ (∪kAk)c) ] ≥ n−1∑ i=1 n∑ j=i+1 [ n∑ k=1 P (Ai ∩Aj ∩Ak) ] ≥ n−2∑ i=1 n−1∑ j=i+1 n∑ k=j+1 P (Ai ∩Aj ∩Ak) = ∑ 1≤i<j<k≤n P (Ai ∩Aj ∩Ak) = P3. The sequence of bounds is improving because the bounds P1, P1−P2+P3, P1−P2+P3−P4+ P5, . . ., are getting smaller since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0. The lower bounds P1 − P2, P1 − P2 + P3 − P4, P1 − P2 + P3 − P4 + P5 − P6, . . ., are getting bigger since Pi ≥ Pj if i ≤ j and therefore the terms P2k+1 − P2k ≥ 0. 1-12 Solutions Manual for Statistical Inference 1.54 a. ∫ π/2 0 sinxdx = 1. Thus, c = 1/1 = 1. b. ∫∞ −∞ e −|x|dx = ∫ 0 −∞ e xdx + ∫∞ 0 e−xdx = 1 + 1 = 2. Thus, c = 1/2. 1.55 P (V ≤ 5) = P (T < 3) = ∫ 3 0 1 1.5 e−t/1.5 dt = 1− e−2. For v ≥ 6, P (V ≤ v) = P (2T ≤ v) = P ( T ≤ v 2 ) = ∫ v 2 0 1 1.5 e−t/1.5 dt = 1− e−v/3. Therefore, P (V ≤ v) = { 0 −∞ < v < 0, 1− e−2 0 ≤ v < 6 , 1− e−v/3 6 ≤ v . Chapter 2 Transformations and Expectations 2.1 a. fx(x) = 42x5(1 − x), 0 < x < 1; y = x3 = g(x), monotone, and Y = (0, 1). Use Theorem 2.1.5. fY (y) = fx(g−1(y)) ∣∣∣∣ ddy g−1(y) ∣∣∣∣ = fx(y1/3) ddy (y1/3) = 42y5/3(1− y1/3)(13y−2/3) = 14y(1− y1/3) = 14y − 14y4/3, 0 < y < 1. To check the integral,∫ 1 0 (14y − 14y4/3)dy = 7y2−14y 7/3 7/3 ∣∣∣∣1 0 = 7y2−6y7/3 ∣∣∣1 0 = 1− 0 = 1. b. fx(x) = 7e−7x, 0 < x < ∞, y = 4x + 3, monotone, and Y = (3,∞). Use Theorem 2.1.5. fY (y) = fx( y − 3 4 ) ∣∣∣∣ ddy (y − 34 ) ∣∣∣∣ = 7e−(7/4)(y−3) ∣∣∣∣14 ∣∣∣∣ = 74e−(7/4)(y−3), 3 < y < ∞. To check the integral,∫ ∞ 3 7 4 e−(7/4)(y−3)dy = −e−(7/4)(y−3) ∣∣∣∞ 3 = 0− (−1) = 1. c. FY (y) = P (0 ≤ X ≤ √ y) = FX( √ y). Then fY (y) = 12√y fX( √ y). Therefore fY (y) = 1 2 √ y 30( √ y)2(1−√y)2 = 15y 12 (1−√y)2, 0 < y < 1. To check the integral,∫ 1 0 15y 1 2 (1−√y)2dy = ∫ 1 0 (15y 1 2 − 30y + 15y 32 )dy = 15(2 3 )− 30(1 2 ) + 15( 2 5 ) = 1. 2.2 In all three cases, Theorem 2.1.5 is applicable and yields the following answers. a. fY (y) = 12y −1/2, 0 < y < 1. b. fY (y) = (n+m+1)! n!m! e −y(n+1)(1− e−y)m, 0 < y < ∞. c. fY (y) = 1σ2 log y y e −(1/2)((log y)/σ)2 , 0 < y < ∞. 2.3 P (Y = y) = P ( XX+1 = y) = P (X = y 1−y ) = 1 3 ( 2 3 ) y/(1−y), where y = 0, 12 , 2 3 , 3 4 , . . . , x x+1 , . . . . 2.4 a. f(x) is a pdf since it is positive and∫ ∞ −∞ f(x)dx = ∫ 0 −∞ 1 2 λeλxdx + ∫ ∞ 0 1 2 λe−λxdx = 1 2 + 1 2 = 1. 2-2 Solutions Manual for Statistical Inference b. Let X be a random variable with density f(x). P (X < t) = {∫ t −∞ 1 2λe λxdx if t < 0∫ 0 −∞ 1 2λe λxdx+ ∫ t 0 1 2λe −λxdx if t ≥ 0 where, ∫ t −∞ 1 2λe λxdx = 12e λx ∣∣t −∞ = 1 2e λt and ∫ t 0 1 2λe −λxdx = − 12e −λx ∣∣t 0 = − 12e −λt + 12 . Therefore, P (X < t) = { 1 2e λt if t < 0 1− 12e −λtdx if t ≥ 0 c. P (|X| < t) = 0 for t < 0, and for t ≥ 0, P (|X| < t) = P (−t < X < t) = ∫ 0 −t 1 2 λeλxdx + ∫ t 0 1 2 λe−λxdx = 1 2 [ 1− e−λt ] + 1 2 [ −e−λt+1 ] = 1− e−λt. 2.5 To apply Theorem 2.1.8. Let A0 = {0}, A1 = (0, π2 ), A3 = (π, 3π 2 ) and A4 = ( 3π 2 , 2π). Then gi(x) = sin2(x) on Ai for i = 1, 2, 3, 4. Therefore g−11 (y) = sin −1( √ y), g−12 (y) = π− sin −1( √ y), g−13 (y) = sin −1( √ y) + π and g−14 (y) = 2π − sin −1( √ y). Thus fY (y) = 1 2π ∣∣∣∣ 1√1− y 12√y ∣∣∣∣+ 12π ∣∣∣∣− 1√1− y 12√y ∣∣∣∣+ 12π ∣∣∣∣ 1√1− y 12√y ∣∣∣∣+ 12π ∣∣∣∣− 1√1− y 12√y ∣∣∣∣ = 1 π √ y(1− y) , 0 ≤ y ≤ 1 To use the cdf given in (2.1.6) we have that x1 = sin−1( √ y) and x2 = π− sin−1( √ y). Then by differentiating (2.1.6) we obtain that fY (y) = 2fX(sin−1( √ y) d dy (sin−1( √ y)− 2fX(π − sin−1( √ y) d dy (π − sin−1(√y) = 2( 1 2π 1√ 1− y 1 2 √ y )− 2( 1 2π −1√ 1− y 1 2 √ y ) = 1 π √ y(1− y) 2.6 Theorem 2.1.8 can be used for all three parts. a. Let A0 = {0}, A1 = (−∞, 0) and A2 = (0,∞). Then g1(x) = |x|3 = −x3 on A1 and g2(x) = |x|3 = x3 on A2. Use Theorem 2.1.8 to obtain fY (y) = 1 3 e−y 1/3 y−2/3, 0 < y < ∞ . b. Let A0 = {0}, A1 = (−1, 0) and A2 = (0, 1). Then g1(x) = 1− x2 on A1 and g2(x) = 1− x2 on A2. Use Theorem 2.1.8 to obtain fY (y) = 3 8 (1− y)−1/2 + 3 8 (1− y)1/2, 0 < y < 1 . Second Edition 2-5 Therefore, FY (y) = d dy FY (y) = fX(y) + fX(−y) = 1√ 2π e −y 2 + 1√ 2π e −y 2 = √ 2 π e −y 2 . Thus, EY = ∫ ∞ 0 y √ 2 π e −y 2 dy = √ 2 π ∫ ∞ 0 e−udu = √ 2 π [ −e−u ∣∣∞ 0 ] = √ 2 π , where u = y 2 2 . EY 2 = ∫ ∞ 0 y2 √ 2 π e −y 2 dy = √ 2 π [ −ye −y 2 ∣∣∣∞ 0 + ∫ ∞ 0 e −y 2 dy ] = √ 2 π √ π 2 = 1. This was done using integration by part with u = y and dv = ye −y 2 dy. Then Var(Y ) = 1− 2π . 2.12 We have tanx = y/d, therefore tan−1(y/d) = x and ddy tan −1(y/d) = 1 1+(y/d)2 1 ddy = dx. Thus, fY (y) = 2 πd 1 1+(y/d)2 , 0 < y < ∞. This is the Cauchy distribution restricted to (0,∞), and the mean is infinite. 2.13 P (X = k) = (1− p)kp + pk(1− p), k = 1, 2, . . .. Therefore, EX = ∞∑ k=1 k[(1− p)kp + pk(1− p)] = (1− p)p [ ∞∑ k=1 k(1− p)k−1 + ∞∑ k=1 kpk−1 ] = (1− p)p [ 1 p2 + 1 (1− p)2 ] = 1− 2p + 2p2 p(1− p) . 2.14 ∫ ∞ 0 (1− FX(x))dx = ∫ ∞ 0 P (X > x)dx = ∫ ∞ 0 ∫ ∞ x fX(y)dydx = ∫ ∞ 0 ∫ y 0 dxfX(y)dy = ∫ ∞ 0 yfX(y)dy = EX, where the last equality follows from changing the order of integration. 2.15 Assume without loss of generality that X ≤ Y . Then X ∨ Y = Y and X ∧ Y = X. Thus X + Y = (X ∧ Y ) + (X ∨ Y ). Taking expectations E[X + Y ] = E[(X ∧ Y ) + (X ∨ Y )] = E(X ∧ Y ) + E(X ∨ Y ). Therefore E(X ∨ Y ) = EX + EY − E(X ∧ Y ). 2.16 From Exercise 2.14, ET = ∫ ∞ 0 [ ae−λt+(1− a)e−µt ] dt = −ae−λt λ − (1− a)e −µt µ ∣∣∣∣∞ 0 = a λ + 1− a µ . 2-6 Solutions Manual for Statistical Inference 2.17 a. ∫m 0 3x2dx = m3 set= 12 ⇒ m = ( 1 2 )1/3 = .794. b. The function is symmetric about zero, therefore m = 0 as long as the integral is finite. 1 π ∫ ∞ −∞ 1 1+x2 dx = 1 π tan−1(x) ∣∣∣∣∞ −∞ = 1 π (π 2 + π 2 ) = 1. This is the Cauchy pdf. 2.18 E|X − a| = ∫∞ −∞ |x− a|f(x)dx = ∫ a −∞−(x− a)f(x)dx + ∫∞ a (x− a)f(x)dx. Then, d da E|X − a| = ∫ a −∞ f(x)dx− ∫ ∞ a f(x)dx set= 0. The solution to this equation is a = median. This is a minimum since d2/da2E|X−a| = 2f(a) > 0. 2.19 d da E(X − a)2 = d da ∫ ∞ −∞ (x− a)2fX(x)dx = ∫ ∞ −∞ d da (x− a)2fX(x)dx = ∫ ∞ −∞ −2(x− a)fX(x)dx = −2 [∫ ∞ −∞ xfX(x)dx− a ∫ ∞ −∞ fX(x)dx ] = −2[EX − a]. Therefore if ddaE(X−a) 2 = 0 then −2[EX−a] = 0 which implies that EX = a. If EX = a then d daE(X − a) 2 = −2[EX − a] = −2[a− a] = 0. EX = a is a minimum since d2/da2E(X − a)2 = 2 > 0. The assumptions that are needed are the ones listed in Theorem 2.4.3. 2.20 From Example 1.5.4, if X = number of children until the first daughter, then P (X = k) = (1− p)k−1p, where p = probability of a daughter. Thus X is a geometric random variable, and EX = ∞∑ k=1 k(1− p)k−1p = p− ∞∑ k=1 d dp (1− p)k = −p d dp [ ∞∑ k=0 (1− p)k−1 ] = −p d dp [ 1 p −1 ] = 1 p . Therefore, if p = 12 ,the expected number of children is two. 2.21 Since g(x) is monotone Eg(X) = ∫ ∞ −∞ g(x)fX(x)dx = ∫ ∞ −∞ yfX(g−1(y)) d dy g−1(y)dy = ∫ ∞ −∞ yfY (y)dy = EY, where the second equality follows from the change of variable y = g(x), x = g−1(y) and dx = ddy g −1(y)dy. 2.22 a. Using integration by parts with u = x and dv = xe−x 2/β2 we obtain that∫ ∞ 0 x2e−x 2/β2dx2 = β2 2 ∫ ∞ 0 e−x 2/β2dx. The integral can be evaluated using the argument on pages 104-105 (see 3.3.14) or by trans- forming to a gamma kernel (use y = −λ2/β2). Therefore, ∫∞ 0 e−x 2/β2dx = √ πβ/2 and hence the function integrates to 1. Second Edition 2-7 b. EX = 2β/ √ π EX2 = 3β2/2 VarX = β2 [ 3 2− 4 π ] . 2.23 a. Use Theorem 2.1.8 with A0 = {0}, A1 = (−1, 0) and A2 = (0, 1). Then g1(x) = x2 on A1 and g2(x) = x2 on A2. Then fY (y) = 1 2 y−1/2, 0 < y < 1. b. EY = ∫ 1 0 yfY (y)dy = 13 EY 2 = ∫ 1 0 y2fY (y)dy = 15 VarY = 1 5 − ( 1 3 )2 = 445 . 2.24 a. EX = ∫ 1 0 xaxa−1dx = ∫ 1 0 axadx = ax a+1 a+1 ∣∣∣1 0 = aa+1 . EX2 = ∫ 1 0 x2axa−1dx = ∫ 1 0 axa+1dx = ax a+2 a+2 ∣∣∣1 0 = aa+2 . VarX = aa+2 − ( a a+1 )2 = a (a+2)(a+1)2 . b. EX = ∑n x=1 x n = 1 n ∑n x=1 x = 1 n n(n+1) 2 = n+1 2 . EX2 = ∑n i=1 x2 n = 1 n ∑n i=1 x 2 = 1n n(n+1)(2n+1) 6 = (n+1)(2n+1) 6 . VarX = (n+1)(2n+1)6 − ( n+1 2 )2 = 2n2+3n+16 − n2+2n+14 = n2+112 . c. EX = ∫ 2 0 x32 (x− 1) 2dx = 32 ∫ 2 0 (x3 − 2x2 + x)dx = 1. EX2 = ∫ 2 0 x2 32 (x− 1) 2dx = 32 ∫ 2 0 (x4 − 2x3 + x2)dx = 85 . VarX = 85 − 1 2 = 35 . 2.25 a. Y = −X and g−1(y) = −y. Thus fY (y) = fX(g−1(y))| ddy g −1(y)| = fX(−y)| − 1| = fX(y) for every y. b. To show that MX(t) is symmetric about 0 we must show that MX(0 + ) = MX(0− ) for all > 0. MX(0 + ) = ∫ ∞ −∞ e(0+)xfX(x)dx = ∫ 0 −∞ exfX(x)dx + ∫ ∞ 0 exfX(x)dx = ∫ ∞ 0 e(−x)fX(−x)dx + ∫ 0 −∞ e(−x)fX(−x)dx = ∫ ∞ −∞ e−xfX(x)dx = ∫ ∞ −∞ e(0−)xfX(x)dx = MX(0− ). 2.26 a. There are many examples; here are three. The standard normal pdf (Example 2.1.9) is symmetric about a = 0 because (0 − )2 = (0 + )2. The Cauchy pdf (Example 2.2.4) is symmetric about a = 0 because (0− )2 = (0 + )2. The uniform(0, 1) pdf (Example 2.1.4) is symmetric about a = 1/2 because f((1/2) + ) = f((1/2)− ) = { 1 if 0 < < 12 0 if 12 ≤ < ∞ . b. ∫ ∞ a f(x)dx = ∫ ∞ 0 f(a + )d (change variable, = x− a) = ∫ ∞ 0 f(a− )d (f(a + ) = f(a− ) for all > 0) = ∫ a −∞ f(x)dx. (change variable, x = a− ) 2-10 Solutions Manual for Statistical Inference = n∑ y=1 n a (y − 1) + (a + 1) ( n− 1 y − 1 ) (a+b−1 a )( (n−1)+(a+1)+b−1 (y−1)+(a+1) ) = na a+1 ( a+b−1 a )( a+1+b−1 a+1 ) n∑ y=1 a + 1 (y − 1) + (a + 1) ( n− 1 y − 1 ) (a+1+b−1 a+1 )( (n−1)+(a+1)+b−1 (y−1)+(a+1) ) = na a + b n−1∑ j=0 a + 1 j + (a + 1) ( n− 1 j ) (a+1+b−1 a+1 )( (n−1)+(a+1)+b−1 (j+(a+1) ) = na a + b , since the last summation is 1, being the sum over all possible values of a beta-binomial(n− 1, a + 1, b). E[Y (Y − 1)] = n(n−1)a(a+1)(a+b)(a+b+1) is calculated similar to EY, but using the identity y(y− 1) ( n y ) = n(n− 1) ( n−2 y−2 ) and adding 2 instead of 1 to the parameter a. The sum over all possible values of a beta-binomial(n− 2, a + 2, b) will appear in the calculation. Therefore Var(Y ) = E[Y (Y − 1)] + EY − (EY )2 = nab(n + a + b) (a + b)2(a + b + 1) . 2.30 a. E(etX) = ∫ c 0 etx 1cdx = 1 cte tx ∣∣c 0 = 1cte tc − 1ct1 = 1 ct (e tc − 1). b. E(etX) = ∫ c 0 2x c2 e txdx = 2c2t2 (cte tc − etc + 1). (integration-by-parts) c. E(etx) = ∫ α −∞ 1 2β e(x−α)/βetxdx + ∫ ∞ α 1 2β e−(x−α)/βetxdx = e−α/β 2β 1 ( 1β +t) ex( 1 β +t) ∣∣∣∣∣ α −∞ + −e α/β 2β 1 ( 1β − t) e−x( 1 β−t) ∣∣∣∣∣ ∞ α = 4eαt 4−β2t2 , −2/β < t < 2/β. d. E ( etX ) = ∑∞ x=0 e tx ( r+x−1 x ) pr(1 − p)x = pr ∑∞ x=0 ( r+x−1 x ) ( (1− p)et )x . Now use the fact that ∑∞ x=0 ( r+x−1 x ) ( (1− p)et )x ( 1− (1− p)et )r = 1 for (1− p)et < 1, since this is just the sum of this pmf, to get E(etX) = ( p 1−(1−p)et )r , t < − log(1− p). 2.31 Since the mgf is defined as MX(t) = EetX , we necessarily have MX(0) = Ee0 = 1. But t/(1− t) is 0 at t = 0, therefore it cannot be an mgf. 2.32 d dt S(t) ∣∣∣∣ t=0 = d dt (log(Mx(t)) ∣∣∣∣ t=0 = d dtMx(t) Mx(t) ∣∣∣∣∣ t=0 = EX 1 = EX ( since MX(0) = Ee0 = 1 ) d2 dt2 S(t) ∣∣∣∣ t=0 = d dt ( M ′x(t) Mx(t) )∣∣∣∣ t=0 = Mx(t)M ′′ x(t)− [M ′ x(t)] 2 [Mx(t)] 2 ∣∣∣∣∣ t=0 = 1 · EX2−(EX)2 1 = VarX. 2.33 a. MX(t) = ∑∞ x=0 e tx e−λλx x! = e −λ∑∞ x=1 (etλ)x x! = e −λeλe t = eλ(e t−1). EX = ddtMx(t) ∣∣ t=0 = eλ(e t−1)λet ∣∣∣ t=0 = λ. Second Edition 2-11 EX2 = d 2 dt2 Mx(t) ∣∣∣ t=0 = λeteλ(e t−1)λet+λeteλ(e t−1) ∣∣∣ t=0 = λ2 + λ. VarX = EX2 − (EX)2 = λ2 + λ− λ2 = λ. b. Mx(t) = ∞∑ x=0 etxp(1− p)x = p ∞∑ x=0 ((1− p)et)x = p 1 1−(1− p)et = p 1−(1− p)et , t < − log(1− p). EX = d dt Mx(t) ∣∣∣∣ t=0 = −p (1− (1− p)et)2 ( −(1− p)et )∣∣∣∣∣ t=0 = p(1− p) p2 = 1−p p . EX2 = d2 dt2 Mx(t) ∣∣∣∣ t=0 = ( 1−(1− p)et )2 ( p(1− p)et ) +p(1− p)et2 ( 1−(1− p)et ) (1− p)et (1− (1− p)et)4 ∣∣∣∣∣∣∣ t=0 = p3(1− p) + 2p2(1− p)2 p4 = p(1− p) + 2(1− p)2 p2 . VarX = p(1− p) + 2(1− p)2 p2 − (1− p) 2 p2 = 1−p p2 . c. Mx(t) = ∫∞ −∞ e tx 1√ 2πσ e−(x−µ) 2/2σ2dx = 1√ 2πσ ∫∞ −∞ e −(x2−2µx−2σ2tx+µ2)/2σ2dx. Now com- plete the square in the numerator by writing x2 − 2µx− 2σ2tx+µ2 = x2 − 2(µ + σ2t)x± (µ + σ2t)2 + µ2 = (x− (µ + σ2t))2 − (µ + σ2t)2 + µ2 = (x− (µ + σ2t))2 − [2µσ2t + (σ2t)2]. Then we have Mx(t) = e[2µσ 2t+(σ2t)2]/2σ2 1√ 2πσ ∫∞ −∞ e − 1 2σ2 (x−(µ+σ2t))2dx = eµt+ σ2t2 2 . EX = ddtMx(t) ∣∣ t=0 = (µ+σ2t)eµt+σ 2t2/2 ∣∣∣ t=0 = µ. EX2 = d 2 dt2 Mx(t) ∣∣∣ t=0 = (µ+σ2t)2eµt+σ 2t2/2+σ2eµt+σ 2t/2 ∣∣∣ t=0 = µ2 + σ2. VarX = µ2 + σ2 − µ2 = σ2. 2.35 a. EXr1 = ∫ ∞ 0 xr 1√ 2πx e−(log x) 2/2dx (f1 is lognormal with µ = 0, σ2 = 1) = 1√ 2π ∫ ∞ −∞ ey(r−1)e−y 2/2eydy (substitute y = log x, dy = (1/x)dx) = 1√ 2π ∫ ∞ −∞ e−y 2/2+rydy = 1√ 2π ∫ ∞ −∞ e−(y 2−2ry+r2)/2er 2/2dy = er 2/2. 2-12 Solutions Manual for Statistical Inference b. ∫ ∞ 0 xrf1(x) sin(2πlog x)dx = ∫ ∞ 0 xr 1√ 2πx e−(log x) 2/2 sin(2π log x)dx = ∫ ∞ −∞ e(y+r)r 1√ 2π e−(y+r) 2/2 sin(2πy + 2πr)dy (substitute y = log x, dy = (1/x)dx) = ∫ ∞ −∞ 1√ 2π e(r 2−y2)/2 sin(2πy)dy (sin(a + 2πr) = sin(a) if r = 0, 1, 2, . . .) = 0, because e(r 2−y2)/2 sin(2πy) = −e(r2−(−y)2)/2 sin(2π(−y)); the integrand is an odd function so the negative integral cancels the positive one. 2.36 First, it can be shown that lim x→∞ etx−(log x) 2 = ∞ by using l’Hôpital’s rule to show lim x→∞ tx− (log x)2 tx = 1, and, hence, lim x→∞ tx− (log x)2 = lim x→∞ tx = ∞. Then for any k > 0, there is a constant c such that∫ ∞ k 1 x etxe( log x) 2/2dx ≥ c ∫ ∞ k 1 x dx = c log x|∞k = ∞. Hence Mx(t) does not exist. 2.37 a. The graph looks very similar to Figure 2.3.2 except that f1 is symmetric around 0 (since it is standard normal). b. The functions look like t2/2 – it is impossible to see any difference. c. The mgf of f1 is eK1(t). The mgf of f2 is eK2(t). d. Make the transformation y = ex to get the densities in Example 2.3.10. 2.39 a. ddx ∫ x 0 e−λtdt = e−λx. Verify d dx [∫ x 0 e−λtdt ] = d dx [ − 1 λ e−λt ∣∣∣∣x 0 ] = d dx ( − 1 λ e−λx + 1 λ ) = e−λx. b. ddλ ∫∞ 0 e−λtdt = ∫∞ 0 d dλe −λtdt = ∫∞ 0 −te−λtdt = −Γ(2)λ2 = − 1 λ2 . Verify d dλ ∫ ∞ 0 e−λtdt = d dλ 1 λ = − 1 λ2 . c. ddt ∫ 1 t 1 x2 dx = − 1 t2 . Verify d dt [∫ 1 t 1 x2 dx ] = d dt ( − 1 x ∣∣∣∣1 t ) = d dt ( −1 + 1 t ) = − 1 t2 . d. ddt ∫∞ 1 1 (x−t)2 dx = ∫∞ 1 d dt ( 1 (x−t)2 ) dx = ∫∞ 1 2(x− t)−3dx = −(x− t)−2 ∣∣∣∞ 1 = 1 (1−t)2 . Verify d dt ∫ ∞ 1 (x− t)−2dx = d dt [ −(x− t)−1 ∣∣∣∞ 1 ] = d dt 1 1− t = 1 (1− t)2 . Second Edition 3-3 3.9 a. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli trials with probability of success (a twin birth) of approximately 190 . The probability of having 5 or more successes approximates the probability of having 5 or more sets of twins entering kindergarten. Then X ∼ binomial(60, 190 ) and P (X ≥ 5) = 1− 4∑ x=0 ( 60 x )( 1 90 )x( 1− 1 90 )60−x = .0006, which is small and may be rare enough to be newsworthy. b. Let X be the number of elementary schools in New York state that have 5 or more sets of twins entering kindergarten. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(310,.0006). Therefore P (X ≥ 1) = 1− P (X = 0) = .1698. c. Let X be the number of States that have 5 or more sets of twins entering kindergarten during any of the last ten years. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(500, .1698). Therefore P (X ≥ 1) = 1− P (X = 0) = 1− 3.90× 10−41 ≈ 1. 3.11 a. lim M/N→p,M→∞,N→∞ ( M x )( N−M K−x )( N K ) = K! x!(K−x)! lim M/N→p,M→∞,N→∞ M !(N−M)!(N−K)! N !(M−x)!(N−M−(K−x))! In the limit, each of the factorial terms can be replaced by the approximation from Stirling’s formula because, for example, M ! = (M !/( √ 2πMM+1/2e−M )) √ 2πMM+1/2e−M and M !/( √ 2πMM+1/2e−M ) → 1. When this replacement is made, all the √ 2π and expo- nential terms cancel. Thus, lim M/N→p,M→∞,N→∞ ( M x )( N−M K−x )( N K ) = ( K x ) lim M/N→p,M→∞,N→∞ MM+1/2(N−M)N−M+1/2(N−K)N−K+1/2 NN+1/2(M−x)M−x+1/2(N−M−K+x)N−M−(K−x)+1/2 . We can evaluate the limit by breaking the ratio into seven terms, each of which has a finite limit we can evaluate. In some limits we use the fact that M →∞, N →∞ and M/N → p imply N −M →∞. The first term (of the seven terms) is lim M→∞ ( M M − x )M = lim M→∞ 1( M−x M )M = limM→∞ 1(1+−xM )M = 1 e−x = ex. Lemma 2.3.14 is used to get the penultimate equality. Similarly we get two more terms, lim N−M→∞ ( N −M N −M − (K − x) )N−M = eK−x and lim N→∞ ( N −K N )N = e−K . 3-4 Solutions Manual for Statistical Inference Note, the product of these three limits is one. Three other terms are limM →∞ ( M M − x )1/2 = 1 lim N−M→∞ ( N −M N −M − (K − x) )1/2 = 1 and lim N→∞ ( N −K N )1/2 = 1. The only term left is lim M/N→p,M→∞,N→∞ (M − x)x(N −M − (K − x))K−x (N −K)K = lim M/N→p,M→∞,N→∞ ( M − x N −K )x( N −M − (K − x) N −K )K−x = px(1− p)K−x. b. If in (a) we in addition have K → ∞, p → 0, MK/N → pK → λ, by the Poisson approxi- mation to the binomial, we heuristically get( M x )( N−M K−x )( N K ) → (K x ) px(1− p)K−x → e −λλx x! . c. Using Stirling’s formula as in (a), we get lim N,M,K→∞, MN →0, KM N →λ ( M x )( N−M K−x )( N K ) = lim N,M,K→∞, MN →0, KM N →λ e−x x! KxexMxex(N−M)K−xeK−x NKeK = 1 x! lim N,M,K→∞, MN →0, KM N →λ ( KM N )x( N −M N )K−x = 1 x! λx lim N,M,K→∞, MN →0, KM N →λ ( 1− MK N K )K = e−λλx x! . 3.12 Consider a sequence of Bernoulli trials with success probability p. Define X = number of successes in first n trials and Y = number of failures before the rth success. Then X and Y have the specified binomial and hypergeometric distributions, respectively. And we have Fx(r − 1) = P (X ≤ r − 1) = P (rth success on (n + 1)st or later trial) = P (at least n + 1− r failures before the rth success) = P (Y ≥ n− r + 1) = 1− P (Y ≤ n− r) = 1− FY (n− r). Second Edition 3-5 3.13 For any X with support 0, 1, . . ., we have the mean and variance of the 0−truncated XT are given by EXT = ∞∑ x=1 xP (XT = x) = ∞∑ x=1 x P (X = x) P (X > 0) = 1 P (X > 0) ∞∑ x=1 xP (X = x) = 1 P (X > 0) ∞∑ x=0 xP (X = x) = EX P (X > 0) . In a similar way we get EX2T = EX2 P (X>0) . Thus, VarXT = EX2 P (X > 0) − ( EX P (X > 0) )2 . a. For Poisson(λ), P (X > 0) = 1− P (X = 0) = 1− e −λλ0 0! = 1− e −λ, therefore P (XT = x) = e−λλx x!(1−e−λ) x = 1, 2, . . . EXT = λ/(1− e−λ) VarXT = (λ2 + λ)/(1− e−λ)− (λ/(1− e−λ))2. b. For negative binomial(r, p), P (X > 0) = 1−P (X = 0) = 1− ( r−1 0 ) pr(1−p)0 = 1−pr. Then P (XT = x) = ( r+x−1 x ) pr(1− p)x 1−pr , x = 1, 2, . . . EXT = r(1− p) p(1− pr) VarXT = r(1− p) + r2(1− p)2 p2(1− pr) − [ r(1− p) p(1− pr)2 ] . 3.14 a. ∑∞ x=1 −(1−p)x x log p = 1 log p ∑∞ x=1 −(1−p)x x = 1, since the sum is the Taylor series for log p. b. EX = −1 log p [ ∞∑ x=1 (1−p)x ] = −1 log p [ ∞∑ x=0 (1−p)x−1 ] == −1 log p [ 1 p −1 ] = −1 log p ( 1−p p ) . Since the geometric series converges uniformly, EX2 = −1 log p ∞∑ x=1 x(1− p)x = (1−p) log p ∞∑ x=1 d dp (1− p)x = (1−p) log p d dp ∞∑ x=1 (1− p)x = (1−p) log p d dp [ 1−p p ] = −(1−p) p2 log p . Thus VarX = −(1−p) p2 log p [ 1 + (1−p) log p ] . Alternatively, the mgf can be calculated, Mx(t) = −1 log p ∞∑ x=1 [ (1−p)et ]x = log(1+pet−et) log p and can be differentiated to obtain the moments. 3-8 Solutions Manual for Statistical Inference e. The double exponential(µ, σ) pdf is symmetric about µ. Thus, by Exercise 2.26, EX = µ. VarX = ∫ ∞ −∞ (x− µ)2 1 2σ e−|x−µ|/σdx = ∫ ∞ −∞ σz2 1 2 e−|z|σdz = σ2 ∫ ∞ 0 z2e−zdz = σ2Γ(3) = 2σ2. 3.23 a. ∫ ∞ α x−β−1dx = −1 β x−β ∣∣∣∣∞ α = 1 βαβ , thus f(x) integrates to 1 . b. EXn = βα n (n−β) , therefore EX = αβ (1− β) EX2 = αβ2 (2− β) VarX = αβ2 2−β − (αβ) 2 (1−β)2 c. If β < 2 the integral of the second moment is infinite. 3.24 a. fx(x) = 1β e −x/β , x > 0. For Y = X1/γ , fY (y) = γβ e −yγ/βyγ−1, y > 0. Using the transforma- tion z = yγ/β, we calculate EY n = γ β ∫ ∞ 0 yγ+n−1e−y γ/βdy = βn/γ ∫ ∞ 0 zn/γe−zdz = βn/γΓ ( n γ +1 ) . Thus EY = β1/γΓ( 1γ + 1) and VarY = β 2/γ [ Γ ( 2 γ +1 ) −Γ2 ( 1 γ +1 )] . b. fx(x) = 1β e −x/β , x > 0. For Y = (2X/β)1/2, fY (y) = ye−y 2/2, y > 0 . We now notice that EY = ∫ ∞ 0 y2e−y 2/2dy = √ 2π 2 since 1√ 2π ∫∞ −∞ y 2e−y 2/2 = 1, the variance of a standard normal, and the integrand is sym- metric. Use integration-by-parts to calculate the second moment EY 2 = ∫ ∞ 0 y3e−y 2/2dy = 2 ∫ ∞ 0 ye−y 2/2dy = 2, where we take u = y2, dv = ye−y 2/2. Thus VarY = 2(1− π/4). c. The gamma(a, b) density is fX(x) = 1 Γ(a)ba xa−1e−x/b. Make the transformation y = 1/x with dx = −dy/y2 to get fY (y) = fX(1/y)|1/y2| = 1 Γ(a)ba ( 1 y )a+1 e−1/by. Second Edition 3-9 The first two moments are EY = 1 Γ(a)ba ∫ ∞ 0 ( 1 y )a e−1/by = Γ(a− 1)ba−1 Γ(a)ba = 1 (a− 1)b EY 2 = Γ(a− 2)ba−2 Γ(a)ba = 1 (a− 1)(a− 2)b2 , and so VarY = 1(a−1)2(a−2)b2 . d. fx(x) = 1Γ(3/2)β3/2 x 3/2−1e−x/β , x > 0. For Y = (X/β)1/2, fY (y) = 2Γ(3/2)y 2e−y 2 , y > 0. To calculate the moments we use integration-by-parts with u = y2, dv = ye−y 2 to obtain EY = 2 Γ(3/2) ∫ ∞ 0 y3e−y 2 dy = 2 Γ(3/2) ∫ ∞ 0 ye−y 2 dy = 1 Γ(3/2) and with u = y3, dv = ye−y 2 to obtain EY 2 = 2 Γ(3/2) ∫ ∞ 0 y4e−y 2 dy = 3 Γ(3/2) ∫ ∞ 0 y2e−y 2 dy = 3 Γ(3/2) √ π. Using the fact that 1 2 √ π ∫∞ −∞ y 2e−y 2 = 1, since it is the variance of a n(0, 2), symmetry yields∫∞ 0 y2e−y 2 dy = √ π. Thus, VarY = 6− 4/π, using Γ(3/2) = 12 √ π. e. fx(x) = e−x, x > 0. For Y = α−γ log X, fY (y) = e−e α−y γ e α−y γ 1 γ , −∞ < y < ∞. Calculation of EY and EY 2 cannot be done in closed form. If we define I1 = ∫ ∞ 0 log xe−xdx, I2 = ∫ ∞ 0 (log x)2e−xdx, then EY = E(α − γ log x) = α − γI1, and EY 2 = E(α − γ log x)2 = α2 − 2αγI1 + γ2I2.The constant I1 = .5772157 is called Euler’s constant. 3.25 Note that if T is continuous then, P (t ≤ T ≤ t+δ|t ≤ T ) = P (t ≤ T ≤ t+δ, t ≤ T ) P (t ≤ T ) = P (t ≤ T ≤ t+δ) P (t ≤ T ) = FT (t+δ)− FT (t) 1−FT (t) . Therefore from the definition of derivative, hT (t) = 1 1− FT (t) = lim δ→0 FT (t + δ)− FT (t) δ = F ′T (t) 1− FT (t) = fT (t) 1−FT (t) . Also, − d dt (log[1− FT (t)]) = − 1 1−FT (t) (−fT (t)) = hT (t). 3.26 a. fT (t) = 1β e −t/β and FT (t) = ∫ t 0 1 β e −x/βdx = − e−x/β ∣∣t 0 = 1− e−t/β . Thus, hT (t) = fT (t) 1−FT (t) = (1/β)e−t/β 1−(1− e−t/β) = 1 β . 3-10 Solutions Manual for Statistical Inference b. fT (t) = γβ t γ−1e−t γ/β , t ≥ 0 and FT (t) = ∫ t 0 γ β x γ−1e−x γ/βdx = ∫ tγ/β 0 e−udu = − e−u|t γ/β 0 = 1− e−tγ/β , where u = xγ/β . Thus, hT (t) = (γ/β)tγ−1e−t γ/β e−tγ/β = γ β tγ−1. c. FT (t) = 11+e−(t−µ)/β and fT (t) = e−(t−µ)/β (1+e−(t−µ)/β)2 . Thus, hT (t) = 1 β e−(t−µ)/β(1+e −(t−µ)/β)2 1 e−(t−µ)/β 1+e−(t−µ)/β = 1 β FT (t). 3.27 a. The uniform pdf satisfies the inequalities of Exercise 2.27, hence is unimodal. b. For the gamma(α, β) pdf f(x), ignoring constants, ddxf(x) = xα−2e−x/β β [β(α−1)− x], which only has one sign change. Hence the pdf is unimodal with mode β(α− 1). c. For the n(µ, σ2) pdf f(x), ignoring constants, ddxf(x) = x−µ σ2 e −(−x/β)2/2σ2 , which only has one sign change. Hence the pdf is unimodal with mode µ. d. For the beta(α, β) pdf f(x), ignoring constants, d dx f(x) = xα−2(1− x)β−2 [(α−1)− x(α+β−2)] , which only has one sign change. Hence the pdf is unimodal with mode α−1α+β−2 . 3.28 a. (i) µ known, f(x|σ2) = 1√ 2πσ exp ( −1 2σ2 (x− µ)2 ) , h(x) = 1, c(σ2) = 1√ 2πσ2 I(0,∞)(σ2), w1(σ2) = − 12σ2 , t1(x) = (x− µ) 2. (ii) σ2 known, f(x|µ) = 1√ 2πσ exp ( − x 2 2σ2 ) exp ( − µ 2 2σ2 ) exp ( µ x σ2 ) , h(x) = exp ( −x2 2σ2 ) , c(µ) = 1√ 2πσ exp ( −µ2 2σ2 ) , w1(µ) = µ, t1(x) = xσ2 . b. (i) α known, f(x|β) = 1 Γ(α)βα xα−1e −x β , h(x) = x α−1 Γ(α) , x > 0, c(β) = 1 βα , w1(β) = 1 β , t1(x) = −x. (ii) β known, f(x|α) = e−x/β 1 Γ(α)βα exp((α− 1) log x), h(x) = e−x/β , x > 0, c(α) = 1Γ(α)βα w1(α) = α− 1, t1(x) = log x. (iii) α, β unknown, f(x|α, β) = 1 Γ(α)βα exp((α− 1) log x− x β ), h(x) = I{x>0}(x), c(α, β) = 1Γ(α)βα , w1(α) = α− 1, t1(x) = log x, w2(α, β) = −1/β, t2(x) = x. c. (i) α known, h(x) = xα−1I[0,1](x), c(β) = 1B(α,β) , w1(β) = β − 1, t1(x) = log(1− x). (ii) β known, h(x) = (1− x)β−1I[0,1](x), c(α) = 1B(α,β) , w1(x) = α− 1, t1(x) = log x. Second Edition 3-13 c. (i) h(x) = 1xI{0<x<∞}(x), c(α) = αα Γ(α)α > 0, w1(α) = α, w2(α) = α, t1(x) = log(x), t2(x) = −x. (ii) A line. d. (i) h(x) = C exp(x4)I{−∞<x<∞}(x), c(θ) = exp(θ4)−∞ < θ < ∞, w1(θ) = θ, w2(θ) = θ2, w3(θ) = θ3, t1(x) = −4x3, t2(x) = 6x2, t3(x) = −4x. (ii) The curve is a spiral in 3-space. (iii) A good picture can be generated with the Mathematica statement ParametricPlot3D[{t, t^2, t^3}, {t, 0, 1}, ViewPoint -> {1, -2, 2.5}]. 3.35 a. In Exercise 3.34(a) w1(λ) = 12λ and for a n(e θ, eθ), w1(θ) = 12eθ . b. EX = µ = αβ, then β = µα . Therefore h(x) = 1 xI{0<x<∞}(x), c(α) = α α Γ(α)( µα ) α , α > 0, w1(α) = α, w2(α) = αµ , t1(x) = log(x), t2(x) = −x. c. From (b) then (α1, . . . , αn, β1, . . . , βn) = (α1, . . . , αn, α1µ , . . . , αn µ ) 3.37 The pdf ( 1σ )f( (x−µ) σ ) is symmetric about µ because, for any > 0, 1 σ f ( (µ+)−µ σ ) = 1 σ f ( σ ) = 1 σ f ( − σ ) = 1 σ f ( (µ−)−µ σ ) . Thus, by Exercise 2.26b, µ is the median. 3.38 P (X > xα) = P (σZ + µ > σzα + µ) = P (Z > zα) by Theorem 3.5.6. 3.39 First take µ = 0 and σ = 1. a. The pdf is symmetric about 0, so 0 must be the median. Verifying this, write P (Z ≥ 0) = ∫ ∞ 0 1 π 1 1+z2 dz = 1 π tan−1(z) ∣∣∣∣∞ 0 = 1 π (π 2 −0 ) = 1 2 . b. P (Z ≥ 1) = 1π tan −1(z) ∣∣∞ 1 = 1π ( π 2− π 4 ) = 14 . By symmetry this is also equal to P (Z ≤ −1). Writing z = (x− µ)/σ establishes P (X ≥ µ) = 12 and P (X ≥ µ + σ) = 1 4 . 3.40 Let X ∼ f(x) have mean µ and variance σ2. Let Z = X−µσ . Then EZ = ( 1 σ ) E(X − µ) = 0 and VarZ = Var ( X − µ σ ) = ( 1 σ2 ) Var(X − µ) = ( 1 σ2 ) VarX = σ2 σ2 = 1. Then compute the pdf of Z, fZ(z) = fx(σz+µ) ·σ = σfx(σz+µ) and use fZ(z) as the standard pdf. 3.41 a. This is a special case of Exercise 3.42a. b. This is a special case of Exercise 3.42b. 3.42 a. Let θ1 > θ2. Let X1 ∼ f(x− θ1) and X2 ∼ f(x− θ2). Let F (z) be the cdf corresponding to f(z) and let Z ∼ f(z).Then F (x | θ1) = P (X1 ≤ x) = P (Z + θ1 ≤ x) = P (Z ≤ x− θ1) = F (x− θ1) ≤ F (x− θ2) = P (Z ≤ x− θ2) = P (Z + θ2 ≤ x) = P (X2 ≤ x) = F (x | θ2). 3-14 Solutions Manual for Statistical Inference The inequality is because x− θ2 > x− θ1, and F is nondecreasing. To get strict inequality for some x, let (a, b] be an interval of length θ1− θ2 with P (a < Z ≤ b) = F (b)−F (a) > 0. Let x = a + θ1. Then F (x | θ1) = F (x− θ1) = F (a + θ1 − θ1) = F (a) < F (b) = F (a + θ1 − θ2) = F (x− θ2) = F (x | θ2). b. Let σ1 > σ2. Let X1 ∼ f(x/σ1) and X2 ∼ f(x/σ2). Let F (z) be the cdf corresponding to f(z) and let Z ∼ f(z). Then, for x > 0, F (x | σ1) = P (X1 ≤ x) = P (σ1Z ≤ x) = P (Z ≤ x/σ1) = F (x/σ1) ≤ F (x/σ2) = P (Z ≤ x/σ2) = P (σ2Z ≤ x) = P (X2 ≤ x) = F (x | σ2). The inequality is because x/σ2 > x/σ1 (because x > 0 and σ1 > σ2 > 0), and F is nondecreasing. For x ≤ 0, F (x | σ1) = P (X1 ≤ x) = 0 = P (X2 ≤ x) = F (x | σ2). To get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a = σ1/σ2 and P (a < Z ≤ b) = F (b)− F (a) > 0. Let x = aσ1. Then F (x | σ1) = F (x/σ1) = F (aσ1/σ1) = F (a) < F (b) = F (aσ1/σ2) = F (x/σ2) = F (x | σ2). 3.43 a. FY (y|θ) = 1− FX( 1y |θ) y > 0, by Theorem 2.1.3. For θ1 > θ2, FY (y|θ1) = 1− FX ( 1 y ∣∣∣∣ θ1) ≤ 1− FX ( 1y ∣∣∣∣ θ2) = FY (y|θ2) for all y, since FX(x|θ) is stochastically increasing and if θ1 > θ2, FX(x|θ2) ≤ FX(x|θ1) for all x. Similarly, FY (y|θ1) = 1 − FX( 1y |θ1) < 1 − FX( 1 y |θ2) = FY (y|θ2) for some y, since if θ1 > θ2, FX(x|θ2) < FX(x|θ1) for some x. Thus FY (y|θ) is stochastically decreasing in θ. b. FX(x|θ) is stochastically increasing in θ. If θ1 > θ2 and θ1, θ2 > 0 then 1θ2 > 1 θ1 . Therefore FX(x| 1θ1 ) ≤ FX(x| 1 θ2 ) for all x and FX(x| 1θ1 ) < FX(x| 1 θ2 ) for some x. Thus FX(x| 1θ ) is stochastically decreasing in θ. 3.44 The function g(x) = |x| is a nonnegative function. So by Chebychev’s Inequality, P (|X| ≥ b) ≤ E|X|/b. Also, P (|X| ≥ b) = P (X2 ≥ b2). Since g(x) = x2 is also nonnegative, again by Chebychev’s Inequality we have P (|X| ≥ b) = P (X2 ≥ b2) ≤ EX2/b2. For X ∼ exponential(1), E|X| = EX = 1 and EX2 = VarX + (EX)2 = 2 . For b = 3, E|X|/b = 1/3 > 2/9 = EX2/b2. Thus EX2/b2 is a better bound. But for b = √ 2, E|X|/b = 1/ √ 2 < 1 = EX2/b2. Thus E|X|/b is a better bound. Second Edition 3-15 3.45 a. MX(t) = ∫ ∞ −∞ etxfX(x)dx ≥ ∫ ∞ a etxfX(x)dx ≥ eta ∫ ∞ a fX(x)dx = etaP (X ≥ a), where we use the fact that etx is increasing in x for t > 0. b. MX(t) = ∫ ∞ −∞ etxfX(x)dx ≥ ∫ a −∞ etxfX(x)dx ≥ eta ∫ a −∞ fX(x)dx = etaP (X ≤ a), where we use the fact that etx is decreasing in x for t < 0. c. h(t, x) must be nonnegative. 3.46 For X ∼ uniform(0, 1), µ = 12 and σ 2 = 112 , thus P (|X − µ| > kσ) = 1− P ( 1 2 − k√ 12 ≤ X ≤ 1 2 + k√ 12 ) = { 1− 2k√ 12 k < √ 3, 0 k ≥ √ 3, For X ∼ exponential(λ), µ = λ and σ2 = λ2, thus P (|X − µ| > kσ) = 1− P (λ− kλ ≤ X ≤ λ + kλ) = { 1 + e−(k+1) − ek−1 k ≤ 1 e−(k+1) k > 1. From Example 3.6.2, Chebychev’s Inequality gives the bound P (|X − µ| > kσ) ≤ 1/k2. Comparison of probabilities k u(0, 1) exp(λ) Chebychev exact exact .1 .942 .926 100 .5 .711 .617 4 1 .423 .135 1 1.5 .134 .0821 .44√ 3 0 0.0651 .33 2 0 0.0498 .25 4 0 0.00674 .0625 10 0 0.0000167 .01 So we see that Chebychev’s Inequality is quite conservative. 3.47 P (|Z| > t) = 2P (Z > t) = 2 1√ 2π ∫ ∞ t e−x 2/2dx = √ 2 π ∫ ∞ t 1+x2 1+x2 e−x 2/2dx = √ 2 π [∫ ∞ t 1 1+x2 e−x 2/2dx+ ∫ ∞ t x2 1+x2 e−x 2/2dx ] . Chapter 4 Multiple Random Variables 4.1 Since the distribution is uniform, the easiest way to calculate these probabilities is as the ratio of areas, the total area being 4. a. The circle x2 + y2 ≤ 1 has area π, so P (X2 + Y 2 ≤ 1) = π4 . b. The area below the line y = 2x is half of the area of the square, so P (2X − Y > 0) = 24 . c. Clearly P (|X + Y | < 2) = 1. 4.2 These are all fundamental properties of integrals. The proof is the same as for Theorem 2.2.5 with bivariate integrals replacing univariate integrals. 4.3 For the experiment of tossing two fair dice, each of the points in the 36-point sample space are equally likely. So the probability of an event is (number of points in the event)/36. The given probabilities are obtained by noting the following equivalences of events. P ({X = 0, Y = 0}) = P ({(1, 1), (2, 1), (1, 3), (2, 3), (1, 5), (2, 5)}) = 6 36 = 1 6 P ({X = 0, Y = 1}) = P ({(1, 2), (2, 2), (1, 4), (2, 4), (1, 6), (2, 6)}) = 6 36 = 1 6 P ({X = 1, Y = 0}) = P ({(3, 1), (4, 1), (5, 1), (6, 1), (3, 3), (4, 3), (5, 3), (6, 3), (3, 5), (4, 5), (5, 5), (6, 5)}) = 12 36 = 1 3 P ({X = 1, Y = 1}) = P ({(3, 2), (4, 2), (5, 2), (6, 2), (3, 4), (4, 4), (5, 4), (6, 4), (3, 6), (4, 6), (5, 6), (6, 6)}) = 12 36 = 1 3 4.4 a. ∫ 1 0 ∫ 2 0 C(x + 2y)dxdy = 4C = 1, thus C = 14 . b. fX(x) = {∫ 1 0 1 4 (x + 2y)dy = 1 4 (x + 1) 0 < x < 2 0 otherwise c. FXY (x, y) = P (X ≤ x, Y ≤ y) = ∫ x −∞ ∫ y −∞ f(v, u)dvdu. The way this integral is calculated depends on the values of x and y. For example, for 0 < x < 2 and 0 < y < 1, FXY (x, y) = ∫ x −∞ ∫ y −∞ f(u, v)dvdu = ∫ x 0 ∫ y 0 1 4 (u + 2v)dvdu = x2y 8 + y2x 4 . But for 0 < x < 2 and 1 ≤ y, FXY (x, y) = ∫ x −∞ ∫ y −∞ f(u, v)dvdu = ∫ x 0 ∫ 1 0 1 4 (u + 2v)dvdu = x2 8 + x 4 . 4-2 Solutions Manual for Statistical Inference The complete definition of FXY is FXY (x, y) =  0 x ≤ 0 or y ≤ 0 x2y/8 + y2x/4 0 < x < 2 and 0 < y < 1 y/2 + y2/2 2 ≤ x and 0 < y < 1 x2/8 + x/4 0 < x < 2 and 1 ≤ y 1 2 ≤ x and 1 ≤ y . d. The function z = g(x) = 9/(x + 1)2 is monotone on 0 < x < 2, so use Theorem 2.1.5 to obtain fZ(z) = 9/(8z2), 1 < z < 9. 4.5 a. P (X > √ Y ) = ∫ 1 0 ∫ 1√ y (x + y)dxdy = 720 . b. P (X2 < Y < X) = ∫ 1 0 ∫√y y 2xdxdy = 16 . 4.6 Let A = time that A arrives and B = time that B arrives. The random variables A and B are independent uniform(1, 2) variables. So their joint pdf is uniform on the square (1, 2)× (1, 2). Let X = amount of time A waits for B. Then, FX(x) = P (X ≤ x) = 0 for x < 0, and FX(x) = P (X ≤ x) = 1 for 1 ≤ x. For x = 0, we have FX(0) = P (X ≤ 0) = P (X = 0) = P (B ≤ A) = ∫ 2 1 ∫ a 1 1dbda = 1 2 . And for 0 < x < 1, FX(x) = P (X ≤ x) = 1−P (X > x) = 1−P (B−A > x) = 1− ∫ 2−x 1 ∫ 2 a+x 1dbda = 1 2 +x− x 2 2 . 4.7 We will measure time in minutes past 8 A.M. So X ∼ uniform(0, 30), Y ∼ uniform(40, 50) and the joint pdf is 1/300 on the rectangle (0, 30)× (40, 50). P (arrive before 9 A.M.) = P (X + Y < 60) = ∫ 50 40 ∫ 60−y 0 1 300 dxdy = 1 2 . 4.9 P (a ≤ X ≤ b, c ≤ Y ≤ d) = P (X ≤ b, c ≤ Y ≤ d)− P (X ≤ a, c ≤ Y ≤ d) = P (X ≤ b, Y ≤ d)− P (X ≤ b, Y ≤ c)− P (X ≤ a, Y ≤ d) + P (X ≤ a, Y ≤ c) = F (b, d)− F (b, c)− F (a, d)− F (a, c) = FX(b)FY (d)− FX(b)FY (c)− FX(a)FY (d)− FX(a)FY (c) = P (X ≤ b) [P (Y ≤ d)− P (Y ≤ c)]− P (X ≤ a) [P (Y ≤ d)− P (Y ≤ c)] = P (X ≤ b)P (c ≤ Y ≤ d)− P (X ≤ a)P (c ≤ Y ≤ d) = P (a ≤ X ≤ b)P (c ≤ Y ≤ d). 4.10 a. The marginal distribution of X is P (X = 1) = P (X = 3) = 14 and P (X = 2) = 1 2 . The marginal distribution of Y is P (Y = 2) = P (Y = 3) = P (Y = 4) = 13 . But P (X = 2, Y = 3) = 0 6= (1 2 )( 1 3 ) = P (X = 2)P (Y = 3). Therefore the random variables are not independent. b. The distribution that satisfies P (U = x, V = y) = P (U = x)P (V = y) where U ∼ X and V ∼ Y is Second Edition 4-3 U 1 2 3 2 112 1 6 1 12 V 3 112 1 6 1 12 4 112 1 6 1 12 4.11 The support of the distribution of (U, V ) is {(u, v) : u = 1, 2, . . . ; v = u + 1, u + 2, . . .}. This is not a cross-product set. Therefore, U and V are not independent. More simply, if we know U = u, then we know V > u. 4.12 One interpretation of “a stick is broken at random into three pieces” is this. Suppose the length of the stick is 1. Let X and Y denote the two points where the stick is broken. Let X and Y both have uniform(0, 1) distributions, and assume X and Y are independent. Then the joint distribution of X and Y is uniform on the unit square. In order for the three pieces to form a triangle, the sum of the lengths of any two pieces must be greater than the length of the third. This will be true if and only if the length of each piece is less than 1/2. To calculate the probability of this, we need to identify the sample points (x, y) such that the length of each piece is less than 1/2. If y > x, this will be true if x < 1/2, y − x < 1/2 and 1 − y < 1/2. These three inequalities define the triangle with vertices (0, 1/2), (1/2, 1/2) and (1/2, 1). (Draw a graph of this set.) Because of the uniform distribution, the probability that (X, Y ) falls in the triangle is the area of the triangle, which is 1/8. Similarly, if x > y, each piece will have length less than 1/2 if y < 1/2, x − y < 1/2 and 1− x < 1/2. These three inequalities define the triangle with vertices (1/2, 0), (1/2, 1/2) and (1, 1/2). The probability that (X, Y ) is in this triangle is also 1/8. So the probability that the pieces form a triangle is 1/8 + 1/8 = 1/4. 4.13 a. E(Y − g(X))2 = E((Y − E(Y | X)) + (E(Y | X)− g(X)))2 = E(Y − E(Y | X))2 + E(E(Y | X)− g(X))2 + 2E [(Y − E(Y | X))(E(Y | X)− g(X))] . The cross term can be shown to be zero by iterating the expectation. Thus E(Y − g(X))2 = E(Y −E(Y | X))2 +E(E(Y | X)−g(X))2 ≥ E(Y −E(Y | X))2, for all g(·). The choice g(X) = E(Y | X) will give equality. b. Equation (2.2.3) is the special case of a) where we take the random variable X to be a constant. Then, g(X) is a constant, say b, and E(Y | X) = EY . 4.15 We will find the conditional distribution of Y |X + Y . The derivation of the conditional distri- bution of X|X + Y is similar. Let U = X + Y and V = Y . In Example 4.3.1, we found the joint pmf of (U, V ). Note that for fixed u, f(u, v) is positive for v = 0, . . . , u. Therefore the conditional pmf is f(v|u) = f(u, v) f(u) = θu−ve−θ (u−v)! λve−λ v! (θ+λ)ue−(θ+λ) u! = ( u v )( λ θ+λ )v ( θ θ+λ )u−v , v = 0, . . . , u. That is V |U ∼ binomial(U, λ/(θ + λ)). 4.16 a. The support of the distribution of (U, V ) is {(u, v) : u = 1, 2, . . . ; v = 0,±1,±2, . . .}. If V > 0, then X > Y . So for v = 1, 2, . . ., the joint pmf is fU,V (u, v) = P (U = u, V = v) = P (Y = u, X = u + v) = p(1− p)u+v−1p(1− p)u−1 = p2(1− p)2u+v−2. 4-6 Solutions Manual for Statistical Inference Then, fU (u) = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) uα−1 ∫ 1 u vβ−1(1− v)γ−1(v−u v )β−1dv = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) uα−1(1− u)β+γ−1 ∫ 1 0 yβ−1(1− y)γ−1dy ( y = v − u 1−u , dy = dv 1−u ) = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) uα−1(1− u)β+γ−1 Γ(β)Γ(γ) Γ(β+γ) = Γ(α+β+γ) Γ(α)Γ(β+γ) uα−1(1− u)β+γ−1, 0 < u < 1. Thus, U ∼ gamma(α, β + γ). b. Let x = √ uv, y = √ u v then J = ∣∣∣∣ ∂x∂u ∂x∂v∂y ∂u ∂x ∂v ∣∣∣∣ = ∣∣∣∣ 12v1/2u−1/2 12u1/2v−1/21 2v −1/2u−1/2 − 12u 1/2v−3/2 ∣∣∣∣ = 12v . fU,V (u, v) = Γ(α + β + γ) Γ(α)Γ(β)Γ(γ) ( √ uv α−1(1− √ uv)β−1 (√ u v )α+β−1( 1− √ u v )γ−1 1 2v . The set {0 < x < 1, 0 < y < 1} is mapped onto the set {0 < u < v < 1u , 0 < u < 1}. Then, fU (u) = ∫ 1/u u fU,V (u, v)dv = Γ(α + β + γ) Γ(α)Γ(β)Γ(γ) uα−1(1−u)β+γ−1︸︷︷︸ ∫ 1/u u ( 1− √ uv 1− u )β−1(1−√u/v 1− u )γ−1 ( √ u/v)β 2v(1− u) dv. Call it A To simplify, let z = √ u/v−u 1−u . Then v = u ⇒ z = 1, v = 1/u ⇒ z = 0 and dz = − √ u/v 2(1−u)v dv. Thus, fU (u) = A ∫ zβ−1(1− z)γ−1dz ( kernel of beta(β, γ)) = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) uα−1(1− u)β+γ−1 Γ(β)Γ(γ) Γ(β+γ) = Γ(α+β+γ) Γ(α)Γ(β+γ) uα−1(1− u)β+γ−1, 0 < u < 1. That is, U ∼ beta(α, β + γ), as in a). 4.24 Let z1 = x + y, z2 = xx+y , then x = z1z2, y = z1(1− z2) and |J | = ∣∣∣∣∣ ∂x∂z1 ∂x∂z2∂y ∂z1 ∂y ∂z2 ∣∣∣∣∣ = ∣∣∣∣ z2 z11−z2 −z1 ∣∣∣∣ = z1. The set {x > 0, y > 0} is mapped onto the set {z1 > 0, 0 < z2 < 1}. fZ1,Z2(z1, z2) = 1 Γ(r) (z1z2)r−1e−z1z2 · 1 Γ(s) (z1 − z1z2)s−1e−z1+z1z2z1 = 1 Γ(r+s) zr+s−11 e −z1 · Γ(r+s) Γ(r)Γ(s) zr−12 (1− z2)s−1, 0 < z1, 0 < z2 < 1. Second Edition 4-7 fZ1,Z2(z1, z2) can be factored into two densities. Therefore Z1 and Z2 are independent and Z1 ∼ gamma(r + s, 1), Z2 ∼ beta(r, s). 4.25 For X and Z independent, and Y = X + Z, fXY (x, y) = fX(x)fZ(y − x). In Example 4.5.8, fXY (x, y) = I(0,1)(x) 1 10 I(0,1/10)(y − x). In Example 4.5.9, Y = X2 + Z and fXY (x, y) = fX(x)fZ(y − x2) = 1 2 I(−1,1)(x) 1 10 I(0,1/10)(y − x2). 4.26 a. P (Z ≤ z,W = 0) = P (min(X, Y ) ≤ z, Y ≤ X) = P (Y ≤ z, Y ≤ X) = ∫ z 0 ∫ ∞ y 1 λ e−x/λ 1 µ e−y/µdxdy = λ µ+λ ( 1− exp { − ( 1 µ + 1 λ ) z }) . Similarly, P (Z ≤ z,W=1) = P (min(X, Y ) ≤ z,X ≤ Y ) = P (X ≤ z,X ≤ Y ) = ∫ z 0 ∫ ∞ x 1 λ e−x/λ 1 µ e−y/µdydx = µ µ+λ ( 1− exp { − ( 1 µ + 1 λ ) z }) . b. P (W = 0) = P (Y ≤ X) = ∫ ∞ 0 ∫ ∞ y 1 λ e−x/λ 1 µ e−y/µdxdy = λ µ+λ . P (W = 1) = 1− P (W = 0) = µ µ+λ . P (Z ≤ z) = P (Z ≤ z,W = 0) + P (Z ≤ z,W = 1) = 1− exp { − ( 1 µ + 1 λ ) z } . Therefore, P (Z ≤ z,W = i) = P (Z ≤ z)P (W = i), for i = 0, 1, z > 0. So Z and W are independent. 4.27 From Theorem 4.2.14 we know U ∼ n(µ + γ, 2σ2) and V ∼ n(µ − γ, 2σ2). It remains to show that they are independent. Proceed as in Exercise 4.24. fXY (x, y) = 1 2πσ2 e− 1 2σ2 [(x−µ) 2+(y−γ)2] (by independence, sofXY = fXfY ) Let u = x + y, v = x− y, then x = 12 (u + v), y = 1 2 (u− v) and |J | = ∣∣∣∣ 1/2 1/21/2 −1/2 ∣∣∣∣ = 12 . The set {−∞ < x < ∞,−∞ < y < ∞} is mapped onto the set {−∞ < u < ∞,−∞ < v < ∞}. Therefore fUV (u, v) = 1 2πσ2 e − 1 2σ2 [ (( u+v2 )−µ) 2 +(( u−v2 )−γ) 2 ] · 1 2 = 1 4πσ2 e − 1 2σ2 [ 2(u2 ) 2−u(µ+γ)+ (µ+γ) 2 2 +2( v2 ) 2−v(µ−γ)+ (µ+γ) 2 2 ] = g(u) 1 4πσ2 e − 1 2(2σ2) (u− (µ + γ))2 · h(v)e− 1 2(2σ2) (v − (µ− γ))2 . By the factorization theorem, U and V are independent. 4-8 Solutions Manual for Statistical Inference 4.29 a. XY = R cos θ R sin θ = cot θ. Let Z = cot θ. Let A1 = (0, π), g1(θ) = cot θ, g −1 1 (z) = cot −1 z, A2 = (π, 2π), g2(θ) = cot θ, g−12 (z) = π + cot −1 z. By Theorem 2.1.8 fZ(z) = 1 2π | −1 1 + z2 |+ 1 2π | −1 1 + z2 | = 1 π 1 1 + z2 , −∞ < z < ∞. b. XY = R2 cos θ sin θ then 2XY = R22 cos θ sin θ = R2 sin 2θ. Therefore 2XYR = R sin 2θ. Since R = √ X2 + Y 2 then 2XY√ X2+Y 2 = R sin 2θ. Thus 2XY√ X2+Y 2 is distributed as sin 2θ which is distributed as sin θ. To see this let sin θ ∼ fsin θ. For the function sin 2θ the values of the function sin θ are repeated over each of the 2 intervals (0, π) and (π, 2π) . Therefore the distribution in each of these intervals is the distribution of sin θ. The probability of choosing between each one of these intervals is 12 . Thus f2 sin θ = 1 2fsin θ + 1 2fsin θ = fsin θ. Therefore 2XY√ X2+Y 2 has the same distribution as Y = sin θ. In addition, 2XY√ X2+Y 2 has the same distribution as X = cos θ since sin θ has the same distribution as cos θ. To see this let consider the distribution of W = cos θ and V = sin θ where θ ∼ uniform(0, 2π). To derive the distribution of W = cos θ let A1 = (0, π), g1(θ) = cos θ, g−11 (w) = cos −1 w, A2 = (π, 2π), g2(θ) = cos θ, g−12 (w) = 2π − cos−1 w. By Theorem 2.1.8 fW (w) = 1 2π | −1√ 1− w2 |+ 1 2π | 1√ 1− w2 | = 1 π 1√ 1− w2 ,−1 ≤ w ≤ 1. To derive the distribution of V = sin θ, first consider the interval (π2 , 3π 2 ). Let g1(θ) = sin θ, 4g−11 (v) = π − sin −1 v, then fV (v) = 1 π 1√ 1− v2 , −1 ≤ v ≤ 1. Second, consider the set {(0, π2 )∪ ( 3π 2 , 2π)}, for which the function sin θ has the same values as it does in the interval (−π2 , π 2 ). Therefore the distribution of V in {(0, π 2 ) ∪ ( 3π 2 , 2π)} is the same as the distribution of V in (−π2 , π 2 ) which is 1 π 1√ 1−v2 , −1 ≤ v ≤ 1. On (0, 2π) each of the sets (π2 , 3π 2 ), {(0, π 2 ) ∪ ( 3π 2 , 2π)} has probability 1 2 of being chosen. Therefore fV (v) = 1 2 1 π 1√ 1− v2 + 1 2 1 π 1√ 1− v2 = 1 π 1√ 1− v2 , −1 ≤ v ≤ 1. Thus W and V has the same distribution. Let X and Y be iid n(0, 1). Then X2 + Y 2 ∼ χ22 is a positive random variable. Therefore with X = R cos θ and Y = R sin θ, R = √ X2 + Y 2 is a positive random variable and θ = tan−1( YX ) ∼ uniform(0, 1). Thus 2XY√ X2+Y 2 ∼ X ∼ n(0, 1). 4.30 a. EY = E {E(Y |X)} = EX = 1 2 . VarY = Var (E(Y |X)) + E (Var(Y |X)) = VarX + EX2 = 1 12 + 1 3 = 5 12 . EXY = E[E(XY |X)] = E[XE(Y |X)] = EX2 = 1 3 Cov(X, Y ) = EXY − EXEY = 1 3 − ( 1 2 )2 = 1 12 . b. The quick proof is to note that the distribution of Y |X = x is n(1, 1), hence is independent of X. The bivariate transformation t = y/x, u = x will also show that the joint density factors. Second Edition 4-11 = ( r + x− 1 x ) Γ(α + β) Γ(α)Γ(β) ∫ 1 0 p(r+α)−1(1− p)(x+β)−1dp = ( r + x− 1 x ) Γ(α + β) Γ(α)Γ(β) Γ(r + α)Γ(x + β) Γ(r + x + α + β) x = 0, 1, . . . Therefore, EX = E[E(X|P )] = E [ r(1− P ) P ] = rβ α− 1 , since E [ 1− P P ] = ∫ 1 0 ( 1− P P ) Γ(α + β) Γ(α)Γ(β) pα−1(1− p)β−1dp = Γ(α + β) Γ(α)Γ(β) ∫ 1 0 p(α−1)−1(1− p)(β+1)−1dp = Γ(α + β) Γ(α)Γ(β) Γ(α− 1)Γ(β + 1) Γ(α + β) = β α− 1 . Var(X) = E(Var(X|P )) + Var(E(X|P )) = E [ r(1− P ) P 2 ] + Var ( r(1− P ) P ) = r (β + 1)(α + β) α(α− 1) + r2 β(α + β − 1) (α− 1)2(α− 2) , since E [ 1− P P 2 ] = ∫ 1 0 Γ(α + β) Γ(α)Γ(β) p(α−2)−1(1− p)(β+1)−1dp = Γ(α + β) Γ(α)Γ(β) Γ(α− 2)Γ(β + 1) Γ(α + β − 1) = (β + 1)(α + β) α(α− 1) and Var ( 1− P P ) = E [( 1− P P )2] − ( E [ 1− P P ])2 = β(β + 1) (α− 2)(α− 1) − ( β α− 1 )2 = β(α + β − 1) (α− 1)2(α− 2) , where E [( 1− P P )2] = ∫ 1 0 Γ(α + β) Γ(α)Γ(β) p(α−2)−1(1− p)(β+2)−1dp = Γ(α + β) Γ(α)Γ(β) Γ(α− 2)Γ(β + 2) Γ(α− 2 + β + 2) = β(β + 1) (α− 2)(α− 1) . 4.35 a. Var(X) = E(Var(X|P )) + Var(E(X|P )). Therefore, Var(X) = E[nP (1− P )] + Var(nP ) = n αβ (α + β)(α + β + 1) + n2VarP = n αβ(α + β + 1− 1) (α + β2)(α + β + 1) + n2VarP 4-12 Solutions Manual for Statistical Inference = nαβ(α + β + 1) (α + β2)(α + β + 1) − nαβ (α + β2)(α + β + 1) + n2VarP = n α α + β β α + β − nVarP + n2VarP = nEP (1− EP ) + n(n− 1)VarP. b. Var(Y ) = E(Var(Y |Λ)) + Var(E(Y |Λ)) = EΛ + Var(Λ) = µ + 1αµ 2 since EΛ = µ = αβ and Var(Λ) = αβ2 = (αβ) 2 α = µ2 α . The “extra-Poisson” variation is 1 αµ 2. 4.37 a. Let Y = ∑ Xi. P (Y = k) = P (Y = k, 1 2 < c = 1 2 (1 + p) < 1) = ∫ 1 0 (Y = k|c = 1 2 (1 + p))P (P = p)dp = ∫ 1 0 ( n k ) [ 1 2 (1 + p)]k[1− 1 2 (1 + p)]n−k Γ(a + b) Γ(a)Γ(b) pa−1(1− p)b−1dp = ∫ 1 0 ( n k ) (1 + p)k 2k (1− p)n−k 2n−k Γ(a + b) Γ(a)Γ(b) pa−1(1− p)b−1dp = ( n k ) Γ(a + b) 2nΓ(a)Γ(b) k∑ j=0 ∫ 1 0 pk+a−1(1− p)n−k+b−1dp = ( n k ) Γ(a + b) 2nΓ(a)Γ(b) k∑ j=0 ( k j ) Γ(k + a)Γ(n− k + b) Γ(n + a + b) = k∑ j=0 [(( k j ) 2n )(( n k ) Γ(a + b) Γ(a)Γ(b) Γ(k + a)Γ(n− k + b) Γ(n + a + b) )] . A mixture of beta-binomial. b. EY = E(E(Y |c)) = E[nc] = E [ n ( 1 2 (1 + p) )] = n 2 ( 1 + a a + b ) . Using the results in Exercise 4.35(a), Var(Y ) = nEC(1− EC) + n(n− 1)VarC. Therefore, Var(Y ) = nE [ 1 2 (1 + P ) ]( 1− E [ 1 2 (1 + P ) ]) + n(n− 1)Var ( 1 2 (1 + P ) ) = n 4 (1 + EP )(1− EP ) + n(n− 1) 4 VarP = n 4 ( 1− ( a a + b )2) + n(n− 1) 4 ab (a + b)2(a + b + 1) . 4.38 a. Make the transformation u = xν − x λ , du = −x ν2 dν, ν λ−ν = x λu . Then∫ λ 0 1 ν e−x/ν 1 Γ(r)Γ(1− r) νr−1 (λ−ν)r dν Second Edition 4-13 = 1 Γ(r)Γ(1− r) ∫ ∞ 0 1 x ( x λu )r e−(u+x/λ)du = xr−1e−x/λ λrΓ(r)Γ(1− r) ∫ ∞ 0 ( 1 u )r e−udu = xr−1e−x/λ Γ(r)λr , since the integral is equal to Γ(1− r) if r < 1. b. Use the transformation t = ν/λ to get∫ λ 0 pλ(ν)dν = 1 Γ(r)Γ(1− r) ∫ λ 0 νr−1(λ− ν)−rdν = 1 Γ(r)Γ(1− r) ∫ 1 0 tr−1(1− t)−rdt = 1, since this is a beta(r, 1− r). c. d dx log f(x) = d dx [ log 1 Γ(r)λr +(r − 1) log x− x/λ ] = r−1 x − 1 λ > 0 for some x, if r > 1. But, d dx [ log ∫ ∞ 0 e−x/ν ν qλ(ν)dν ] = − ∫∞ 0 1 ν2 e −x/νqλ(ν)dν∫∞ 0 1 ν e −x/νqλ(ν)dν < 0 ∀x. 4.39 a. Without loss of generality lets assume that i < j. From the discussion in the text we have that f(x1, . . . , xj−1, xj+1, . . . , xn|xj) = (m− xj)! x1!· · · · ·xj−1!·xj+1!· · · · ·xn! × ( p1 1− pj )x1 · · · · · ( pj−1 1− pj )xj−1 ( pj+1 1− pj )xj+1 · · · · · ( pn 1− pj )xn . Then, f(xi|xj) = ∑ (x1,...,xi−1,xi+1,...,xj−1,xj+1,...,xn) f(x1, . . . , xj−1, xj+1, . . . , xn|xj) = ∑ (xk 6=xi,xj) (m− xj)! x1!· · · · ·xj−1!·xj+1!· · · · ·xn! × ( p1 1− pj )x1 · · · · ·( pj−1 1− pj )xj−1( pj+1 1− pj )xj+1 · · · · ·( pn 1− pj )xn × (m− xi − xj)! ( 1− pi1−pj )m−xi−xj (m− xi − xj)! ( 1− pi1−pj )m−xi−xj = (m− xj)! xi!(m− xi − xj)! ( pi 1− pj )xi ( 1− pi 1− pj )m−xi−xj × ∑ (xk 6=xi,xj) (m− xi − xj)! x1!· · · · ·xi−1!, xi+1!· · · · ·xj−1!, xj+1!· · · · ·xn! × ( p1 1− pj − pi )x1 · · · · ·( pi−1 1− pj − pi )xi−1( pi+1 1− pj − pi )xi+1 4-16 Solutions Manual for Statistical Inference as is the variance, Var(aX + bY ) = a2VarX + b2VarY + 2abCov(X, Y ) = a2σ2X + b 2σ2Y + 2abρσXσY . To show that aX + bY is normal we have to do a bivariate transform. One possibility is U = aX + bY , V = Y , then get fU,V (u, v) and show that fU (u) is normal. We will do this in the standard case. Make the indicated transformation and write x = 1a (u − bv), y = v and obtain |J | = ∣∣∣∣ 1/a −b/a0 1 ∣∣∣∣ = 1a. Then fUV (u, v) = 1 2πa √ 1−ρ2 e − 1 2(1−ρ2) [ [ 1a (u−bv)] 2−2 ρa (u−bv)+v 2 ] . Now factor the exponent to get a square in u. The result is − 1 2(1−ρ2) [ b2 + 2ρab + a2 a2 ] [ u2 b2 + 2ρab + a2 −2 ( b + aρ b2 + 2ρab + a2 ) uv + v2 ] . Note that this is joint bivariate normal form since µU = µV = 0, σ2v = 1, σ 2 u = a 2 + b2 +2abρ and ρ∗ = Cov(U, V ) σUσV = E(aXY + bY 2) σUσV = aρ + b√ a2 + b2 + abρ , thus (1− ρ∗2) = 1− a 2ρ2 + abρ + b2 a2 + b2 + 2abρ = (1−ρ2)a2 a2 + b2 + 2abρ = (1− ρ2)a2 σ2u where a √ 1−ρ2 = σU √ 1−ρ∗2. We can then write fUV (u, v) = 1 2πσUσV √ 1−ρ∗2 exp [ − 1 2 √ 1−ρ∗2 ( u2 σ2U −2ρ uv σUσV + v2 σ2V )] , which is in the exact form of a bivariate normal distribution. Thus, by part a), U is normal. 4.46 a. EX = aXEZ1 + bXEZ2 + EcX = aX0 + bX0 + cX = cX VarX = a2XVarZ1 + b 2 XVarZ2 + VarcX = a 2 X + b 2 X EY = aY 0 + bY 0 + cY = cY VarY = a2Y VarZ1 + b 2 Y VarZ2 + VarcY = a 2 Y + b 2 Y Cov(X,Y ) = EXY − EX · EY = E[(aXaY Z21 + bXbY Z 2 2 + cXcY + aXbY Z1Z2 + aXcY Z1 + bXaY Z2Z1 + bXcY Z2 + cXaY Z1 + cXbY Z2)− cXcY ] = aXaY + bXbY , since EZ21 = EZ 2 2 = 1, and expectations of other terms are all zero. b. Simply plug the expressions for aX , bX , etc. into the equalities in a) and simplify. c. Let D = aXbY − aY bX = − √ 1−ρ2σXσY and solve for Z1 and Z2, Z1 = bY (X−cX)− bX(Y−cY ) D = σY (X−µX)+σX(Y−µY )√ 2(1+ρ)σXσY Z2 = σY (X−µX)+σX(Y−µY )√ 2(1−ρ)σXσY . Second Edition 4-17 Then the Jacobian is J = ( ∂z1 ∂x1 ∂z1 ∂y ∂z2 ∂x ∂z2 ∂y ) = ( bY D −bX D−aY D aX D ) = aXbY D2 − aY bX D2 = 1 D = 1 − √ 1−ρ2σXσY , and we have that fX,Y (x, y) = 1√ 2π e − 12 (σY (x−µX )+σX (y−µY )) 2 2(1+ρ)σ2 X σ2 Y 1√ 2π e − 12 (σY (x−µX )+σX (y−µY )) 2 2(1−ρ)σ2 X σ2 Y 1√ 1−ρ2σXσY = (2πσXσY √ 1− ρ2)−1 exp ( − 1 2(1− ρ2) ( x− µX σX )2) − 2ρx− µX σX ( y − µY σY ) + ( y − µY σY )2 , −∞ < x < ∞, −∞ < y < ∞, a bivariate normal pdf. d. Another solution is aX = ρσXbX = √ (1− ρ2)σX aY = σY bY = 0 cX = µX cY = µY . There are an infinite number of solutions. Write bX = ± √ σ2X−a2X ,bY = ± √ σ2Y−a2Y , and substitute bX ,bY into aXaY = ρσXσY . We get aXaY + ( ± √ σ2X−a2X )( ± √ σ2Y−a2Y ) = ρσXσY . Square both sides and simplify to get (1− ρ2)σ2Xσ2Y = σ2Xa2Y − 2ρσXσY aXaY + σ2Y a2X . This is an ellipse for ρ 6= ±1, a line for ρ = ±1. In either case there are an infinite number of points satisfying the equations. 4.47 a. By definition of Z, for z < 0, P (Z ≤ z) = P (X ≤ z and XY > 0) + P (−X ≤ z and XY < 0) = P (X ≤ z and Y < 0) + P (X ≥ −z and Y < 0) (since z < 0) = P (X ≤ z)P (Y < 0) + P (X ≥ −z)P (Y < 0) (independence) = P (X ≤ z)P (Y < 0) + P (X ≤ z)P (Y > 0) (symmetry of Xand Y ) = P (X ≤ z)(P (Y < 0) + P (Y > 0)) = P (X ≤ z). By a similar argument, for z > 0, we get P (Z > z) = P (X > z), and hence, P (Z ≤ z) = P (X ≤ z). Thus, Z ∼ X ∼ n(0, 1). b. By definition of Z, Z > 0 ⇔ either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. So Z and Y always have the same sign, hence they cannot be bivariate normal. 4-18 Solutions Manual for Statistical Inference 4.49 a. fX(x) = ∫ (af1(x)g1(y) + (1− a)f2(x)g2(y))dy = af1(x) ∫ g1(y)dy + (1− a)f2(x) ∫ g2(y)dy = af1(x) + (1− a)f2(x). fY (y) = ∫ (af1(x)g1(y) + (1− a)f2(x)g2(y))dx = ag1(y) ∫ f1(x)dx + (1− a)g2(y) ∫ f2(x)dx = ag1(y) + (1− a)g2(y). b. (⇒) If X and Y are independent then f(x, y) = fX(x)fY (y). Then, f(x, y)− fX(x)fY (y) = af1(x)g1(y) + (1− a)f2(x)g2(y) − [af1(x) + (1− a)f2(x)][ag1(y) + (1− a)g2(y)] = a(1− a)[f1(x)g1(y)− f1(x)g2(y)− f2(x)g1(y) + f2(x)g2(y)] = a(1− a)[f1(x)− f2(x)][g1(y)− g2(y)] = 0. Thus [f1(x)− f2(x)][g1(y)− g2(y)] = 0 since 0 < a < 1. (⇐) if [f1(x)− f2(x)][g1(y)− g2(y)] = 0 then f1(x)g1(y) + f2(x)g2(y) = f1(x)g2(y) + f2(x)g1(y). Therefore fX(x)fY (y) = a2f1(x)g1(y) + a(1− a)f1(x)g2(y) + a(1− a)f2(x)g1(y) + (1− a)2f2(x)g2(y) = a2f1(x)g1(y) + a(1− a)[f1(x)g2(y) + f2(x)g1(y)] + (1− a)2f2(x)g2(y) = a2f1(x)g1(y) + a(1− a)[f1(x)g1(y) + f2(x)g2(y)] + (1− a)2f2(x)g2(y) = af1(x)g1(y) + (1− a)f2(x)g2(y) = f(x, y). Thus X and Y are independent. c. Cov(X, Y ) = aµ1ξ1 + (1− a)µ2ξ2 − [aµ1 + (1− a)µ2][aξ1 + (1− a)ξ2] = a(1− a)[µ1ξ1 − µ1ξ2 − µ2ξ1 + µ2ξ2] = a(1− a)[µ1 − µ2][ξ1 − ξ2]. To construct dependent uncorrelated random variables let (X, Y ) ∼ af1(x)g1(y) + (1 − a)f2(x)g2(y) where f1, f2, g1, g2 are such that f1 − f2 6= 0 and g1 − g2 6= 0 with µ1 = µ2 or ξ1 = ξ2. d. (i) f1 ∼ binomial(n, p), f2 ∼ binomial(n, p), g1 ∼ binomial(n, p), g2 ∼ binomial(n, 1− p). (ii) f1 ∼ binomial(n, p1), f2 ∼ binomial(n, p2), g1 ∼ binomial(n, p1), g2 ∼ binomial(n, p2). (iii) f1 ∼ binomial(n1, pn1 ), f2 ∼ binomial(n2, p n2 ), g1 ∼ binomial(n1, p), g2 ∼ binomial(n2, p). Second Edition 4-21 We need to prove that log(n) ≥ ∑n i=1 ai log( 1 ai ). Using Jensen inequality we have that E log( 1a ) = ∑n i=1 ai log( 1 ai ) ≤ log(E 1a ) = log( ∑n i=1 ai 1 ai ) = log(n) which establish the result. 4.59 Assume that EX = 0, EY = 0, and EZ = 0. This can be done without loss of generality because we could work with the quantities X −EX, etc. By iterating the expectation we have Cov(X, Y ) = EXY = E[E(XY |Z)]. Adding and subtracting E(X|Z)E(Y |Z) gives Cov(X, Y ) = E[E(XY |Z)− E(X|Z)E(Y |Z)] + E[E(X|Z)E(Y |Z)]. Since E[E(X|Z)] = EX = 0, the second term above is Cov[E(X|Z)E(Y |Z)]. For the first term write E[E(XY |Z)− E(X|Z)E(Y |Z)] = E [E {XY − E(X|Z)E(Y |Z)|Z}] where we have brought E(X|Z) and E(Y |Z) inside the conditional expectation. This can now be recognized as ECov(X, Y |Z), establishing the identity. 4.61 a. To find the distribution of f(X1|Z), let U = X2−1X1 and V = X1. Then x2 = h1(u, v) = uv+1, x1 = h2(u, v) = v. Therefore fU,V (u, v) = fX,Y (h1(u, v), h2(u, v))|J | = e−(uv+1)e−vv, and fU (u) = ∫ ∞ 0 ve−(uv+1)e−vdv = e−1 (u + 1)2 . Thus V |U = 0 has distribution ve−v. The distribution of X1|X2 is e−x1 since X1 and X2 are independent. b. The following Mathematica code will draw the picture; the solid lines are B1 and the dashed lines are B2. Note that the solid lines increase with x1, while the dashed lines are constant. Thus B1 is informative, as the range of X2 changes. e = 1/10; Plot[{-e*x1 + 1, e*x1 + 1, 1 - e, 1 + e}, {x1, 0, 5}, PlotStyle -> {Dashing[{}], Dashing[{}],Dashing[{0.15, 0.05}], Dashing[{0.15, 0.05}]}] c. P (X1 ≤ x|B1) = P (V ≤ v∗| − < U < ) = ∫ v∗ 0 ∫ − ve −(uv+1)e−vdudv∫∞ 0 ∫ − ve −(uv+1)e−vdudv = e−1 [ e−v ∗(1+) 1+ − 1 1+ − e−v ∗(1−) 1− + 1 1− ] e−1 [ − 11+ + 1 1− ] . Thus lim→0 P (X1 ≤ x|B1) = 1− e−v ∗ − v∗e−v∗ = ∫ v∗ 0 ve−vdv = P (V ≤ v∗|U = 0). P (X1 ≤ x|B2) = ∫ x 0 ∫ 1+ 0 e−(x1+x2)dx2dx1∫ 1+ 0 e−x2dx2 = e−(x+1+) − e−(1+) − e−x + 1 1− e−(1+) . Thus lim→0 P (X1 ≤ x|B2) = 1− ex = ∫ x 0 ex1dx1 = P (X1 ≤ x|X2 = 1). 4-22 Solutions Manual for Statistical Inference 4.63 Since X = eZ and g(z) = ez is convex, by Jensen’s Inequality EX = Eg(Z) ≥ g(EZ) = e0 = 1. In fact, there is equality in Jensen’s Inequality if and only if there is an interval I with P (Z ∈ I) = 1 and g(z) is linear on I. But ez is linear on an interval only if the interval is a single point. So EX > 1, unless P (Z = EZ = 0) = 1. 4.64 a. Let a and b be real numbers. Then, |a + b|2 = (a + b)(a + b) = a2 + 2ab + b2 ≤ |a|2 + 2|ab|+ |b|2 = (|a|+ |b|)2. Take the square root of both sides to get |a + b| ≤ |a|+ |b|. b. |X + Y | ≤ |X|+ |Y | ⇒ E|X + Y | ≤ E(|X|+ |Y |) = E|X|+ E|Y |. 4.65 Without loss of generality let us assume that Eg(X) = Eh(X) = 0. For part (a) E(g(X)h(X)) = ∫ ∞ −∞ g(x)h(x)fX(x)dx = ∫ {x:h(x)≤0} g(x)h(x)fX(x)dx + ∫ {x:h(x)≥0} g(x)h(x)fX(x)dx ≤ g(x0) ∫ {x:h(x)≤0} h(x)fX(x)dx + g(x0) ∫ {x:h(x)≥0} h(x)fX(x)dx = ∫ ∞ −∞ h(x)fX(x)dx = g(x0)Eh(X) = 0. where x0 is the number such that h(x0) = 0. Note that g(x0) is a maximum in {x : h(x) ≤ 0} and a minimum in {x : h(x) ≥ 0} since g(x) is nondecreasing. For part (b) where g(x) and h(x) are both nondecreasing E(g(X)h(X)) = ∫ ∞ −∞ g(x)h(x)fX(x)dx = ∫ {x:h(x)≤0} g(x)h(x)fX(x)dx + ∫ {x:h(x)≥0} g(x)h(x)fX(x)dx ≥ g(x0) ∫ {x:h(x)≤0} h(x)fX(x)dx + g(x0) ∫ {x:h(x)≥0} h(x)fX(x)dx = ∫ ∞ −∞ h(x)fX(x)dx = g(x0)Eh(X) = 0. The case when g(x) and h(x) are both nonincreasing can be proved similarly. Chapter 5 Properties of a Random Sample 5.1 Let X = # color blind people in a sample of size n. Then X ∼ binomial(n, p), where p = .01. The probability that a sample contains a color blind person is P (X > 0) = 1 − P (X = 0), where P (X = 0) = ( n 0 ) (.01)0(.99)n = .99n. Thus, P (X > 0) = 1− .99n > .95 ⇔ n > log(.05)/ log(.99) ≈ 299. 5.3 Note that Yi ∼ Bernoulli with pi = P (Xi ≥ µ) = 1 − F (µ) for each i. Since the Yi’s are iid Bernoulli, ∑n i=1 Yi ∼ binomial(n, p = 1− F (µ)). 5.5 Let Y = X1 + · · ·+ Xn. Then X̄ = (1/n)Y , a scale transformation. Therefore the pdf of X̄ is fX̄(x) = 1 1/nfY ( x 1/n ) = nfY (nx). 5.6 a. For Z = X − Y , set W = X. Then Y = W − Z, X = W , and |J | = ∣∣∣∣ 0 1−1 1 ∣∣∣∣ = 1. Then fZ,W (z, w) = fX(w)fY (w − z) · 1, thus fZ(z) = ∫∞ −∞ fX(w)fY (w − z)dw. b. For Z = XY , set W = X. Then Y = Z/W and |J | = ∣∣∣∣ 0 11/w −z/w2 ∣∣∣∣ = −1/w. Then fZ,W (z, w) = fX(w)fY (z/w) · |−1/w|, thus fZ(z) = ∫∞ −∞ |−1/w| fX(w)fY (z/w)dw. c. For Z = X/Y , set W = X. Then Y=W/Z and |J | = ∣∣∣∣ 0 1−w/z2 1/z ∣∣∣∣ = w/z2. Then fZ,W (z, w) = fX(w)fY (w/z) · |w/z2|, thus fZ(z) = ∫∞ −∞ |w/z 2|fX(w)fY (w/z)dw. 5.7 It is, perhaps, easiest to recover the constants by doing the integrations. We have∫ ∞ −∞ B 1+ ( ω σ )2 dω = σπB, ∫ ∞ −∞ D 1+ ( ω−z τ )2 dω = τπD and ∫ ∞ −∞ [ Aω 1+ ( ω σ )2− Cω1+ (ω−zτ )2 ] dω = ∫ ∞ −∞ [ Aω 1+ ( ω σ )2− C(ω−z)1+ (ω−zτ )2 ] dω − Cz ∫ ∞ −∞ 1 1+ ( ω−z τ )2 dω = A σ2 2 log [ 1+ (ω σ )2] − Cτ 2 2 log [ 1+ ( ω−z τ )2]∣∣∣∣∣ ∞ −∞ − τπCz. The integral is finite and equal to zero if A = M 2σ2 , C = M 2 τ2 for some constant M . Hence fZ(z) = 1 π2στ [ σπB−τπD−2πMz τ ] = 1 π(σ+τ) 1 1+ (z/(σ+τ))2 , if B = τσ+τ , D = σ σ+τ) , M = −στ2 2z(σ+τ) 1 1+( zσ+τ ) 2 . 5-4 Solutions Manual for Statistical Inference θ2 = E(Xi − µ)2 = σ2 θ3 = E(Xi−µ)3 = E(Xi − µ)2(Xi − µ) (Stein’s lemma: Eg(X)(X − θ) = σ2Eg′(X)) = 2σ2E(Xi − µ) = 0 θ4 = E(Xi − µ)4 = E(Xi − µ)3(Xi − µ) = 3σ2E(Xi − µ)2 = 3σ4. b. VarS2 = 1n (θ4 − n−3 n−1θ 2 2) = 1 n (3σ 4 − n−3n−1σ 4) = 2σ 4 n−1 . c. Use the fact that (n− 1)S2/σ2 ∼ χ2n−1 and Varχ2n−1 = 2(n− 1) to get Var ( (n− 1)S2 σ2 ) = 2(n− 1) which implies ( (n−1) 2 σ4 )VarS 2 = 2(n− 1) and hence VarS2 = 2(n− 1) (n− 1)2/σ4 = 2σ4 n− 1 . Remark: Another approach to b), not using the χ2 distribution, is to use linear model theory. For any matrix A Var(X ′AX) = 2µ22trA 2 + 4µ2θ′Aθ, where µ2 is σ2, θ = EX = µ1. Write S2 = 1n−1 ∑n i=1(Xi − X̄) = 1 n−1X ′(I − J̄n)X.Where I − J̄n =  1− 1n − 1 n · · · − 1 n − 1n 1− 1 n ... ... . . . ... − 1n · · · · · · 1− 1 n  . Notice that trA2 = trA = n− 1, Aθ = 0. So VarS2 = 1 (n− 1)2 Var(X ′AX) = 1 (n− 1)2 ( 2σ4(n− 1) + 0 ) = 2σ4 n− 1 . 5.11 Let g(s) = s2. Since g(·) is a convex function, we know from Jensen’s inequality that Eg(S) ≥ g(ES), which implies σ2 = ES2 ≥ (ES)2. Taking square roots, σ ≥ ES. From the proof of Jensen’s Inequality, it is clear that, in fact, the inequality will be strict unless there is an interval I such that g is linear on I and P (X ∈ I) = 1. Since s2 is “linear” only on single points, we have ET 2 > (ET )2 for any random variable T , unless P (T = ET ) = 1. 5.13 E ( c √ S2 ) = c √ σ2 n− 1 E (√ S2(n− 1) σ2 ) = c √ σ2 n− 1 ∫ ∞ 0 √ q 1 Γ ( n−1 2 ) 2(n−1)/2 q( n−1 2 )−1e−q/2dq, Since √ S2(n− 1)/σ2 is the square root of a χ2 random variable. Now adjust the integrand to be another χ2 pdf and get E ( c √ S2 ) = c √ σ2 n− 1 · Γ(n/2)2 n/2 Γ((n− 1)/2)2((n−1)/2 ∫ ∞ 0 1 Γ(n/2)2n/2 q(n−1)/2 − 1 2 e−q/2dq︸︷︷︸ =1 since χ2n pdf . So c = Γ(n−12 ) √ n−1 √ 2Γ(n2 ) gives E(cS) = σ. Second Edition 5-5 5.15 a. X̄n+1 = ∑n+1 i=1 Xi n + 1 = Xn+1 + ∑n i=1 Xi n + 1 = Xn+1 + nX̄n n + 1 . b. nS2n+1 = n (n + 1)− 1 n+1∑ i=1 ( Xi − X̄n+1 )2 = n+1∑ i=1 ( Xi− Xn+1 + nX̄n n + 1 )2 (use (a)) = n+1∑ i=1 ( Xi− Xn+1 n + 1 − nX̄n n + 1 )2 = n+1∑ i=1 [( Xi − X̄n ) − ( Xn+1 n + 1 − X̄n n + 1 )]2 ( ±X̄n ) = n+1∑ i=1 [( Xi−X̄n )2−2 (Xi−X̄n)(Xn+1−X̄n n + 1 ) + 1 (n + 1)2 ( Xn+1−X̄n )2] = n∑ i=1 ( Xi − X̄n )2 + (Xn+1 − X̄n)2 − 2(Xn+1−X̄n)2 n + 1 + n + 1 (n + 1)2 ( Xn+1 − X̄n )2 ( since n∑ 1 (Xi − X̄n) = 0 ) = (n− 1)S2n + n n + 1 ( Xn+1 − X̄n )2 . 5.16 a. ∑3 i=1 ( Xi−i i )2 ∼ χ23 b. ( Xi−1 i )/√√√√∑3 i=2 ( Xi−i i )2/ 2 ∼ t2 c. Square the random variable in part b). 5.17 a. Let U ∼ χ2p and V ∼ χ2q, independent. Their joint pdf is 1 Γ ( p 2 ) Γ ( q 2 ) 2(p+q)/2 u p 2−1v q 2−1e −(u+v) 2 . From Definition 5.3.6, the random variable X = (U/p)/(V/q) has an F distribution, so we make the transformation x = (u/p)/(v/q) and y = u + v. (Of course, many choices of y will do, but this one makes calculations easy. The choice is prompted by the exponential term in the pdf.) Solving for u and v yields u = p q xy 1 + qpx , v = y 1 + qpx , and |J | = q py( 1 + qpx )2 . We then substitute into fU,V (u, v) to obtain fX,Y (x, y) = 1 Γ ( p 2 ) Γ ( q 2 ) 2(p+q)/2 ( p q xy 1 + qpx ) p 2−1( y 1 + qpx ) q 2−1 e −y 2 q py( 1 + qpx )2 . 5-6 Solutions Manual for Statistical Inference Note that the pdf factors, showing that X and Y are independent, and we can read off the pdfs of each: X has the F distribution and Y is χ2p+q. If we integrate out y to recover the proper constant, we get the F pdf fX(x) = Γ ( p+q 2 ) Γ ( p 2 ) Γ ( q 2 )(q p )p/2 xp/2−1( 1 + qpx ) p+q 2 . b. Since Fp,q = χ2p/p χ2q/q , let U ∼ χ2p, V ∼ χ2q and U and V are independent. Then we have EFp,q = E ( U/p V/q ) = E ( U p ) E ( q V ) (by independence) = p p qE ( 1 V ) (EU = p). Then E ( 1 V ) = ∫ ∞ 0 1 v 1 Γ ( q 2 ) 2q/2 v q 2−1e− v 2 dv = 1 Γ ( q 2 ) 2q/2 ∫ ∞ 0 v q−2 2 −1e− v 2 dv = 1 Γ ( q 2 ) 2q/2 Γ ( q − 2 2 ) 2(q−2)/2 = Γ ( q−2 2 ) 2(q−2)/2 Γ ( q−2 2 ) ( q−2 2 ) 2q/2 = 1 q − 2 . Hence, EFp,q = pp q q−2 = q q−2 , if q > 2. To calculate the variance, first calculate E(F 2p,q) = E ( U2 p2 q2 V 2 ) = q2 p2 E(U2)E ( 1 V 2 ) . Now E(U2) = Var(U) + (EU)2 = 2p + p2 and E ( 1 V 2 ) = ∫ ∞ 0 1 v2 1 Γ (q/2) 2q/2 v(q/2)−1e−v/2dv = 1 (q − 2)(q − 4) . Therefore, EF 2p,q = q2 p2 p(2 + p) 1 (q − 2)(q − 4) = q2 p (p + 2) (q − 2)(q − 4) , and, hence Var(Fp,q) = q2(p + 2) p(q − 2)(q − 4) − q 2 (q − 2)2 = 2 ( q q − 2 )2( q + p− 2 p(q − 4) ) , q > 4. c. Write X = U/pV/p then 1 X = V/q U/p ∼ Fq,p, since U ∼ χ 2 p, V ∼ χ2q and U and V are independent. d. Let Y = (p/q)X1+(p/q)X = pX q+pX , so X = qY p(1−Y ) and ∣∣∣dxdy ∣∣∣ = qp (1− y)−2. Thus, Y has pdf fY (y) = Γ ( q+p 2 ) Γ ( p 2 ) Γ ( q 2 ) (p q ) p 2 ( qy p(1−y) ) p−2 2 ( 1 + pq qy p(1−y) ) p+q 2 q p(1− y)2 = [ B (p 2 , q 2 )]−1 y p 2−1(1− y) q 2−1 ∼ beta (p 2 , q 2 ) . Second Edition 5-9 where we use the independence of X and Y . Since X and Y are identically distributed, P (X > a) = P (Y > a) = 1− FX(a), so FZ2(z) = (1− FX(− √ z))2 − (1− FX( √ z))2 = 1− 2FX(− √ z), since 1− FX( √ z) = FX(− √ z). Differentiating and substituting gives fZ2(z) = d dz FZ2(z) = fX(− √ z) 1√ z = 1√ 2π e−z/2z−1/2, the pdf of a χ21 random variable. Alternatively, P (Z2 ≤ z) = P ( [min(X, Y )]2 ≤ z ) = P (− √ z ≤ min(X, Y ) ≤ √ z) = P (− √ z ≤ X ≤ √ z,X ≤ Y ) + P (− √ z ≤ Y ≤ √ z, Y ≤ X) = P (− √ z ≤ X ≤ √ z|X ≤ Y )P (X ≤ Y ) +P (− √ z ≤ Y ≤ √ z|Y ≤ X)P (Y ≤ X) = 1 2 P (− √ z ≤ X ≤ √ z) + 1 2 P (− √ z ≤ Y ≤ √ z), using the facts that X and Y are independent, and P (Y ≤ X) = P (X ≤ Y ) = 12 . Moreover, since X and Y are identically distributed P (Z2 ≤ z) = P (− √ z ≤ X ≤ √ z) and fZ2(z) = d dz P (− √ z ≤ X ≤ √ z) = 1√ 2π (e−z/2 1 2 z−1/2 + e−z/2 1 2 z−1/2) = 1√ 2π z−1/2e−z/2, the pdf of a χ21. 5.23 P (Z > z) = ∞∑ x=1 P (Z > z|x)P (X = x) = ∞∑ x=1 P (U1 > z, . . . , Ux > z|x)P (X = x) = ∞∑ x=1 x∏ i=1 P (Ui > z)P (X = x) (by independence of the Ui’s) = ∞∑ x=1 P (Ui > z)xP (X = x) = ∞∑ x=1 (1− z)x 1 (e− 1)x! = 1 (e− 1) ∞∑ x=1 (1− z)x x! = e1−z − 1 e− 1 0 < z < 1. 5.24 Use fX(x) = 1/θ, FX(x) = x/θ, 0 < x < θ. Let Y = X(n), Z = X(1). Then, from Theorem 5.4.6, fZ,Y (z, y) = n! 0!(n− 2)!0! 1 θ 1 θ (z θ )0(y − z θ )n−2 ( 1−y θ )0 = n(n− 1) θn (y−z)n−2, 0 < z < y < θ. 5-10 Solutions Manual for Statistical Inference Now let W = Z/Y , Q = Y . Then Y = Q, Z = WQ, and |J | = q. Therefore fW,Q(w, q) = n(n− 1) θn (q − wq)n−2q = n(n− 1) θn (1− w)n−2qn−1, 0 < w < 1, 0 < q < θ. The joint pdf factors into functions of w and q, and, hence, W and Q are independent. 5.25 The joint pdf of X(1), . . . , X(n) is f(u1, . . . , un) = n!an θan ua−11 · · ·ua−1n , 0 < u1 < · · · < un < θ. Make the one-to-one transformation to Y1 = X(1)/X(2), . . . , Yn−1 = X(n−1)/X(n), Yn = X(n). The Jacobian is J = y2y23 · · · yn−1n . So the joint pdf of Y1, . . . , Yn is f(y1, . . . , yn) = n!an θan (y1 · · · yn)a−1(y2 · · · yn)a−1 · · · (yn)a−1(y2y23 · · · yn−1n ) = n!an θan ya−11 y 2a−1 2 · · · yna−1n , 0 < yi < 1; i = 1, . . . , n− 1, 0 < yn < θ. We see that f(y1, . . . , yn) factors so Y1, . . . , Yn are mutually independent. To get the pdf of Y1, integrate out the other variables and obtain that fY1(y1) = c1y a−1 1 , 0 < y1 < 1, for some constant c1. To have this pdf integrate to 1, it must be that c1 = a. Thus fY1(y1) = ay a−1 1 , 0 < y1 < 1. Similarly, for i = 2, . . . , n − 1, we obtain fYi(yi) = iayia−1i , 0 < yi < 1. From Theorem 5.4.4, the pdf of Yn is fYn(yn) = na θna y na−1 n , 0 < yn < θ. It can be checked that the product of these marginal pdfs is the joint pdf given above. 5.27 a. fX(i)|X(j)(u|v) = fX(i),X(j)(u, v)/fX(j)(v). Consider two cases, depending on which of i or j is greater. Using the formulas from Theorems 5.4.4 and 5.4.6, and after cancellation, we obtain the following. (i) If i < j, fX(i)|X(j)(u|v) = (j − 1)! (i− 1)!(j − 1− i)! fX(u)F i−1X (u)[FX(v)− FX(u)] j−i−1F 1−jX (v) = (j − 1)! (i− 1)!(j − 1− i)! fX(u) FX(v) [ FX(u) FX(v) ]i−1 [ 1−FX(u) FX(v) ]j−i−1 , u < v. Note this interpretation. This is the pdf of the ith order statistic from a sample of size j−1, from a population with pdf given by the truncated distribution, f(u) = fX(u)/FX(v), u < v. (ii) If j < i and u > v, fX(i)|X(j)(u|v) = (n− j)! (n− 1)!(i− 1− j)! fX(u) [1−FX(u)]n−i [FX(u)− FX(v)] i−1−j [1−FX(v)]j−n = (n− j)! (i− j − 1)!(n− i)! fX(u) 1−FX(v) [ FX(u)− FX(v) 1−FX(v) ]i−j−1 [ 1− FX(u)− FX(v) 1−FX(v) ]n−i . This is the pdf of the (i−j)th order statistic from a sample of size n−j, from a population with pdf given by the truncated distribution, f(u) = fX(u)/(1− FX(v)), u > v. b. From Example 5.4.7, fV |R(v|r) = n(n− 1)rn−2/an n(n− 1)rn−2(a− r)/an = 1 a− r , r/2 < v < a− r/2. Second Edition 5-11 5.29 Let Xi = weight of ith booklet in package. The Xis are iid with EXi = 1 and VarXi = .052. We want to approximate P (∑100 i=1 Xi > 100.4 ) = P (∑100 i=1 Xi/100 > 1.004 ) = P (X̄ > 1.004). By the CLT, P (X̄ > 1.004) ≈ P (Z > (1.004− 1)/(.05/10)) = P (Z > .8) = .2119. 5.30 From the CLT we have, approximately, X̄1 ∼ n(µ, σ2/n), X̄2 ∼ n(µ, σ2/n). Since X̄1 and X̄2 are independent, X̄1 − X̄2 ∼ n(0, 2σ2/n). Thus, we want .99 ≈ P (∣∣X̄1−X̄2∣∣ < σ/5) = P ( −σ/5 σ/ √ n/2 < X̄1−X̄2 σ/ √ n/2 < σ/5 σ/ √ n/2 ) ≈ P ( −1 5 √ n 2 < Z < 1 5 √ n 2 ) , where Z ∼ n(0, 1). Thus we need P (Z ≥ √ n/5( √ 2)) ≈ .005. From Table 1, √ n/5 √ 2 = 2.576, which implies n = 50(2.576)2 ≈ 332. 5.31 We know that σ2 X̄ = 9/100. Use Chebyshev’s Inequality to get P ( −3k/10 < X̄−µ < 3k/10 ) ≥ 1− 1/k2. We need 1− 1/k2 ≥ .9 which implies k ≥ √ 10 = 3.16 and 3k/10 = .9487. Thus P (−.9487 < X̄ − µ < .9487) ≥ .9 by Chebychev’s Inequality. Using the CLT, X̄ is approximately n ( µ, σ2 X̄ ) with σX̄ = √ .09 = .3 and (X̄ − µ)/.3 ∼ n(0, 1). Thus .9 = P ( −1.645 < X̄−µ .3 < 1.645 ) = P (−.4935 < X̄ − µ < .4935). Thus, we again see the conservativeness of Chebychev’s Inequality, yielding bounds on X̄ − µ that are almost twice as big as the normal approximation. Moreover, with a sample of size 100, X̄ is probably very close to normally distributed, even if the underlying X distribution is not close to normal. 5.32 a. For any > 0, P (∣∣∣√Xn −√a∣∣∣ > ) = P (∣∣∣√Xn −√a∣∣∣ ∣∣∣√Xn +√a∣∣∣ > ∣∣∣√Xn +√a∣∣∣) = P ( |Xn − a| > ∣∣∣√Xn +√a∣∣∣) ≤ P ( |Xn − a| > √ a ) → 0, as n →∞, since Xn → a in probability. Thus √ Xn → √ a in probability. b. For any > 0, P (∣∣∣∣ aXn − 1 ∣∣∣∣ ≤ ) = P ( a1+ ≤ Xn ≤ a1− ) = P ( a− a 1+ ≤ Xn ≤ a + a 1− ) ≥ P ( a− a 1+ ≤ Xn ≤ a + a 1+ ) ( a + a 1+ < a + a 1− ) = P ( |Xn − a| ≤ a 1+ ) → 1, as n →∞, since Xn → a in probability. Thus a/Xn → 1 in probability. 5-14 Solutions Manual for Statistical Inference 5.43 a. P (|Yn − θ| < ) = P (∣∣∣√(n)(Yn − θ)∣∣∣ <√(n)). Therefore, lim n→∞ P (|Yn − θ| < ) = lim n→∞ P (∣∣∣√(n)(Yn − θ)∣∣∣ <√(n)) = P (|Z| < ∞) = 1, where Z ∼ n(0, σ2). Thus Yn → θ in probability. b. By Slutsky’s Theorem (a), g′(θ) √ n(Yn − θ) → g′(θ)X where X ∼ n(0, σ2). Therefore√ n[g(Yn)− g(θ)] = g′(θ) √ n(Yn − θ) → n(0, σ2[g′(θ)]2). 5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is In[120]:= f1[x_]=PDF[GammaDistribution[4,25],x] p1=Integrate[f1[x],{x,100,\[Infinity]}]//N 1-CDF[BinomialDistribution[300,p1],149] Out[120]= e^(-x/25) x^3/2343750 Out[121]= 0.43347 Out[122]= 0.0119389. The answer can also be simulated in Mathematica or in R. Here is the R code for simulating the same probability p1<-mean(rgamma(10000,4,scale=25)>100) mean(rbinom(10000, 300, p1)>149) In each case 10,000 random variables were simulated. We obtained p1 = 0.438 and a binomial probability of 0.0108. 5.47 a. −2 log(Uj) ∼ exponential(2) ∼ χ22. Thus Y is the sum of ν independent χ22 random variables. By Lemma 5.3.2(b), Y ∼ χ22ν . b. β log(Uj) ∼ exponential(2) ∼ gamma(1, β). Thus Y is the sum of independent gamma random variables. By Example 4.6.8, Y ∼ gamma(a, β) c. Let V = ∑a j=1 log(Uj) ∼ gamma(a, 1). Similarly W = ∑b j=1 log(Uj) ∼ gamma(b, 1). By Exercise 4.24, VV +W ∼ beta(a, b). 5.49 a. See Example 2.1.4. b. X = g(U) = − log 1−UU . Then g −1(x) = 11+e−y . Thus fX(x) = 1× ∣∣∣∣ e−y(1 + e−y)2 ∣∣∣∣ = e−y(1 + e−y)2 −∞ < y < ∞, which is the density of a logistic(0, 1) random variable. c. Let Y ∼ logistic(µ, β) then fY (y) = 1β fZ( −(y−µ) β ) where fZ is the density of a logistic(0, 1). Then Y = βZ + µ. To generate a logistic(µ, β) random variable generate (i) generate U ∼ uniform(0, 1), (ii) Set Y = β log U1−U + µ. 5.51 a. For Ui ∼ uniform(0, 1), EUi = 1/2, VarUi = 1/12. Then X = 12∑ i=1 Ui − 6 = 12Ū − 6 = √ 12 ( Ū−1/2 1/ √ 12 ) Second Edition 5-15 is in the form √ n ( (Ū−EU)/σ ) with n = 12, so X is approximately n(0, 1) by the Central Limit Theorem. b. The approximation does not have the same range as Z ∼ n(0, 1) where −∞ < Z < +∞, since −6 < X < 6. c. EX = E ( 12∑ i=1 Ui−6 ) = 12∑ i=1 EUi − 6 = ( 12∑ i=1 1 2 ) − 6 = 6− 6 = 0. VarX = Var ( 12∑ i=1 Ui−6 ) = Var 12∑ i=1 Ui = 12VarU1 = 1 EX3 = 0 since X is symmetric about 0. (In fact, all odd moments of X are 0.) Thus, the first three moments of X all agree with the first three moments of a n(0, 1). The fourth moment is not easy to get, one way to do it is to get the mgf of X. Since EetU = (et − 1)/t, E [ e t (∑12 i=1 Ui−6 )] = e−6t ( et−1 t )12 = ( et/2 − e−t/2 t )12 . Computing the fourth derivative and evaluating it at t = 0 gives us EX4. This is a lengthy calculation. The answer is EX4 = 29/10, slightly smaller than EZ4 = 3, where Z ∼ n(0, 1). 5.53 The R code is the following: a. obs <- rbinom(1000,8,2/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 5.231 > variance [1] 1.707346 b. obs<- rhyper(1000,8,2,4) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 3.169 > variance [1] 0.4488879 c. obs <- rnbinom(1000,5,1/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 10.308 > variance [1] 29.51665 5-16 Solutions Manual for Statistical Inference 5.55 Let X denote the number of comparisons. Then EX = ∞∑ k=0 P (X > k) = 1 + ∞∑ k=1 P (U > Fy(yk−1)) = 1 + ∞∑ k=1 (1− Fy(yk−1)) = 1 + ∞∑ k=0 (1− Fy(yi)) = 1 + EY 5.57 a. Cov(Y1, Y2) = Cov(X1 + X3, X2 + X3) = Cov(X3, X3) = λ3 since X1, X2 and X3 are independent. b. Zi = { 1 if Xi = X3 = 0 0 otherwise pi = P (Zi = 0) = P (Yi = 0) = P (Xi = 0, X3 = 0) = e−(λi+λ3). Therefore Zi are Bernoulli(pi) with E[Zi] = pi, Var(Zi) = pi(1− pi) and E[Z1Z2] = P (Z1 = 1, Z2 = 1) = P (Y1 = 0, Y2 = 0) = P (X1 + X3 = 0, X2 + X3 = 0) = P (X1 = 0)P (X2 = 0)P (X3 = 0) = e−λ1e−λ2e−λ3 . Therefore, Cov(Z1, Z2) = E[Z1Z2]− E[Z1]E[Z2] = e−λ1e−λ2e−λ3 − e−(λi+λ3)e−(λ2+λ3) = e−(λi+λ3)e−(λ2+λ3)(eλ3 − 1) = p1p2(eλ3 − 1). Thus Corr(Z1, Z2) = p1p2(e λ3−1)√ p1(1−p1) √ p2(1−p2) . c. E[Z1Z2] ≤ pi, therefore Cov(Z1, Z2) = E[Z1Z2]− E[Z1]E[Z2] ≤ p1 − p1p2 = p1(1− p2), and Cov(Z1, Z2) ≤ p2(1− p1). Therefore, Corr(Z1, Z2) ≤ p1(1− p2)√ p1(1− p1) √ p2(1− p2) = √ p1(1− p2)√ p2(1− p1) and Corr(Z1, Z2) ≤ p2(1− p1)√ p1(1− p1) √ p2(1− p2) = √ p2(1− p1)√ p1(1− p2) which implies the result. 5.59 P (Y ≤ y) = P (V ≤ y|U < 1 c fY (V )) = P (V ≤ y, U < 1cfY (V )) P (U < 1cfY (V )) = ∫ y 0 ∫ 1 c fY (v) 0 dudv 1 c = 1 c ∫ y 0 fY (v)dv 1 c = ∫ y 0 fY (v)dv 5.61 a. M = supy Γ(a+b) Γ(a)Γ(b) y a−1(1−y)b−1 Γ([a]+[b]) Γ([a])Γ([b]) y [a]−1(1−y)[b]−1 < ∞, since a− [a] > 0 and b− [b] > 0 and y ∈ (0, 1). Second Edition 5-19 Write P (Zi+1 ≤ a) = P (Vi+1 ≤ a and Ui+1 ≤ ρi+1) + P (Zi ≤ a and Ui+1 > ρi+1). Since Zi ∼ fY , suppressing the unnecessary subscripts we can write P (Zi+1 ≤ a) = P (V ≤ a and U ≤ ρ(V, Y )) + P (Y ≤ a and U > ρ(V, Y )). Add and subtract P (Y ≤ a and U ≤ ρ(V, Y )) to get P (Zi+1 ≤ a) = P (Y ≤ a) + P (V ≤ a and U ≤ ρ(V, Y )) −P (Y ≤ a and U ≤ ρ(V, Y )). Thus we need to show that P (V ≤ a and U ≤ ρ(V, Y )) = P (Y ≤ a and U ≤ ρ(V, Y )). Write out the probability as P (V ≤ a and U ≤ ρ(V, Y )) = ∫ a −∞ ∫ ∞ −∞ ρ(v, y)fY (y)fV (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≤ 1) ( fY (v)fV (y) fV (v)fY (y) ) fY (y)fV (v)dydv + ∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≥ 1)fY (y)fV (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≤ 1)fY (v)fV (y)dydv + ∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≥ 1)fY (y)fV (v)dydv. Now, notice that w(v, y) = 1/w(y, v), and thus first term above can be written∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≤ 1)fY (v)fV (y)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(y, v) > 1)fY (v)fV (y)dydv = P (Y ≤ a, ρ(V, Y ) = 1, U ≤ ρ(V, Y )). The second term is∫ a −∞ ∫ ∞ −∞ I(w(v, y) ≥ 1)fY (y)fV (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(y, v) ≤ 1)fY (y)fV (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(y, v) ≤ 1) ( fV (y)fY (v) fV (y)fY (v) ) fY (y)fV (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(y, v) ≤ 1) ( fY (y)fV (v) fV (y)fY (v) ) fV (y)fY (v)dydv = ∫ a −∞ ∫ ∞ −∞ I(w(y, v) ≤ 1)w(y, v)fV (y)fY (v)dydv = P (Y ≤ a, U ≤ ρ(V, Y ), ρ(V, Y ) ≤ 1). 5-20 Solutions Manual for Statistical Inference Putting it all together we have P (V ≤ a and U ≤ ρ(V, Y )) = P (Y ≤ a, ρ(V, Y ) = 1, U ≤ ρ(V, Y )) + P (Y ≤ a, U ≤ ρ(V, Y ), ρ(V, Y ) ≤ 1) = P (Y ≤ a and U ≤ ρ(V, Y )), and hence P (Zi+1 ≤ a) = P (Y ≤ a), so fY is the stationary density. Chapter 6 Principles of Data Reduction 6.1 By the Factorization Theorem, |X| is sufficient because the pdf of X is f(x|σ2) = 1√ 2πσ e−x 2/2σ2 = 1√ 2πσ e−|x| 2/2σ2 = g( |x||σ2) · 1︸︷︷︸ h(x) . 6.2 By the Factorization Theorem, T (X) = mini(Xi/i) is sufficient because the joint pdf is f(x1, . . . , xn|θ) = n∏ i=1 eiθ−xiI(iθ,+∞)(xi) = einθI(θ,+∞)(T (x))︸︷︷︸ g(T (x)|θ) · e−Σixi︸︷︷︸ h(x) . Notice, we use the fact that i > 0, and the fact that all xis > iθ if and only if mini(xi/i) > θ. 6.3 Let x(1) = mini xi. Then the joint pdf is f(x1, . . . , xn|µ, σ) = n∏ i=1 1 σ e−(xi−µ)/σI(µ,∞)(xi) = ( eµ/σ σ )n e−Σixi/σI(µ,∞)(x(1))︸︷︷︸ g(x(1),Σixi|µ,σ) · 1︸︷︷︸ h(x) . Thus, by the Factorization Theorem, ( X(1), ∑ i Xi ) is a sufficient statistic for (µ, σ). 6.4 The joint pdf is n∏ j=1 { h(xj)c(θ) exp ( k∑ i=1 wi(θ)ti(xj) )} = c(θ)n exp  k∑ i=1 wi(θ) n∑ j=1 ti(xj)  ︸︷︷︸ g(T (x)|θ) · n∏ j=1 h(xj)︸︷︷︸ h(x) . By the Factorization Theorem, (∑n j=1 t1(Xj), . . . , ∑n j=1 tk(Xj) ) is a sufficient statistic for θ. 6.5 The sample density is given by n∏ i=1 f(xi|θ) = n∏ i=1 1 2iθ I (−i(θ − 1) ≤ xi ≤ i(θ + 1)) = ( 1 2θ )n( n∏ i=1 1 i ) I ( min xi i ≥ −(θ − 1) ) I ( max xi i ≤ θ + 1 ) . Thus (minXi/i,max Xi/i) is sufficient for θ. 6-4 Solutions Manual for Statistical Inference The last ratio does not depend on θ. The other terms are constant as a function of θ if and only if n = n′ and x = y. So (X, N) is minimal sufficient for θ. Because P (N = n) = pn does not depend on θ, N is ancillary for θ. The point is that although N is independent of θ, the minimal sufficient statistic contains N in this case. A minimal sufficient statistic may contain an ancillary statistic. b. E ( X N ) = E ( E ( X N ∣∣∣∣N)) = E( 1N E (X | N) ) = E ( 1 N Nθ ) = E(θ) = θ. Var ( X N ) = Var ( E ( X N ∣∣∣∣N))+ E(Var(XN ∣∣∣∣N)) = Var(θ) + E( 1N2 Var (X | N) ) = 0 + E ( Nθ(1−θ) N2 ) = θ(1− θ)E ( 1 N ) . We used the fact that X|N ∼ binomial(N, θ). 6.13 Let Y1 = log X1 and Y2 = log X2. Then Y1 and Y2 are iid and, by Theorem 2.1.5, the pdf of each is f(y|α) = α exp {αy − eαy} = 1 1/α exp { y 1/α − ey/(1/α) } , −∞ < y < ∞. We see that the family of distributions of Yi is a scale family with scale parameter 1/α. Thus, by Theorem 3.5.6, we can write Yi = 1αZi, where Z1 and Z2 are a random sample from f(z|1). Then logX1 logX2 = Y1 Y2 = (1/α)Z1 (1/α)Z2 = Z1 Z2 . Because the distribution of Z1/Z2 does not depend on α, (log X1)/(log X2) is an ancillary statistic. 6.14 Because X1, . . . , Xn is from a location family, by Theorem 3.5.6, we can write Xi = Zi+µ, where Z1, . . . , Zn is a random sample from the standard pdf, f(z), and µ is the location parameter. Let M(X) denote the median calculated from X1, . . . , Xn. Then M(X) = M(Z)+µ and X̄ = Z̄+µ. Thus, M(X)− X̄ = (M(Z) + µ)− (Z̄ + µ) = M(Z)− Z̄. Because M(X)− X̄ is a function of only Z1, . . . , Zn, the distribution of M(X)− X̄ does not depend on µ; that is, M(X)− X̄ is an ancillary statistic. 6.15 a. The parameter space consists only of the points (θ, ν) on the graph of the function ν = aθ2. This quadratic graph is a line and does not contain a two-dimensional open set. b. Use the same factorization as in Example 6.2.9 to show (X̄, S2) is sufficient. E(S2) = aθ2 and E(X̄2) = VarX̄ + (EX̄)2 = aθ2/n + θ2 = (a + n)θ2/n. Therefore, E ( n a + n X̄2−S 2 a ) = ( n a + n )( a + n n θ2 ) − 1 a aθ2 = 0, for all θ. Thus g(X̄, S2) = na+nX̄ 2 − S 2 a has zero expectation so (X̄, S 2) not complete. 6.17 The population pmf is f(x|θ) = θ(1−θ)x−1 = θ1−θ e log(1−θ)x, an exponential family with t(x) = x. Thus, ∑ i Xi is a complete, sufficient statistic by Theorems 6.2.10 and 6.2.25. ∑ i Xi − n ∼ negative binomial(n, θ). 6.18 The distribution of Y = ∑ i Xi is Poisson(nλ). Now Eg(Y ) = ∞∑ y=0 g(y) (nλ)ye−nλ y! . If the expectation exists, this is an analytic function which cannot be identically zero. Second Edition 6-5 6.19 To check if the family of distributions of X is complete, we check if Ep g(X) = 0 for all p, implies that g(X) ≡ 0. For Distribution 1, Ep g(X) = 2∑ x=0 g(x)P (X = x) = pg(0) + 3pg(1) + (1− 4p)g(2). Note that if g(0) = −3g(1) and g(2) = 0, then the expectation is zero for all p, but g(x) need not be identically zero. Hence the family is not complete. For Distribution 2 calculate Ep g(X) = g(0)p + g(1)p2 + g(2)(1− p− p2) = [g(1)− g(2)]p2 + [g(0)− g(2)]p + g(2). This is a polynomial of degree 2 in p. To make it zero for all p each coefficient must be zero. Thus, g(0) = g(1) = g(2) = 0, so the family of distributions is complete. 6.20 The pdfs in b), c), and e) are exponential families, so they have complete sufficient statistics from Theorem 6.2.25. For a), Y = max{Xi} is sufficient and f(y) = 2n θ2n y2n−1, 0 < y < θ. For a function g(y), E g(Y ) = ∫ θ 0 g(y) 2n θ2n y2n−1 dy = 0 for all θ implies g(θ) 2nθ2n−1 θ2n = 0 for all θ by taking derivatives. This can only be zero if g(θ) = 0 for all θ, so Y = max{Xi} is complete. For d), the order statistics are minimal sufficient. This is a location family. Thus, by Example 6.2.18 the range R = X(n) − X(1) is ancillary, and its expectation does not depend on θ. So this sufficient statistic is not complete. 6.21 a. X is sufficient because it is the data. To check completeness, calculate Eg(X) = θ 2 g(−1) + (1− θ)g(0) + θ 2 g(1). If g(−1) = g(1) and g(0) = 0, then Eg(X) = 0 for all θ, but g(x) need not be identically 0. So the family is not complete. b. |X| is sufficient by Theorem 6.2.6, because f(x|θ) depends on x only through the value of |x|. The distribution of |X| is Bernoulli, because P (|X| = 0) = 1 − θ and P (|X| = 1) = θ. By Example 6.2.22, a binomial family (Bernoulli is a special case) is complete. c. Yes, f(x|θ) = (1 − θ)(θ/(2(1 − θ))|x| = (1 − θ)e|x|log[θ/(2(1−θ)], the form of an exponential family. 6.22 a. The sample density is ∏ i θx θ−1 i = θ n( ∏ i xi) θ−1, so ∏ i Xi is sufficient for θ, not ∑ i Xi. b. Because ∏ i f(xi|θ) = θne(θ−1) log(Πixi), log ( ∏ i Xi) is complete and sufficient by Theorem 6.2.25. Because ∏ i Xi is a one-to-one function of log ( ∏ i Xi), ∏ i Xi is also a complete sufficient statistic. 6.23 Use Theorem 6.2.13. The ratio f(x|θ) f(y|θ) = θ−nI(x(n)/2,x(1))(θ) θ−nI(y(n)/2,y(1))(θ) is constant (in fact, one) if and only if x(1) = y(1) and x(n) = y(n). So (X(1), X(n)) is a minimal sufficient statistic for θ. From Exercise 6.10, we know that if a function of the sufficient statistics is ancillary, then the sufficient statistic is not complete. The uniform(θ, 2θ) family is a scale family, with standard pdf f(z) ∼ uniform(1, 2). So if Z1, . . . , Zn is a random sample 6-6 Solutions Manual for Statistical Inference from a uniform(1, 2) population, then X1 = θZ1, . . . , Xn = θZn is a random sample from a uniform(θ, 2θ) population, and X(1) = θZ(1) and X(n) = θZ(n). So X(1)/X(n) = Z(1)/Z(n), a statistic whose distribution does not depend on θ. Thus, as in Exercise 6.10, (X(1), X(n)) is not complete. 6.24 If λ = 0, Eh(X) = h(0). If λ = 1, Eh(X) = e−1h(0) + e−1 ∞∑ x=1 h(x) x! . Let h(0) = 0 and ∑∞ x=1 h(x) x! = 0, so Eh(X) = 0 but h(x) 6≡ 0. (For example, take h(0) = 0, h(1) = 1, h(2) = −2, h(x) = 0 for x ≥ 3 .) 6.25 Using the fact that (n − 1)s2x = ∑ i x 2 i − nx̄2, for any (µ, σ2) the ratio in Example 6.2.14 can be written as f(x|µ, σ2) f(y|µ, σ2) = exp [ µ σ2 (∑ i xi − ∑ i yi ) − 1 2σ2 (∑ i x2i − ∑ i y2i )] . a. Do part b) first showing that ∑ i X 2 i is a minimal sufficient statistic. Because (∑ i Xi, ∑ i X 2 i ) is not a function of ∑ i X 2 i , by Definition 6.2.11 (∑ i Xi, ∑ i X 2 i ) is not minimal. b. Substituting σ2 = µ in the above expression yields f(x|µ, µ) f(y|µ, µ) = exp [∑ i xi − ∑ i yi ] exp [ − 1 2µ (∑ i x2i − ∑ i y2i )] . This is constant as a function of µ if and only if ∑ i x 2 i = ∑ i y 2 i . Thus, ∑ i X 2 i is a minimal sufficient statistic. c. Substituting σ2 = µ2 in the first expression yields f(x|µ, µ2) f(y|µ, µ2) = exp [ 1 µ (∑ i xi − ∑ i yi ) − 1 2µ2 (∑ i x2i − ∑ i y2i )] . This is constant as a function of µ if and only if ∑ i xi = ∑ i yi and ∑ i x 2 i = ∑ i y 2 i . Thus,(∑ i Xi, ∑ i X 2 i ) is a minimal sufficient statistic. d. The first expression for the ratio is constant a function of µ and σ2 if and only if ∑ i xi =∑ i yi and ∑ i x 2 i = ∑ i y 2 i . Thus, (∑ i Xi, ∑ i X 2 i ) is a minimal sufficient statistic. 6.27 a. This pdf can be written as f(x|µ, λ) = ( λ 2π )1/2( 1 x3 )1/2 exp ( λ µ ) exp ( − λ 2µ2 x− λ 2 1 x ) . This is an exponential family with t1(x) = x and t2(x) = 1/x. By Theorem 6.2.25, the statistic ( ∑ i Xi, ∑ i(1/Xi)) is a complete sufficient statistic. (X̄, T ) given in the problem is a one-to-one function of ( ∑ i Xi, ∑ i(1/Xi)). Thus, (X̄, T ) is also a complete sufficient statistic. b. This can be accomplished using the methods from Section 4.3 by a straightforward but messy two-variable transformation U = (X1 +X2)/2 and V = 2λ/T = λ[(1/X1)+ (1/X2)− (2/[X1 + X2])]. This is a two-to-one transformation. Second Edition 6-9 where F2n,2n is an F random variable with 2n degrees of freedom in the numerator and denominator. This follows since 2Yi/θ and 2Xiθ are all independent exponential(1), or χ22. Differentiating (in t) and simplifying gives the density of T as fT (t) = Γ(2n) Γ(n)2 2 t ( t2 t2 + θ2 )n( θ2 t2 + θ2 )n , and the second derivative (in θ) of the log density is 2n t4 + 2t2θ2 − θ4 θ2(t2 + θ2)2 = 2n θ2 ( 1− 2 (t2/θ2 + 1)2 ) , and the information in T is 2n θ2 [ 1− 2E ( 1 T 2/θ2 + 1 )2] = 2n θ2 1− 2E( 1 F 22n,2n + 1 )2 . The expected value is E ( 1 F 22n,2n + 1 )2 = Γ(2n) Γ(n)2 ∫ ∞ 0 1 (1 + w)2 wn−1 (1 + w)2n = Γ(2n) Γ(n)2 Γ(n)Γ(n + 2) Γ(2n + 2) = n + 1 2(2n + 1) . Substituting this above gives the information in T as 2n θ2 [ 1− 2 n + 1 2(2n + 1) ] = I(θ) n 2n + 1 , which is not the answer reported by Joshi and Nabar. (ii) Let W = ∑ i Xi and V = ∑ i Yi. In each pair, Xi and Yi are independent, so W and V are independent. Xi ∼ exponential(1/θ); hence, W ∼ gamma(n, 1/θ). Yi ∼ exponential(θ); hence, V ∼ gamma(n, θ). Use this joint distribution of (W,V ) to derive the joint pdf of (T,U) as f(t, u|θ) = 2 [Γ(n)]2t u2n−1 exp ( −uθ t − ut θ ) , u > 0, t > 0. Now, the information in (T,U) is −E ( ∂2 ∂θ2 log f(T,U |θ) ) = −E ( −2UT θ3 ) = E ( 2V θ3 ) = 2nθ θ3 = 2n θ2 . (iii) The pdf of the sample is f(x,y) = exp [−θ ( ∑ i xi)− ( ∑ i yi) /θ] . Hence, (W,V ) defined as in part (ii) is sufficient. (T,U) is a one-to-one function of (W,V ), hence (T,U) is also sufficient. But, E U2 = EWV = (n/θ)(nθ) = n2 does not depend on θ. So E(U2−n2) = 0 for all θ, and (T,U) is not complete. 6.39 a. The transformation from Celsius to Fahrenheit is y = 9x/5 + 32. Hence, 5 9 (T ∗(y)− 32) = 5 9 ((.5)(y) + (.5)(212)− 32) = 5 9 ((.5)(9x/5 + 32) + (.5)(212)− 32) = (.5)x + 50 = T (x). b. T (x) = (.5)x + 50 6= (.5)x + 106 = T ∗(x). Thus, we do not have equivariance. 6-10 Solutions Manual for Statistical Inference 6.40 a. Because X1, . . . , Xn is from a location scale family, by Theorem 3.5.6, we can write Xi = σZi + µ, where Z1, . . . , Zn is a random sample from the standard pdf f(z). Then T1(X1, . . . , Xn) T2(X1, . . . , Xn) = T1(σZ1+µ, . . . , σZn+µ) T2(σZ1+µ, . . . , σZn+µ) = σT1(Z1, . . . , Zn) σT2(Z1, . . . , Zn) = T1(Z1, . . . , Zn) T2(Z1, . . . , Zn) . Because T1/T2 is a function of only Z1, . . . , Zn, the distribution of T1/T2 does not depend on µ or σ; that is, T1/T2 is an ancillary statistic. b. R(x1, . . . , xn) = x(n) − x(1). Because a > 0, max{ax1 + b, . . . , axn + b} = ax(n) + b and min{ax1+b, . . . , axn+b} = ax(1)+b. Thus, R(ax1+b, . . . , axn+b) = (ax(n)+b)−(ax(1)+b) = a(x(n) − x(1)) = aR(x1, . . . , xn). For the sample variance we have S2(ax1 + b, . . . , axn + b) = 1 n− 1 ∑ ((axi + b)− (ax̄ + b))2 = a2 1 n− 1 ∑ (xi − x̄)2 = a2S2(x1, . . . , xn). Thus, S(ax1 + b, . . . , axn + b) = aS(x1, . . . , xn). Therefore, R and S both satisfy the above condition, and R/S is ancillary by a). 6.41 a. Measurement equivariance requires that the estimate of µ based on y be the same as the estimate of µ based on x; that is, T ∗(x1 + a, . . . , xn + a)− a = T ∗(y)− a = T (x). b. The formal structures for the problem involving X and the problem involving Y are the same. They both concern a random sample of size n from a normal population and estimation of the mean of the population. Thus, formal invariance requires that T (x) = T ∗(x) for all x. Combining this with part (a), the Equivariance Principle requires that T (x1+a, . . . , xn+a)− a = T ∗(x1+a, . . . , xn+a)−a = T (x1, . . . , xn), i.e., T (x1+a, . . . , xn+a) = T (x1, . . . , xn)+a. c. W (x1 + a, . . . , xn + a) = ∑ i(xi + a)/n = ( ∑ i xi) /n + a = W (x1, . . . , xn) + a, so W (x) is equivariant. The distribution of (X1, . . . , Xn) is the same as the distribution of (Z1 + θ, . . . , Zn + θ), where Z1, . . . , Zn are a random sample from f(x − 0) and E Zi = 0. Thus, EθW = E ∑ i(Zi + θ)/n = θ, for all θ. 6.43 a. For a location-scale family, if X ∼ f(x|θ, σ2), then Y = ga,c(X) ∼ f(y|cθ + a, c2σ2). So for estimating σ2, ḡa,c(σ2) = c2σ2. An estimator of σ2 is invariant with respect to G1 if W (cx1 + a, . . . , cxn + a) = c2W (x1, . . . , xn). An estimator of the form kS2 is invariant because kS2(cx1+a, . . . , cxn+a) = k n− 1 n∑ i=1 ( (cxi + a)− n∑ i=1 (cxi + a)/n )2 = k n− 1 n∑ i=1 ((cxi + a)− (cx̄ + a))2 = c2 k n− 1 n∑ i=1 (xi − x̄)2 = c2kS2(x1, . . . , xn). To show invariance with respect to G2 , use the above argument with c = 1. To show invariance with respect to G3, use the above argument with a = 0. ( G2 and G3 are both subgroups of G1. So invariance with respect to G1 implies invariance with respect to G2 and G3.) b. The transformations in G2 leave the scale parameter unchanged. Thus, ḡa(σ2) = σ2. An estimator of σ2 is invariant with respect to this group if W (x1 + a, . . . , xn + a) = W (ga(x)) = ḡa(W (x)) = W (x1, . . . , xn). Second Edition 6-11 An estimator of the given form is invariant if, for all a and (x1, . . . , xn), W (x1 + a, . . . , xn + a) = φ ( x̄+a s ) s2 = φ ( x̄ s ) s2 = W (x1, . . . , xn). In particular, for a sample point with s = 1 and x̄ = 0, this implies we must have φ(a) = φ(0), for all a; that is, φ must be constant. On the other hand, if φ is constant, then the estimators are invariant by part a). So we have invariance if and only if φ is constant. Invariance with respect to G1 also requires φ to be constant because G2 is a subgroup of G1. Finally, an estimator of σ2 is invariant with respect to G3 if W (cx1, . . . , cxn) = c2W (x1, . . . , xn). Estimators of the given form are invariant because W (cx1, . . . , cxn) = φ (cx̄ cs ) c2s2 = c2φ ( x̄ s ) s2 = c2W (x1, . . . , xn).