# When are two random samples dependent?

2. When are two random samples dependent?

Though we draw samples randomly but there may be cases when two random samples might be dependent. Now two random samples are dependent if each data value in one sample can be paired with a corresponding data value in the other sample. So when the elements of the both sample are same we will say that the samples are dependent. As for example blood pressure before and after taking a medicine is a common example of dependent random samples.

4. How profitable are different sectors of the stock market? One way to answer such a question is to examine profit as a percentage of stockholder equity. A random sample of 32 retail stocks such as Toys “R” Us, Best Buy, and Gap was studied for x1, profit as a percentage of stockholder equity. The result was x1=13.7. A random sample of 34 utility (gas and electric) stocks such as Boston Edison, Wisconsin Energy, and Texas utilities as studied for x2, profit as a percentage of stockholder equity. The result was x2=10.1. Assume o1=4.1 and o2=2.7.

(a) Let u1 represent the population mean profit as a percentage of stockholder equity for retail stocks, and let u2 represent the population mean profit as a percentage of stockholder equity for utility stocks. Find a 95% confidence interval for u1-u2.

As here we can see the variances differs significantly so we should use a test considering unequal variance and thus the 95% confidence interval is,

95% CI = ( )

As,

Z_{0.025} = 1.96 thus the 95% confidence interval is,

CI = ( ) = (1.914, 5.286)

(b) Examine the confidence interval and explain what it means in the context of the problem? Does the interval consist of numbers that are all positive? All negative? Of different signs? At the 95% level of confidence, does it appear that the profit as a percentage of stockholder equity for retail stocks is higher than that for utility stocks?

Here the confidence interval is (1.914, 5.286). As the confidence interval is completely on the right hand side so it consists only positive numbers which implies that on an average that with 95% probability we can say it appears that the profit as a percentage of stockholder equity for retail stocks is higher than that for utility stocks.

So based on the interval, with 95% confidence we can say that the difference for population mean profit as a percentage of stockholder equity lies in the given interval.

(c ) Test the claim that the profit as a percentage of stockholder equity for retail stocks is higher than that for utility stocks. Use a=0.01.

(i) What is the level of significance? State the null and alternate hypothesis.

As here α = 0.01 thus the level of significance is 0.01 or 1%.

The null and alternative hypotheses in this case are,

H_{0}: u1≤ u2 against H_{1}: u1>u2.

(ii) What sampling distribution will you use? What assumptions are you making? What is the value of the sample test statistic?

As the population standard deviations are known I will use a Z test rather than a t test. And so I am assuming that the sampling distribution follows a Normal distribution and that the sampleis a random sample (these are the assumptions).

The value of the sample test statistic is,

Z = = = 4.186

(iii) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value.

Now the alternative is one sided implies that the test is a one tail or one sided test (right tail) thus the p-value for this test is,

P = P(Z >4.186) = 0.0001420.

The required graph is given below,

6. A random of 17 wolf litters in Ontario, Canada, gave an average of x1=4.9 wolf pups per litter with estimated sample standard deviation s1=1.0. Another random sample of 6 wolf litters in Finland gave an average of x2=2.8 wolf pups per litter with sample standard deviation s2=1.2.

(a) Find an 85% confidence interval for u1-u2, the difference in population mean litter size between Ontario and Finland.

Note that, here,

1-α = 0.85 => α=0.15 = α/2 = 0.075

Now if we look at the standard deviations we can see that they are almost same so there is no significant difference between the standard deviations hence we need to use a pooled estimate of variance which is,

S^{2} = = = 1.1048

As the sample size is small (less than 30) so we need to use t value,

85% CI = ( )

Df = 17+6-2 = 21 so == 1.494.

Hence,

85% CI = (4.9 )

= (1.3543, 2.8457)

(b) Examine the confidence interval and explain what it means in the context of this problem. Does the interval consist of numbers that are all positive? All negative? Of different signs? At the 85% level of confidence, does it appear that the average size of wolf pups in Ontario is greater than the average litter size in Finland?

As the confidence interval does not contain 0 and on the right hand side thus it contains all positive values and so we can conclude that at 85% confidence level it appears that the average size of wolf pups in Ontario is greater than the average litter size in Finland.

© test the claim that the average litter of wolf pups in Ontario is greater than the average litter size wolf pups in Finland. Use a=0.01

(iv) What is the level of significance? State the null and alternate hypothesis.

As a=0.01 thus the level of significance is 0.01 and the hypothesis of interest is,

H_{0}: u1≤u2 against H_{a}: u1 > u2.

(v) What sampling distribution will you use? What assumptions are you making? What is the value of the sample test statistic?

Here the sample sizes are small thus the sample does not satisfy the condition of CLT (sample size should be larger than 30) so I will use a t test in this case. So I am assuming that the sample is randomly drawn, samples are independent to each other and the sampling distribution is normal (rather asymptotically normal).

The value of the sample test statistic is,

T = = 4.2074

(vi) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value.

The p-value = P( t_{21}> 4.2074) = 0.0001979 and the required graph is given below.

8.Locander and others also studied the accuracy of responses on questions involving more sensitive material than voter registration. From public records, individuals were identified as having been charged with drunken driving not less than 6 months or more than 12 months from the starting date of the study. Two random samples from this group were studied. In the first sample of 30 individuals the respondents were asked in a face-to-face interview if they had been charged with drunken driving in the last 12 months. Of these 30 people interviewed face-to-face, 16 answered the question accurately. The second random sample consisted of 46 people who had been charged with drunken driving. During a phone interview, 25 of these responded accurately to the question asking if they had been charged with drunken driving during the past 12 months. Assume that the samples are representative of all people recently charged with drunken driving.

(a) Let p1 represent the population portion of all people with recent charges of drunken driving who respond accurately to a face-to-face interview asking if they have been charged with drunken driving during the past 12 months. Let p2 represent the population portion of people who respond accurately to the same question when asked in a telephone interview. Find 90% confidence interval for p1-p2.

(b) Does the interval found in part (a) contain all numbers that re positive? All negative? Mixed? Comment on the meaning of the confidence interval in the context of this problem. At the 90% confidence level, do you detect any differences in the proportion of accurate responses to the question from face-to-face interviews as compared with the proportion of accurate responses from the telephone interviews?

© test the claim that there is a difference in the proportion of accurate responses from face-to-face interviews compared with the proportion of accurate responses from telephone interviews. Use a=0.05.

_{(vii) }What is the level of significance? State the null and alternate hypothesis.

(viii) What sampling distribution will you use? What assumptions are you making? What is the value of the sample test statistic?

(ix) Find (or estimate) the P-value. Sketch the sampling distribution and show the area corresponding to the P-value.