본문 바로가기

Coursera/Mathematics for ML and Data Science

Probability & Statistics for Machine Learning & Data Science (25)

728x90

Confidence Intervals and Hypothesis testing

Hypothesis Testing

728x90

t-Distribution

01

When the data can be modeled as a Gaussian distribution with parameters $\mu$ and $\sigma^2$, the sample mean will also follow a Gaussian distribution of the same mean, but with a smaller standard deviation.

If we don’t know the standard deviation, then we use T-statistics instead of Z-statistics.

012

But using T-statistics doesn’t follow the standard normal distribution, rather it follows the t-distribution.

A parameter of the t-distribution is the degree of freedom ($\nu$).

As the degrees of freedom increase, the distribution becomes more like the Gaussian distribution, and by 30, it is like the Gaussian distribution and that’s why we would like to have at least 30 samples to perform a test.

t-Tests

0123

The result differs from the previous test due to the uncertainty of not knowing the population variance.

Test for proportions

In the videos, you learned how to perform hypothesis testing for the mean of a Gaussian population. Another very useful example is testing for a population proportion $𝑝$.

An example

Imagine you have a coin, but you don't know whether it's fair. The proportion you are interested in is $p=\bold{P}(H)$. A possible set of hypotheses for this problem is

$$ H_0: p=0.5 \text{ vs. }H_1:p\ne0.5 $$

Imagine you toss the coin 20 times, of which 7 turned out heads. Your random sample consists of one random variable X = "number of heads in 20 coin flips", which has a $𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(20,𝑝)$ distribution. A good estimation for the proportion is the relative frequency of heads:

$$ \hat p={X\over20} $$

Remember that under certain conditions, the Central Limit Theorem states that $\hat p \sim N(p, \sqrt{{p(1-p)\over20}})$, or equivalently

$$ Z={{X\over20}-p\over\sqrt{p(1-p)}}\sqrt{20}\sim N(0, 1) $$

Z will be your test statistic. If $H_0$ is true (P=0.5), then your test statistic becomes

$$ Z={{X\over20}-0.5\over\sqrt{0.5(1-0.5)}}\sqrt{20}={{X\over20}-0.5\over0.5}\sqrt{20}\sim N(0, 1) $$

Consider a significance level $\alpha=0.05$. Then to make a decision you need to get the p-value for your observed statistic. With the observed sample $x=7$, the observed statistic is

$$ z={{7\over20}-0.5\over0.5}\sqrt{20}=-1.3416 $$

The p-value is then the probability that $Z>|z|$ or $X<-|z|$:

$$ p\text{-value}=\bold{P}(|Z|>|z|)=\bold{P}(|Z|>1.3416)=0.1797 $$

Conclusion: Since the p-value is bigger than the significance level of 0.05, you do not have enough evidence to reject the null hypothesis that $𝑝=0.5$.

General case:

  • $p$ is the population proportion of individuals in a particular category (i.e. probability of the coin landing heads)
  • $p_0$ is the population proportion under the null hypothesis (i.e. $p_0=0.5$)
  • $x$ is the observed number of individuals in the sample from the specified category (i.e. number of heads)
  • $n$ is the sample size (i.e. number of the coin toss)
  • $\hat p={x\over n}$ is the sample proportion for the observed sample $x$.

Then, $Z={{X\over n}-p_0\over\sqrt{p_0(1-p_0)}}\sqrt{n}\sim N(0, 1)$ is the test statistic for comparing proportions, and $z={{x\over n}-p_0\over\sqrt{p_0(1-p_0)}}\sqrt{n}$ is the observed statistic.

Depending on the type of hypothesis, you have different expressions for the p-value:

  • Right-tailed test: $H_0:p=p_0 \text{ vs. }H_1:p>p_0:\\p\text{-value}=\bold P(Z>z)$

  • Left-tailed test: $H_0:p=p_0 \text{ vs. }H_1:p<p_0:\\p\text{-value}=\bold P(Z<z)$

  • Two-tailed test: $H_0:p=p_0 \text{ vs. }H_1:p\ne p_0:\\p\text{-value}=\bold P(|Z|>|z|)$

For these results to be valid, the following conditions need to be satisfied:

  • The population size must be at least 20 times bigger than the sample size. This is necessary to ensure that all samples are independent. This condition is unnecessary in a situation like the coin toss, where independence is inherent to the experiment.
  • The individuals in the population can be divided into two categories: whether they belong to the specified category or they don't
  • The values $np_0>10$ and $n(1-p_0)>10$. This condition must be verified so that the Gaussian approximation holds when the assumption that $H_0$ is true.

Two Sample t-Test

01234567

A two-sample test for proportions

You learned how to test if the means from two populations were different from one another. Here, you will see a similar test for comparing proportions from two different populations. A common application of a two-sample test for proportions is in A/B testing.

An example

Imagine you want to compare the proportion of households that own a car in Chicago ($p_1$) with the proportion of households that do in New York ($p_2$). Note that $p_1$ and $p_2$ are population proportions.

A possible set of hypotheses for this problem is

$$ H_0:p_1=p_2 \text{ vs. }H_1=p_1\ne p_2 $$

Consider for this test a significance level of 0.05

Suppose you randomly sample $n_1=100$ households from Chicago, 62 of which own a car, and $n_2=120$ households from New York, 58 of which own a car.

Defining X = “number of households that own a car in Chicago” and Y = “number of households that own a car in New York”, a good approximation for $p_1$ and $p_2$ are

$$ \hat p_1={X\over100}\quad \hat p_2={Y\over 120} $$

Naturally, a good approximation for $\Delta=p_1-p_2$ is

$$ \hat\Delta=\hat p_1-\hat p_2 $$

To get a good test statistic, you must find the distribution of $\hat\Delta$. First, note that $n_1$ and $n_2$ are big enough, which in this case are:

$$ \hat p_1={X\over100}\sim N\left(p_1, {p_1(1-p_1)\over100}\right) \quad \hat p_2={Y\over120}\sim N\left(p_2, {p_2(1-p_2)\over120}\right) $$

so that $\hat\Delta=\hat p_1-\hat p_2 \sim N\left(p_1 - p_2, {p_1(1-p_1)\over100}+{p_2(1-p_2)\over120}\right)$

If $H_0$ is true, then $p_1=p_2=p$. This simplifies the expression quite a bit! In this case,

$$ \hat\Delta=\hat p_1-\hat p_2 \sim N\left(0, {p_1(1-p_1)\over100}+{p_2(1-p_2)\over120}\right) = N\left(0, p_1(1-p_1)\left({1\over100}+{1\over120}\right)\right) $$

Standardizing, you get that

$$ {{X\over100}-{Y\over120}-0\over\sqrt{p(1-p)({1\over100}+{1\over120})}} \sim N(0, 1) $$

Unfortunately, this statistic is not good enough, because even if $H_0$ is true, you do not know the value of $p$.

However, you can replace it with the aggregated sample proportion. If $H_0$ is true, and $p_1=p_2=p$, then $X\sim Binomial(100,p)$ and $Y\sim Binomial(120,p)$, so you can use both samples to get a better estimate of $p$:

$$ \hat p = {X+Y\over100+120}={X+Y\over220} $$

Replacing $p$ with $\hat p$ you finally get the test statistic

$$ Z={{X\over100}-{Y\over120}-0\over\sqrt{{(X+Y)\over220}(1-{X+Y\over220})({1\over100}+{1\over120})}} \sim N(0, 1) $$

To simply the expression even further, observe that $({1\over100}+{1\over120})=({100+120\over100\cdot120})$, so that you can rewrite the test statistic as

$$ Z={{X\over100}-{Y\over120}-0\over\sqrt{{(X+Y)\over220}(1-{X+Y\over220})}}\sqrt{100\cdot120} \sim N(0, 1) $$

With the observations you have ($X=62, y=58$), the observed statistic results in $z=2.0271$. Since you are proposing a two-sided test, then the p-value is the probability that $Z>2.0271$ or $Z<-2.0271$:

$$ p\text{-value}=\bold{P}(|Z|>2.0271)=0.04265 $$

Conclusion: Since the p-value is smaller than the significance level (0.05), then you conclude that you have enough evidence to reject the null hypothesis, and accept that the two population proportions are different

General Case

You have two populations or groups you want to compare.

  • $p_1-p_2$ is the difference in the population proportion between two groups. (i.e. the difference in the proportion of households that own a car in Chicago and New York)
  • $x$ is the observed number of individuals in the sample from the specified category from one of the groups(i.e. number of households that own a car in Chicago)
  • $y$ is the observed number of individuals in the sample from the specified category from one of the groups(i.e. number of households that own a car in New York)
  • $n_1$ is the sample size for group 1 (sample size from Chicago)
  • $n_2$ is the sample size for group 2 (sample size from New York)

Then the test statistic is

$$ Z={{X\over n_1}-{Y\over n_2}-0\over\sqrt{{(X+Y)}(1-{X+Y\over n_1+n_2})}}\sqrt{n_1\cdot n_2} \sim N(0, 1) $$

and the observed statistic is

$$ z={{x\over n_1}-{y\over n_2}-0\over\sqrt{{(x+y)}(1-{x+y\over n_1+n_2})}}\sqrt{n_1\cdot n_2} \sim N(0, 1) $$

Depending on the type of hypothesis, you have different expressions for the p-value:

  • Right-tailed test: $H_0:p_1-p_2=0 \text{ vs. }H_1:p_1-p_2 > 0:\\ p\text{-value}=\bold P(Z>z)$

  • Left-tailed test: $H_0:p_1-p_2=0 \text{ vs. }H_1:p_1-p_2 < 0:\\ p\text{-value}=\bold P(Z<z)$

  • Two-tailed test: $H_0:p_1-p_2=0 \text{ vs. }H_1:p_1-p_2 \ne 0:\\ p\text{-value}=\bold P(|Z|>|z|)$

For these results to be valid, the following conditions need to be satisfied:

  • Two simple random samples are independent from one another. This means that you have one sample from population 1 and another from population 2 and that the samples are independent between both groups.
  • Each population size must be at least 20 times bigger than the sample size. This is necessary to ensure that all samples are independent.
  • The individuals in each sample can be divided into two categories: whether they belong to the specified category or they don't
  • Both sample sizes need to be at least 10. This condition needs to be verified so that the Gaussian approximation holds when the assumption that $H_0$ is true.

Paired t-Test

The top is a two-sample t-test and the bottom is a paired t-test

The paired t-test is a test of two dependent groups (like weights of before vs after the training program).

01234

A researcher wants to investigate the effectiveness of a new teaching method for improving students' math skills. The researcher selects 20 students from a school and divides them into two groups: Group A and Group B. Each student undergoes an initial assessment of their math skills. Then, Group A receives traditional teaching methods for four weeks, while Group B receives the new teaching method for the same duration. After the four-week intervention, both Group A and Group B undergo a post-assessment of their math skills using the same standardized test. Based on the scenario described, is this an example of a two-sample t-test or a paired t-test?

더보기

Paired t-test

Great job! The researcher is interested in comparing the individual changes in math skills within each group. Therefore, the appropriate statistical test to use is a paired t-test.

ML Application: A/B Testing

A/B testing is a way to compare two different strategies or variations to see which one is better.

A common rule in A/B testing is to have fewer people try option B since we don’t know if it will be effective.

For our example, we update the location of designs A and B on the website to see if it improves the purchase amounts.

01

Then we perform the necessary test appropriate for the hypothesis.

In this example, we perform a two-sample t-test.

We used a t-test because we were comparing the means for Gaussian populations, otherwise, we’d use another statistical test to make the decision.

01

Due to the law of large numbers, the samples gathered have a trait of central limit theorem and follow a Gaussian distribution.

01234

All the information provided is based on the Probability & Statistics for Machine Learning & Data Science | Coursera from DeepLearning.AI

728x90