Hypothesis Testing

Statistical inference is the process of learning about characteristics of a population based on what is observed in a relatively small sample from that population. A sample will never give us the entire picture though, and we are bound to make incorrect decisions from time to time.

We will learn how to derive and interpret appropriate tests to manage this error and how to evaluate when one test is better than another. we will learn how to construct and perform principled hypothesis tests for a wide range of problems and applications when they are not.

What is Hypothesis

Hypothesis testing is an act in statistics whereby an analyst tests an assumption regarding a population parameter.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

Note

Due to random samples and randomness in the problem, we can different errors in our hypothesis testing. These errors are called Type I and Type II errors.

Type of hypothesis testing

Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and variance $\sigma^2$

import torch
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm


sns.set_theme(style="darkgrid")
sample = torch.normal(mean = 0, std = 1, size=(1,1000))

sns.displot(sample[0], kde=True, stat = 'density',)
plt.axvline(torch.mean(sample[0]), color='red', label='mean')
plt.show()

../_images/06e632c7b457599fdbb9bde0730634a01d920579f5d8a468f288ff84ed949ed2.png

Example of random sample after it is observed:

\[ 2.73,1.14,3.98,2.15,5,85,1.97,2.54,2.03 \]

Based on what you are seeing, do you believe that the true population mean $\mu$ is

\[\begin{split} \mu<=3 \text{ or } \mu>3 \\ \text { The sample mean is } \overline{\mathrm{x}}=2.799 \end{split}\]

This is below 3 , but can we say that $\mu<3$ ?

This seems awfully dependent on the random sample we happened to get! Let’s try to work with the most generic random sample of size 8:

\[ X_1, X_2, X_3, X_4, X_5, X_6, X_7, X_8 \]

Let $\mathrm{X}_1, \mathrm{X}_2, \ldots, \mathrm{X}_{\mathrm{n}}$ be a random sample of size $\mathrm{n}$ from the $\mathrm{N}\left(\mu, \sigma^2\right)$ distribution.

\[ \mathrm{X}_1, \mathrm{X}_2, \ldots, \mathrm{X}_{\mathrm{n}} \stackrel{\text { iid }}{\sim} \mathrm{N}\left(\mu, \sigma^2\right) \]

The Sample mean is

\[ \bar{x}=\frac{1}{n} \sum_{i=11}^n X_i \]

We’re going to tend to think that $\mu<3$ when $\bar{X}$ is “significantly” smaller than 3.
We’re going to tend to think that $\mu>3$ when $\bar{X}$ is “significantly” larger than 3.
We’re never going to observe $\bar{X}=3$, but we may be able to be convinced that $\mu=3$ if $\bar{X}$ is not too far away.

How do we formalize this stuff, We use hypothesis testing

Notation

$\mathrm{H}_0: \mu \leq 3$ <- Null hypothesis
$\mathrm{H}_1: \mu>3 \quad$ Alternate hypothesis

Null hypothesis

The null hypothesis is a hypothesis that is assumed to be true. We denote it with an $H_0$.

Alternate hypothesis

The alternate hypothesis is what we are out to show. The alternative hypothesis is a hypothesis that we are looking for evidence for or out to show. We denote it with an $H_1$.

Note

Some people use the notation $H_a$ here

Conclusion is either:
Reject $\mathrm{H}_0 \quad$ OR $\quad$ Fail to Reject $\mathrm{H}_0$

simple hypothesis

A simple hypothesis is one that completely specifies the distribution. Do you know the exact distribution.

composite hypothesis

You don’t know the exact distribution.
Means you know the distribution is normal but you don’t know the mean and variance.

Critical values

Critical values for distributions are numbers that cut off specified areas under pdfs. For the N(0, 1) distribution, we will use the notation $z_\alpha$ to denote the value that cuts off area $\alpha$ to the right as depicted here.

Errors in Hypothesis Testing

Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and variance $\sigma^2=2$

\[ H _0: \mu \leq 3 \quad H _1: \mu>3 \]

Idea: Look at $\bar{X}$ and reject $H_0$ in favor of $H _1$ if $\overline{ X }$ is “large”.
i.e. Look at $\bar{X}$ and reject $H_0$ in favor of $H _1$ if $\overline{ X }> c$ for some value $c$.

You are a potato chip manufacturer and you want to ensure that the mean amount in 15 ounce bags is at least 15 ounces. $\mathrm{H}_0: \mu \leq 15 \quad \mathrm{H}_1: \mu>15$

Type I Error

The true mean is $\leq 15$ but you concluded i was $>15$. You are going to save some money because you won’t be adding chips but you are risking a lawsuit!

Type II Error

The true mean is $> 15$ but you concluded it was $\leq 15$ . You are going to be spending money increasing the amount of chips when you didn’t have to.

Developing a Test

Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.

Consider testing the simple versus simple hypotheses

\[\begin{split} \begin{aligned} & H _0: \mu=5 \\ & H _1: \mu=3 \end{aligned} \end{split}\]

level of significance

Let $\alpha= P$ (Type I Error)
$= P \left(\right.$ Reject $H _0$ when it’s true $)$
$= P \left(\right.$ Reject $H _0$ when $\left.\mu=5\right)$

$\alpha$ is called the level of significance of the test. It is also sometimes referred to as the size of the test.

\[\begin{split} \begin{aligned} \alpha &=\max P (\text { Type I Error }) \\ &=\max _{\mu \in H _0} P \left(\text { Reject } H _0 ; \mu\right) \\ \beta &=\max P (\text { Type II Error }) \\ &=\max _{\mu \in H _1} P \left(\text { Fail to Reject } H _0 ; \mu\right) \end{aligned} \end{split}\]

Power of the test

$1-\beta$ is known as the power of the test

\[\begin{split} \begin{gathered} 1-\beta=1-\max _{\mu \in H _1} P \left(\text { Fail to Reject } H _0 ; \mu\right) \\ =\min _{\mu \in H _1}\left(1- P \left(\text { Fail to Reject } H _0 ; \mu\right)\right) \\ =\min _{\mu \in H _1} P \left(\text { Reject } H _0 ; \mu\right) \quad \begin{array}{c} \text { High power } \\ \text { is good! } \end{array} \end{gathered} \end{split}\]

Step One

Choose an estimator for μ.

\[ \widehat{\mu}=\bar{X} \]

Step Two

Choose a test statistic or Give the “form” of the test.

We are looking for evidence that $H _1$ is true.
The $N \left(3, \sigma^2\right)$ distribution takes on values from $-\infty$ to $\infty$.
$\overline{ X } \sim N \left(\mu, \sigma^2 / n \right) \Rightarrow \overline{ X }$ also takes on values from $-\infty$ to $\infty$.
It is entirely possible that $\bar{X}$ is very large even if the mean of its distribution is 3.
However, if $\bar{X}$ is very large, it will start to seem more likely that $\mu$ is larger than 3.
Eventually, a population mean of 5 will seem more likely than a population mean of 3.

Reject $H _0$, in favor of $H _1$, if $\overline{ X }< c$ for some c to be determined.

Step Three

Find c.

If $c$ is too large, we are making it difficult to reject $H _0$. We are more likely to fail to reject when it should be rejected.
If $c$ is too small, we are making it to easy to reject $H _0$. We are more likely reject when it should not be rejected.

This is where $\alpha$ comes in.

\[\begin{split} \alpha&= P(Type I Error) \\ &=P( \text{Reject } H_0 \text{ when true}) \\ &=P (\overline{ X }< c \text{ when } \mu=3) \end{split}\]

Step Four

Give a conclusion!

$0.05= P ($ Type I Error)
$= P \left(\right.$ Reject $H _0$ when true $)$
$= P (\overline{ X }< \text{ c when } \mu=5)$

$ = P \left(\frac{\overline{ X }-\mu_0}{\sigma / \sqrt{ n }}<\frac{ c -5}{2 / \sqrt{10}}\right.$ when $\left.\mu=5\right)$

Formula

Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$.

Consider testing the simple versus simple hypotheses

\[ H _0: \mu=\mu_0 \quad H _1: \mu=\mu_1 \]

where $\mu_0$ and $\mu_1$ are fixed and known.

\[ \begin{align}\begin{aligned}\begin{split} H_0: \mu=\mu_0 \\ H _1: \mu=\mu_1 \\ \mu_0<\mu_1 \\ \text{ Reject H0, in favor of H1 if } \\\end{split}\\\large \overline{ X }>\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }} \end{aligned}\end{align} \]

\[ \begin{align}\begin{aligned}\begin{split} H_0: \mu=\mu_0 \\ H _1: \mu=\mu_1 \\ \mu_0>\mu_1 \\ \text{ Reject H0, in favor of H1 if } \\\end{split}\\\large \overline{ X }<\mu_0+ z_{1-\alpha} \frac{\sigma}{\sqrt{ n }} \end{aligned}\end{align} \]

Type II Error

\[\begin{split} H_0: \mu=\mu_0 \\ H _1: \mu=\mu_1 \\ \mu_0<\mu_1 \end{split}\]

\[\begin{split} \begin{aligned} & \beta= P (\text { Type II Error }) \\ =& P \left(\text { Fail to Reject } H _0 \text { when false }\right) \\ =& P \left(\overline{ X } \leq \mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }} \text { when } \mu=\mu_1\right) \\ =& P \left(\overline{ X } \leq \mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }} ; \mu_1\right) \end{aligned}\end{split}\]

\[\begin{split} \begin{aligned} \beta &= P \left(\left(\frac{\overline{X} -\mu_1}{\sigma / \sqrt{ n }}\right) \leq \frac{\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }}-\mu_1}{\sigma / \sqrt{ n }} ; \mu_1\right) \\ &= P \left( Z \leq \frac{\mu_0+ z _\alpha \frac{\sigma}{\sqrt{ n }}-\mu_1}{\sigma / \sqrt{ n }}\right) \end{aligned} \end{split}\]

Composite vs Composite Hypothesis

\[\begin{split} \begin{aligned} & X _1, X _2, \ldots, X _{ n } \sim N \left(\mu, \sigma^2\right), \sigma^2 \text { known } \\ & H _0: \mu \leq \mu_0 \quad \text { vs } \quad H _1: \mu>\mu_0 \end{aligned} \end{split}\]

Step One Choose an estimator for μ
Step Two Choose a test statistic: Reject $H_0$ , in favor of $H_1$ if $\bar{𝖷}$ > c, where c is to be determined.
Step Three Find c.

One-Tailed Tests

Let $X_1, X_2, \ldots, X_n$ be a random sample from the normal distribution with mean $\mu$ and known variance $\sigma^2$. Consider testing the hypotheses

\[ H _0: \mu \geq \mu_0 \quad H _1: \mu<\mu_0 \]

where $\mu_0$ is fixed and known.