# Estimators and Sampling Distributions

We have learned many different distributions for random variables and all of those distributions had parameters: the numbers that you provide as input when you define a random variable.

What if we don’t know the values of the parameters. What if instead of knowing the random variables, we have a lot of examples of data generated with the same underlying distribution? In this chapter we are going to learn formal ways of estimating parameters from data.

**These ideas are critical for artificial intelligence. Almost all modern machine learning algorithms work like
this**

specify a probabilistic model that has parameters.

Learn the value of those parameters from data.

Both of these schools of thought assume that your data are independent and identically distributed (IID) samples.

## Random Sample

A collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent.

Random Sample = \(X_1, X_2, X_3, ..., X_n\)

Suppose that \(X_1, X_2, X_3, ..., X_n\) is a random sample from the gamma distribution with parameters \(alpha\) and \(\beta\).

**E.g**

A good example is a succession of throws of a fair coin: The coin has no memory, so all the throws are “independent”. And every throw is 50:50 (heads:tails), so the coin is and stays fair - the distribution from which every throw is drawn, so to speak, is and stays the same: “identically distributed”.

### Independent and identically distributed random variables (IID)

Random Sample == IID

Note

What are biased and unbiased estimators? A biased estimator is one that deviates from the true population value. An unbiased estimator is one that does not deviate from the true population parameter.

## Parameters

Before we dive into parameter estimation, first let’s revisit the concept of parameters. Given a model, the parameters are the numbers that yield the actual distribution.

In the case of a Bernoulli random variable, the single parameter was the value p.

In the case of a Uniform random variable, the parameters are the a and b values that define the min and max value.

we are going to use the notation \(\theta\) to be a vector of all the parameters.

Distribution |
Parameters |
---|---|

Bernoulli(p) |
\(\theta = p\) |

Poisson(λ) |
\(\theta = \lambda\) |

Uniform(a,b) |
\(\theta = (a,b)\) |

Normal |
\(\theta = (\mu,\sigma)\) |

\(Y = wX + b\) |
\(\theta = (w,b)\) |

## Sampling Distributions

\(\theta\) will denote a generic parameter.

**E.g**

\(\theta = \mu , \theta = p , \theta = \lambda , \theta = (\alpha, \beta)\)

**Estimator**

\(\hat{\theta}\) = a Random variable,

\(\hat{\theta}=\bar{X}\)

**Estimate**

\(\hat{\theta}\) = a observed number

\(\hat{\theta}=\bar{x} = 42.5\)

We want our estimator of to be correct “on average.

\(\bar{X}\) is a random variable with its owo distribution and its own mean or expected value.

We would like sample mean \(𝖤[\bar{𝖷}] = μ\) to be close to the true mean or population mean \(μ\).

Important

If this is true, we say that \(\bar{𝖷}\) is an unbiased estimator of \(\mu\).

In general, \(\bar{\theta}\) is an unbiased estimator of \(\theta\). if \(E[\bar{\theta}] = \theta\).

That’s is really good thing.

### Mean

Let X1, X2, …, Xn be random sample from any distribution with mean \(\mu\).

That is \(E[X_i] = \mu\) for i = 1,2,3,…, n. Then

We have shown that, no matter what distribution we are working with, if the mean is \(\mu\) , \(\bar{X}\) is an unbiased estimator for \(\mu\).

Attention

We have shown that, no matter what distribution we are working with, if the mean \(\mu\) is , \(\bar{X}\) is an unbiased estimator for \(\mu\) .

Let X1, X2, …, Xn be random sample from any 𝖾𝗑𝗉(rate = \(\lambda\))

Let \(\bar{X}=\frac{1}{n} \sum_{i=1}^{n} X_{i}\) is the sample mean. We know, for the exponential distribution, that \(E[X_i]=\frac{1}{\lambda}\).

Then \(E[\bar{X}] = \frac{1}{\lambda}\)

### Variance

Let X1, X2, …, Xn be random sample from any distribution with mean \(\mu\) and variance \(\sigma^2\).

We already know that \(\bar{X}\) is an unbiased estimator for \(\mu\) .

What can we say about the variance of \(\bar{X}\)?

\(Var[\bar{X}]=Var\left[\frac{1}{n} \sum_{i=1}^{n} X_{i}\right]= =\frac{1}{n^{2}} Var\left[\sum_{i=1}^{n} X_{i}\right] = =\frac{1}{n^{2}} \sum_{i=1}^{n} Var\left[X_{i}\right]\)

\(=\frac{1}{n^{2}} \sum_{i=1}^{n} \sigma^{2} = \frac{1}{n^{2}} n \sigma^{2} =\frac{\sigma^{2}}{\mathrm{n}}\)