▤ 목차
Sampling and Point estimation
This week shifts its focus from probability to statistics. You will start by learning the concept of a sample and a population and two fundamental results from statistics concerning samples and populations: the law of large numbers and the central limit theorem. In lesson 2, you will learn the first and the simplest method of estimation in statistics: point estimation. You will see how maximum likelihood estimation, the most common point estimation method, works and how regularization helps prevent overfitting. You'll then learn how Bayesian Statistics incorporates the concept of prior beliefs into the way data is evaluated and conclusions are reached.
Population and Sample
Population and Sample
A population is the entire group of individuals or items we want to study.
A sample is the smaller subset (of the population) we observe or measure.
Remember, we always pick samples randomly.
We do not want a case where one sample depends on the other samples (i.e. get the second sample set after picking the first sample set but not picking any samples from the first sample set).
Samples can be repeated between the sets since we pick new random samples for each set.
Samples should be identically distributed.
What is the population of your study?
All avocados sold in the U.S.

In your study about the price of avocados in the United States, what is the sample of your study?
Avocados sold in the 4 stores you selected.

Sample Mean
To calculate the population average, we calculate the average of the sample to get an estimate.
And the same goes for the proportion.
Which sample is most likely to generate a mean closest to the population mean?
- A random sample of 6 people
- A convenience sample of the first 6 people.
1
A random sample is more likely to represent the population because it avoids bias and ensures that each individual has an equal chance of being selected. Therefore, the mean calculated from a random sample is more likely to be closer to the population mean.
When comparing sample means, which statement is generally true when comparing a small sample to a large sample?
- The mean of a small sample (e.g. sample size = 2) is more likely to represent the population mean.
- The mean of a large sample (e.g. sample size = 6) is more likely to represent the population mean.
- The means of small and large samples are equally likely to represent the population mean.
- The sample size does not affect the representativeness of the sample mean.
2
Larger samples produce more reliable population parameter estimates because they reduce sampling variability's impact. Therefore, the mean of a large sample is more likely to be closer to the population mean.
Sample Proportion
As the sample number goes up, the sample proportion gets closer to the population proportion.
Sample Variance
If the first dataset has a sample variance of 1.7, what do you think this dataset’s sample variance will be? Remember, variance is the average squared deviation from the mean.
About 50
The points are about 7 away from the sample mean, on average, so you'd expect the sample variance to be about $7^2$ or roughly 50.

The above introduces a small game where we have 1, 2, or 3 and we win whatever we pull out, so the average would be 2.
We can see in the third slide that when the sample size is small, the sample variance deviates from the population variance.
Subtracting the n (number of samples) by 1 in the denominator of the sample variance formula fixes the bias issue that we saw earlier.
Law of Large Numbers
The larger the number, the better the estimate.
What do you think will happen to the average of the samples as the sample size increases?
The average of the samples will get closer to the population mean
Great job! This illustrates the Law of Large Numbers.
Central Limit Theorem - Discrete Random Variable
Central Limit Theorem: as n (number of samples) gets larger, the distribution forms more of the Gaussian (normal) distribution.
What can we say about the probability distribution of the number of heads flipped when the number of coin flips increases?
The probability distribution becomes bell-shaped.
Great job! As the number of coin flips increases, the probability distribution of the number of heads flipped tends to follow a bell-shaped curve, which is characteristic of the normal distribution. Continue the video to hear the explanation.


Central Limit Theorem - Continuous Random Variable
What can we say about the distribution of the average wait time calls as n increases?
The distribution becomes bell-shaped.
As the sample size n increases, the distribution of the averages of wait time calls tends to follow a bell-shaped curve, which is characteristic of the normal distribution.

As n grows, so do the mean and the variance.
We usually need a minimum of 30 samples to see the normal distribution.
We need fewer samples when the original dataset has a symmetrical distribution compared to the skewed distribution.
Like the above example, with the uniform distribution, the normal distribution appears quickly.
We need to standardize because the mean is always the same as the population mean, but the variance depends on n.
So it’s easier to compare the distribution of $Y_n$ (record of the average of all n experiments) for different values of n when you standardize this average.
- Think of this as the reason why we standardize different features (height vs weight), which helps compare distributions of different units
As n increases, the distribution of the average is close to a standard normal distribution.
The central limit theorem states that as n goes to infinity, the standardized average will follow a standard normal distribution and is usually true for n around 30 or higher.
All the information provided is based on the Probability & Statistics for Machine Learning & Data Science | Coursera from DeepLearning.AI