본문 바로가기

Coursera/Mathematics for ML and Data Science

Probability & Statistics for Machine Learning & Data Science (13)

728x90

Describing probability distributions and probability distributions with multiple variables

Describing Distributions

728x90

Quantiles and Box-Plots

012

We need to look at the data not only numerically but also visually.

012

Visualizing data: Box-Plots

012
Box plots

We can visualize the data using the quantiles and we need minimum, maximum, 25%, 50%, and 75% quartiles, and the interquartile range (IQR) which is the third quartile minus the first quartile.

Visualizing data: Kernel density estimation

01

These are continuous data and can be plotted with histograms, but histograms don’t have a smooth curve.

So one way to approximate PDF is by using kernel density estimation (KDE).

0123
KDE

The blue Gaussian distributions are called kernels.

Visualizing data: Violin Plots

01

Violin plots are useful as they are formed with KDE and box plots.

Visualizing data: QQ plots

01

Histograms are useful in visualizing the distribution of the data.

With the example newspaper dataset, we can see that the data doesn’t form a Gaussian distribution, and also more data on the left compared to the right tells us that the dataset is right-skewed.

With the sales data, we can see that the data forms a Gaussian distribution by looking at the histograms and QQ plots.

 

All the information provided is based on the Probability & Statistics for Machine Learning & Data Science | Coursera from DeepLearning.AI

728x90