Describing probability distributions and probability distributions with multiple variables
Describing Distributions
Quantiles and Box-Plots
We need to look at the data not only numerically but also visually.
Visualizing data: Box-Plots
We can visualize the data using the quantiles and we need minimum, maximum, 25%, 50%, and 75% quartiles, and the interquartile range (IQR) which is the third quartile minus the first quartile.
Visualizing data: Kernel density estimation
These are continuous data and can be plotted with histograms, but histograms don’t have a smooth curve.
So one way to approximate PDF is by using kernel density estimation (KDE).
The blue Gaussian distributions are called kernels.
Visualizing data: Violin Plots
Violin plots are useful as they are formed with KDE and box plots.
Visualizing data: QQ plots
Histograms are useful in visualizing the distribution of the data.
With the example newspaper dataset, we can see that the data doesn’t form a Gaussian distribution, and also more data on the left compared to the right tells us that the dataset is right-skewed.
With the sales data, we can see that the data forms a Gaussian distribution by looking at the histograms and QQ plots.
All the information provided is based on the Probability & Statistics for Machine Learning & Data Science | Coursera from DeepLearning.AI