본문 바로가기
Coursera/Mathematics for ML and Data Science

Probability & Statistics for Machine Learning & Data Science (13)

by Fresh Red 2024. 9. 26.
728x90
반응형

▤ 목차

    Describing probability distributions and probability distributions with multiple variables

    Describing Distributions

    728x90

    Quantiles and Box-Plots

    012

    We need to look at the data not only numerically but also visually.

    012

    Visualizing data: Box-Plots

    012
    Box plots

    We can visualize the data using the quantiles and we need minimum, maximum, 25%, 50%, and 75% quartiles, and the interquartile range (IQR) which is the third quartile minus the first quartile.

    Visualizing data: Kernel density estimation

    01

    These are continuous data and can be plotted with histograms, but histograms don’t have a smooth curve.

    So one way to approximate PDF is by using kernel density estimation (KDE).

    0123
    KDE

    The blue Gaussian distributions are called kernels.

    Visualizing data: Violin Plots

    01

    Violin plots are useful as they are formed with KDE and box plots.

    Visualizing data: QQ plots

    01

    Histograms are useful in visualizing the distribution of the data.

    With the example newspaper dataset, we can see that the data doesn’t form a Gaussian distribution, and also more data on the left compared to the right tells us that the dataset is right-skewed.

    With the sales data, we can see that the data forms a Gaussian distribution by looking at the histograms and QQ plots.

     

    All the information provided is based on the Probability & Statistics for Machine Learning & Data Science | Coursera from DeepLearning.AI

    728x90
    반응형

    home top bottom
    }