Correlation vs. Covariance | Standardization of Data | with example in Python/NumPy

Поділитися
Вставка
  • Опубліковано 17 чер 2024
  • The Multivariate Normal/Gaussian uses the Covariance Matrix to describe the interdependency of feature dimensions. Are the covariances in the off-diagonal elements related to correlation? In order to find it out, we need to standardize our data. Here are the notes: raw.githubusercontent.com/Cey...
    Here you can find an interactive web plot for the Multivariate Normal: share.streamlit.io/ceyron/num...
    It is common that multiple feature dimensions in high-dimensional data are not independent. Most of the time, there is a linear relationship, called correlation. We can investigate it by looking at the correlation coefficients, i.e., the off-diagonal elements of the correlation matrix. This matrix is computed as the covariance matrix of standardized data. Standardization refers to the process of first centering the data by their empirical mean and then dividing by the empirical standard deviation in each dimension.
    -----
    Information on Nonlinear Relationships:
    There are quite some interesting examples in which the data obviously has a relationship between the feature dimensions but would evaluate to a zero correlation, see here: en.wikipedia.org/wiki/Correla...
    As always, non-linearity extremely increase the difficulty to figure out these relationships. One way of doing so could be by manifold learning/nonlinear dimensionality reduction: scikit-learn.org/stable/modul... or en.wikipedia.org/wiki/Nonline...
    These techniques help because they discover (nonlinear) manifolds embedded in the high dimensional space (e.g., a 2d plane in 3d space).
    -------
    📝 : Check out the GitHub Repository of the channel, where I upload all the handwritten notes and source-code files (contributions are very welcome): github.com/Ceyron/machine-lea...
    📢 : Follow me on LinkedIn or Twitter for updates on the channel and other cool Machine Learning & Simulation stuff: / felix-koehler and / felix_m_koehler
    💸 : If you want to support my work on the channel, you can become a Patreon here: / mlsim
    -------
    ⚙️ My Gear:
    (Below are affiliate links to Amazon. If you decide to purchase the product or something else on Amazon through this link, I earn a small commission.)
    - 🎙️ Microphone: Blue Yeti: amzn.to/3NU7OAs
    - ⌨️ Logitech TKL Mechanical Keyboard: amzn.to/3JhEtwp
    - 🎨 Gaomon Drawing Tablet (similar to a WACOM Tablet, but cheaper, works flawlessly under Linux): amzn.to/37katmf
    - 🔌 Laptop Charger: amzn.to/3ja0imP
    - 💻 My Laptop (generally I like the Dell XPS series): amzn.to/38xrABL
    - 📱 My Phone: Fairphone 4 (I love the sustainability and repairability aspect of it): amzn.to/3Jr4ZmV
    If I had to purchase these items again, I would probably change the following:
    - 🎙️ Rode NT: amzn.to/3NUIGtw
    - 💻 Framework Laptop (I do not get a commission here, but I love the vision of Framework. It will definitely be my next Ultrabook): frame.work
    As an Amazon Associate I earn from qualifying purchases.
    -------
    Timestamps:
    00:00 Introduction
    01:51 Components of Covariance Matrix
    03:38 Estimating the Covariance Matrix
    06:37 Limitation of Covariances for dependency
    07:12 Correlation instead of Covariance
    07:28 Standardization
    10:37 Standardized Data Matrix
    11:29 Correlation Matrix
    12:33 Discussing correlations
    14:30 Python: Creating linear dataset
    16:12 Python: Concatenate into data matrix
    16:51 Python: Pure Covariance of the data
    17:48 Python: Standardizing the data
    21:22 Python: Using Broadcasting
    22:26 Python: Calculating correlation matrix
    23:22 Python: Correlation Matrix by NumPy
    24:06 Final Remarks on nonlinear dependencies
    25:06 Outro

КОМЕНТАРІ • 8

  • @danielnovikov1355
    @danielnovikov1355 3 роки тому +1

    Great content. Your explanations and short demos are terrific.
    A great follow up would be explaining the fisher information matrix.
    This suggested topic could tie in statistics and machine learning even stronger.

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому

      Thanks a lot for the feedback :)
      I will put the Fisher information matrix on my To-Do list. Thanks for the contribution. There is quite something in front of it in the queue, so it will take me some time to get the video out.

  • @albankurti300
    @albankurti300 2 роки тому +1

    you're good

  • @krithiksingh7715
    @krithiksingh7715 Рік тому +1

    Amazing! Thanks a lot! It gives deeper insights.. could you let me know if this interactive plot was made in python? Is this code available in your GitHub?

    • @MachineLearningSimulation
      @MachineLearningSimulation  Рік тому

      Hi,
      Thanks for the comment :).
      You are probably referring to this interactive plot: englishessential-pmf-pdfmultivariate-normal-interactiv-q1rxuf.streamlitapp.com/
      It is based on this python script: github.com/Ceyron/machine-learning-and-simulation/blob/main/english/essential_pmf_pdf/multivariate_normal_interactive_chart.py
      It employs streamlit (super nice library) and is hosted on the streamlit cloud. 😊

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 3 роки тому +1

    future video recommendation: connection to eigenvalues and eigenvectors would be interesting.

    • @MachineLearningSimulation
      @MachineLearningSimulation  3 роки тому +1

      Thanks for the contribution, I will note it down. :)
      Videos on for instance Principal Component Analysis (PCA) will come in the future