How to Build ML Solutions (w/ Python Code Walkthrough)

Поділитися
Вставка
  • Опубліковано 6 чер 2024
  • 👉 More on Full Stack Data Science: • Full Stack Data Science
    This is the 4th video in a series on Full Stack Data Science. Here, I explain why experimentation is critical to the ML lifecycle and walk through the development of a semantic search tool for my UA-cam videos.
    More Resources:
    💻 Example Code: github.com/ShawhinT/UA-cam-B...
    🤖 RAG: • How to Improve LLMs wi...
    📚Text Embeddings: • Text Embeddings, Class...
    References:
    [1] / software-2-0
    [2] arxiv.org/abs/2012.07919
    --
    Book a call: calendly.com/shawhintalebi
    Homepage: shawhintalebi.com/
    Socials
    / shawhin
    / shawhintalebi
    / shawhint
    / shawhintalebi
    The Data Entrepreneurs
    🎥 UA-cam: / @thedataentrepreneurs
    👉 Discord: / discord
    📰 Medium: / the-data
    📅 Events: lu.ma/tde
    🗞️ Newsletter: the-data-entrepreneurs.ck.pag...
    Support ❤️
    www.buymeacoffee.com/shawhint
    Introduction - 0:00
    Why ML is Different - 0:39
    Role of Experimentation - 3:04
    Semantic Search (Design Choices) - 5:09
    Example Code: Semantic Search of YT Videos - 8:17
    Preview of Final Product - 10:06
    Step 1: Experimentation & Evaluation - 11:17
    Step 2: Build Video Index - 34:14
    Step 3: Build UI - 35:49
    What's Next? - 43:43

КОМЕНТАРІ • 4

  • @ShawhinTalebi
    @ShawhinTalebi  27 днів тому

    More on Full Stack Data Science 👇
    👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html
    💻 Example Code: github.com/ShawhinT/UA-cam-Blog/tree/main/full-stack-data-science/data-science

  • @kreddy8621
    @kreddy8621 26 днів тому

    Brilliant, thanks

  • @Tenebrisuk
    @Tenebrisuk 26 днів тому

    Great video, really interesting.
    A question on the encoding process. Does condensing transcripts into an embedding with 384 dimensions lose much information, or does the encoding process truncate the text at a point?
    How would something like this manage a lengthy transcript where you cover several different topics?
    Does the embedding get too "noisy" in that case to be able to really stand above your threshold if only perhaps 5 lines out of 100 contain the information relating to the search?

    • @ShawhinTalebi
      @ShawhinTalebi  26 днів тому

      That's a great question. Whether (much) information is lost depends on the specific use case. For example, if you have simple text chunks that either say "True" or "False" then even a 1 dimensional embedding will preserve all the information. However, as your describing, the longer the chunks the more information can be lost. This is why experimentation is so critical because you can't really know 1) how much "information" is preserved by embeddings and 2) how that impacts your use case, before just trying it out.