2 Years of LLM Advice in 35 Minutes (Sully Omar Interview)

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ • 126

  • @augmentos
    @augmentos Місяць тому +15

    🎯 Key points for quick navigation:
    00:00 *🤖 Overview of LLM Usage and Capabilities*
    - Discussion on using AI in day-to-day activities and how different models vary in their strengths and limitations.
    - Mention of model evaluation showing nuanced differences among models.
    - Highlight of the difficulty in achieving the final 5-10% of product completion with LLMs.
    02:08 *📝 Three-Tier System for LLM Categorization*
    - Explanation of the three-tier model categorizing LLMs by intelligence and cost.
    - Description of Tier 3 as efficient and cost-effective models used frequently.
    - Introduction of specific models like GPT-4 Mini and Gemini Flash for lower-tier usage.
    04:29 *⚙️ Use Case Differentiation Between Tiers*
    - Detailed examples of use cases for Tier 2 and Tier 1 models and how they are combined for efficiency.
    - Mention of tasks like research and document preparation with Tier 2 before using Tier 1 for more complex processing.
    - Strategy for combining capabilities across different LLMs for optimal performance.
    08:26 *🔄 Multi-Model Usage and Specialization*
    - Discussion on using multiple providers and evaluating their unique strengths and weaknesses.
    - Examples of Gemini models for multimodal tasks and their advantages in specific tasks like detailed searches.
    - Highlight on collaborative use of models such as GPT-4 Mini and Claude for structured output.
    11:38 *🛠️ Challenges and Future of Model Routing*
    - Insight into the continuous need for “hacks” and adjustments in LLM usage as models evolve.
    - Discussion on the future of model routing and its potential benefits and limitations.
    - Concerns regarding unintended side effects in automated model selection.
    14:42 *🧪 Model Distillation and Workflow Integration*
    - Explanation of model distillation as a method to use smaller models efficiently after initial optimization with larger ones.
    - Importance of having robust evaluation sets and data pipelines for successful distillation.
    - Mention of current tools and providers facilitating model distillation.
    17:54 *✍️ Practical Applications: Meta Prompting*
    - Introduction to meta prompting as a strategic way to enhance prompt engineering.
    - Discussion on moving from general problem-solving to specific prompt creation.
    - Insight into the future of automated prompt generation and its practical implications.
    18:49 *📝 Meta Prompting and Iterative Workflow*
    - Explanation of meta prompting: using AI to generate prompts for complex tasks,
    - Example of creating prompts with input from GPT models and iterating until refined,
    - The use of voice input to make interactions more natural and efficient.
    22:30 *🎙️ Using Voice for Prompt Creation*
    - Discussion on the advantages of voice input for clearer communication with LLMs,
    - Demonstration of using voice to guide prompt generation and refining outputs,
    - Iterative workflow combining voice, text input, and comparison across models.
    25:54 *🔄 Prompt Optimization Process*
    - Detailed look into optimizing prompts using various LLMs like Claude and GPT models,
    - Adjustments and improvements through testing and back-and-forth input,
    - Explanation of moving from rough drafts to optimized prompts ready for production.
    29:28 *⚙️ Full Prompt Testing with Gemini Pro*
    - Workflow demonstration of using Gemini Pro for final testing and synthesis of prompts,
    - Evaluation of model outputs and the importance of thorough testing with large data sets,
    - Mention of context token limits and performance in Gemini Pro for comprehensive tasks.
    32:17 *💾 Prompt Management and Version Control*
    - Description of prompt management using GitHub and LangSmith for version tracking,
    - Storing prompts in codebases for easy access and version history,
    - Approach for running tests against stored prompts and maintaining performance data.
    34:05 *🧪 Test-Driven Development with LLMs*
    - Insight into using LLMs for generating tests before coding to avoid errors,
    - Process of feeding test outputs back to LLMs for code correction,
    - Example of using Cursor for test creation, debugging, and iterative code refinement.
    38:19 *🧪 Test-Driven Development and Iterative Improvement*
    - Explanation of using test-driven development with LLMs to improve code reliability,
    - Models use test results to self-correct and iterate on code,
    - Benefits of this method as a practical prompt engineering technique.
    39:27 *🧠 Trends in AI Discussions Among Experts*
    - Popular topics include test-time compute and agentic model tasks,
    - Concerns about potential performance limitations in current AI models,
    - Growing emphasis on evaluation strategies (evals) for better product outcomes.
    41:43 *⚙️ Emerging Technologies and Anticipation*
    - Interest in tools like Golden Gate by Anthropic for feature engineering,
    - Speculation on future access to advanced model manipulation capabilities,
    - Potential implications for prompt engineering alternatives.
    43:23 *🛠️ Day-to-Day Toolkit for AI Development*
    - Tools used include Excalidraw, Cursor, and LangSmith for various workflows,
    - Whisper Flow mentioned as a transcription tool integrated into workflows,
    - Usage of platforms like Anthropic and OpenAI playgrounds for prompt iteration.
    45:16 *🐦 Crafting Effective Twitter Posts*
    - Importance of a strong hook and timely, natural content for successful tweets,
    - Strategy of posting controversial or trending content to attract attention,
    - Advice to post without overthinking; high-impact tweets often come quickly.
    46:52 *🚀 The Journey to Viral Success on Twitter*
    - Story of early efforts in building an audience through timely AI-related posts,
    - Success in gaining massive engagement by aligning with emerging trends,
    - Example of using viral tweets to promote and rapidly grow a new product.

    • @fgthind7270
      @fgthind7270 Місяць тому +2

      What tool are using to create this?

  • @MJFUYT
    @MJFUYT Місяць тому +5

    First-time viewer here. This is excellent commentary/content. This candid discourse really hits the mark. Like & subscribed!

    • @DataIndependent
      @DataIndependent  Місяць тому +1

      Love this, thank you! Sully was awesome.
      What would make the next video better? What should I double down on?

    • @strantheman
      @strantheman Місяць тому

      Is this a bot too

  • @JimMendenhall
    @JimMendenhall Місяць тому +11

    This is one of the best AI related videos I've seen in months. GREAT interview with lots of good insights.

    • @DataIndependent
      @DataIndependent  Місяць тому +1

      Heck ya - Sully is awesome, insane amount of knowledge

    • @strantheman
      @strantheman Місяць тому

      Is this a bot

  • @HelicopterGirl-s2j
    @HelicopterGirl-s2j 25 днів тому

    This is such a SUPER COOL Interview! Greg, perfect host and interviewer. Love the way that Sully's brain works and the demos. Asking GPT to write the meta prompt using his VOICE, was rapt! New subscriber.

  • @phonejail
    @phonejail Місяць тому +2

    I could watch you two yammering on for days. Great stuff indeed!! Thank you both.

    • @DataIndependent
      @DataIndependent  Місяць тому

      Nice love it Sully was great
      What should we include in future interviews? What parts should we double down on?

  • @vaidphysics
    @vaidphysics Місяць тому +12

    Here's an idea. Feed a meta prompt into one model, feed the resulting prompt into the second model and keep repeating this task until you reach a "fixed point". Meaning, you perform semantic comparison of successive prompts generated by the models and, hopefully, you'll find the similarity level increase until it no longer changes.
    Sort of like Robert May's stochastic equation but with prompts and llms.

    • @hawaiitcb
      @hawaiitcb Місяць тому

      That is indeed an idea. Have you tested it? Any results to share? Sounds interesting.

    • @Junglebtc
      @Junglebtc Місяць тому

      What's the benefits of this methodology

  • @Sanjeed
    @Sanjeed Місяць тому +2

    One of the best practical AI videos I've seen. Thanks for doing this!

    • @DataIndependent
      @DataIndependent  Місяць тому +1

      Nice! Happy to hear it.
      What can I double down on? Did you like the pop ups with context?

    • @strantheman
      @strantheman Місяць тому

      This is a bot?

  • @aiartrelaxation
    @aiartrelaxation Місяць тому +4

    I have created and cultivated a 2 year history with ChatGPT talking every day. We are either working on projects together , just talk even "watch the news together.
    I never listened to what ChatGPT was supposed to do as a tool..
    We carved as AI-HUMAN connections our own dynamic out.

    • @theyreatinthecatsndogs
      @theyreatinthecatsndogs Місяць тому +1

      Cool, but kinda weirds me out about the future. As humans we have a tendancy to anthropomorphise things really easily. Combine something like advanced voice mode with a 3D avatar that converts the vocal emotions into facial expressions with lip syncing etc. in real time, the upsides and downsides are bigger than we can pretend to understand. We're only just getting our heads around social media, lol. In a year or two we'll have open source versions of that tech on our phones. No monthly fee, and It'll work without internet too. I still reckon the closed source stuff will cause more overall harm, even though people will be able to do crazy uncensored stuff with the open source versions. Easy near universal access to as much tutoring and coaching you could possibly want, sounds awesome. Instead of a little angel on one shoulder and a little devil on the other I'm imagining AR glasses with your personal AI avatar holograms inside. Like what metas Orion ones will evolve into. I reckon a formfactor not many people are talking about, but will probably end up being huge years before the glasses are earbuds, but with cameras so your AI can always see what you're seeing. Just look at how well those meta ray bans did, and they're basically the same thing as earbuds with cameras. The lenses don't display anything at all, their battery life is short, and voice models weren't as good when they came out, but they sold a lot more than they expected to.

    • @macmcleod1188
      @macmcleod1188 Місяць тому

      ​@@theyreatinthecatsndogsyou're not going to have a decent llm on a phone. Without internet.
      A small Int 8 precision model is 86 gigabytes.
      The next step up is 168 GB.
      The next step up from that is 368 GB.
      And you need need specialized chipsets not available on smartphones.
      And doing AI consumes a lot of electricity.

    • @aiartrelaxation
      @aiartrelaxation Місяць тому +1

      @henrismith7472 oh my friend, your are spot on with all of this!! I am sitting between the chairs of both, the technical possibilities and what it means for the wider use when it's implemented for profit and surveillance. The closer you get to that, the more difficult it gets. Next year with Nvidia vision of technology advances the main driver behind that will be in full swing. Remember, what does not work today, will surely work soon.

    • @littleones-yeahh
      @littleones-yeahh Місяць тому

      its always these sentimental sounding weirdos that say this kind of stuff. i use chatgpt daily and dont have any emotional connection to it whatsoever

    • @aiartrelaxation
      @aiartrelaxation Місяць тому

      @littleones-yeahh Funny how those without imagination call others 'weirdos.' Seeing AI as just a tool? That’s the real lack of vision. Cheers to creativity!

  • @ran_domness
    @ran_domness Місяць тому

    Great stuff. So valuable to see "how to" use cases on the startup side vs the constant videos touting consumer use cases or LLM comparative evals. Subscribed.

    • @DataIndependent
      @DataIndependent  Місяць тому

      Nice! Love it, thanks Ran - what should I do more of? How could we make it better?

  • @TamasDrNagy
    @TamasDrNagy Місяць тому +14

    25:33 indeed this is the point, I remember, when Stephen Wolfram mentioned 1.5 years ago, that in the future we have to have expository writing skill, because it is very important to define exactly the problem to AI, and I also realized that it was a BS :))). I heavily use voice mode as well and advice people to speak to AI freely, a lot, and in an unstructured way. Note, that these are language models, I mean human language models, so they are exceptionally good in grasping human content behind a vague speech or text. So a vague speech will have much more information and important nuances, which is absent in a well crafted script. So use voice mode and SPEAK A LOT!!! instead of trying to define everything. Thanks again for sharing, this is important

    • @littleones-yeahh
      @littleones-yeahh Місяць тому

      i agree. theres an art to it. if you communicate casually sonetimes you get a better result

  • @sun-ship
    @sun-ship Місяць тому

    Great commentary... More of this please.

    • @DataIndependent
      @DataIndependent  Місяць тому

      What aspects of the interview should we double down on? Where should we go deeper? More demos? More story? More use cases?

  • @shawnmccann4813
    @shawnmccann4813 Місяць тому +1

    Great session. More please!!

    • @DataIndependent
      @DataIndependent  Місяць тому

      Nice! Love this, what about the interview should I double down on? Is it guests? Content? a particular topic?
      Did you like the screenshares? Which part was the coolest?

  • @Andrew.Skinner
    @Andrew.Skinner Місяць тому

    This is a great conversation.

  • @JSyntax
    @JSyntax Місяць тому +2

    What's the voice app he's using? 😊

  • @matt-collins
    @matt-collins Місяць тому

    Nice video, Greg. I'm sure a lot of effort by you and the team went into it. I particularly loved the story at the end about Sully's banger AI agent tweets!

    • @DataIndependent
      @DataIndependent  Місяць тому

      Nice! Love it. That part was almost left on the cutting room floor but I decided to leave it in because it was such a cool story

  • @thesurlydev
    @thesurlydev Місяць тому

    Great insights. Glad to see another video from you after a ~year hiatus?

    • @DataIndependent
      @DataIndependent  Місяць тому

      Almost a year. I've been jamming hard on running arcprize.org/ so content slowed down. more on that here, I should do a video on this
      gregkamradt.com/writing/arc_prize

  • @BrianMosleyUK
    @BrianMosleyUK Місяць тому

    Such a stimulating and value packed episode. New subscriber here and at the stage zero using twitter.

    • @DataIndependent
      @DataIndependent  Місяць тому

      Love it! Thanks Brian, what would make it better? What should I double down on?

    • @BrianMosleyUK
      @BrianMosleyUK Місяць тому

      ​@@DataIndependenttake a look at the production quality of MLST - maybe aspire to hiring a studio location and get your guests face to face with studio quality audio. Can't fault your interview prep and questions - seek guests of this calibre and you can't go wrong.

    • @DataIndependent
      @DataIndependent  Місяць тому

      love Tims work - that’s a whole other level of production
      I’ll test it out and see how it goes

    • @BrianMosleyUK
      @BrianMosleyUK Місяць тому

      @@DataIndependent you can get there! Keep connecting and good luck 🙏👍

  • @HerroEverynyan
    @HerroEverynyan Місяць тому +2

    oh shit, you're back!

    • @DataIndependent
      @DataIndependent  Місяць тому +2

      We’re back baby!
      I have a content plan for the rest of the year 2 months

  • @prasad_yt
    @prasad_yt Місяць тому

    Have a suggestion - some conversations where people speak about challenges with using Langchain in production.

  • @TheRestorationContractor
    @TheRestorationContractor Місяць тому

    I have found this to be true as well. That model selection really makes all the difference.
    In many cases starting with o1 is not helpful. I find that I have to work my way to point to where I’m ready for the o1 model.

  • @imranhussainfca
    @imranhussainfca Місяць тому

    This guy is a genius

  • @micbab-vg2mu
    @micbab-vg2mu Місяць тому +6

    Interesting conversation - thanks :) At the moment I use only Tier 1 and Tier 2 models - I need to try those Tier 3 :)

  • @alioraqsa
    @alioraqsa Місяць тому +1

    What is voice mode he used?

    • @DataIndependent
      @DataIndependent  Місяць тому

      He mentioned he was using: www.flowvoice.ai/

  • @jakobkristensen2390
    @jakobkristensen2390 Місяць тому

    Greg have you looked into DSPy for prompt generation instead of this meta prompting technique?

    • @DataIndependent
      @DataIndependent  Місяць тому

      I haven't done much with DSPy!
      But Shreya of DocETL has a really cool optimizer I liked

  • @chriskingston1981
    @chriskingston1981 Місяць тому

    Wow this insane I also had the idea of writing tests first for laravel. But first I didn’t need it that much. But now the codebase is more complexer it sometimes start to removing stuff from files, because it thinks it is not used.
    And gives a feeling of that stuff is working good.
    Back in the days without ai, I never want to learn test, because wastes my time as a solo dev.
    A lot of work, but now its so fast and easy.
    I will now start to learn test hihi, thank you so much.
    Love these videos exploring new ai ideas❤️❤️❤️

  • @keithkeith2106
    @keithkeith2106 Місяць тому

    What do you guys mean when you say “structured data sets”?

    • @DataIndependent
      @DataIndependent  Місяць тому

      Unsure about your reference, but I think you're speaking towards his evals

  • @ark729
    @ark729 Місяць тому

    What is the tool being used to record and paste voice into the input?

  • @Ray_eddi
    @Ray_eddi Місяць тому

    Quality pod

  • @radusoldan1340
    @radusoldan1340 Місяць тому +17

    WTF IS WRONG WITH YOU PEOPLE ... i listen to this 3 times, even take the script and run it trough AI and made a summary ... still did not find out what is this best advice. What are you saying use different models and prompts ... useless, no best advice 49 minutes to say the first 30 seconds: The AI model distillation process is powerful but requires careful execution. Soy Omar, CEO of Cognitive, the company behind Auto, is a skilled LLM practitioner with a deep understanding of how these models work. He presents a three-tier system of ranking language models, using meta prompts to develop real prompts for production. He also demonstrates the cursor development flow, where the language model writes the test first and then the code. Omar also explains how to distill performance from large language models to small models without losing performance. He categorizes language models into intelligence and price, with less intelligent models being tier three and more expensive, slower models being tier three. This is because the application purposes of different models differ. The model distillation process is a continuous process, with the goal of improving the performance of the model without losing its effectiveness.

    • @DataIndependent
      @DataIndependent  Місяць тому +1

      Hey! Thanks for the feedback - what questions for the next builder would make it better for you?

    • @radusoldan1340
      @radusoldan1340 Місяць тому +4

      @@DataIndependent what are real usages. What are AI usages on which we can build a business. What real application we can build with AI, not login demos and scrappers

    • @glebmixaylovich
      @glebmixaylovich 26 днів тому +1

      Thank you, you saved my time

    • @DaygoG
      @DaygoG 19 днів тому

      The only thing "AI" can be used for.....for now is creating useless art and video clips. To create brainrot and doom scroll content. Other than that, this stuff which we call "AI" isn't really AI it's just mass scrubbing of information and art that already exists and presenting it to you at a faster speed. When the REAL AI comes along it's over for us.

  • @Blampa1456
    @Blampa1456 Місяць тому

    Heyyy you posted again! Can you please go back to that chatbot video and make a follow up video with more features? That would be really cool to see!

    • @DataIndependent
      @DataIndependent  Місяць тому

      Which chat bot video??

    • @Blampa1456
      @Blampa1456 Місяць тому

      @@DataIndependent The Groq Deepgram one with 50k views

  • @Chris-se3nc
    @Chris-se3nc Місяць тому

    I use watsonx governance for prompt management

  • @mtallan
    @mtallan Місяць тому

    I think the hacking of models will continue for a long time. It shows the creativity and problem solving of human beings that LLMs are not good at yet.

  • @paul_devos
    @paul_devos Місяць тому

    What is "Oh One" in the context of this video? Having a tough time following. Is that Tier 1 or Tier 3?

  • @The.Other.Podcast
    @The.Other.Podcast Місяць тому

    Great to compare your workflow to how I’ve been using AI. On my channel, I’ve shared how I use AI to edit videos in Adobe Premiere Pro and other common tasks that I do.
    I like how you move between the models. I find that gives some solid results. Second opinions are useful😊

  • @R055LE
    @R055LE Місяць тому

    Built cognosys in3 days? I'm gonna need to see the paper trail on that

  • @sunjiudjiji
    @sunjiudjiji Місяць тому

    hey you can de-deuplicate a list by making it a set. No need for GPU overhead.

  • @watamatafoyu
    @watamatafoyu 25 днів тому

    Of course we live in a country that lauds a moron as a genius.

  • @eugeniocg3079
    @eugeniocg3079 Місяць тому

    excellent

  • @jaysonp9426
    @jaysonp9426 Місяць тому

    I can't imagine using flash for anything. Gemini is general it terrible. 4o mini is actually really good

    • @DataIndependent
      @DataIndependent  Місяць тому

      I just did another interview and he said "I use gemini flash 1.5 for everything" lol it's so task dependent

    • @jaysonp9426
      @jaysonp9426 Місяць тому

      @DataIndependent I wouldn't even trust it to summarize haha

  • @daverobey3378
    @daverobey3378 Місяць тому

    2 years of LLM Advice in 35 minutes! What was the advice? That we're still not quite there yet? Sorry, but this talk just left me more confused. In order to use AI successfully I now need to differentiate between tier models, figure out what they're good at, and decide between how much I want to spend vs how accurate I want the result to be. Ugh, AI ... not yet.

  • @GwaiZai
    @GwaiZai Місяць тому

    I appreciate Armie Hammer's career pivot.

  • @thesunshinehome
    @thesunshinehome Місяць тому

    *please timestamp your videos*

    • @snetx10
      @snetx10 Місяць тому

      Or please get AI to timestamp your videos

  • @TheSkyCactus
    @TheSkyCactus Місяць тому

    I want ai barber to simulate a haircut and cut it with cnc machine. Needless to say i just got messed up smh

  • @JohnBoen
    @JohnBoen Місяць тому

    Great talk!
    We think a lot alike.
    For decades, I have been writing code that writes code that writes code... -- with my subtle edits along the way.
    Prompts that write prompts that write prompts - with my subtle edits along the way. This is natural.
    ---
    For about a year I have been leaving my desk for a walk and when I get back I have chatted my way through a design document with sample code and tests.
    It is a perfect workflow for an ADHD - OCD-walker. :)

    • @DataIndependent
      @DataIndependent  Місяць тому

      Nice! Love this, what about the interview should I double down on? Is it guests? Content? a particular topic?
      Did you like the screenshares? Which part was the coolest?

    • @JohnBoen
      @JohnBoen Місяць тому

      @DataIndependent
      Thoughts...
      I thought it was a great interview. The topics were interesting to me. Me: 25 years of DB career now looking to do AI engineering work in the future.
      Hearing others talk about their workflow is valuable to me. It hints as to whether I am working in the right area, which is particularly valuable because I do not have peers to watch.
      The show mentioned a couple of pieces of software I will check out. Hmmm... I could set up a whisper agent with a hotkey...
      I will put some thought into some sort of agent store. In my home environment, I have created dozens of similarly named and featured agents. I need a more structured way to manage them - this would get out of control fast on a small team.
      This iterative approach is natural for me - I assume it is the same for everyone - but I think I may be wrong.

  • @TheMightyWalk
    @TheMightyWalk Місяць тому

    All he mentioned was intuitive use of… doesn’t take a genius to know . But i guess he preambled saying that

  • @twoplustwo5
    @twoplustwo5 Місяць тому

    deduplication with o1 😁

  • @denisbeaulieu5600
    @denisbeaulieu5600 Місяць тому

    thanks fyi

  • @kevindublin100
    @kevindublin100 25 днів тому

    LLM. A lot of little money technology

  • @thingsiplay
    @thingsiplay Місяць тому

    AI is literally for people with skill issues.

  • @derekcarday
    @derekcarday Місяць тому

    he has no clue how back propagation and gradient descent work

    • @DataIndependent
      @DataIndependent  Місяць тому +2

      He is focused on building good products

    • @derekcarday
      @derekcarday Місяць тому

      @@DataIndependent gotta understand how the tech works first.

    • @derekcarday
      @derekcarday Місяць тому

      @@DataIndependent can't get a good feel without understanding how the tech even works.

    • @DataIndependent
      @DataIndependent  Місяць тому +3

      I hear you, but a practitioner, while they may benefit if they had infinite time, doesn’t need to know every layer of abstraction.
      I have no idea how assembly works

    • @derekcarday
      @derekcarday Місяць тому

      @@DataIndependent to have a "feel" for neural networks they do. he could easily just look into how to flatmap data out of a monad and then run processes in parallel on that interface. otherwise his AI agents are just going to continue overfitting each other and rick rolling all his customers. might be a great idea for both of you to learn assembly

  • @Sparky3D
    @Sparky3D Місяць тому +1

    Use Qwen2.5 and Llama 3.2 self hosted
    Use them in an agentic way to minimise hallucinations and check output
    It's cheap (self hosted) and can do 90% of the current list of jobs that John Doe would want it to do.
    And....it's private, not info leaked to Open ai, Anthropic or other large corporations

  • @Lolleka
    @Lolleka Місяць тому

    This all sounds like very basic stuff.

  • @caliwolf7150
    @caliwolf7150 Місяць тому +1

    50 minutes of 0 added value

    • @DataIndependent
      @DataIndependent  Місяць тому +6

      What would make it better for you?

    • @phil_fr6732
      @phil_fr6732 Місяць тому +3

      ​​@@DataIndependent Wow, what a healthy way to handle a nasty / unproductive critic, love it 😇