Robust Text-to-SQL With LangChain: Claude 3 vs GPT-4

Поділитися
Вставка
  • Опубліковано 26 лис 2024

КОМЕНТАРІ • 16

  • @andaldana
    @andaldana 3 місяці тому

    Great video! Always something new to learn!

  • @Shai_Di
    @Shai_Di 6 місяців тому +1

    This is really interesting but I have some concerns about this method, I'd love to hear what you think about them:
    1. We are always sending the entire schema as context. If we want to have a large dataset connected to this "application", we will waste a ton of tokens on that. The agent that LangChain built slowly decides which tables might be relevant, thus reducing the amount of tokens used as context. How would you approach something like this?
    2. Sometimes, tables and column names might not be super intuitive to the LLM, and without sampling the data, it can assume properties, values or anything else. So this requires the user to review the query and make sure it makes sense, which is what we are kind of trying to prevent when we start using AI for queries. What do you think about adding a semi step that will somehow sample the relevant data?

    • @FunKillerForFun
      @FunKillerForFun Місяць тому

      all thses LLM to SQL are playing, not production ready. they are just not mature enough. the db itself need to be documented well with the business.

  • @abhinabaghose3380
    @abhinabaghose3380 2 місяці тому

    what has been your experience with text to pandas dataframe? Is it better than text to sql in terms of complexity?

  • @kelvinadungosi1579
    @kelvinadungosi1579 7 місяців тому

    Hi, great tutorial! How would you implement a chat fuctionality? where you can ask follow up questions??

    • @rabbitmetrics
      @rabbitmetrics  7 місяців тому

      Thanks! I would use ChatMessageHistory to manage the conversation and catch the traceback - this is needed for more advanced queries.

  • @TheBestgoku
    @TheBestgoku 7 місяців тому

    THIS is function-calling but instead of a "json" u get a "sql query". Am i missing something?

    • @rabbitmetrics
      @rabbitmetrics  7 місяців тому +1

      That is one way to think of it. But in this case LangChain is handling the parsing of the LLM output (note the "model.bind(stop=["
      SQLResult:"])" in the chain). When you generate SQL or any other code you'll find that the code is often returned in quotes or with some text explaining the code. The trick is to minimize this by parsing the output in a suitable way.

  • @mrchongnoi
    @mrchongnoi 2 місяці тому

    I am late to the game on this video. I have been working on a TextToSQL project. Like most of the examples I have viewed, the LLM can understand the context of columns. From the project I am working on, the names of the columns may have some hint of what the data would be or its use. The schema I have has a date, a reference date, and a delivery date. Delivery date is obvious. There are other fields where the names are not indicative of the values. What happens when you have multiple tables with a large schema? My approach is to use the LLM to build the SQL and not to synthesize, as the amount of data could be quite large.

    • @premmanu6557
      @premmanu6557 2 місяці тому

      You should be able to define some of the fields in the prompt with examples. That way the model can try to differentiate what the field means

  • @lionhuang9209
    @lionhuang9209 8 місяців тому +1

    Where can we download the code file?

    • @rabbitmetrics
      @rabbitmetrics  8 місяців тому

      There's a link below the video to the Colab notebook with code and written tutorial including how to generate the ecom tables

  • @sahinakhtar2246
    @sahinakhtar2246 2 місяці тому

    Unable to see the names of the db using "print(db.get_usable_table_names())" but the Database connected successfully, it shows an empty array [], What I'll do?

  • @SR-zi1pw
    @SR-zi1pw 8 місяців тому

    What happens if he drops the table when hallucinating

    • @MaxwellHay
      @MaxwellHay 8 місяців тому +2

      Read only role

    • @rabbitmetrics
      @rabbitmetrics  7 місяців тому

      As mentioned, make sure to restrict access scope and permission.