Simon Willison
Simon Willison
  • 22
  • 115 175

Відео

VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5
Переглядів 1,1 тис.14 днів тому
VERDAD - verdad.app/ - is a new project from Rajiv Sinclair and Public Data Works that aims to identify misinformation broadcast on US radio stations by archiving their audio, transcribing and translating it and hunting for potential misinformation topics using LLMs. In this interview we dive deep into how the project works and what they've learned from building it so far. 00:00 Introduction to...
“Teresa T” the juvenile humpback whale - in Pillar Point Harbor, Half Moon Bay
Переглядів 2 тис.2 місяці тому
“Teresa T” the juvenile humpback whale - in Pillar Point Harbor, Half Moon Bay
Extracting unstructured text and images into database tables with GPT-4 Turbo and Datasette Extract
Переглядів 11 тис.7 місяців тому
Demonstrating datasette-extract, a new Datasette plugin that uses GPT-4 Turbo and GPT-4 Vision to extract structured data. github.com/datasette/datasette-extract datasette.io/ www.datasette.cloud/ The events table created in this video: simon.datasette.site/content/events
Datasette Enrichments
Переглядів 1,6 тис.11 місяців тому
More details here: simonwillison.net/2023/Dec/1/datasette-enrichments/
Embeddings: What they are and why they matter
Переглядів 25 тис.Рік тому
Extensive notes to accompany this talk: simonwillison.net/2023/Oct/23/embeddings/
When Zeppelins Ruled The Earth
Переглядів 2,5 тис.Рік тому
Slides and audio from a talk I gave about the history of Zeppelins at Skillswap on Speed in Brighton on 29th October 2008
Prompt Injection, explained
Переглядів 20 тис.Рік тому
Full transcript and notes at simonwillison.net/2023/May/2/prompt-injection-explained/
Datasette ChatGPT Plugin
Переглядів 3 тис.Рік тому
simonwillison.net/2023/Mar/24/datasette-chatgpt-plugin/
Bellingcat Hackathon: Action Transcription
Переглядів 4,3 тис.2 роки тому
github.com/simonw/action-transcription
Datasette: a big bag of tricks for solving interesting problems using SQLite
Переглядів 2,9 тис.2 роки тому
A ten minute introduction to Datasette and sqlite-utils presented at Have you tried rubbing a database on it? on 29th April 2022
How to build, test and publish an open source Python library (without sign language)
Переглядів 7633 роки тому
Video with sign language is here: ua-cam.com/video/VMnLXynUqys/v-deo.html Write-up here: simonwillison.net/2021/Nov/4/publish-open-source-python-library/
Datasette Desktop initial demo
Переглядів 1,2 тис.3 роки тому
More about this demo here: simonwillison.net/2021/Aug/30/datasette-app/
Using Datasette with Jupyter to publish your data (JupyterCon 2020)
Переглядів 4513 роки тому
Notes here: gist.github.com/simonw/656c21b5800d5e4624dec9930f00e093
Datasette - an ecosystem of tools for working with small data
Переглядів 1,1 тис.3 роки тому
Datasette - an ecosystem of tools for working with small data
Joining CSV and JSON data using the "sqlite-utils memory" command
Переглядів 1,7 тис.3 роки тому
Joining CSV and JSON data using the "sqlite-utils memory" command
Django SQL Dashboard
Переглядів 3 тис.3 роки тому
Django SQL Dashboard
Git scraping: tracking changes to a scraped data source using GitHub Actions
Переглядів 4,9 тис.3 роки тому
Git scraping: tracking changes to a scraped data source using GitHub Actions
Introduction to Datasette and sqlite-utils
Переглядів 15 тис.3 роки тому
Introduction to Datasette and sqlite-utils
Barn the Spoon makes a wooden spoon at Monki Gras 2013
Переглядів 3,7 тис.11 років тому
Barn the Spoon makes a wooden spoon at Monki Gras 2013
Czech muscle-bus doing press-ups outside the Business Design Centre in Islington
Переглядів 1,8 тис.12 років тому
Czech muscle-bus doing press-ups outside the Business Design Centre in Islington
How to use OpenID
Переглядів 8 тис.12 років тому
How to use OpenID

КОМЕНТАРІ

  • @gautame
    @gautame 4 дні тому

    Brilliant stuff!

  • @user-kt1iz4vc3x
    @user-kt1iz4vc3x 5 днів тому

    This guy’s like Jack Black meets Nicholas Hoult.

  • @sadburger
    @sadburger 14 днів тому

    This was a really nice interview and interesting project. It’s incredible the superpowers that we developers have gained over the last two years. Things that you could’ve asked for 10 years ago and I would’ve said maybe with a year and a few million dollars worth of headcount are now an API call away. I have LLM‘s integrated into nearly every part of my workflow and my tooling. The way I work now looks almost nothing like the way it used to. I want to know more about the price difference with Gemini flash versus Whisper for transcription particularly with all the many flavors of local whisper that are available. I’ll have to do some research on this.

    • @swillison
      @swillison 14 днів тому

      OpenAI charge $0.006 / minute for their Whisper API - so an hour of audio would cost 36 cents. Gemini 1.5 Flash is $0.075 for 1 million tokens and every second of audio is charged as 25 tokens, which means an hour is 90,000 tokens and hence costs just 0.675 cents - so it's over 50x cheaper!

    • @ftk525
      @ftk525 14 днів тому

      @@swillison If you use GPU spot instances yourself you can run whisper large v3 turbo at about a penny per hour. Since this project only requires timestamping, and appears to have a high tolerance for timestamps not being exactly accurate, I would think your guest would be well served with just whisper tiny, which you can run at roughly 10x on a single CPU - basically free.

  • @scottieapplseed
    @scottieapplseed Місяць тому

    Fantastic tool and fun examples that actually demonstrate fun little use cases.

  • @arpitgarg5172
    @arpitgarg5172 Місяць тому

    Is there a place where one could explore all the published datasettes?

  • @andrewrecchia4103
    @andrewrecchia4103 3 місяці тому

    ua-cam.com/video/CQbkhYg2DzM/v-deo.htmlsi=bBBGXe5F6RoeMtS_

  • @andrewrecchia4103
    @andrewrecchia4103 3 місяці тому

    三 1

  • @mikecourian
    @mikecourian 3 місяці тому

    Amazing work Simon! Thank you!

  • @alokranjan
    @alokranjan 3 місяці тому

    Can i use this Plugin along with MySQL .. getting errors.

  • @MatthewTerry-suade
    @MatthewTerry-suade 4 місяці тому

    Thank you, very informative

  • @bbcc2960
    @bbcc2960 4 місяці тому

    Thank you for sharing.

  • @energyexecs
    @energyexecs 4 місяці тому

    Thank you Simon Wilson. Great information. I especially like how you demonstrated the development of your own tools. Finally my thoughts is your presentation in an executive summary format will educate policy makers in both the enterprise and government sector who seem to have fear of AI. For example my company has an existing early policy that employees are not allowed to use AI or ChatGPT. At the same time my Use Case to leverage RAG was to augment our LLM was accepted by our AI Review Committee. My thought is the enterprise companies will be careful and prudent in the rollout of LLMs and AI tools because they will want “security rails” in place. Thank you.

  • @canadianrepublican1185
    @canadianrepublican1185 4 місяці тому

    Thank you .

  • @schalkdormehl3057
    @schalkdormehl3057 4 місяці тому

    Mask, in 2023...

  • @Tony_Indiana
    @Tony_Indiana 5 місяців тому

    I just took a poll. And people said if you could show us OSINT using a model like mistral that is mostly uncensored (Dolphin/Instruct) or whatever your preference is then gpt4. Everyone who responded agreed that would be something we would pay for. 2024 tips and tricks LLMs and OSINT. But there are advantages to uncensored.

  • @tutacat
    @tutacat 5 місяців тому

    This is the best fundamental way of describing embeddings.

  • @tutacat
    @tutacat 5 місяців тому

    This is what microsoft recall wants to do

  • @tutacat
    @tutacat 5 місяців тому

    This man is truly based.

  • @codenocode
    @codenocode 5 місяців тому

    this is really nice! thanks for sharing.

  • @Clammer999
    @Clammer999 5 місяців тому

    I’m totally new to embeddings and this video inspired me to want learn even more!

  • @codenocode
    @codenocode 6 місяців тому

    I've recently stumbled across your work which I read about in Gergely's book "Software Engineering Guidebook". Fantastic find. Love the creativity here.

  • @enigmeta
    @enigmeta 6 місяців тому

    Love this! Would be useful to mention you need to run datasette in --root mode in order to make modifications, it took me a while to find this.

  • @Speejays2
    @Speejays2 6 місяців тому

    Is it possible to replace the OpenAI API key with local vision model instead?

  • @monKeman495
    @monKeman495 7 місяців тому

    there should be some kind of authorized base restriction on internal llm tokens to normal public

  • @_ramen
    @_ramen 7 місяців тому

    very great demo, thanks for sharing! this is an excellent example of practical use of embeddings.

  • @brcosmin
    @brcosmin 7 місяців тому

    Thanks for linking yourself on ycombinator, very interesting talk and quite engaging delivery.

  • @QINGCHARLES
    @QINGCHARLES 7 місяців тому

    The future is wild. Imagine how good this will be 6 months or a year from now.

  • @MichelBinkhorst
    @MichelBinkhorst 7 місяців тому

    New to Datasette. Just installed it on OSX with Homebrew, and added the Extract plugin, but I'm not seeing the 'database actions' button. Am I missing something?

    • @jmottishaw
      @jmottishaw 7 місяців тому

      same here on Windows in a fresh venv

  • @AP-hv5dh
    @AP-hv5dh 7 місяців тому

    🔥

  • @subinalex88
    @subinalex88 7 місяців тому

    Nice

  • @ecosse64
    @ecosse64 7 місяців тому

    That's fantastic. Does it work across multiple websites and in different languages? For example, if you wanted to provide a list of specific events in a country where both English and Spanish or Italian are spoken but have a single database in English.

  • @kai.diefenbach
    @kai.diefenbach 7 місяців тому

    Awesome!

  • @anne-marieroy8812
    @anne-marieroy8812 7 місяців тому

    Thanks very interesting and useful.

  • @sebastianwagner5843
    @sebastianwagner5843 7 місяців тому

    Things start to become magical.

  • @muddasirkhan805
    @muddasirkhan805 8 місяців тому

    This was so good! Please do more of these - i am still in awe!! Thank you!

  • @zgintasz2
    @zgintasz2 8 місяців тому

    "vibes-based search" lol. love the term you invented.

  • @korolyovPavel
    @korolyovPavel 9 місяців тому

    Cool

  • @curtisblake261
    @curtisblake261 10 місяців тому

    Impressive fast talking and fast scrolling. A lot of knowledge and experience for sure. I guess I'll have to do some digging if I want to really benefit from this lecture.

  • @asiddiqi123
    @asiddiqi123 10 місяців тому

    Why are you wearing a mask 😷?

    • @KayButtonJay
      @KayButtonJay 9 місяців тому

      Maybe he doesn’t want to get people sick genius

    • @schalkdormehl3057
      @schalkdormehl3057 4 місяці тому

      @@KayButtonJay he wouldn't if he didn't wear a mask either.

  • @rileydavidjesus
    @rileydavidjesus 11 місяців тому

    What a genius

  • @silentbob1236
    @silentbob1236 11 місяців тому

    Looks great, but setting it up is not easy... I have installed plugins, created a config file for the API key, and started Datasette, but nothing ever changes. A setup video for Windows or Linux demonstrating how to set up plugins would be appreciated!

    • @swillison
      @swillison 11 місяців тому

      Did you run Datasette with the --root option and click the link to sign in as root? That's the most likely cause for it not working. Feel free to open an issue on GitHub if that doesn't help - and I agree, I need to build a tutorial for this.

  • @BillyRichardson
    @BillyRichardson 11 місяців тому

    Very cool! thanks for sharing

  • @sennetor
    @sennetor 11 місяців тому

    Synthetic data 😁

  • @johnh6959
    @johnh6959 Рік тому

    OMG, the pause button got a workout. Cheers!

  • @miikalewandowski7765
    @miikalewandowski7765 Рік тому

    How does the semantic vectorization of a word look like, in a mathematical sense ? Is it like every word has it’s spatial ID (coordinate) and gets kind of multiplied with a vector array of assoziatives IDs?