- 22
- 115 175
Simon Willison
Приєднався 9 лис 2011
simonwillison.net
Civic Band, presented by Philip James during Datasette Public Office Hours, 15th November 2024
Detailed notes: simonwillison.net/2024/Nov/16/civic-band/
Переглядів: 309
Відео
VERDAD - tracking misinformation in radio broadcasts using Gemini 1.5
Переглядів 1,1 тис.14 днів тому
VERDAD - verdad.app/ - is a new project from Rajiv Sinclair and Public Data Works that aims to identify misinformation broadcast on US radio stations by archiving their audio, transcribing and translating it and hunting for potential misinformation topics using LLMs. In this interview we dive deep into how the project works and what they've learned from building it so far. 00:00 Introduction to...
“Teresa T” the juvenile humpback whale - in Pillar Point Harbor, Half Moon Bay
Переглядів 2 тис.2 місяці тому
“Teresa T” the juvenile humpback whale - in Pillar Point Harbor, Half Moon Bay
Extracting unstructured text and images into database tables with GPT-4 Turbo and Datasette Extract
Переглядів 11 тис.7 місяців тому
Demonstrating datasette-extract, a new Datasette plugin that uses GPT-4 Turbo and GPT-4 Vision to extract structured data. github.com/datasette/datasette-extract datasette.io/ www.datasette.cloud/ The events table created in this video: simon.datasette.site/content/events
Datasette Enrichments
Переглядів 1,6 тис.11 місяців тому
More details here: simonwillison.net/2023/Dec/1/datasette-enrichments/
Embeddings: What they are and why they matter
Переглядів 25 тис.Рік тому
Extensive notes to accompany this talk: simonwillison.net/2023/Oct/23/embeddings/
When Zeppelins Ruled The Earth
Переглядів 2,5 тис.Рік тому
Slides and audio from a talk I gave about the history of Zeppelins at Skillswap on Speed in Brighton on 29th October 2008
Prompt Injection, explained
Переглядів 20 тис.Рік тому
Full transcript and notes at simonwillison.net/2023/May/2/prompt-injection-explained/
Datasette ChatGPT Plugin
Переглядів 3 тис.Рік тому
simonwillison.net/2023/Mar/24/datasette-chatgpt-plugin/
Bellingcat Hackathon: Action Transcription
Переглядів 4,3 тис.2 роки тому
github.com/simonw/action-transcription
Datasette: a big bag of tricks for solving interesting problems using SQLite
Переглядів 2,9 тис.2 роки тому
A ten minute introduction to Datasette and sqlite-utils presented at Have you tried rubbing a database on it? on 29th April 2022
How to build, test and publish an open source Python library (without sign language)
Переглядів 7633 роки тому
Video with sign language is here: ua-cam.com/video/VMnLXynUqys/v-deo.html Write-up here: simonwillison.net/2021/Nov/4/publish-open-source-python-library/
Datasette Desktop initial demo
Переглядів 1,2 тис.3 роки тому
More about this demo here: simonwillison.net/2021/Aug/30/datasette-app/
Using Datasette with Jupyter to publish your data (JupyterCon 2020)
Переглядів 4513 роки тому
Notes here: gist.github.com/simonw/656c21b5800d5e4624dec9930f00e093
Datasette - an ecosystem of tools for working with small data
Переглядів 1,1 тис.3 роки тому
Datasette - an ecosystem of tools for working with small data
Joining CSV and JSON data using the "sqlite-utils memory" command
Переглядів 1,7 тис.3 роки тому
Joining CSV and JSON data using the "sqlite-utils memory" command
Git scraping: tracking changes to a scraped data source using GitHub Actions
Переглядів 4,9 тис.3 роки тому
Git scraping: tracking changes to a scraped data source using GitHub Actions
Introduction to Datasette and sqlite-utils
Переглядів 15 тис.3 роки тому
Introduction to Datasette and sqlite-utils
Barn the Spoon makes a wooden spoon at Monki Gras 2013
Переглядів 3,7 тис.11 років тому
Barn the Spoon makes a wooden spoon at Monki Gras 2013
Czech muscle-bus doing press-ups outside the Business Design Centre in Islington
Переглядів 1,8 тис.12 років тому
Czech muscle-bus doing press-ups outside the Business Design Centre in Islington
Brilliant stuff!
This guy’s like Jack Black meets Nicholas Hoult.
This was a really nice interview and interesting project. It’s incredible the superpowers that we developers have gained over the last two years. Things that you could’ve asked for 10 years ago and I would’ve said maybe with a year and a few million dollars worth of headcount are now an API call away. I have LLM‘s integrated into nearly every part of my workflow and my tooling. The way I work now looks almost nothing like the way it used to. I want to know more about the price difference with Gemini flash versus Whisper for transcription particularly with all the many flavors of local whisper that are available. I’ll have to do some research on this.
OpenAI charge $0.006 / minute for their Whisper API - so an hour of audio would cost 36 cents. Gemini 1.5 Flash is $0.075 for 1 million tokens and every second of audio is charged as 25 tokens, which means an hour is 90,000 tokens and hence costs just 0.675 cents - so it's over 50x cheaper!
@@swillison If you use GPU spot instances yourself you can run whisper large v3 turbo at about a penny per hour. Since this project only requires timestamping, and appears to have a high tolerance for timestamps not being exactly accurate, I would think your guest would be well served with just whisper tiny, which you can run at roughly 10x on a single CPU - basically free.
Fantastic tool and fun examples that actually demonstrate fun little use cases.
Is there a place where one could explore all the published datasettes?
ua-cam.com/video/CQbkhYg2DzM/v-deo.htmlsi=bBBGXe5F6RoeMtS_
三 1
Amazing work Simon! Thank you!
Can i use this Plugin along with MySQL .. getting errors.
Thank you, very informative
Thank you for sharing.
Thank you Simon Wilson. Great information. I especially like how you demonstrated the development of your own tools. Finally my thoughts is your presentation in an executive summary format will educate policy makers in both the enterprise and government sector who seem to have fear of AI. For example my company has an existing early policy that employees are not allowed to use AI or ChatGPT. At the same time my Use Case to leverage RAG was to augment our LLM was accepted by our AI Review Committee. My thought is the enterprise companies will be careful and prudent in the rollout of LLMs and AI tools because they will want “security rails” in place. Thank you.
Thank you .
Mask, in 2023...
I just took a poll. And people said if you could show us OSINT using a model like mistral that is mostly uncensored (Dolphin/Instruct) or whatever your preference is then gpt4. Everyone who responded agreed that would be something we would pay for. 2024 tips and tricks LLMs and OSINT. But there are advantages to uncensored.
This is the best fundamental way of describing embeddings.
This is what microsoft recall wants to do
This man is truly based.
this is really nice! thanks for sharing.
I’m totally new to embeddings and this video inspired me to want learn even more!
I've recently stumbled across your work which I read about in Gergely's book "Software Engineering Guidebook". Fantastic find. Love the creativity here.
Love this! Would be useful to mention you need to run datasette in --root mode in order to make modifications, it took me a while to find this.
Is it possible to replace the OpenAI API key with local vision model instead?
there should be some kind of authorized base restriction on internal llm tokens to normal public
very great demo, thanks for sharing! this is an excellent example of practical use of embeddings.
Thanks for linking yourself on ycombinator, very interesting talk and quite engaging delivery.
The future is wild. Imagine how good this will be 6 months or a year from now.
New to Datasette. Just installed it on OSX with Homebrew, and added the Extract plugin, but I'm not seeing the 'database actions' button. Am I missing something?
same here on Windows in a fresh venv
🔥
Nice
That's fantastic. Does it work across multiple websites and in different languages? For example, if you wanted to provide a list of specific events in a country where both English and Spanish or Italian are spoken but have a single database in English.
Awesome!
Thanks very interesting and useful.
Things start to become magical.
This was so good! Please do more of these - i am still in awe!! Thank you!
"vibes-based search" lol. love the term you invented.
Cool
Impressive fast talking and fast scrolling. A lot of knowledge and experience for sure. I guess I'll have to do some digging if I want to really benefit from this lecture.
Why are you wearing a mask 😷?
Maybe he doesn’t want to get people sick genius
@@KayButtonJay he wouldn't if he didn't wear a mask either.
What a genius
Looks great, but setting it up is not easy... I have installed plugins, created a config file for the API key, and started Datasette, but nothing ever changes. A setup video for Windows or Linux demonstrating how to set up plugins would be appreciated!
Did you run Datasette with the --root option and click the link to sign in as root? That's the most likely cause for it not working. Feel free to open an issue on GitHub if that doesn't help - and I agree, I need to build a tutorial for this.
Very cool! thanks for sharing
Synthetic data 😁
OMG, the pause button got a workout. Cheers!
How does the semantic vectorization of a word look like, in a mathematical sense ? Is it like every word has it’s spatial ID (coordinate) and gets kind of multiplied with a vector array of assoziatives IDs?