I believe this method can be used to automate certain routine processes, but only if the price of gpt4v is reasonable. For example, you need to send 10,000 screenshots with a resolution of 1920x1080 pixels to gpt4v in 1 day - how much will it cost?🤔🤓
Your tutorial helps with the excitement and anxiety as a fellow dev. I knew I could do this myself but keep procrastinating and eventually some tasks end up as a mental block in WFH mode. Just forcing myself to watch a fella do something like this really helps, thank you!
Use the retry library and set a low timeout; you can use a simple decorator. If the timeout needs to be high and this isn't very pleasant, consider running multiple requests concurrently and waiting only on the first result.
With full page screenshots. Maybe create an assistant which looks at my bookmarks and the tags in there based on my question and tries to get me the info from the page.
So for cookies you just need to know what cookie is being set, in many cases it’s likely just a matter of causing the same effect in puppeteer, one way is to add to the cookie store directly (I’m sure puppeteer has a way to do this), and an alternative is specifying a “user directory” for puppeteer so you can actually agree to things like cookies, in many ways consent popups are easy to “locate” using standard html locators simply because it is often set to a priority load event and is often a div/container with a name/id containing the word consent or cookie etc, so regex can be used to find these reasonably easy. Use puppeteer to locate the “Ok” button and click it and then having that reusable user directory means you only check for any site if you have or haven’t accepted consent, if not click it if so just scrape it
I'm only up to 15:00 but the issue you had up at this point is that it CAN read sam altman's birthdate, but it doesn't know what the date is today. You can feed it the date in your response generated with `date()` or whatever.
Its interesting that this is exactly what I was looking for. Llast night i spent a few hours asking copilot how to implement the same libraries. Thanks for the tutorial
A little speed up might be to use the python requests package to try and fetch the url first before running puppeteer - then short-circuit invalid domains, 404's etc? Also, when doing a completion you can pass `request_timeout=10` or whatever and it'll kill the call. Sometimes even works.... ;-)
Thanks, I'll try that. Yeah, you can set the request_timeout, but you still have to handle the error my having some recursive function that retries the request if it fails. And I don't have time to implement that. It would take like a minute, lol
I replied to this and youtube removed it (I think!) - but the python package 'tenacity' (or the original retry) is worth a look (I'll skip the url as I think that's what made youtube remove/hide my comment)
I'd like to see a video from you about navigating websites with Puppeteer. Now that you ask, I'd like a tutorial on how it follows links, fills out data, crawls four or more links deep into a website, how to handle session cookies, automate and run loops, etc. :-)
i was wondering how this is different from the web-search capapblilty of chatgpt-plus right now . in other words , if i asked gpt to look for an answer on the web will it struggle to do so ? , is this a hack way to use a better websearch via an api like method because it's not enabled yet in the openai dev tools . any way i really like the video , can we use selenuim to do so also ?
It grabs it from the OPENAI_API_KEY environment variable. You can set it on Linux by running "export OPENAI_API_KEY=YOUR_API_KEY" and if you're on Windows, I believe you can use "setx" or "set" instead of "export"
Great video! However, I noticed a few instances where you mentioned not having prior experience with certain tasks, but then you later showcased projects where the code was already complete. For example, at 9:29 in the video. This seems a bit contradictory and might confuse some viewers
@unconv, this is my first time viewing a video on your channel. I observed that you started by looking through the documentation as if it was new to you, despite already having the answer in another file. This struck me as unusual, but I understand it might have been part of your process. When the documentation didn't seem to help, you referred to your existing project. I don't mean this in a negative way; it's just my personal observation from watching this video for the first time
I've used Puppeteer multiple times in the past, but I never remember the boilerplate stuff. I didn't want to jump directly to my own previously written code, because I want to do things from scratch in my videos, not leaving out any steps. And I want to show how I go about researching stuff. But I get that it might have been confusing - although I suspect even more confusing if I directly copy pasted my old code.
hey man! I would need something like this posted onto a server of some sort, like AWS or Heroku. is that possible if i build this & deploy it? i need it to scale up for 1000 requests daily
Great video! Interesting experiments with the GPT Vision API and Puppeteer. I have a couple of questions and a suggestion: 1. Could you share some insights on the cost aspect of using the GPT Vision API for this project? I'm curious about the pricing and whether it's feasible. Also, have you considered combining classical web scraping methods with the Vision API in a synergistic way? Specifically, using traditional scraping to gather initial data and then employing the Vision API to verify or correct this data where needed. I think this could potentially address some of the limitations of both methods. What are your thoughts on this approach?Looking forward to hearing your thoughts!
Thanks! On the day I filmed the video, my API costs were $0.58. The next day I maxed out the limit of 100 messages of the gpt-4-vision-preview while testing and the total cost for that day was $2.15. These costs include some other API calls as well, though. Combining classical web scraping and Vision API seems like a good idea. I'll have to look into that when I run into an issue scraping something.
You should try the JSON response mode. You can request to return a response like that in the system promp: {data: ExpectedDataInterface, error: ErrorInterface | null}. Good luck!
also i think this better suuited for assistants api. i made a private investigator that uses functions. one is serper api and if it finds a linkedin page crawls and de html it and send to get summzairzed with the link snippets ,then the other function is getting details on a image url you asked it to veiw using gpt 4 vision and i could make those functions paralell
I believe it’s not telling you his age because it is trying to provide you with a precise age i.e. his current age, given his birth date. Don’t ask what his age is, but what age the page or author of the page says he is
Why you mixed Python + JS i dont see an requirement you could single programming language, Java script, or Python, and simply executed the same task with the single project
"In Alaska's land, where coders seek the weather's tale, They type and query, 'neath the aurora's bright veil. With every line of code, they ask the sky's mood, Hoping for sunshine, but prepared for the cold and brood."
I think that he wants to explain the code to us by writing. I use ChatGPT to write code as I'm not a programmer myself, but I find myself learning to code anyway because I still need to understand what I actually need. It's also tiring to pass every small error to chat; it's easier to make adjustments yourself. However, to do that, you need to understand the code at some level.
I actually have Copilot but usually I disable it because it often guides me to directions I don't want to go. Especially when making videos, if Copilot suggests a different way than I was going to go, I get distracted. And I'm still learning Python, so I want to actually learn it. If I always use Copilot, I can get the job done but I probably won't memorize the syntax.
im not going to watch an hour of you coding but i will share that you can get a image of each element and selenium would problem be a good choice to use in this
If humans didnt all re-invent the wheel every hour, there would be a huge database of every query : response : list of problems : links to solutions if they ever figured it out , that would save humanity unlimited man hours... but probably put openai out of business
Grate video, at last I see on YT someone that struggles with the API as I do… I know the topic of the video is to use the vision api, but you cold get better results using a terminal web browser like lynx , piping the result to a Tex file and asking ChatGPT with that text as context. Just an idea. 😉
I was gonna dismiss your suggestion by saying one does not simply use Lynx in 2023 since it doesn't support JavaScript, which many websites require nowadays. But testing it out just now, all the examples I showed in this video could have worked with Lynx (based on its output). I don't know how I would extract links and input fields with Lynx, though, to make it crawl subpages. Perhaps all those pages were server side rendered, so I might as well have used Curl.
Seems very inefficient to do it that way, yes it’s and interesting concept but you can do it all in Python and your logic can be simplified to get results
Almost 20K views 😳 Part 2: ua-cam.com/video/PMLg6Rr8fcU/v-deo.html
please upload part 2 sir
I believe this method can be used to automate certain routine processes, but only if the price of gpt4v is reasonable. For example, you need to send 10,000 screenshots with a resolution of 1920x1080 pixels to gpt4v in 1 day - how much will it cost?🤔🤓
Didn’t expect a coding video to be this entertaining. Love the frank display of your thought process.
Your tutorial helps with the excitement and anxiety as a fellow dev. I knew I could do this myself but keep procrastinating and eventually some tasks end up as a mental block in WFH mode. Just forcing myself to watch a fella do something like this really helps, thank you!
This guy has superpowers. He can talk and code at the same time!
I love how much of the process of programming he includes in the demo
Seriously impressive. I'm a NodeJS API engineer and you're writing that JS code faster than me!
Thanks! Fast doesn't equal good, though 😅
I just wanted to tell you that you are doing great and I really like your format.
Thank you very much!
Use the retry library and set a low timeout; you can use a simple decorator. If the timeout needs to be high and this isn't very pleasant, consider running multiple requests concurrently and waiting only on the first result.
A fabulous video that has been of great help in orienting our new collaborators. Your generosity is highly valued!
This was super cool! Don't mind the long format at all. Would love to see you evolve this concept in another video.
I've already filmed the next one. It'll definitely be long form 😅
With full page screenshots. Maybe create an assistant which looks at my bookmarks and the tags in there based on my question and tries to get me the info from the page.
Really appreciate your information and style. Learning much!
Thanks for watching!
So for cookies you just need to know what cookie is being set, in many cases it’s likely just a matter of causing the same effect in puppeteer, one way is to add to the cookie store directly (I’m sure puppeteer has a way to do this), and an alternative is specifying a “user directory” for puppeteer so you can actually agree to things like cookies, in many ways consent popups are easy to “locate” using standard html locators simply because it is often set to a priority load event and is often a div/container with a name/id containing the word consent or cookie etc, so regex can be used to find these reasonably easy. Use puppeteer to locate the “Ok” button and click it and then having that reusable user directory means you only check for any site if you have or haven’t accepted consent, if not click it if so just scrape it
Legend has it, he’s still trying to find out what the weather is like in Alaska…
Great video. I have a question, can you suggest a way to select dynamically generated element id using puppeteer and OpenAI?
I'm only up to 15:00 but the issue you had up at this point is that it CAN read sam altman's birthdate, but it doesn't know what the date is today. You can feed it the date in your response generated with `date()` or whatever.
Chain of thought is actually meant to be used for mostly information accuracy, not for fixing what you could do in a proper single prompt.
Its interesting that this is exactly what I was looking for. Llast night i spent a few hours asking copilot how to implement the same libraries. Thanks for the tutorial
No typescript and no copilot? This was a more wholesome time.
a Master in the Arts of coding!
Thanks for the video. Great work.
but the token authorization for use gpt-4 preview where is ?
I dont get the plus in funcionality compared to google in this demo. Help me out.
What is the weather like in Alaska?
Couldn't you use backoff to handle the error when the API is stuck?
In package.json yku can set type : module
so you need to use gpt 3.5 turbo to get exact answers ijnstead of gpt-4? weird.
I still don’t have access to the vision API : (
This is so cool and nerdy! Maybe the best site to follow and learn more and more on OpenAI API. Difficult but entertaining to follow.
what is the weather like in alaska?
This is awesome. I love your videos. Please keep these videos going specially this one. I learned so much
Thank you! More to come :)
Very clever. Congratulation
If you add something like "Strictly based on the information from screeshot" you get information based on the information he gets from screenshot.
A little speed up might be to use the python requests package to try and fetch the url first before running puppeteer - then short-circuit invalid domains, 404's etc? Also, when doing a completion you can pass `request_timeout=10` or whatever and it'll kill the call. Sometimes even works.... ;-)
Thanks, I'll try that. Yeah, you can set the request_timeout, but you still have to handle the error my having some recursive function that retries the request if it fails. And I don't have time to implement that. It would take like a minute, lol
I replied to this and youtube removed it (I think!) - but the python package 'tenacity' (or the original retry) is worth a look (I'll skip the url as I think that's what made youtube remove/hide my comment)
What is the current weather in the world?
Crazy good content! Thank you!
For getting Sam Altman's age, would it help if you stated that the screenshot is taken today? ChatGPT may be hesitant to assume this.
I'd like to see a video from you about navigating websites with Puppeteer. Now that you ask, I'd like a tutorial on how it follows links, fills out data, crawls four or more links deep into a website, how to handle session cookies, automate and run loops, etc. :-)
very interesting, thanks for sharing!
Great Video! Can these libraries handle auth like azure oauth flow in order to browse to the page?
A good way is to include in user role message a timestamp. It will help him calculate the age of SAM Altaman easily!
Yes, but only because he knows his birthday already (even without the Wikipedia screenshot)
i was wondering how this is different from the web-search capapblilty of chatgpt-plus right now .
in other words , if i asked gpt to look for an answer on the web will it struggle to do so ? ,
is this a hack way to use a better websearch via an api like method because it's not enabled yet in the openai dev tools .
any way i really like the video , can we use selenuim to do so also ?
I wouldn't call this easy web scraping, but this was very hilarious with all the bugs
is it possible to use selenium? at least it is python, you don't need to switch between 2 language.
Yes, it should work too. I just have more experience with Puppeteer (never tried Selenium)
You're already in javascript for puppeteer. Why do the gymnastics of writing your main logic in python?
That's a good point and in the next video I in fact switch to JavaScript only. I prefer Python, though
The Vision API downsamples the image.. thats why it cannot recognise small fonts.
also i had checkout a patreon chat ( paid ). but now i am just unable to find it? it is gone?+
I'm not on Patreon but I'm on BuyMeACoffee and you can find a link in the description
@@unconv thankyou for the good job. i am improving and using it.
there are some pieces that doens work up to today and fixed them
Great video dude. Im gonna rewatch later. I got a project this might help on.
I appreciate your efforts mannn...
where do you put the openai key? I can't find anywhere to put it tried searching. Getting a billing not active error.
It grabs it from the OPENAI_API_KEY environment variable. You can set it on Linux by running "export OPENAI_API_KEY=YOUR_API_KEY" and if you're on Windows, I believe you can use "setx" or "set" instead of "export"
Great video! However, I noticed a few instances where you mentioned not having prior experience with certain tasks, but then you later showcased projects where the code was already complete. For example, at 9:29 in the video. This seems a bit contradictory and might confuse some viewers
Thanks! Which tasks did I say I didn't have prior experience with?
@unconv, this is my first time viewing a video on your channel. I observed that you started by looking through the documentation as if it was new to you, despite already having the answer in another file. This struck me as unusual, but I understand it might have been part of your process. When the documentation didn't seem to help, you referred to your existing project. I don't mean this in a negative way; it's just my personal observation from watching this video for the first time
I've used Puppeteer multiple times in the past, but I never remember the boilerplate stuff. I didn't want to jump directly to my own previously written code, because I want to do things from scratch in my videos, not leaving out any steps. And I want to show how I go about researching stuff. But I get that it might have been confusing - although I suspect even more confusing if I directly copy pasted my old code.
Make screenshot (do not close puppeteer session) and ask chatGPT is page looks loaded or not instead of relying on networkidle0, timeout, etc
hey man! I would need something like this posted onto a server of some sort, like AWS or Heroku. is that possible if i build this & deploy it? i need it to scale up for 1000 requests daily
A lot of websites will block requests from AWS servers, so you would probably need some sort of proxy server in between.
What is 4 Vision API?
Great video!
Interesting experiments with the GPT Vision API and Puppeteer. I have a couple of questions and a suggestion:
1. Could you share some insights on the cost aspect of using the GPT Vision API for this project? I'm curious about the pricing and whether it's feasible.
Also, have you considered combining classical web scraping methods with the Vision API in a synergistic way? Specifically, using traditional scraping to gather initial data and then employing the Vision API to verify or correct this data where needed. I think this could potentially address some of the limitations of both methods. What are your thoughts on this approach?Looking forward to hearing your thoughts!
Thanks! On the day I filmed the video, my API costs were $0.58. The next day I maxed out the limit of 100 messages of the gpt-4-vision-preview while testing and the total cost for that day was $2.15. These costs include some other API calls as well, though.
Combining classical web scraping and Vision API seems like a good idea. I'll have to look into that when I run into an issue scraping something.
You should try the JSON response mode. You can request to return a response like that in the system promp: {data: ExpectedDataInterface, error: ErrorInterface | null}. Good luck!
The llm was wrong about what the light on the motorcycle means, since the headlight is ALWAYS on. A simple but important mistake.
BWAHAHAHAHA! the struggle (programming: errors = WTF!!!!) is real.
day in the life of code building...Awesome video!
also i think this better suuited for assistants api. i made a private investigator that uses functions. one is serper api and if it finds a linkedin page crawls and de html it and send to get summzairzed with the link snippets ,then the other function is getting details on a image url you asked it to veiw using gpt 4 vision and i could make those functions paralell
Could you share more details, I'm trying to build similar functionality
I made a drinking game out of the word Alaska. I died.
First thank You. And question - how much token used this scraping method?
I haven't checked exactly but it seems to be around $0.017 per scrape based on my API usage during building this
I believe it’s not telling you his age because it is trying to provide you with a precise age i.e. his current age, given his birth date. Don’t ask what his age is, but what age the page or author of the page says he is
14:50 i don't think that's a good idea because u will lose a lot of tokens (input, output), so it s better to use scrapping urls with vector store
Thank you for this helpfull video! can you please try the same task with the functions tool? Thanks!
Excellent Job
Why you mixed Python + JS i dont see an requirement you could single programming language, Java script, or Python, and simply executed the same task with the single project
Can this work for Instagram scraping ?
no. instagram doesnt allow repetitive actions.
Is there a reason you don't use copilot?
It often guides me to directions I don't want to go. Also, I'm still learning Python so I'd rather practice my memorization
Remove the word Like, ask what is the weather in Alaska. The question you ask leads to an answer such as “colder than a commercial freezer”.
Good point 😂
"In Alaska's land, where coders seek the weather's tale,
They type and query, 'neath the aurora's bright veil.
With every line of code, they ask the sky's mood,
Hoping for sunshine, but prepared for the cold and brood."
Why not use everything in js? So confusing
Thank you.
just kisses for you , so freaakin loved how you explained and debugged along us
gpt4 vision api limits ?
100 requests per day
Why aren't you using the AI to help you code?🤔🤷
I think that he wants to explain the code to us by writing. I use ChatGPT to write code as I'm not a programmer myself, but I find myself learning to code anyway because I still need to understand what I actually need. It's also tiring to pass every small error to chat; it's easier to make adjustments yourself. However, to do that, you need to understand the code at some level.
I actually have Copilot but usually I disable it because it often guides me to directions I don't want to go. Especially when making videos, if Copilot suggests a different way than I was going to go, I get distracted. And I'm still learning Python, so I want to actually learn it. If I always use Copilot, I can get the job done but I probably won't memorize the syntax.
@@unconvI think he meant let chatgpt generate the entire code, not copilot.
Because fully AI generated code is unusable
@@yungjerkynot anymore it isn't. Never used Grimoire?
im not going to watch an hour of you coding but i will share that you can get a image of each element and selenium would problem be a good choice to use in this
Thank you for the Video.
But the way you re-typing the question (instead of copy and paste it) make me frustrated 😖
Sorry about that 😄
Nice content, but you should just copy paste the code, we know you can code well behind the scenes, don't worry. Keep doing great!
I see that 0420 there... in 00:31:50 : )
If humans didnt all re-invent the wheel every hour, there would be a huge database of every query : response : list of problems : links to solutions if they ever figured it out , that would save humanity unlimited man hours... but probably put openai out of business
Cool video
i would just using the scraping way and dehtml it. ive never seen seen someone with so much problems calling api
great video, just one suggestion, the repetition of what you're typing literally every time is a bit much.
Thanks! I'll try to avoid that in the future (and mistakes leading to repetition in general)
to have productive programming ai has to return what you want in 100% cases. it has to be better than human in deduction.
Grate video, at last I see on YT someone that struggles with the API as I do…
I know the topic of the video is to use the vision api, but you cold get better results using a terminal web browser like lynx , piping the result to a Tex file and asking ChatGPT with that text as context.
Just an idea. 😉
I was gonna dismiss your suggestion by saying one does not simply use Lynx in 2023 since it doesn't support JavaScript, which many websites require nowadays. But testing it out just now, all the examples I showed in this video could have worked with Lynx (based on its output). I don't know how I would extract links and input fields with Lynx, though, to make it crawl subpages. Perhaps all those pages were server side rendered, so I might as well have used Curl.
"Hopefully this is not a Malware" :D :D
This could be the best kodi addon ever
👍
Seems very inefficient to do it that way, yes it’s and interesting concept but you can do it all in Python and your logic can be simplified to get results
It's not hard to make a scraper. In fact you probably only need to use a http request, not a full on instance of chrome.
Want more
bro sounds like an AI. Good video tho
Instant fork, all your code belong to us
aaaaa this was frustrating as hell
you would bankrupt if you use gp4 vision api scrap web.... just link your credit card and start scraping
coding 😛
Using the seed as if it was a hyperparameter shows how little you know about the stuff you're talking about, congrats!
I mean, if you know more about it than me, you could maybe explain further or link to some more information about the subject
@hidroman1993 What a stupid reply, guide him at least if you know better...
It is intolerable how badly you prepared for the video. You can't teach people like that.
This isn't Unconventional Teaching
It is a more natural way as a developer, it is much better that way, learnt debugging
This is definitely the practical way to watch and learn. I like your style. You are showing the humanity of future coding
I love this approach - similar to how good developers actually code. Keep it up unconv
Meh, he is teaching how to troubleshoot. If you want direct directions just read the API documentation.