The ultimate AI SCRAPER is Finally COMPLETE!!

Поділитися
Вставка
  • Опубліковано 2 січ 2025

КОМЕНТАРІ •

  • @MinaEllis-XXAI
    @MinaEllis-XXAI 2 місяці тому +4

    What a great video! You are awesome! Great demonstration, thank you. This is the first video I have seen and I have already decided to subscribe!

    • @redamarzouk
      @redamarzouk  2 місяці тому

      thanks for the comment, really means a lot!

    • @MinaEllis-XXAI
      @MinaEllis-XXAI 2 місяці тому

      @@redamarzouk My name is Mina

    • @redamarzouk
      @redamarzouk  2 місяці тому

      @@MinaEllis-XXAI oh sorry the comment is fixed now.

  • @UberLinny
    @UberLinny 2 місяці тому +3

    love the work , but really wish the pagination would go beyond 1 page and cycle through the "Next" buttons to add to the CSV & the ability to save/load presets so I could come back every week and scrape the same data. but overall great job.

  • @eliteandhonor
    @eliteandhonor 2 місяці тому +2

    Thank you this is exactly what I have been wanting

  • @EduardoCortez81
    @EduardoCortez81 2 місяці тому +1

    You are a wizard! Thanks for this

  • @hannespi2886
    @hannespi2886 2 місяці тому +3

    Here for the final version!

  • @oussineabar7970
    @oussineabar7970 Місяць тому +1

    السلام عليكم ورحمة الله
    جزاكم الله خيرا وآلله ارحم الوالدين على هاد المجهوادت ديالك باش تفيدنا ❤

  • @Lazbel
    @Lazbel 2 місяці тому +1

    What could be causing this error? "ValueError: The provided formatted data is a string but not valid JSON." I'm scraping Facebook Marketplace listings in my Marketplace feed (using attended mode, with Gemini 1.5 Flash). It seems to be working fine until it gets to this point.
    I also get this other error sometimes: "AttributeError: Unknown field for Candidate: finish_message. Did you mean: 'finish_reason'?". I think it has something to do with Gemini's safety settings? If that's the case, is there any way to set Gemini's five safety categories to "Block none"?. This might help Gemini to avoid errors in scraping due to false positives.

  • @explosiveenterprises1479
    @explosiveenterprises1479 2 місяці тому +1

    This is great but what would take it over the top is allow me to go to each individual Amazon item open it and scrape some specific items then go back to the list and go to the next item. Obviously not for Amazon but just as an example

  • @streamstudiosllc
    @streamstudiosllc 2 місяці тому +4

    The Docker Container's name was "Stoic Beaver". You win

  • @kavinaychand858
    @kavinaychand858 2 місяці тому +1

    love this project

  • @Peakcoder
    @Peakcoder Місяць тому

    Hey man thanks for all your efforts, it. Helps alot to me specially for finding the new way/updates on pakages/framework for coding. Lots of Love from india Arunachal Pradesh ❤

  • @GrantSolomon-w5e
    @GrantSolomon-w5e 20 днів тому

    With Amazon scraping, I can only scrape information that is visible on the search results page (product title, ratings, etc.). Is there a way to scrape information that can only be viewed by clicking on the product? For example, how could I scrape all of the product descriptions that come up when I search for iPhone cases on Amazon? Product descriptions can't be viewed from the search page, and can only be seen after clicking on the product. Similarly, for job boards, job descriptions can only be viewed after clicking on the individual jobs. How would I scrape all of the job descriptions for marketing roles on a company job board?

  • @DennisDrzosga-xr9ny
    @DennisDrzosga-xr9ny Місяць тому

    Hey Reda,
    Really nice Video!
    Is it also possible to read the data of potential leads from the imprint or other subpages if mannbar has the main URL and not the exact subpage? Or is there a better way?
    Best regards
    Dennis

  • @suupaauozaden3463
    @suupaauozaden3463 2 місяці тому +1

    Hey, thanks for the work. I used Gemini for the Wikipedia images with the tag JPG URLs, it took all images, svg, png included. It would be nice to actually allow for a prompt, so it can format the URLs like remove 'thumbnail/' string, and the part right to the last / of the URL.
    Basically condition or post process the output so you save tokens elsewhere.

    • @redamarzouk
      @redamarzouk  2 місяці тому +3

      You basically want control of the output as well, by adding a little text box where you can give instruction on the output table?
      if that's the case, it's a great Idea.

  • @Sam_On_Tech
    @Sam_On_Tech Місяць тому

    Awesome work and good explanation on how to install and work with this app. I am trying to pull the Docker image and use that to start with however I am running into this error:
    Unable to pull redamarzouk/scrape-master:latest
    no matching manifest for linux/arm64/v8 in the manifest list entries
    Any idea why this error is coming up?

  • @d2d2505
    @d2d2505 19 днів тому

    Hi Reda,
    How can we run local Llama within Docker, would a separate container with the server work?

  • @shankar9063
    @shankar9063 19 днів тому

    Im trying to collect multiple headers and their data. Since the haders are large paragraphs, llm couldn't provide the full paragraph data. How to handle this issue?

  • @VaibhavShewale
    @VaibhavShewale 2 місяці тому +1

    why not added the part number? i m directly watcing the last part.

  • @DRMEDAHMED
    @DRMEDAHMED 2 місяці тому

    The computer control from anthropic can it be useful in this context ? And how would it be integrated linux vm or just a modified browser.

  • @towhidurrahman8961
    @towhidurrahman8961 2 місяці тому

    I have a question about "Enable Attended Mode." Is it possible to render the Chrome browser, along with the opened tab, on the frontend while performing web scraping?

  • @hichemlaribi252
    @hichemlaribi252 2 місяці тому +1

    Thanks for the video, can you upload your dockerfile to the github repo please ?

  • @knmplace
    @knmplace 2 місяці тому

    In the docker version, getting chrome error, as shown in your video. Would that not be a dependency built into the image when built, or do we need install extension into the container itself? Great app! Can’t wait to try it out

    • @redamarzouk
      @redamarzouk  2 місяці тому

      I've run the docker container on a different machine with no chrome driver and it worked.
      the code itself is responsible for installing the chrome driver. If it still doesn't work for you please download the chromedriver and put it in the corresponding file where you're root project is.

    • @knmplace
      @knmplace 2 місяці тому +1

      @@redamarzouk Great, thank you I will give it a try again and start from scratch and update. Thank you again.

  • @iiiBog
    @iiiBog 2 місяці тому

    Is that works on Linux docker server (i've tried with no luck)?
    Also will be nice to have web UI field for local AI server URL, in when Ollama/Ai studio deployed on different server in local network

  • @apsaraG-k7r
    @apsaraG-k7r Місяць тому

    I was trying with soup, selenium to understand the basic scraping.In review page, when I click see more review,amazon is asking to login to continue.How to handle this.Any suggestions ?

    • @redamarzouk
      @redamarzouk  Місяць тому

      in this case you'll have to use attended mode and login yourself.
      but adding a session persistence feature in Scape-Master will be neat

  • @tommynguyen4253
    @tommynguyen4253 2 місяці тому

    I have a list of keywords in excel; and I want to put each into the search box, then do scraping.
    Is it possible with the tool?

  • @pejdaniel-l2o
    @pejdaniel-l2o Місяць тому

    Great Job. Please record a video that how can we deploy this app in Azure cloud?

  • @Kevinsmithns
    @Kevinsmithns Місяць тому

    What if u want to scape all b2b info on serps
    Will . it work?

  • @tylertheeverlasting
    @tylertheeverlasting 2 місяці тому +1

    Add claude computer use to attended mode to make that part automated

    • @redamarzouk
      @redamarzouk  2 місяці тому

      You know that's actually a great idea, the only issue is that in the first video I used gpt4o if I remember correctly and the price was a bit high for every scrape.
      I got grilled in the comments by people saying that it's too expensive and this AI scraping approach will never touch the traditional way of scraping.
      So great idea but waiting on Claude to launch the next cheaper version and then let's see.

  • @benmoussaimane6902
    @benmoussaimane6902 Місяць тому +1

    plz can you do vedio how using playwright and agentQl for scraping thx sir

  • @asanadaniel497
    @asanadaniel497 10 годин тому

    Where can i find the App or Webiste please

    • @redamarzouk
      @redamarzouk  3 години тому

      sadly you can only use it if you set it up yourself on your machine.
      I'm preparing an app that can be used as a service, but that will be coming soon.

  • @nitzanbegger6250
    @nitzanbegger6250 2 місяці тому

    pagination works ?

  • @toufiqqureshi4668
    @toufiqqureshi4668 Місяць тому

    Please tell me how can I give you my code to review and recommendations from you.

    • @redamarzouk
      @redamarzouk  Місяць тому

      I have a discord channel link in the description, but since I have a full time job it's really hard to keep up with all comments and requests.

  • @s6yx
    @s6yx 2 місяці тому

    why no claude

  • @hamburger--fries
    @hamburger--fries 2 місяці тому

    Issue is non structured data such as a H1 field with no explanation. This happens with old HTML-1.

    • @redamarzouk
      @redamarzouk  2 місяці тому

      the scraper doesn't expect structured data in the input, it takes the whole website and structure it according to the fields you expect.

    • @hamburger--fries
      @hamburger--fries 2 місяці тому

      @@redamarzouk For example - If I was scraping a PDF or image I need additional Python libraries and as a base would need BeautifulSoup.

  • @toufiqqureshi4668
    @toufiqqureshi4668 Місяць тому

    Brother I want you help I m work for a revenue management company and they have hotels around 200 count and they want to make competition analysis for their clients they have 200 hotels as client and their clients hotels ha competition of their hotels so they want to make a web scraping project like at same time they want to scrape 200 hotels data from MakeMyTrip and they have proxy as well and I have make a code script but no one has to review and say that they could be more better please help me bro

  • @lanvinpierre
    @lanvinpierre 2 місяці тому

    why is it asking for openai keys if im trying to use llama locally?

    • @redamarzouk
      @redamarzouk  2 місяці тому

      For this final version I didn't even bother trying the local model since I've been scraping amazon, ebay and other really long token context websites.
      I'll check the code to see if it needs an openai api key even with local model.

    • @lanvinpierre
      @lanvinpierre 2 місяці тому

      @@redamarzouk thanks! i have no idea in coding but have managed to use ollama and anythingllm so anything that could work with that api would help a lot of users thank you

  • @GundamExia88
    @GundamExia88 2 місяці тому

    Docker Compose should be the way to go!!! =) Btw, great video! Good job!! I wonder if using docker, if you can use something like kasm for unattended mode that brings up a browser in kasm.

  • @AndersNøhrHolmstrøm
    @AndersNøhrHolmstrøm 2 місяці тому

    Amazing! Now make it Mac 🥳

    • @AndersNøhrHolmstrøm
      @AndersNøhrHolmstrøm 2 місяці тому

      According to ChatGPT I can run it in Docker on my Mac. Wish me luck!

    • @V3racious3
      @V3racious3 Місяць тому

      @@AndersNøhrHolmstrøm According to an idiot, you're an idiot.