How to Scrape Websites Like a Pro Using DeepSeek (Beginner-Friendly)

Поділитися
Вставка
  • Опубліковано 6 лют 2025
  • In this video, I dive into the world of web scraping using DeepSeek and show you how incredibly affordable it can be. We'll start by setting up DeepSeek, integrating it with the open-source crawler Crawl for AI, and then move on to scraping a website to get structured data in no time. Along the way, I break down the costs, compare token usage with different language models, and explain why DeepSeek is a game-changer for startups that rely on consistent, reliable, and cheap data scraping.
    You'll see step-by-step instructions on configuring the DeepSeek API, creating your key, and using the crawler to extract data like a pro. I also highlight some of the cool features of Crawl for AI, like excluding external links, handling iframes, and configuring prompts to get super-precise results. At the end, we scrape the leaderboard from Chatbot Arena to demonstrate the power of this setup, resulting in structured JSON data that's perfect for databases or frontend applications.
    Disclaimer:
    While I mention in the intro that this "feels illegal," it's important to clarify that scraping is not inherently illegal. However, you must always review and adhere to the policies of the websites you scrape. Be responsible with the data you collect, and ensure you’re not violating terms of service or ethical guidelines. Use tools like these wisely and respectfully.
    For more context, the leaderboard used in this demonstration is from the open-source project Chatbot Arena (huggingface.co..., and the scraping of (web.lmarena.ai...) was done purely for demonstrational purposes.

КОМЕНТАРІ • 16

  • @thegreninja7675
    @thegreninja7675 15 днів тому +2

    Damn, its like hindi dub of Leonardo Grigorio video on this Deepseek AI for scraping.

    • @CodeLit-Ahaskar
      @CodeLit-Ahaskar  15 днів тому

      Haha it is, thanks! Do you recommend how I can improve?

  • @titusfx
    @titusfx 16 днів тому +1

    Hi, you could use similar soft to adobe noise reducer. Is free for X amount of minutes, you can split the audio and then concatenate again. Or use a local model to do it. No need to buy a mic in your earlies steps. But is true that all youtubes says that good audio is important. 💪

  • @epic_miner
    @epic_miner 19 днів тому

    😮😮😮 great bro

  • @abhaykumar3548
    @abhaykumar3548 2 дні тому +1

    can you make a video on the process from scatch to feeding those scrap data to an AI agent and asking question as they would be subject expert through the scrap data. My goal is to build an AI agent(free) and feed the data from a website and ask query related to that.

    • @CodeLit-Ahaskar
      @CodeLit-Ahaskar  2 дні тому

      Sure thing bro!
      It’s actually a nice idea,
      I’ll ping you here after posting!

    • @CodeLit-Ahaskar
      @CodeLit-Ahaskar  53 хвилини тому

      Here you go Abhay:
      ua-cam.com/video/NNvs2cdZyyc/v-deo.html

  • @ItsGauravPundir
    @ItsGauravPundir 17 днів тому +2

    This is cool can you make some videos AI agents? Also long videos with step by step things will be great ❤
    Also do add shorts to the Chanel to grow it quickly.

  • @whisky961
    @whisky961 18 днів тому +1

    Title & description is in english.
    Your into started in eglish.
    Then you started transitioning into a different language. Come on.

    • @CodeLit-Ahaskar
      @CodeLit-Ahaskar  17 днів тому +1

      Hey man, I am so sorry I didn't specify in the thumbnail. My videos are for hindi speakers. Sorry for any inconvenience caused.
      Here’s a quick overview of the content:
      Deepseek is way cheaper than open ai. It just charges 0.014$ for 1 million tokens. But do not think this 1 Million tokens is a lot, as many companies are scraping tons of websites every minute. Also web scrapers crawl the html and look for hyperlinks as well.
      Now, I discuss how crawl4ai is a library which we can use for web scraping purposes.
      This is the complete code I have used in the video:
      github.com/Ahaskar04/crawl4ai
      Just update with your deepseek api key and you are good to go.
      You can get your deepseek api key from here:
      (considering how effective this is, a recharge of 2$ will last a really long time)
      platform.deepseek.com/api_keys
      At the end of the video I show a demo of this code by scrapping this website:
      web.lmarena.ai/leaderboard
      You can define the url you want to scrap in line 9 of the given code.
      Lemme know if you have any doubts in the code, I am more than happy to help you out!
      Text me on instagram(@code_lit25) or here in the comments :)
      All the best!!

    • @whisky961
      @whisky961 17 днів тому +1

      @CodeLit-Ahaskar it's all good man. It's just a little annoying and I've seen a lot of videos doing the exact same thing.
      But thanks for the clarification and good luck with your next videos.

  • @akashkumargupta9807
    @akashkumargupta9807 16 днів тому

    Bro buy a mic