Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

The Biggest Issues I've Faced Web Scraping (and how to fix them)

Заява ЗАЛУЖНОГО ШОКУВАЛА увесь СВІТ😱ТРЕТЯ СВІТОВА ВІЙНА ПОЧАЛАСЬ?

How Strong Is Tape?

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

Prompt Engineering

Переглядів 23 868

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 8 січ 2025

КОМЕНТАРІ • 43

@engineerprompt 7 місяців тому ⁺³
If you want to build robust RAG applications based on your own datasets, this is for you: prompt-s-site.thinkific.com/courses/rag
@lorenzo.padoan 6 місяців тому ⁺¹⁴
Thanks for mentioning ScrapeGraphAI, I'm one of the co-founders, we have implemented new features like code generator for scraping to minimize the number of calls to LLM on sites that have a shared structure on different pages, we are preparing something big related to KG, stay tuned :))))
@unclecode 7 місяців тому ⁺⁵
Thanks for mentioning Crawl4Ai! I'm adding some new features, such as extracting all media tags (video, image, audio), Breadth-First Search (BFS) Crawling, and more. I do it with the aim to generate quality data without relying on large language models (LLM). I think firing up GPUs for just crawling data from a page with billions of parameters is a bit over the top. Developers can use LLMs themselves once they have the right raw data from web sources.
@engineerprompt 7 місяців тому ⁺²
Crawl4AI is shaping up pretty nicely. I will do a deep dive on it.
@mjacfardk 7 місяців тому ⁺⁶
Yes PLEASE, Do a videos on {Crawl4Ai and ScrapeGraphAI}, and thank you for everything you do and your time 🙏
@engineerprompt 7 місяців тому ⁺²
Yes, its on my list.
@TimTruth 7 місяців тому ⁺³
I just use selenium web driver and JavaScript or Jquery to interact with and get the parts of pages I want. If they use cloud flare or other bot blocking you can run js in console and utilize the copy command then paste in a txt file
@d.d.z. 7 місяців тому ⁺²
Is there any path for learning you can recommend me? i´m generating reports from a web using python, looking for an alternative. Thanks in advance.
@jarad4621 7 місяців тому ⁺⁴
For jina reader Api key free for 1 million tokens which was 570 sites then pay 10 for 500 mil worth is 250k sites which is totally insane just pay the tiny amount for much better rate limits
@GetzAI 7 місяців тому ⁺²
Great review. Please do a review on ScrapeGraphAI. Maybe a comparison to Uncle Code's Crawl4AI? I like Crawl4AI and hope UC incorporates PDF options.
@engineerprompt 7 місяців тому ⁺¹
thanks, yes, both of them are on my TODO list.
@ahassan7270 7 місяців тому ⁺¹
Thank you so much for sharing this valuable information. It is absolutely helpful.
@engineerprompt 7 місяців тому
Glad it was helpful!
@sethhavens1574 5 місяців тому
Super handy, thanks 🙏
@beemerrox 6 місяців тому ⁺¹
Nice comparison! Please continue work on scraping for AI applications. Hot topic!
@engineerprompt 6 місяців тому
thanks, will do
@j4cks0n94 7 місяців тому
Scrapegraph is pretty amazing, highly recommended
@AJ-lg4zr 4 місяці тому
Can you make a detailed video on scrapegraphai? It’s kinda buggy right now for me
@ahassan7270 7 місяців тому
Thank you so much for sharing this valubale information. It is absouletly helpful. But, is it possible,as far as jina ai is concerned, to specify in the code the number of pages that I want to scrape, as spmetimes the pdf file has more than 500 pages .
@engineerprompt 7 місяців тому
I am not sure, their api seems to be very simple and I haven't noticed any customizations yet.
@MeinDeutschkurs 7 місяців тому
Crawl4ai sounds perfect!
@SeeFoodDie 7 місяців тому
Thank you. If you could dive deeper into scrapegraph, specifically the knowledge graph feature.
@engineerprompt 7 місяців тому ⁺¹
thanks, will look into it.
@planetgamecommunity817 7 місяців тому ⁺¹
I need this materials very much,, can you share codes and api brothe??
@engineerprompt 7 місяців тому
link to the notebook is in the video description.
@planetgamecommunity817 7 місяців тому
@@engineerprompt thanks this is crucial ...best for you dude
@chuckcarlson7940 6 місяців тому
Do any of these solutions work on sites you have to log in to? You can give them a url, but if the site requires you to log in, you will not be able to scrape further.
@engineerprompt 6 місяців тому
Good question, I am not sure. you might have to add authentication yourself to these.
@chuckcarlson7940 6 місяців тому
@@engineerprompt If any of these solutions are Chromium based, then one could load the page, go through the authentication process, and select the page to be scraped. Then invoke the scraping tool.
@stefleur 7 місяців тому
Probably a silly question, but in what is all this complicated proccess better than doing a simple copy paste from the url?
@engineerprompt 7 місяців тому
There are a couple of reasons.
1. Even if you were to just copy and paste, the you not preserve the structure in most cases, there will be table, images etc which will mess up the formatting.
2. Even if copy paste were to give you perfect results, you can scale that to 100s or 10,000s of webpages. Using these automated tools, you need to provide list of urls and they will be able to parse at scale.
@hypnoz7871 Місяць тому
Good luck copy pasting+ cleaning millions of pages for llm feeding.
Also good luck for manual updating :)
@ai-whisperer 7 місяців тому
brilliant 🙌🙌
@engineerprompt 7 місяців тому
thanks :)
@bardaiart 7 місяців тому
Thanks a lot! :)
@ppp3812 7 місяців тому
Are there any scrapper available for LinkedIn and Instagram?
@engineerprompt 7 місяців тому
I am not aware of any.
@JPy90 7 місяців тому ⁺¹
great thx!
@john_blues 7 місяців тому
The android in the thumbnail looks like he's DJing. Like he's ready to drop a sick beat...NOW!
@thesimplicitylifestyle 7 місяців тому
We must create order from the messiness! 😎🤖
@engineerprompt 7 місяців тому
Agree :)

Наступне

Автоматичне відтворення

Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

Qwen-Agent: Build Autonomous Agents with The Best Open Weight Model

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

The Biggest Issues I've Faced Web Scraping (and how to fix them)

The Biggest Issues I've Faced Web Scraping (and how to fix them)

Заява ЗАЛУЖНОГО ШОКУВАЛА увесь СВІТ😱ТРЕТЯ СВІТОВА ВІЙНА ПОЧАЛАСЬ?

Заява ЗАЛУЖНОГО ШОКУВАЛА увесь СВІТ😱ТРЕТЯ СВІТОВА ВІЙНА ПОЧАЛАСЬ?

How Strong Is Tape?

How Strong Is Tape?

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Як азовська піхота прийняла групу розвідки вс рф? Зізнання окупантів і кадри з GoPro

Ветеран війни отримав гроші на житло

Ветеран війни отримав гроші на житло

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

This is how I scrape 99% websites via LLM

This is how I scrape 99% websites via LLM

Build your own WebScrapper using Crawl4AI and Streamlit

Build your own WebScrapper using Crawl4AI and Streamlit

LightRAG: A More Efficient Solution than GraphRAG for RAG Systems?

LightRAG: A More Efficient Solution than GraphRAG for RAG Systems?

Web Scraping with GPT-4 Vision AI + Puppeteer is Mind-Blowingly EASY!

Web Scraping with GPT-4 Vision AI + Puppeteer is Mind-Blowingly EASY!

AI Powered Web Scraping : the EASY way with n8n and Jina.ai (no-code!)

AI Powered Web Scraping : the EASY way with n8n and Jina.ai (no-code!)

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

This Open Source Scraper CHANGES the Game!!!

This Open Source Scraper CHANGES the Game!!!

Crawl4AI: The Ultimate AI Website Scraping Guide

Crawl4AI: The Ultimate AI Website Scraping Guide

Анна Трінчер - Треш (Official Music Video)

Анна Трінчер - Треш (Official Music Video)

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

НА ЦЕ можна дивитись ВІЧНО! Такої ПАЛКОЇ зустрічі НІХТО НЕ ЧЕКАВ

Правильный подход к детям

Правильный подход к детям

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

«Шнурки не зрізайте, акуратненько»: медик про реакцію військових на поранення #shorts

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Этот бой - Самое большое РАЗОЧАРОВАНИЕ за всю КАРЬЕРУ БУАКАВА!

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

Психіатр Глузман УПЕРШЕ сканує Зеленського, Путіна й Трампа

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

#JasonDeruloTV // Funny #GotPermissionToPost From @SofiManassyan #SlowLow

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas

ふわふわシフォン大作戦🩷スイーツ戦隊のキラキラミッション✨【銀座コージーコーナー】 #shorts #シフォンケーキ #クリスマスケーキ #クリスマス #ケーキ #チョコケーキ #christmas