I Made a FAST Search Engine
Вставка
- Опубліковано 10 чер 2024
- Get $15 free credits with BrightData: brdta.com/conaticus1
BrightData UA-cam Channel: @BrightData
TF-IDF Blog Post: janav.wordpress.com/2013/10/2...
Lemmetization Word Lists: github.com/michmech/lemmatiza...
Crawler Repository: github.com/conaticus/search-e...
API Repository: github.com/conaticus/search-e...
Client Repository: github.com/conaticus/search-e...
Discord: / discord
Github: github.com/conaticus
Twitter: / conaticus
Join this channel to get access to perks:
/ @conaticus
I Made a FAST Search Engine
0:00 Intro
0:20 BrightData
2:10 Inverse Term Frequency & Indexing
6:41 Page Ranking & Lemmetization - Наука та технологія
Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1
no
no
no
no thanks
no
I don't know what this guy said, and still was mind-blown of all the effort this guy puts
Thanks much so 🙏 It would not be possible without your support
I’m impressed, can’t wait to see you build a multithreaded web server in assembly
Why do I find it super funny 😅😅😅.
@@da40au40 Me too :D
it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.
@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites
@@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.
That's really impressive, I can't even figure out how to run it.
Nice username
Just added some instructions to the READMEs if you're interested :)
@@conaticus thanks, I'll do that
SERBIA MENTIONED 🎉🎉🎉
Now waiting for Russia 🥰
@@europa_the_last_battle>goes to comments
>sees meme comment
>looks at replies
>only a LARPer replied
lol
@@RealMephres this aint 4chan nga
that name rings a bell, maybe from some kind of Serbian movie?
@@MAXHASS-ph5ib tell that to the LARPer dawg
Love your content. You and your quality have really improved. Keep it up ❤
Thanks so much, your support means a lot ♥
7:40 flashing those questionable websites in a sponsored video is quite the move
You scared of porn?
This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own
The problem is this approach is susceptible to SEO spamming/invisible SEO keywords
Yeah for sure, realistically it should be moderated based on user interaction as well
Nice video and nice code, keep up the good work!
filter out JS for another 10x bandwidth savings
alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)
This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big
Let's go another conaticus video
Love this dude and his video projects
🙏
Subscribed & notifications on :)
you deserve more recognition bruh
Google also does the same but with disstributed computing to reduce the overall time .
Just scale the database horizontally and mimic googles apporach
Why did you choose TF-IDF instead of word2vec or any context aware model?
+1 Woule like to know
such a cool video! i love the way how you explain what you are doing :)
random question but what is your editor font?
Appreciate it :) I'm using Jetbrains Mono it's free to download
Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉
great video, gave me ptsd from my information retrieval class though
Awesome effort ✨
thats insane, hows this only at 12k views
You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.
how much did you pay for the web scraping service in total?
is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it?
or was it dust a peak or somthing you made cuz ( you where bored or smt )
Supa dope. I would like to use this search engine of yours
How much did the scraping cost if it wasn't free?
>goes to youtube homepage
>finds this video
>yipeee
>oh
>lets try it
Impressive, seriously!
I believe it's "inverted indexing", as inverse indexing is something else.
Super good editing 🫡🫡🫡🫡
Would not possible with your breathtaking animations 😄
Well of course it is very fast, it only has like 200 websites
3:07 Best pronunciation of Euclidean I have every heard :P
Where?
@@CrazyDiamondo I added a timestamp
ain't see rust there!
very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank
How did you manage to get a node.js memory leak??
🔥🔥🔥
Programming 🤝 martincitopants…match made in heaven
yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths
oh my fuck i saw this on your github last night
W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.
Bro managed to memleak in js
Good! The world needs a new Google Search, one that's more like how it was in the 2000s.
Nice job :D
Remember, never return an over 18 site without an over 18 word in the search request
what is things that i should to know or learn to create like these projects
HTML for website creation
CSS page designing
Javascript for making website dynamic and for backend
SQL for indexing
Rust for fast backend services
Rewrite your genetic code in Rust.
i would rather be bug free so i will pass
Create your own database engine for shits and giggles
B+Trees 💀
first time watching a vid of yours ...
i have one question : why are you vibrating ??
Cause he is vibrator
i love this channel
Next time use the Common Crawl dataset ;)
Now make your own email system to go along with it. 😉
Nice, you re-invented the lucene library
🍎 👀
.. Apple being like "when will it be ready?".
You should host it
Lol. Got notif after clicking the video.
good vid
you seem ok
how can i install this search engine?
Instructions are on the Github repos :)
If only windows file explorer could do the same
For this we have thing named Everything :)
how do you edit your vids
Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁
@@conaticus ok thx
nice
da goat
Auto solve captcha you say🧐
at a desert
"some fucking genius" lmao
discord clone when
🔥🔥🔥
I was looking for that algorithm and didn't know its name.
Bro make a compiler programming language
I found a worthy opponent
why disallow and user-agent matter? can't you just scrap everything?
You can but it might be illegal
Good
Liked and subbed
@google acquire this man
6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀
Why is there Rust in the thumbnail? This was written in Javascript
Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations
Cant wait for you to rewrite JS in binary 🎉🎉
shockedd
Bro sounds like WilburSoot
hub 🎉🎉
What did u mean by the websites u shouldn’t have searched
MAKE LONGER VIDEOS
What are the consequences of scrapings sites you aren't allowed to?
Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt
wastes their resources and yours
1:06 automatically solve captchas? i knew these things exist just to waste our time and energy
damn
what TF is IDF ?!!
idk man but watching it makes me feel smart
Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf
rust is a real badass❤❤
Great video 😊
FYI: bright data is an Israeli company 😮
Make a better version of VSCode.
105
then brightdata makes captchas useless
Captcha's effectiveness has been in question for quite some time now.
You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀
All open source and ready to play around with 😂
we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF.
also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone
skill issue
@@skorp5677 exactly
This is just an ad for BrightData. Compared to previous videos very low effort.
Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi
Thank you, I'll do this if I ever pronounce it again 😂
SRBIJAAAAAA
this result dont make any sense xha... very fast
Still not fast and scalable enough. The result is not even relevant, you made bing not google
wow really? Im also surprised one single guy didnt manage to make a product rivaling Google
how is this impressive? Of course it's gonna be faster. You aren't querying billions or even trillions of web pages unlike Google? So this search engine isn't even faster than Google...
It wasn't meant to be impressive it was meant to be informative and entertaining 👍
@@conaticus your thumbnail implies it is faster than Google. And I believe the original title did too.
You need to learn how to sync up your audio and video.