I Made a FAST Search Engine

Поділитися
Вставка
  • Опубліковано 10 чер 2024
  • Get $15 free credits with BrightData: brdta.com/conaticus1
    BrightData UA-cam Channel: @BrightData
    TF-IDF Blog Post: janav.wordpress.com/2013/10/2...
    Lemmetization Word Lists: github.com/michmech/lemmatiza...
    Crawler Repository: github.com/conaticus/search-e...
    API Repository: github.com/conaticus/search-e...
    Client Repository: github.com/conaticus/search-e...
    Discord: / discord
    Github: github.com/conaticus
    Twitter: / conaticus
    Join this channel to get access to perks:
    / @conaticus
    I Made a FAST Search Engine
    0:00 Intro
    0:20 BrightData
    2:10 Inverse Term Frequency & Indexing
    6:41 Page Ranking & Lemmetization
  • Наука та технологія

КОМЕНТАРІ • 176

  • @conaticus
    @conaticus  2 місяці тому +39

    Start building awesome projects with $15 free credits using BrightData today: brdta.com/conaticus1

  • @lifeofme702
    @lifeofme702 2 місяці тому +294

    I don't know what this guy said, and still was mind-blown of all the effort this guy puts

    • @conaticus
      @conaticus  2 місяці тому +17

      Thanks much so 🙏 It would not be possible without your support

  • @jaymarksum6542
    @jaymarksum6542 2 місяці тому +286

    I’m impressed, can’t wait to see you build a multithreaded web server in assembly

    • @da40au40
      @da40au40 2 місяці тому +8

      Why do I find it super funny 😅😅😅.

    • @ArthursHD
      @ArthursHD 2 місяці тому +2

      @@da40au40 Me too :D

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV 2 місяці тому +2

      it's not impressive. Of course querying a few hundred or even hundred thousand web pages isn't as complicated or slow of a task than querying trillions of webpages.

    • @KibitoAkuya
      @KibitoAkuya 2 місяці тому

      ​@@DanskeCrimeRiderTV google also wastes time deciding wether you are allowed to see or not certain sites

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV Місяць тому +1

      @@KibitoAkuya what does that have to do with anything? Google is still faster at querying trillions of results than this.

  • @asm_x86
    @asm_x86 2 місяці тому +72

    That's really impressive, I can't even figure out how to run it.

    • @ZuperPotato
      @ZuperPotato 2 місяці тому +9

      Nice username

    • @conaticus
      @conaticus  2 місяці тому +17

      Just added some instructions to the READMEs if you're interested :)

    • @asm_x86
      @asm_x86 2 місяці тому +4

      @@conaticus thanks, I'll do that

  • @greensporevalley
    @greensporevalley 2 місяці тому +407

    SERBIA MENTIONED 🎉🎉🎉

    • @europa_the_last_battle
      @europa_the_last_battle 2 місяці тому +12

      Now waiting for Russia 🥰

    • @RealMephres
      @RealMephres 2 місяці тому +16

      ​@@europa_the_last_battle>goes to comments
      >sees meme comment
      >looks at replies
      >only a LARPer replied
      lol

    • @MAXHASS-ph5ib
      @MAXHASS-ph5ib 2 місяці тому +20

      @@RealMephres this aint 4chan nga

    • @jawadmansoor6064
      @jawadmansoor6064 2 місяці тому +1

      that name rings a bell, maybe from some kind of Serbian movie?

    • @RealMephres
      @RealMephres 2 місяці тому +5

      @@MAXHASS-ph5ib tell that to the LARPer dawg

  • @coderx8634
    @coderx8634 2 місяці тому +27

    Love your content. You and your quality have really improved. Keep it up ❤

    • @conaticus
      @conaticus  2 місяці тому +2

      Thanks so much, your support means a lot ♥

  • @ccost
    @ccost 2 місяці тому +62

    7:40 flashing those questionable websites in a sponsored video is quite the move

  • @coderan5029
    @coderan5029 Місяць тому +1

    This is basically what we learned in my big data class, but we used map-reduce to do the TF-IDF calculations, so it's impressive you figured this out on your own

  • @devinlauderdale9635
    @devinlauderdale9635 2 місяці тому +34

    The problem is this approach is susceptible to SEO spamming/invisible SEO keywords

    • @conaticus
      @conaticus  2 місяці тому +11

      Yeah for sure, realistically it should be moderated based on user interaction as well

  • @rafaelpereiracoias1047
    @rafaelpereiracoias1047 Місяць тому +1

    Nice video and nice code, keep up the good work!

  • @6IGNITION9
    @6IGNITION9 2 місяці тому +6

    filter out JS for another 10x bandwidth savings
    alternatively use an adblocker. (can puppeteer do that? It's just chromium right?)

  • @polyshrub
    @polyshrub 2 місяці тому +2

    This is very impressive, what was the size of the database when indexing is finished? Seems like it would be quite big

  • @ExpandedCuber
    @ExpandedCuber 2 місяці тому +5

    Let's go another conaticus video

  • @foqsi_
    @foqsi_ 2 місяці тому +2

    Love this dude and his video projects

  • @MySachincool
    @MySachincool Місяць тому

    Subscribed & notifications on :)
    you deserve more recognition bruh

  • @R_Y_Z_E_N
    @R_Y_Z_E_N Місяць тому +1

    Google also does the same but with disstributed computing to reduce the overall time .
    Just scale the database horizontally and mimic googles apporach

  • @SG-kn2jl
    @SG-kn2jl 2 місяці тому +5

    Why did you choose TF-IDF instead of word2vec or any context aware model?

    • @skorp5677
      @skorp5677 2 місяці тому +1

      +1 Woule like to know

  • @GermanTimecrafter
    @GermanTimecrafter 2 місяці тому +1

    such a cool video! i love the way how you explain what you are doing :)
    random question but what is your editor font?

    • @conaticus
      @conaticus  2 місяці тому

      Appreciate it :) I'm using Jetbrains Mono it's free to download

  • @turb0004
    @turb0004 2 місяці тому +1

    Please finish your file explorer in rust fully, because the idea of it is awesome. Love your videos, content is very engaging 🎉

  • @madalenaferreira3018
    @madalenaferreira3018 Місяць тому

    great video, gave me ptsd from my information retrieval class though

  • @iritesh
    @iritesh 2 місяці тому

    Awesome effort ✨

  • @stayhappy-forever
    @stayhappy-forever 2 місяці тому +2

    thats insane, hows this only at 12k views

  • @miro5182
    @miro5182 День тому

    You can use a chrome like TLS config to not get blocked by cloud flare in a lot of cases, using a browser for scraping isn’t viable when tracking about scanning the internet.

  • @yorailevi6747
    @yorailevi6747 2 місяці тому

    how much did you pay for the web scraping service in total?

  • @maksymilianglowacki1409
    @maksymilianglowacki1409 Місяць тому

    is this engine oneline or ( wouldt it be abel to be oneline for otcher users ) so otcher also coulst enjoy it?
    or was it dust a peak or somthing you made cuz ( you where bored or smt )

  • @a6gitti
    @a6gitti 2 місяці тому

    Supa dope. I would like to use this search engine of yours

  • @thekwoka4707
    @thekwoka4707 2 місяці тому

    How much did the scraping cost if it wasn't free?

  • @synapsenova299-fp7tf
    @synapsenova299-fp7tf 2 місяці тому

    >goes to youtube homepage
    >finds this video
    >yipeee
    >oh
    >lets try it

  • @Nerdimo
    @Nerdimo 2 місяці тому

    Impressive, seriously!

  • @jsalsman
    @jsalsman 2 місяці тому

    I believe it's "inverted indexing", as inverse indexing is something else.

  • @allenfpascua
    @allenfpascua 2 місяці тому

    Super good editing 🫡🫡🫡🫡

    • @conaticus
      @conaticus  2 місяці тому

      Would not possible with your breathtaking animations 😄

  • @lonelybookworm
    @lonelybookworm 2 місяці тому +3

    Well of course it is very fast, it only has like 200 websites

  • @80sVectorz
    @80sVectorz 2 місяці тому +1

    3:07 Best pronunciation of Euclidean I have every heard :P

  • @gopallohar5534
    @gopallohar5534 Місяць тому +1

    ain't see rust there!

  • @jugurtha292
    @jugurtha292 2 місяці тому +5

    very nice, built something similar for my info retrieval class. we have to use okapi bm25 formula for the ranking but overall very similar. scrape, tokenize, parse, inverted index, rank

  • @ethanstewart1011
    @ethanstewart1011 Місяць тому

    How did you manage to get a node.js memory leak??

  • @dreamsofcode
    @dreamsofcode 2 місяці тому +11

    🔥🔥🔥

  • @alexmoses3215
    @alexmoses3215 Місяць тому

    Programming 🤝 martincitopants…match made in heaven

  • @gammongaming9081
    @gammongaming9081 Місяць тому

    yk what would be funny? making the slowest search engine possible without like halting the program for a set time, just with maths

  • @errplane_
    @errplane_ 2 місяці тому +5

    oh my fuck i saw this on your github last night

  • @user-xl2om2up2x
    @user-xl2om2up2x Місяць тому +2

    W ad plug, it's 100% relevant and actually necessary to fulfill the premise of this vid.

  • @HyperCodec
    @HyperCodec 2 місяці тому +2

    Bro managed to memleak in js

  • @MortonMcCastle
    @MortonMcCastle 2 місяці тому

    Good! The world needs a new Google Search, one that's more like how it was in the 2000s.

  • @gaimnbro9337
    @gaimnbro9337 2 місяці тому

    Nice job :D

  • @Raven-fu1zz
    @Raven-fu1zz 2 місяці тому

    Remember, never return an over 18 site without an over 18 word in the search request

  • @animeworld4775
    @animeworld4775 2 місяці тому

    what is things that i should to know or learn to create like these projects

    • @GONDWANA-de4od
      @GONDWANA-de4od 2 місяці тому +1

      HTML for website creation
      CSS page designing
      Javascript for making website dynamic and for backend
      SQL for indexing
      Rust for fast backend services

  • @AquaQuokka
    @AquaQuokka 2 місяці тому +19

    Rewrite your genetic code in Rust.

    • @pyyrr
      @pyyrr 2 місяці тому

      i would rather be bug free so i will pass

  • @joenutt1232
    @joenutt1232 2 місяці тому +3

    Create your own database engine for shits and giggles

  • @mahrezjanati3426
    @mahrezjanati3426 2 місяці тому

    first time watching a vid of yours ...
    i have one question : why are you vibrating ??

    • @-rate6326
      @-rate6326 2 місяці тому

      Cause he is vibrator

  • @carlitosdummy
    @carlitosdummy 2 місяці тому

    i love this channel

  • @callowaysutton
    @callowaysutton Місяць тому

    Next time use the Common Crawl dataset ;)

  • @SlimyFrog123
    @SlimyFrog123 2 місяці тому

    Now make your own email system to go along with it. 😉

  • @a224kkk
    @a224kkk Місяць тому +1

    Nice, you re-invented the lucene library

  • @gamedirection_us
    @gamedirection_us 2 місяці тому

    🍎 👀
    .. Apple being like "when will it be ready?".

  • @lazarusNoob
    @lazarusNoob Місяць тому

    You should host it

  • @larry_berry
    @larry_berry 2 місяці тому

    Lol. Got notif after clicking the video.

  • @TheRealMangoDev
    @TheRealMangoDev 2 місяці тому

    good vid

  • @fangg194
    @fangg194 2 місяці тому

    you seem ok

  • @daemonkisure2952
    @daemonkisure2952 2 місяці тому

    how can i install this search engine?

    • @conaticus
      @conaticus  2 місяці тому

      Instructions are on the Github repos :)

  • @monotonedevelopment
    @monotonedevelopment 2 місяці тому +1

    If only windows file explorer could do the same

    • @SandWire
      @SandWire Місяць тому +1

      For this we have thing named Everything :)

  • @playtatus1758
    @playtatus1758 2 місяці тому

    how do you edit your vids

    • @conaticus
      @conaticus  2 місяці тому

      Allen uses adobe after effects for the amazing animations - I just use Davinci to cut things up 😁

    • @playtatus1758
      @playtatus1758 2 місяці тому

      @@conaticus ok thx

  • @igrb
    @igrb Місяць тому

    nice

  • @dylhack
    @dylhack 2 місяці тому

    da goat

  • @humanontheinternet6510
    @humanontheinternet6510 Місяць тому

    Auto solve captcha you say🧐

  • @thescratchguy428
    @thescratchguy428 2 місяці тому

    at a desert

  • @binpersonal
    @binpersonal 2 місяці тому +1

    "some fucking genius" lmao

  • @_DarkLiquid
    @_DarkLiquid 2 місяці тому +1

    discord clone when

  • @Serhii_Volchetskyi
    @Serhii_Volchetskyi Місяць тому

    🔥🔥🔥
    I was looking for that algorithm and didn't know its name.

  • @deepfan14
    @deepfan14 15 днів тому

    Bro make a compiler programming language

  • @v037_
    @v037_ Місяць тому

    I found a worthy opponent

  • @Faeest
    @Faeest 2 місяці тому

    why disallow and user-agent matter? can't you just scrap everything?

    • @skorp5677
      @skorp5677 2 місяці тому

      You can but it might be illegal

  • @Tech_Code127-76
    @Tech_Code127-76 2 місяці тому

    Good

  • @Macellaio94
    @Macellaio94 2 місяці тому

    Liked and subbed

  • @Xanmattauri
    @Xanmattauri 2 місяці тому

    @google acquire this man

  • @ALTERRAa8
    @ALTERRAa8 2 місяці тому

    6:08 nahhhhhhhhhhh whats bro even searching 💀💀💀💀

  • @sleepybraincells
    @sleepybraincells 2 місяці тому +3

    Why is there Rust in the thumbnail? This was written in Javascript

    • @conaticus
      @conaticus  2 місяці тому +2

      Used Rust for the API and TF-IDF matching - decided not to keep in much of the footage for that as it was already explained in the animations

  • @iCrimzon
    @iCrimzon 9 днів тому

    Cant wait for you to rewrite JS in binary 🎉🎉

  • @J0Y22
    @J0Y22 2 місяці тому

    shockedd

  • @neologicalgamer3437
    @neologicalgamer3437 2 місяці тому +1

    Bro sounds like WilburSoot

  • @trolIface_
    @trolIface_ Місяць тому +1

    hub 🎉🎉

  • @danielisop3182
    @danielisop3182 Місяць тому

    What did u mean by the websites u shouldn’t have searched

  • @Ayymoss
    @Ayymoss 2 місяці тому

    MAKE LONGER VIDEOS

  • @chiroyce
    @chiroyce 2 місяці тому

    What are the consequences of scrapings sites you aren't allowed to?

    • @conaticus
      @conaticus  2 місяці тому +1

      Probably not much on its own as long as you're not violating copyright - however it is curtious not to scrape sites forbidden by the robots.txt

    • @trollinqu
      @trollinqu 2 місяці тому +1

      wastes their resources and yours

  • @Miluum
    @Miluum Місяць тому

    1:06 automatically solve captchas? i knew these things exist just to waste our time and energy

  • @monkshee
    @monkshee 2 місяці тому

    damn

  • @juniordevmedia
    @juniordevmedia 2 місяці тому +2

    what TF is IDF ?!!

    • @neofox2526
      @neofox2526 2 місяці тому

      idk man but watching it makes me feel smart

    • @jamesbarret4240
      @jamesbarret4240 2 місяці тому +1

      Term frequency (the number of times a given word or so shows up in total) - inverse document frequency (the number of times it shows up in a specific document). The wikipedia article is pretty good: en.wikipedia.org/wiki/Tf-idf

  • @user-fj5ts6sz1f
    @user-fj5ts6sz1f 2 місяці тому

    rust is a real badass❤❤

  • @AhmedMahmoud-ec4kz
    @AhmedMahmoud-ec4kz Місяць тому

    Great video 😊
    FYI: bright data is an Israeli company 😮

  • @_sohom
    @_sohom 2 місяці тому

    Make a better version of VSCode.

  • @latrapa918
    @latrapa918 2 місяці тому

    105

  • @susannerudolph8469
    @susannerudolph8469 2 місяці тому +2

    then brightdata makes captchas useless

  • @ph03n1x_dev
    @ph03n1x_dev 2 місяці тому +1

    You made a search engine for porn?! Thats disgusting... is it on GitHub?! 👀

    • @conaticus
      @conaticus  2 місяці тому

      All open source and ready to play around with 😂

  • @konstantinsotov6251
    @konstantinsotov6251 2 місяці тому

    we had a hackathon where we basically had to implement TF/IDF - also a search engine of a sort, but for files. we did the interface in python and all mathematics processing in C++. It would have been a fun experience if not for the time limit. we struggled really hard, on test data our solution worked faster by an order or two than most other participants, but... we somehow failed on the exam data. we failed fucking IO. and won nothing. I fucking hate hackathons since then. fuck IDF.
    also maybe this happened because i had written 75% of the code, while 4 other members did almost nothing. It was (their) responsibility to handle IO, and mine to handle mathematics and processing. I hate working in teams. I know noone cares but i might as well just burst out all of the rage I have towards that experience. once again, fuck team work, fuck hackathons, fuck my teammates, fuck everything and everyone

  • @lukamajcenic1172
    @lukamajcenic1172 2 місяці тому

    This is just an ad for BrightData. Compared to previous videos very low effort.

  • @kavinbharathi
    @kavinbharathi 2 місяці тому +1

    Not to be the 🤓☝️ guy, but "Jana Vembunarayanan" is pronounced 'Ja' as in 'Jarvis' and 'na' as usual. Just fyi

    • @conaticus
      @conaticus  2 місяці тому +1

      Thank you, I'll do this if I ever pronounce it again 😂

  • @vrljk
    @vrljk 2 місяці тому

    SRBIJAAAAAA

  • @deadshadow759
    @deadshadow759 Місяць тому

    this result dont make any sense xha... very fast

  • @planktonfun1
    @planktonfun1 2 місяці тому +22

    Still not fast and scalable enough. The result is not even relevant, you made bing not google

    • @LaugeHeiberg
      @LaugeHeiberg Місяць тому +8

      wow really? Im also surprised one single guy didnt manage to make a product rivaling Google

  • @DanskeCrimeRiderTV
    @DanskeCrimeRiderTV 2 місяці тому +2

    how is this impressive? Of course it's gonna be faster. You aren't querying billions or even trillions of web pages unlike Google? So this search engine isn't even faster than Google...

    • @conaticus
      @conaticus  2 місяці тому +2

      It wasn't meant to be impressive it was meant to be informative and entertaining 👍

    • @DanskeCrimeRiderTV
      @DanskeCrimeRiderTV 2 місяці тому +2

      @@conaticus your thumbnail implies it is faster than Google. And I believe the original title did too.

  • @avi7278
    @avi7278 29 днів тому

    You need to learn how to sync up your audio and video.