Turning My AWFUL Code Into Professional Code
Вставка
- Опубліковано 17 тра 2024
- Start the FREE Software Development Introduction Course with CourseCareers Now: coursecareers.com/a/conaticus...
Course Careers Testimonies: @coursecareers
Project Repository: github.com/conaticus/search-e...
Discord: / discord
Github: github.com/conaticus
Twitter: / conaticus
Join this channel to get access to perks:
/ @conaticus
0:00 Intro
0:48 Project Showcase
1:27 Indexing 1 Million Sites
3:28 Keyword Analysis
5:15 Time Complexity Optimisation
8:05 Importance of Performance & Low Abstractions
11:22 Making Plurals Redundant
13:12 More F*ckery
15:11 Outro - Наука та технологія
Start the FREE Software Development Introduction Course with CourseCareers Now: coursecareers.com/a/conaticus?course=software-dev-fundamentals
I always find it interesting seeing people look back on their old work at their own work, it's impressive how far you've come!
Thanks so much
the 6 month effect is fr crazy
tip: you can use something like "Levenshtein distance" to calculate how different two words are (how many letters you have to remove, add, or change for it to be the same) and you can then include that in how the page is ranked, this way you can have an easier way to deal with plurals than just using a huge dictionary. This is because something like "elephant" and "elephants" have a distance of 1, so I you can set some cap, at for example 2, and any word with a distance of 2 or less will be included, and yeah, you could end up with unrelated words, but if you limit how far away a word can be, it would lower the amount of unrelated words, you could also modify the algorithm so that removing/adding a letter is considered to be closer to the original word than replacing a letter.
Fuzzy Searching... Yes!
Caching... Absolutely Yes!
You should learn about Vector Embedding, data clustering, and sharding...
the best quote in the world goes to conaticus with the quote of "More fu*kery".
I basically go back and build the same project every year, and every time its incredible to see how far Ive come. Definitely recommend this to any developers out there as a confidence boost at the very minimum
this channel is a gem.
Fun fact, if you go to any random Wikipedia page and click on the first link in the article (not a redirection/disambiguation links or any pronunciation links or any wiktionary links) you will most likely eventually end up on the page "Philosophy", and then end up in a loop seeing Philosophy forever. (there are a few pages that will lead to their own loops, but most pages lead to Philosophy)
The reason this happens is because the first link (that isn't a pronounciation link) always leads to a more general page, so if you go to a page about a town, you'll get more general location of the town, etc, and Philosophy is the most general topic you can have.
(Gets on Wikipedia)
Oh my gosh it worked. I can't believe it.
Tip: look into pre-processing steps like stemming or lemmatizations. For more fancy algorithms maybe look into tf-idf, bm25 or something like lsh
5:55 A set is just an array with hashes, which you have to go through to get the hash you're looking for, so it's still an O(n) complexity, but it allows you to use binary search that gives an O(log(n))
Well acshually
A set is not just an array. It can either be a heap or a hash table.
A heap is a binary tree, lookup and insertion is O(log n)
A hash table is kinda like an array, but you calculate the index in constant time with a hash function, so both insertion and lookup is O(1). Hash functions are costly in time, however. That constant time is constantly big
Either way, looping through a title of 10 words or less is much more performant than transforming it into another data structure and looking up on it. Asymptotic is not everything
I did/ am doing a db search engine using postgres currently, it's been fun. The most optimal way is to use a text search vector and match by keyword, only issue users want accuracy within a single letter rather than a single word or trigram. I've basically got to the point of using pg_trgm and similarity matching with the ILIKE operator. Then I made a query creator function for each table in our database (30+ tables) so in the future it can easily be extended. Coupled that with some GIST indexes, debounce time and Redis for caching on the backend... So far it's been performative thank god. Probably done a million things wrong, but as a junior on my own with this one it's the best I can do lol
3:48 even the correction is wrong?
should be `if (occurances
5:56 Hashsets have a time complexity of O(log n) right?
"For HashSet, LinkedHashSet, and EnumSet, the add(), remove() and contains() operations cost constant O(1) time thanks to the internal HashMap implementation."
@@piroliroblack1219 Yea sorry, I should have said specifically the has() method is O(log n)
@@mediumdifficulty1875 has() is the same as contains(), I'd think.
Also, these time complexity discussions never consider memory latency, which is quite a big flaw imo.
By algorithmic complexity, it's O(1), by memory O(log(n)) (the actual curve is like a staircase, and depends on what was accessed previously).
@@mediumdifficulty1875has() = contains()
why didn't you rerecord parts where you added notes?
4:55 this isn't O(n*n) right?
Because we're not looping through `wordOccurances` multiple times, for every iteration through `wordOccurances` we are iterating through each word, so that's like `k`
Admittedly I don't know what time complexity that would boil down to, but definitely not O(n*n), because we're not looping through `wordOccurances` for every iteration of looping through `wordOccurances`
Make a video about it where you do it in rust
4:34 the third loop isnt it O(n) ?
Inside the loop are calls to .includes() which is O(n) in itself, easy to miss 😄
@@conaticus ahh true got it, thanks!
Some comments:
- Do not use `map` with side effecting functions like `set.add(e)` since the semantics of the map method imply that a new array is returned (here you discarded it). Either use `forEach` or use `map`, but use the resulting array as a parameter to the `Set` constructor.
- You can avoid the `words.map(async ...)` by making the lambda return a Promise, and then using `Promise.all` on the resulting array
- You can pipeline `replace` methods on strings, avoiding the `str = ...` assignment
- As a possible optimization, I would change the `.json` db into a plain javascript object and remove the `pluralFor` value; that is use `db = { "": "" }`. You only have ~60.000 keys
- There is no need for `|| undefined` after `return title?.innerText `
There are probably a lot more "details" that could be cleaned up but that's what I found in a couple of minutes
what’s your vscode font?
JetBrains Mono :)
building a search engine is already waaaay out of my league to do, i wouldn't even bother having a bad code
I use the finally keyword when I don't care about catching
Feel your pain here... Sometimes I wonder how the code I write is even capable of running.
Edit: 2:54 I feel the same about finally blocks. Perhaps someone, someday, could actually tell us the real reason they exist. Perchance.
@conaticus Finally blocks run after either the try block or the catch block, whichever path the code ends up taking. It's used to deduplicate the clean up code that needs to run to whether the code in the try block throws an exception or not, such as closing a file. It's fairly useful when the catch block throws another exception for the calling function to handle, but resources need to be cleaned in both the try and catch blocks either way, and since throwing an exception means that the rest of the function will not run, therefore the resources will not be cleaned up if not explicitly done so in the catch block. In many modern languages, this feature has mainly been replaced by the defer keyword, which is more versatile and makes sure clean up code runs whenever the functions returns, whether normally or with an exception.
Code in the finally block runs even if you return inside the try or catch block
@@niels.m But the code after the both blocks will run either way.
@@apoorvaaditya9491So you're saying a finally block also catches exceptions in the catch block?
@@ExediceWhyNotYes. And if for whatever reason the flow breaks (return, break, etc.) it also gets run (though you can probably refactor these in almost all cases so they happen outside the try catch). Usually for things you want to really make sure gets run, like closing sockets, files, database connections...
What happened to the unhinged conaticus
he'll be back for search engine attempt 2
@@conaticusDo we get to see you throwing your wallet out the window again?
O of N? What? What's that?
Its big o notation which is a formula that says how long a function will take to execute based on the input. (im not 100% sure but you can google it)
O of n, or O(n), is a way to describe the Time Complexity of an Operation (O) in relation to the number of data (n) you are parsing.
O(n) means that the time taken scales linearly in proportion to the number of data.
e.g. while (let i = 0, i < array.length, i ++){ console.log(array[i]) } --> this does 1 Operation per element in the set.
If you have 2 elements in the array, this does 2 operations. If you have 100 elements in the array, this does 100 operations. If you have 1,000,000 elements, 1,000,000 operations
O(n^2) means that as the number of elements increases, the number of operations increases exponentially
O(n^2) would do some operations per operation per element in the array.
e.g.
while (let i = 0, i < array.length, i ++){
while (let j = 0, j < array.length, j++){
console.log(array[j])
}
} --> For each element of the array, this prints each element of the array.
If you have 2 elements in the array, this would do 4 operations
If you have 10 elements in the array, this would do 100 operations
If you have 100 elements, it would do 10,000 operations.
If you have 1,000,000 elements, it would do 1 trillion operations
Baguette
OH YEAH HEARTED COMMENT WHOOOO RAHHHH
first uwu
2:54 ok, I'm out
Why do you look like if Ted Nivison and Mr. Bean had a kid
At 6:30 you changed what the code does, and probably worsened the performance. It's probably more performant to loop over a 25 characters string than that O(1). Hash functions are reaaally expensive. And now you're looking for exact matches of the word. Where before the code would match "paint" with "painted", now it won't
This video about performance is great, and completely 180ed how i see these things: ua-cam.com/video/5rb0vvJ7NCY/v-deo.htmlsi=qSKlTWOVoAxvn8We
ok
Saw xnxx 😂
saw ph
You can't make JS code professional. Just saying..
you need to learn some sql brah... stop using prisma or other ORMs
Why? Less sql => cleaner code
@@nixt1247 but knowing SQL can still help you optimize your code because you have better knowledge of what's going on
I just looked back at the code for a React app I made about a few months back, and god it's disgusting