Big O myths busted! (Time complexity is complicated)

strager

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 2 жов 2024

КОМЕНТАРІ • 645

@strager_ Рік тому ⁺⁸³
Source code of all of the techniques (for educational purposes):
github.com/strager/big-oh-lines
Corrections:
19:23 I show vec![0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 12, 12, 12, 12, ...], But this is incorrect. The Vec items should be line numbers, not offsets. (The keys are the offsets!) So the Vec would actually look like this: vec![1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, ...]. (Thanks to @polyfoxgames9006 for pointing this out.)
Pop quizzes:
04:24 average time complexity? answer at 18:34
09:42 another way to preprocess? answer at 19:09
12:23 why does binary search 2x? answer at 20:14
14:17 time complexity of SIMD algo? answer at 20:47
Lessons:
01:13 1: Try it!
03:28 2: Check your loop bounds
03:46 3: Stress test your algorithms
06:09 4: Big O is not the full story
07:42 5: O(n) ≠ O(L)
09:17 6: Preprocess your data
12:09 7: Big O is approximate
13:53 8: Big O is for big inputs
17:44 9: SIMD is awesome
@C0ckfuck Рік тому
I think that you should mention that big O notation really is really nothing more than a set of functions of a "certain shape", and that the shape of your expression measuring what you want (running time, space complexity, number of steps etc.) is in that set if it is asymptotically dominated by functions in that set.
More importantly, I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm over a distribution of inputs as if it is says anything meaningful - since how do you define the input distribution, and what does it tell you? Nothing at all for a deterministic algorithm.
You can flip the picture and consider an algorithm like the one you describe but where its input is considered a cyclical buffer (just start over when you reach the end...). Then you could have it pick an index uniformly at random and do average case analysis based on the random choice. This is a good example of both probabilistic analysis of running time and of the advantages of a randomized algorithm.
@strager_ Рік тому ⁺²
> I think that you should mention that big O notation really is really nothing more than
Plenty of other videos explain it this way. I wanted to explain it a different way which I think is more intuitive.
> I think it is quite harmful to bring up the notion of the average case runtime for a deterministic algorithm
I brought up average time complexity after talking about best and worst case. I wanted to show that, given a best case of O(1) and worst case of O(n), average case is *not* necessarily 'in the middle' of best and worst. Average case time complexity was not O(n/2) or O(sqrt(n)) or something; it was just O(n).
@C0ckfuck Рік тому
@@strager_ But wait, O(n/2)=O(n). I think intuition is fine, but I think it's important to mention that that it is not the whole picture.
Regarding your use of average case analysis, my point is that, sure, you can perform a thought experiment - but I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define. My concern would be that viewers would be inclined to use uniform distributions over input spaces when they see it done this way when this is not mentioned. Anyways, take it for what it's worth. People seem to like your video, and so do I.
@kameronbriggs235 Рік тому ⁺²
I'm primarily specialized in graphics programming & self taught. Got major ADD issues, and can't even remember the months cuz what's the point?
Your formatting is perfect. It keeps me entertained, and you're good at speaking, all while teaching and representing ideas in simple ways (: sick af dawg, dis gnarly.
@strager_ Рік тому ⁺²
> I think it is important to mention that it only has any merits when comparing algorithms w.r.t. a "meaningful" distribution and that it is very hard to define.
Agreed. I didn't make that nuance clear in my video.
@moe_31415 Рік тому ⁺²⁰⁹
Big O is not bad nor a lie, you just have to understand it. It does not tell you how fast an algorithm is. It only tells you how it scales with respect to its input size
@ApesAmongUs Рік тому ⁺²¹
Yea, it took me a moment to understand what he was saying in the screen shot, since it never dawned on me that anyone had ever believed the simplified (and therefore incorrect) "myth" as he was using it.
@Garentei Рік тому ⁺³²
The whole video is a strawman take on Big O. Debunking and correcting claims nobody actually believes, and if they do, they are misunderstanding it.
@ClaudioParraGonzalez Рік тому ⁺⁷
this video proves that youtube is full of people that teaches from ignorance.
@Meneer456 Рік тому ⁺⁵
C'mon guys, everyone uses some way to lure people to watch their video.
All in all, great video! A lot can be learned from this video, I think you'd be a great teacher if you aren't already.
@jordixboy Рік тому ⁺⁴
@@Meneer456 yeah, why people so toxic and calling him strawman huh? bad day? Why dont you get in front of a camera, lets see how that goes.
@jacklegate3324 Рік тому ⁺²¹⁴
The biggest thing to take away is that big O is not equivalent to the actual performance, it is an upper bound for the growth rate of an algorithm. When comparing algorithms look at the behavior when you double the amount of input.
@louisrobitaille5810 Рік тому ⁺³
Usually it becomes way more obvious if you decuple it instead of just doubling it 😅.
@KingJangOng Рік тому ⁺²
yea everyone seems to forget this
@tear728 Рік тому ⁺¹⁷
Isn't that the entire point of big O lol how could anyone forget this
@KingJangOng Рік тому ⁺¹⁰
@@tear728 they never actually learn what it is
@EthanBradley1231 Рік тому ⁺⁹
@@tear728 I can't remember who it was but I recently saw a video where someone said they almost failed their Google interview because *the interviewer* didn't understand this. The question was something like "Algorithm A runs in O(n), algorithm B runs in O(logn). What's faster?" They correctly answered that B is better asymptotically but A might be faster for small inputs, and the interviewer insisted that was wrong.
@skrundz Рік тому ⁺²⁰⁶
This is like how they discovered a new “efficient” multiplication algorithm for large numbers but the constants are so huge it’s only faster once you’re multiplying numbers over 4.5 trillion bits long
@strager_ Рік тому ⁺⁴⁷
Yup! Luckily, I haven't needed to deal with data sizes that big. 😅
@RepChris Рік тому ⁺⁶⁰
Yup. Those "galactic algorithms", as theyre called, dont really have any direct real world use; they however are useful for the theortical part of computer science such as by showing that certain bounds can be broken, or by giving a framework of techniques by which practical algorithms can be newly developed or improved further. Theres also the chance that at one point well have to deal with such huge inputs, but as with abstract mathematics the usual way they become useful is by laying the groundworks.
@kuhluhOG Рік тому ⁺⁹
@@RepChris probably if you want to have highly accurate (as accurate as Quantum Physics let's you be) Physics simulations, these can be useful
@RepChris Рік тому ⁺⁸
@@kuhluhOG Yeah, but that falls under the "we might get to that amount of input data at some point" category, as we currently very much do not use even remotely the amount of input in any application to make a galactic algorithm faster; that is somewhat tautological since thats a part of the definition of what a galactic algorithm even is.
@kuhluhOG Рік тому ⁺⁴
@@RepChris it was more of a comment on your sentence "Theres also the chance that at one point well have to deal with such huge inputs"
@atiedebee1020 Рік тому ⁺³²²
This is one of the greatest programming channels, keep up the good work!
@superspartanman4480 Рік тому ⁺⁸
The greatest
@wernersmidt3298 Рік тому ⁺²¹
Every person I've seen using comic sans in a presentation is an absolute beast. Like you have to earn your right to use it.
@strager_ Рік тому ⁺¹⁹
@wernersmidt3298 It's an honor to be compared to Simon Peyton Jones!
@fredoverflow Рік тому ⁺³
@@strager_ So... are you gonna make a Monad tutorial next? 🙃
@fireballs619 Рік тому ⁺²⁸
I'm a physics PhD student who has to code all the time despite having taking very few formal courses on it. I find your videos super helpful and have been learning a lot!
@43chord Рік тому ⁺¹
Let me guess, fellow HEP enjoyer?
@fireballs619 Рік тому ⁺³
@@43chord cosmology :)
@mastershooter64 Рік тому
@@fireballs619 what do you do in cosmology?
@fireballs619 Рік тому
@@mastershooter64 CMB analysis mostly
@randolphbusch7777 Рік тому ⁺⁷
The reason you don't need the log base is because you can convert any log to any other log using a constant factor - which is ignored in the O-notation.
log_e(t) = log_2(t) / log_2(e)
log_2(e) is a constant. In O-notation, log_2(t) and log_2(t)/C are the same, which means log_2(t) and log_e(t) are the same.
@qubyt Рік тому ⁺⁶⁰
One of, if not the best big O notation video out there. Bravo and thank you!
@sepulous Рік тому ⁺³⁶
It's important to keep in mind that Big O is for classifying the scalability of algorithms, not their performance in a given context. Big O analysis is not always a substitute for a benchmark.
@kasuha Рік тому ⁺³⁸
O notation might be for big data but different O complexities have different Ns that count as big. N=50000 may not matter for difference between O(N log N) and O(N log log N) but it is big enough to make a difference for O(N^2).
I still have PTSD from times when I spent hours upon hours waiting for my Windows to update because someone in Microsoft deep down the history decided that number of updates will never be big and used some kind of bubblesort to sort them.
@strager_ Рік тому ⁺⁹
> different Ns that count as big
Agreed! But the notation doesn't give you *any* clue where this point might be.
@levsonc Рік тому ⁺²
Actually, there are still some places in Windows like this, that someone have 10 minutes opening Start menu in some conditions. Or Explorer starting to slow down when doing something like deletion in a folder with thousands of files. (Explorer meaning desktop as well.)
@gianni50725 9 місяців тому
You know, I think it's dumb they used bubble sort (at least use insertion sort!), but I really have to wonder how many updates you missed if bubble sort was enough to slow down the time to update significantly.
@NikitaKaramov Рік тому ⁺¹⁹⁵
When I got taught big O at the university, many people (including me) failed to comprehend the actual meaning of it (some still don't). One could replace, like, 2 lectures worth of material with this one video. Keep up the good work!
@strager_ Рік тому ⁺¹¹
Thanks!
@marloelefant7500 Рік тому ⁺³³
I personally always found O notation one of the easiest things in theoretical computer science.
@bluepancake975 Рік тому ⁺⁶
at our uni we weren't even taught what big O actually means (what is explained in this video), instead we were taught to blindly follow all the myths...
@nexdemise4182 Рік тому ⁺⁴
The easiest way to understand big O in my opinion is to not just focus on the most significant term, but rather write out the entire time complexity, so instead of O(n^2) it'll be something like 4n^2+80n+20n*ln(n)+35. That kinda helps understand that the constants do matter and can overpower the n and which term is the most significant for a given n. Like O(n) isn't better than O(n^2) if O(n) is 8,000,000n and O(n^2) is n^2 unless n > 8,000,000.
Like take this scenario, you have some counting function that relies on sorting and it's O(n*ln(n)), is it worth the effort in switching to a priority queue to get that O(n)? That depends on what you expect n to be. If this stuff will involve at most a thousand items, probably not unless there's some serious time critically going on (which generally there won't be), if it's going to involve billions then yeah.
@theqwertycoder_alt Рік тому ⁺⁴
Big O notation says that, given a function f and a function g, where f(n) is in O(g(n)), there exists some constants c and k such that f(n)≤c*g(n) for all n>k. I never found it hard to understand or formalize, but I can see why it seems hard to understand. In general, big O notation only matters when you have differing time complexities and large values of n. It tells you that, as n grows larger, a function that is not in O(g(n)) is going to necessarily be larger than a function that is in O(g(n)).
@Yotanido Рік тому ⁺⁷⁸
Didn't learn anything new about Big O, though the reminder is nice. What I did learn, though, was how easy it actually is to use SIMD, at least for simple applications like this.
I've tried looking into it in the past and just got super confused.
I have to say, a video specifically on SIMD would be awesome.
@strager_ Рік тому ⁺⁴⁴
> a video specifically on SIMD would be awesome.
My previous video on hash tables also contained some SIMD stuff. That hash table SIMD was more advanced, but I didn't walk through the code. ua-cam.com/video/DMQ_HcNSOAI/v-deo.html
I do want to share more SIMD knowledge in future videos. 👍
@Dorumin Рік тому ⁺⁶
The portable SIMD struct that nightly rust has is awesome.
@treelibrarian7618 Рік тому ⁺¹
@@Dorumin yeah, intel should've named the vbroadcast intructions "splat" instead lol
@Dorumin Рік тому
@@treelibrarian7618 I was talking more about the structured approach to lanes and its ops over intrinsic instructions on small buffers, plus the nice Into/Froms. The names are funny though, what do you think about the swizzle trait? :P
@andremaldonado7410 Рік тому
@@strager_ I would love to see it, something I'm very interested in learning more about
@AnimeKing-m2t Місяць тому
You are not just a good instructor but also a good video creator/editor.
Explanation: 10/10,
Video Quality for creating graphs and all : 10/10,
Depth of topic: 10/10.
AMAZING...
@trudyandgeorge Рік тому ⁺³
Mate, you've got the nack for educating. I'm loving the channel.
@bibliophile5863 Рік тому ⁺⁹
Obligatory comment noting that in the manually unrolled loop, the code may reach outside of the array bounds, resulting in a crash, if you're lucky. This is still one of the best complexity explanations I've seen.
@strager_ Рік тому ⁺¹²
> the code may reach outside of the array bounds, resulting in a crash
Yes, that is true. I didn't explain this in the video, but when generating the table, I added 7 dummy elements to prevent out-of-bounds issues. github.com/strager/big-oh-lines/blob/537e6a048d10c89ee6ea421c48c4ecc7229d2d01/bol_linelinear8/src/lib.rs#L24-L26
@k6l2t Рік тому
@@strager_ alternatively you can just modulus the SIMD size, and do the non-SIMD naive stuff on the remainder, if you don't want to do array resizing nonsense
@angryanonymous4082 Рік тому ⁺²
This was randomly recommended to me, and it was one of the best explanations on big O I've seen
@andytroo Рік тому ⁺¹⁶
my favorite "non linear" algorithm is manipulations within the Disjoint-set data structure - it has an average and worst case of O(a(n)) where a is the inverse Ackermann function -- functionally less than 5 for any number you can write.
@minerscale Рік тому ⁺⁴
Wow that's amazing hahaha. At that point it may as well be constant time. But no, an infinite input size will have an infinite lookup time. Incredible.
@riverm5889 Рік тому
I think they use the inverse ackermann function for amortized analysis which basically just treats it as a constant :)
@NaourassDerouichi Рік тому ⁺²
I've followed the making of on Twitch. Glad this is out!
@최강재-y9c Рік тому ⁺²
Everytime I watch your vid, everytime I get a tons of insights from you.
Thanks and please keep going!
@SteinGauslaaStrindhaug Рік тому ⁺⁸
In my experience Big O is very rarely a useful metric in web frontend code, since frontend deals mostly with interactivity the number of "things" is relatively small and low latency and minimising network traffic is almost always more important than number crunching efficiency. In fact for almost all DOM related things; if the number of objects is even barely approaching a size large enough for Big O to be meaningful; the DOM tree is probably so large it will kill most mobile browsers for excessive memory use. Also since your code runs on any random client you don't control how fast it's hardware is so most number crunching is better done on the server (in which case it's the backend developers problem 😉)
@strager_ Рік тому ⁺³
This is a great point!
@vishwanathbondugula4593 Рік тому ⁺¹⁹
Hey while we are traversing through the string, as soon as we hit a
char we can store a counter which increments for every line and if we want to show the line number just use the counter variable which will not take any time
@NamelessStudiosInc Рік тому ⁺⁵
definetly possible, but usually in compilers there is a single pre processing step before the representation of the code changes. Compile time errors, as far as I know, are usually not found until the second step of compilation
@strager_ Рік тому ⁺³⁴
> we can store a counter which increments for every line
That is true, but in the real world it's usually more complicated than that.
Context: A compiler in practice does not always report errors immediately. It might need to parse some code then report an error in the past. (For example, Rust cannot report a use-of-undeclared-function error until it reads the entire file because the function might be declared after the call.) Therefore the compiler needs to attach location information to each token and AST node.
In addition to line information, a compiler also wants to report column information. The compiler *could* attach two numbers (line and column) to each token and AST node. But some IDEs want byte offsets, so we might need to store three numbers (line, column, and offset from beginning). That's a non-trivial amount of data to add to *every* token and AST node, so memory usage goes up.
To reduce memory usage, a compiler could attach only *one* number (offset) to each token and AST node, and compute the line number and column number as needed. (Or not compute anything if the IDE just needs a byte offset.) Hence the table-based solutions discussed in the video.
But that's for a traditional batch compiler. Nowadays compilers are integrated into editors. Compilers are given *updates*. LSP (Language Server Protocol) updates communicate line and column information *into* the compiler. Having the line table data structure makes this operation much cheaper.
I didn't discuss any of this in the video because I didn't want to bore people who aren't interested in compiler frontends. But it's a good discussion. =]
@polyfoxgames9006 Рік тому ⁺⁷
For the array lookup, wouldn't you store the line number at each position not the offset?
@strager_ Рік тому ⁺⁴
You're right! I messed up the explanation. 🤦‍♀️
@AlphishCreature Рік тому ⁺¹
I really like how this video explains how better big O doesn't necessarily mean better performance for data one's working with, while at the same time it doesn't overdramatise things. As video states, big O describes the general shape of performance curve with increased data, but it isn't be all and end all. And as SIMD example shows, even if it has the exact same big O parameters as line table, it makes up for it with what I call "small c", i.e. those "insignificant" constant factors like the slope.
Personally, my intuition is to prefer structures and algorithms with better big O parameters for key operations - the idea being that for most scenarios either the data is so small extra constant costs are negligible, or the data is so big that better big O matters. But like most things, it's not an absolute rule and in the end, it takes real life performance testing to get the most complete picture.
@alejandrom.4680 Рік тому ⁺¹
As a CS student, I truly appreciate this way of explaining theorethical computer science. Thank you so much!
@enclave2k1 Рік тому ⁺⁶
Certified strager classic.
@maksimsivyy5684 Рік тому ⁺¹
I was expecting to see SIMD binary search :D
But the video is a great refresher, thanks for it!
Рік тому ⁺³⁰
Great explanation!
One aspect that isn't talked about much is that due to laws of physics the actual *random* access time (in seconds) to a pre-computed table of size n is O(sqrt(n)), not O(1). As a result there are algorithms that are actually slower in real time for large inputs even though they are predicted to be faster by big-O.
There's a great article called "The myth of RAM" that explains it deeply.
@strager_ Рік тому ⁺⁴
In modern computers, are cache effects what you mean by sqrt(n)?
The simplicity of RAM (Random Access Machine) which we normally use for complexity analysis is indeed deceiving. My case study didn't lend itself to an explanation of RAM (or CPU caches), so I kept it simple and focused on other issues with time complexity.
Рік тому ⁺⁵
@@strager_ yes, it's cache but the article I mentioned explains it can't be fixed by making cache bigger because of quantum physics (maximum amount of information in black hole is proportional to its surface area and nothing can be denser).
And yes, I understand that this is introductory video so advanced stuff like this wouldn't fit in there.
@elkeospert9188 Рік тому
If I understood that "great article" correct (I only had a quick view on it) the reason for that O(sqrt(n)) is that memory (RAM) is typically flat - so the double amount of memory needs the double area. If these area is a square when doubling the memory causes that the width of this square is increasing by factor SQRT(2) - and therefore the average physical distance the information has to "travel" is proportional to SQRT(2) of the size of the memory.
As information could not travel faster than the speed of light that really would mean that the access to memory is getting slower the larger the memory is....
On the other side transistor density on chips is still increasing - double memory therefore not mean that the double area is needed - that might will no longer true in the future but for the next years O(1) makes more sense than O(SQRT(n)) for memory access.
Рік тому
@@elkeospert9188 almost, circle is a better representation but when we use big-O we don't write constants, so we drop Pi.
The article has other parts, where the author explains that quantum physics dictates it will never be O(1) and stay O(sqrt(n)). Increased density of transistors helps only by a constant. If you need to process 100PB of data, you can't fit it into RAM and will need to use other storage which will be slower.
@jsgfitz Рік тому
@@strager_ Even ignoring physics fundamentals, if you're using virtual memory the pagetable lookups are O(ln n) so the O(1) claim for lookup tables was always suspect. Of course caches - incl TLB - mitigate this in practice, so long as you fit within it.
@letsaram Рік тому
Your tone and editing make this comfy.
@kennethbeal Рік тому ⁺⁷
Loved your ending! Take that, Rust Foundation! :) Thank you for the Big O lesson as well.
@mikhaililin3033 Рік тому ⁺¹¹
This channel is pure gold. Clear explanations, starting from simple problems and going to more sophisticated ones, step by step, while not focusing on little details for too long and constantly tossing in some new questions for viewer to think about. Thanks for your effort
@dariokartal9453 Рік тому
And zero content fluff!
@aronas1353 Рік тому ⁺¹
I love your videos, your explanation is straight forward and comprehensive.
@VioletCatastrophe Рік тому ⁺¹
As someone with a background in mathematics, this was all... kind of trivial to me. But I see misconceptions about big O notation all the time from programmers so I can imagine this being very much needed. It's really just about limiting behavior, the same tricks mathematicians use to tame infinities, divisions by zero, and all that. It feels absolutely absurd to say it... but the actual specifics of implementation, and real world performance metrics are FAR more important than a cool mathematical trick used for shenangians. Anyway, very good video, and a very needed one to try to push against the absurd hyperfocus on big O time complexity over more relevant discussions about overheads and actual implementation.
@lc9245 Рік тому ⁺¹
I remember there was someone who also made a video about how they were frowned upon by an interviewer for mentioning that Big O is just about how algorithms scales and not comparing how they perform in all cases. It shocked me because it seems almost common knowledge.
@Cubinator73 Рік тому ⁺⁴
7:51 "[...] and, in fact, you have fewer lines than you have bytes"
A file containing only
always has one more line than bytes. In fact, an empty file has no bytes at all, but one (empty) line.
@strager_ Рік тому ⁺¹
Factual.
@dnullify100 Рік тому ⁺³
You are one of the best computer science educators ive seen.
Ive been self teaching CS, being a former mechanical engineer. So far this video, ans your hash table video have filled in large gaps in understanding concretely, as ive dealved deeper into the low level fundamentals
@LeoPartIsntCool Рік тому ⁺²
so basically programmers leave out constants that would tell them the full picture, and then they are suprised that it doesnt give them the full picture
@strager_ Рік тому ⁺¹
Yup!
@PragandSens Рік тому ⁺⁸
Amazing dude.... how do you make those graphics? In case you use Adobe Premiere or any other editing software for it... what software do you recommend for drawing graphics like it?
@strager_ Рік тому ⁺⁵
I used Microsoft PowerPoint and Motion Canvas.
I love PowerPoint. I'm new to on Motion Canvas.
@kuqmua755 Рік тому ⁺⁴
What about choosing algorithm every time checking file size? For small and large files - different algorithms
@strager_ Рік тому ⁺³
That's an interesting idea! Hybrid algorithms are something I should think about more. The idea seems to work well with sorting, for example.
@leddoo Рік тому ⁺³
@@strager_ yeah, it's pretty common in generic sorting algorithms, that can't make any assumptions about the data, but should perform as well as possible in all cases regardless.
for the line numbers, i would prefer the simplicity of only using a single algorithm. we're on the low end of the latency spectrum already. taking 50 ns vs 25 ns to compute line numbers for < 1000 errors doesn't really matter. (it's different for the sorting algorithm, cause some user may want to sort millions of tiny arrays, and that should still work.)
@tiarkrezar Рік тому ⁺¹
It's worth it with almost every divide and conquer-style algorithm due to the constant factors that O(n) notation doesn't show and the way caches improve performance on small datasets.
But you don't just pick one algorithm at the start, instead you keep dividing the problem into smaller and smaller chunks, and you don't stop at the base case of 1 element. You stop earlier, once you reach some minimum number of elements that's better handled by the naive algorithm. Then you use the naive algorithm on those small chunks, and let the main algorithm make the higher-level decisions.
@kuqmua755 Рік тому
@@tiarkrezar it's just another algorithm for choosing better algorithms
@elkeospert9188 Рік тому
@@strager_ Another idea is to "optimize" binary search by checking inside the binary search function if the search range contains less than (8 or 10 or something like that) elements and if so do a linear search.
On the other side.
Calculating line numbers in small files is anyway "fast enough" - even if the normal binary search is a little bit slower than linear search for small amount of lines it does not matter.
@davidkormendi354 Рік тому
Really love your optimization videos and the way you explain. Keep it up my man!
@silvis5749 Рік тому ⁺⁶
Best big O notation i ever watched, thank you very much. ❤
@brecoldyls Рік тому
Really good video! I found this explanation of big o a little unconventional but good. I especially enjoyed your visuals and your dive into SIMD at the end. Thanks!
@Klosterhasi Рік тому
this was a really smart way of making me watch a video on big O notation nice job
@ivanraulsanchezdiaz7499 Рік тому ⁺⁵
I met you on twitch and you even helped me with a terrible code that I wrote, I already knew you were like a genius but what you are doing on this youtube channel surpasses everything!
@TheTyrori Рік тому ⁺¹
Extremely informative and intuitive presentation. Thank you! :)
However, I'll admit I'm not as familiar with SIMD stuff, so the speed at which that was moved through left me with some questions.
Will probably look into that on my own, but I would love to see something on that if it is a topic you'd want to cover in the future too!
@strager_ Рік тому
> the speed at which that was moved through left me with some questions.
Thanks for the feedback!
@alienrenders Рік тому ⁺³⁸
Big O is meant for describing limiting behaviour as it reaches infinity. We never use infinite elements. So there are plenty of algorithms that could have worse Big O notation and be faster because we use fewer elements. I personally never liked Big O notation in Computer Science because it often detracts from practical algorithms. Nice presentation.
On computers, it's really difficult to know what will speed something up sometimes :)
@oscarfriberg7661 Рік тому ⁺⁵
Big O is also meant to describe worst case scenarios. Maybe the worst case is vanishingly rare for your particular use case, so optimizing for Big O might not lead to any speed improvements. Context matters!
At best, Big O can be used as a guidance, but should never replace real analysis and measurements on actual real world data. You lose so much information about an algorithm by trying to describe it with just a few symbols.
@shiinondogewalker2809 Рік тому
@@oscarfriberg7661 even in that case, it's specifically for the case where the input/problem size is sufficiently large
@gillbates2685 Рік тому ⁺³
@@shiinondogewalker2809 people who complain about big O's practical value often have no understanding what it means. Big O doesn't need anything in your program to go to infinity. The information it gives you becomes true for input size above N, what N is, depends on your formula. Could be 10, could be 10^10.
People who think "it only matters for infinite value" remind me of debacle at Microsoft, some "practical programmers" who thought the number of updates will never be "sufficiently large" so they used the wrong sort algorithm leading to millions of people waiting hours, days for windows updates 😂
@gillbates2685 Рік тому ⁺²
@@oscarfriberg7661 big O tells you the upper bound, while big Omega telling you the lower bound. So any argument against big O citing worst case scenario could be reverted into argument for big Omega, now you optimize for lower bound, for large input your algorithm will always be at least as good as the value you optimize for :))
And to many people big O and big Omega and big theta are all "Big O notations", so when they spout "Big O are useless" they mean any asymptotic notations.
@gillbates2685 Рік тому ⁺²
"Big O notation in Computer Science because it often detracts from practical algorithms"
Big O detracts from practical algorithms the same way the ability to count detracts from practical algorithm
There are two stages when you design an algorithm
the theory stage, where you use reasonings in discrete mathematics to derive the right computation then formally prove the time/space complexity (if you skip this stage it's because some smart person already did it for you and wrote it in a textbooks/papers somewhere then other people repeat it into wisdom and eventually it reaches you without you having to open a textbook).
And the implementation stage, where you take into account what the compiler does, what the CPU does.
Big O doesn't care about algorithm or its implementation, it cares about functions f(n), g(n) on (typically) natural numbers. It is a mathematical concept, much like counting, both deal with natural numbers, and both are used to estimate.
Now guess which stage are big O and counting more often used in :))
Probably stage 1, but it does help in stage 2 too, it gives you a direction on which practical things to try when you don't have all the technical details of your machine, what kind of optimization is done on your CPU etc.
because in practice, you don't start out knowing all these technical details. If you already knew like in this video, what kind of sampling distribution your input data have, how fast your algorithm runs on this specific CPU, then why would you go back to look at the more general idea which makes no such assumption about your system? if the aim is to disprove the theory using practical implementation I suggest at least understand the theory first. e.g. big O never implies O(log N) is faster than O(N^2), that's stupid, an algorithm in O(1) is also in O(N^2)...
Or of course you could spend time running your algorithm on billion different input sizes, that's very practical too :))
Now do I need someone who is good at discrete math and algorithm on my team? no I don't. Do I want someone who doesn't understand the basics such as what big O notations tell you... absolutely not, that's a recipe for disaster.
@vyldim3401 Рік тому ⁺²
I know this was just educational, I know this was supposed to showcase Big O and different approaches to programming. BUT if I dont say this, I'll lose sleep over it so here goes. You can just increment a variable by 1 everytime you encounter a newline while compiling. So when it encounters an error, you have the line number ready. Time complexity O( time to reach error ), Space Complexity O(1). You are designing a compiler here, I'm sure you can add a line or two to the code can't you?
@strager_ Рік тому
You would need to store that line number somewhere. Storing line+offset or line+column uses more memory than just storing the byte offset. That memory consumption is multiplied by every AST node, so it's actually O(n) space usage.
Also, the line tables I describe in the video are handy for implementing LSP.
@remirousselet6867 Рік тому ⁺¹
Really nice video!
Maybe one thing I would've added is how we could switch to a different algorithm based on the offset. Like "if (offset is very large) binarySearch() else simdLineTable()"
@strager_ Рік тому
Yup! Hybrid algorithms would give you the best of both worlds (in theory).
@ItzAnameOk Рік тому ⁺³
Banger!
@henrycgs Рік тому ⁺²
very important video. I've seen a lot of novice programmers misusing big O notation... people like to put multipliers inside thinking it means anything, but O(1 billionth * n) really is the same as O(1 billion n). I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!).
@strager_ Рік тому ⁺²
> I think you could also have mentioned that big O is merely an upper bound. this means anything that is O(n) is also O(n³), and anything that is O(log n) is also O(n!).
I think this confuses engineers more than it helps. In the world of computer science this matters, but not so much when we're talking about code.
@henrycgs Рік тому ⁺¹
@@strager_ yeah, I guess. coming from a cs background myself, I'm a bit too used to big os, little os, big omegas, little omegas... and theta. which is what most people think big O is.
@DavidSpry Рік тому
The past two videos have been great!
@thefireagen Рік тому
I do not know why I do not find this kind of videos in my TL.
Thanks!
@hamzadlm6625 Рік тому ⁺¹
I love you man, keep on uploading please
@ianglenn2821 Рік тому ⁺¹
Subscribed for two reasons, counting the trailing zeros at 17:21 was clever, this helped me solve a different problem, but also that crab booty haha extra thicc
@aaronbredon2948 Рік тому
One catch with big O notation on current hardware is that current technology and physics limits how big your data can be before you start to have to split your data.
And some of the most efficient big O algorithms have such a large constant that you never get to the point where the algorithm is efficient compared to a more naive solution that allows preloading of data from slower access methods.
@annaczgli2983 Рік тому ⁺¹
Pls post more on YT. Your videos are so interesting.
@viraatchandra8498 Рік тому ⁺⁵
Thanks for this Amazing Video!! I am a fan of your channel, and love the kind of content you have been putting out recently Its really good to see you uploading more often, cant wait to see what you share with us next!
@mendeljacks Рік тому ⁺⁵
How does the compiler know the difference between
from source code and
from a user defined string within the code is there escaping?
@strager_ Рік тому ⁺¹⁸

in a string literal would appear as two bytes: '\' and 'n'.
A newline characte would appear as one byte. (For simplicity, I ignored Windows-style line endings in this video.)
@set.theory 10 місяців тому
I am no expert on algorithms or complexity notation; however, I seem to understand there are other notations than Big O which communicate different growth information.
Would any such notation have provided sufficient information to compare the various algorithms used at relatively small input sizes? If so, then perhaps using these notations more often could help address these common misconceptions with Big O, painting a fuller picture.
In some instances, such as with sorting implementations, it is beneficial to fall back to algorithms with greater time complexity (in Big O notation) as they behave better for small input sizes.
One such instance is merge sort implementations, which can fall back to e.g. insertion sort for small inputs. I believe the Rust standard library adopts the same approach in its std::sort implementation for slices.
I find this topic pretty fascinating and extremely interesting to think about! Thank you for making this video, it is very well-produced.
@hughmcateer3369 Рік тому ⁺²
Great video
@dudaseifert Рік тому ⁺²
I've been able to pass some O(n²) but highly optimized code in some contests that were expecting O(nlogn) algorithms. It's all about finding the worst case scenario
@МихаилБатищев-ю7у Рік тому ⁺²
Really good video. Thank you!
@fridgedigga Рік тому ⁺⁵
Really loving the past few videos. Learned a ton. Thanks.
@saeedgnu Рік тому ⁺¹
I was hoping to see the combination of binary search for large L and SIMD for small L and how the graphs compare.
@strager_ Рік тому
A hybrid algorithm is a great idea. I'll keep it in mind for next time!
@Warwipf Рік тому ⁺²
I didn't know there were misunderstandings about this topic. I guess my uni did a good job (thanks Prof. Schöning
@d0ener4life Рік тому ⁺²
The worst algorithm I ever wrote had a complexity of 4^n. It worked instantly for anything less then 10, took 30 seconds for n=12 and when it hadn't finished after several minutes when I tried n=13, I decided to abort and look into writing a better algorithm.
@MrPixifan Рік тому ⁺²
Would it be worth it to SIMD the binary search, or would the decrease in spatial locality outweigh the benefit? I bet for small files, it's not worth it, but perhaps for large ones it is.
@NamelessStudiosInc Рік тому ⁺³
how would you go about simd the binary search? you can't predict the next element to be compared, I guess you could compare to many different split points but at that point it wouldn't be binary rather n-ary
@strager_ Рік тому ⁺¹
Do you know of a SIMD implementation of binary search?
@SimGunther Рік тому
@@strager_ Maybe for searching multiple trees simultaneously in a branchless way, but I'm not sure what use that'd be for.
EDIT: DemoFox has a blog on this from 2017
@MrPixifan Рік тому
@@strager_ No. The thought was this:
For array of length n and needle x:
1. Calculate n/9, 2n/9, 3n/9, 4n/9, 5n/9, 6n/9, 7n/9, 8n/9
2. Compare x to the above items
(I have no experience with SIMD, hardly knew about it before your video. Maybe division is not a possible SIMD instruction? Something else wrong?)
@strager_ Рік тому ⁺³
@MrPixifan It sounds like you are proposing an n-ary search (e.g. 8-way instead of 2-way). It's certainly doable, but I have heard that a 4-ary search is less efficient at the scale we are talking about in the video: github.com/scandum/binary_search#quaternary-binary-search
Perhaps more experimentation is needed.
@ssmith5048 Рік тому ⁺⁴
Fantastic presentation and explanation! Although I watched you put this together, seeing the final version was even more engaging than I expected. Great work and thank you!
@MaxCoplan Рік тому ⁺³
I definitely busted my Big O 😮 watching this video. Amazing!
@masheroz Рік тому ⁺¹
For the SIMD example, you explicitly used a simd library. Does C++ or other languages have similar options?
@strager_ Рік тому
There has been talk of adding SIMD to the C++ standard library, but I don't think it's in any version yet.
In C++, I wrap the x86 and ARM intrinsics myself. There are probably open source libraries out there which do this for you.
@hawks3109 Рік тому ⁺²
big O is a theoretical bound that nulls out the constants. You should take big O with a grain of salt and learn the trade offs. For example, bubble sort is actually faster than quick sort for low N values. This is due to the overhead of the constants (constants caused by recursive function calls mostly). For a sufficiently large N, the time saved by the algorithm itself overcomes the time lost by doing the function overhead, but as N get's smaller, you lose those time savings since the time saved by the algorithm is diminished. There is also worst case situations that come into play. It's all about what you know about your situation. In general big O is a great starting place, but context of your data ALWAYS matters.
@tandlose Рік тому ⁺¹
Like how good is this channel
@r75shell Рік тому
In one algorithmic task which I've prepared (idea of task is not mine), in analysis which I wrote it was time complexity O(9^n) which I also proven. But! On inputs up to n equal to 10^5 it was running within 2-3 seconds :)
@strager_ Рік тому
Yikes.
@ttrss Рік тому ⁺⁴
great channel, these are really interesting . Also why is it so rare for the compiler to optimise with SIMD instructions, are those comparison blocks just not a common code pattern?
@strager_ Рік тому ⁺¹
> Also why is it so rare for the compiler to optimise with SIMD instructions
In this particular case, there are a few reasons why autovectorization is a problem:
* The compiler cannot reasonably assume that the Vec's length is a multiple of 8. Supporting any Vec is complicated.
* Because of the early return, and without profiling data, the compiler might think that it's common for the *first* element to return, therefore SIMD isn't worth it.
* The compiler is written by humans with limited time, so the autovectorizer is implemented for common code patterns. I suspect my algorithm isn't common. (Maybe it's common with '==', but not with '
@HyuLilium Рік тому
Thanks to this video I've learned what Big O notation is, and what pitfalls to look out for. But I've not learned how to figure how to do what you did in this video. I basically only learned about the algorithms specifically in your video, which you have measured and calculated worst case, best case and average case, and used file size etc. I wouldn't know how to do that hard work, a great video would be on how to learn to do that hard work you did here in this video. Otherwise we still only rely on big O for algorithms that the internet lists for us.
@strager_ Рік тому ⁺¹
> But I've not learned how to figure how to do what you did in this video. [...] I wouldn't know how to do that hard work
What I did: Throw data at the algorithm, run it, measure the time, and plot a graph. Look at the graph's shape. From the graph, you can reasonably estimate the time complexity.
There are surely other videos or tutorials which teach how to analyze algorithms and compute time complexity. I don't have any particular recommendations.
@m.m3633 Рік тому ⁺¹
Grrate video! How do you make those smooth animations and transitions when code changes? It makes it much more intuitive
@strager_ Рік тому
I used Microsoft PowerPoint morph transitions.
@m.m3633 Рік тому
@@strager_ Thank you
@Starckoo Рік тому ⁺⁵
Quality content! 😀
@SurfingOnTheGuitar Рік тому ⁺²
Great video!! btw is there a mistake in 19:26 where instead of '12' in the slice it should be '2's representing the second line?
@strager_ Рік тому ⁺¹
Yup! I mentioned this issue in the 'Corrections' section of my pinned comment.
@astralfoxy1787 Місяць тому
3:04 - Slope is not zero, slope is 1. If it is zero - the "y" can be only zero. And be is not 1, obviously, cause "b" is "y intersection", but "y" intersect in 0 coordinate, what mean the "b" is 0.
@cw2448 Рік тому ⁺²
This video is an excellent explanation of Big O notation and what it really means for our algorithms. Great stuff here!
@BeccaYetiammo Рік тому ⁺³
Wow, I’m actually planning on optimizing a batch processing API for tomorrow and I came across this video. Learned new insights that will help me make decisions on optimization. Thank you for this! 🙌
@padraighill4558 Рік тому ⁺²
the base of the logarithm doesn't matter because it is just a multiple and we are already omitting coefficient!
also as a mathematician I would have loved to see Landau's Big-O notation in its actual form.
we say that f(x) = O( g(x) ) as x -> ∞ if
there exists an M > 0 and an N > 0 such that
if n > N then f(x)
@strager_ Рік тому ⁺¹
Yeah, I didn't explain clearly why we don't care about the log base.
@ryanvs Рік тому ⁺²
Nice video. The only nitpick is disregarding the read latency for the file. It seems like you are only considering the comparisons and not the read latency. What if the file is local vs network vs wan vs web? Big O really kicks in when you add latency.
@strager_ Рік тому ⁺²
Throughout my video, the file was already read into memory. I didn't want to make the video too complicated by talking about cache misses, page faults, etc.
@mash808 Рік тому ⁺³
This is probably the most helpful time complexity video I've watched to date (and I've watched a lot). Thanks so much.
@user-vx1vl1ci1e Рік тому ⁺²
This was an absolute treat to find in my notifications after work. Thank you
@parthsharma7366 Рік тому
U earned a subscriber!!🤟🔥🔥
Keep growing!!😊
@casperes0912 Рік тому ⁺²
Pedantic point. Best case would not use O. It uses Θ. So in that first example, the O(n) is fine but O(1) is wrong. It's Θ(1)
@strager_ Рік тому
> Best case would not use O.
Best case time complexity can be described with O or Ω or Θ or whatever you want. So can the worst case and the average case. O(1) is appropriate here.
@dorcohen3522 Рік тому
You cover basic approach but dropping out some important aspects
1. You can and should include coefficients in your analysis if possible, sometimes it makes all the difference. It's kind of a misconception that algorithms of the same magnitude are somewhat equal. It's very clear from the definition - the big O means that we can bound one function with the other up to an integer coefficient, but this coefficient can be very large. From a heuristic point of view, this matters alot.
2. We also might care about constants if they are big enough, and we don't omit them. Also in more theoretical approach you can define your own "constants" although they are not really constants in a global sense, but only for a scenario.
3. Worst case analysis is one thing, but we actually might care about average case and best case as well, mainly average case.
4. The underlying microarchitecture / memory hierarchy etc matters a lot. You can have the best algorithm which falls short compared to a naive one which is easy ti implement and verify but utilized vector operations etc. Or exploits memory related stuff. I feel this aspect is being ignored
@strager_ Рік тому
> 1. You can and should include coefficients in your analysis if possible [...]
>
> 2. We also might care about constants if they are big enough [...]
Agreed, but in this video I wanted to focus on RAM big O, not other models.
@strager_ Рік тому
> 4. The underlying microarchitecture / memory hierarchy etc matters a lot. [...] I feel this aspect is being ignored
I did want to discuss this. However, it felt awkward when most of the rest of the content was focused on big O. To prevent confusion, I left out discussion of cache effects, branch predictors, etc.
The video did discuss vector operations (SIMD) though. I did this to show how an O(n) solution can be faster than an O(log n) solution for small enough n. (The constant factors for binary search were small enough at
@dorcohen3522 Рік тому
@@strager_ I appreciate the response. You have fair points
@oscarfriberg7661 Рік тому ⁺²
When talking about Big O notation it's also worth to point out that it's only describing worst case input. It's possible that an algorithm can have a bad Big O, but the conditions for it to hit the "worst case" time is rare in practice.
One example is the Simplex algorithm used in Linear Programming. It's Big O time complexity is exponential. If you're following Big O blindly, you might be led to believe this algorithm incredibly slow, and you should look for some faster algorithm. And a polynomial time algorithm exists for Linear Programming. Better use that one instead!
However, if you actually benchmark both algorithms you will find that there's no clear winner. The exponential time algorithm is even better in many cases. Why? Turns out the worst case of the Simplex algorithm is rare. Actually, it's actually performing polynomially on average.
So in this case, following Big O blindly is incredibly misleading. This is quite understandable: you will lose a lot of valuable information about a complicated algorithm if you try to describe it with just a few symbols. Use Big O as a guidance, but it should never be a replacement for actual benchmarking. I've worked with many programs that are quadratic in theory, but performs linear in practice using real data.
@strager_ Рік тому
> When talking about Big O notation it's also worth to point out that it's only describing worst case input.
This is incorrect. You can talk about worst case, average case, and best case independently using big O notation.
For example, bubble sort has best case O(n) time complexity (comparisons).
> you will lose a lot of valuable information about a complicated algorithm if you try to describe it with just a few symbols.
Agreed.
@oscarfriberg7661 Рік тому ⁺¹
@@strager_ That’s true. Big O as a notation is quite general and is not limited to computer science. My point is that worst case analysis, which Big O is often associated with, can often be misleading.
Using Big O for best case can also be misleading. Search algorithms like binary search typically have O(1) best case.
Using Big O for average case can be tricky. It all depends on which average. An algorithm might perform one way if the input is uniformly distributed, but another way if it’s geometrically distributed. Your average case might not be the same as my average case. It’s not always trivial to decide which probability distribution best describes your real world input.
@MaxCoplan Рік тому ⁺³
Great video! I'm curious about some of the code and would like to take a look at the benchmarks. Is the code available somewhere?
@strager_ Рік тому ⁺¹
Oops, I wrote a pinned comment but forgot to publish it! Thanks for the reminder.
@protosssc790 Рік тому ⁺²
Thank you for the helpful information learning these subject alone sometimes feel depress but watching contents like this gives me hope thank you again!!
@guiAI Рік тому ⁺²
Normally don't comment. Normally find these types of videos boring. Im commenting, i love these videos, i loved the other optimization video you did. In my opinion you should do more of these style, with the same quality, no need to rush to have video quantity. Good job man. 👍
@felchs Рік тому ⁺¹
If you're not developing databases, graph algorithms, and high demand processing, these big o are not really remembered in daily programming. At the end of the day, 99.9% of programmers just use APIs.
@peter8261 Рік тому ⁺¹
Dude showing the y=mx+b and the graph and then showing in as o(n) shattered my brain. That makes so much sense.
@emjizone Рік тому
I learned about SIMD thanks to you. But I already hate it because it's so hardware specific.
@strager_ Рік тому ⁺¹
It's usually easy to provide non-SIMD fallbacks in case the hardware doesn't support the operations efficiently.
I wouldn't say SIMD is "so" hardware specific. If you're talking about microarchitecture performance, then you'd probably hate all software because all software performance is hardware specific. 😆
@ryanjupton Рік тому ⁺²
Man, that's one of the most succinct, best explanations I've heard. Keep posting!!!
@changthunderwang7543 Рік тому ⁺²
Can't you make a map where each index of
is mapped to its line number. Then at any point in the code, you can scan ahead to the next
and then do an O(1) lookup for what line it is? Then you get O(n) table creation time, O(L) memory, and~O(1) lookup time,
@strager_ Рік тому
That's an interesting algorithm. I never thought of that.
Worst case lookup is O(n) because a file with only one line would need to possibly scan the entire file to reach the end. More generally, I think the time complexity is O(s) where s is the average length of each line. (On average, is is constant, thus O(1). At worst, s=n, thus O(n).)
@changthunderwang7543 Рік тому ⁺¹
@@strager_ yeah, it was basically your precompute-every-answer table minus the redundancy
@trainerprecious1218 Рік тому ⁺²
quality content at my youtube feed at last!
@RuskiRozpierdalacz Рік тому ⁺²
That simd Rust library seems really nice, so far I only used intrinsics in C , but those are too much work for casual playing.
@strager_ Рік тому ⁺¹
Agreed, intrinsics take a lot of effort. But Rust SIMD is missing *many* things and might limit your thinking. Tradeoffs!
@David-gj6dc Рік тому ⁺²
I already have a CS degree but your point about your input mattering is so important. Like for sorting algorithms, people tend to think about nlog(n) solutions as being the best possible. And it is true for any comparison based sort that is the best possible, in the WORST case. But not all of those algorithms are equal. Quick sort is typically the default sort but that is because it is good in the average case case. It degenerates to n^2 in the worst case though. Likewise sure insertion sort is n^2 in the worst case on paper. But it approaches linear time the more sorted the input is. So you might opt for that in small inputs or mostly sorted inputs. You can't think of just the worst case to gain optimal performance.
@pepehimovic3135 Рік тому ⁺¹
I think it was an insightful video.
Near the start when you say O(1) “best case”, shouldn’t that be big-Ω and not big-O?
@strager_ Рік тому ⁺¹
Big O is appropriate when talking about best case time complexity.
Big Ω or Θ would also be appropriate, but I didn't want to dig into that topic in this video.

Наступне

Автоматичне відтворення

Faster than Rust and C++: the PERFECT hash table