Having done some research about the speed claim, decompiling both of the rust and mojo binaries, I saw that mojo optimized out everything, making the resulting binary call main and returning without any work; this was proven by making a program with only a main function, and running it through the same benchmark test, which lead to the exact same results. Rust does the same optimization in any version equal or newer to 1.75.0, and its execution is so much faster that it cannot be properly calculated. In older rust versions, the compiler would create the entire recursion, and the content inside the loop looks something like: allocate 336 bytes, deallocate 336 bytes, check branch condition, call recursion function; this is slower than mojo because it is actually doing work by having to call the allocator multiple times, and having to manage a branch; also note that the compiler did do TCO, shown by having the deallocation happen before the branch condition check. This is a case where the benchmark was cherry-picked to something rust did not optimize at compile time, while also lying about rust not doing TCO, as well as using non-ideomatic rust (as shown in the video, `vec![0; size]` is preferred over `Vec::with_capacity(size)`, and results in a faster execution time than mojo). If they cannot properly explain why their language has the faster execution in the benchmark, having to assume things about how rust works to make something believable, then I do not see how any claims they make will be trusted in the foreseeable future. I am willing to change my mind in the matter if the article writer is able to explain themselves about this discrepancy, but for now, here's the truth about those benchmarks.
@@deiminator2 No engineer worth their time in dolla bills should reach for recursion where they don't need the full stack trace, because even in Mojo you cannot rely on the optimizer in all cases, that's an NP complete problem, not happening. (a side note, a surprising number of optimizations are NP complete problems, we've always just optimized for the most common case) Mojo is targeting the AI Python dev who does not think about these things, so they optimized for their target use case, it's really just that simple. Rust targets syst... well everything really. Unfortunately, neither Rust nor Python nor Mojo is going to edge C++ within the -real- technical AI space. (it's a joke, calm down, I'm a theoretician too). Also, another side note, but MLIR (IR optimized for tensor/AI use case) can be used by Rust as well, it'd not be harder than the GCC back end. I'd be concerned about the creator getting his ego all up in him and gating MLIR optimizations behind Mojo, but I *really* don't think they will. This article is probably just marketing towards those same AI Python devs who don't understand LLVM vs MLIR or SIMD in the first place.
9:20 C# is in an even weirder spot, its between 1, 2 and 3 tier. It can compile to native code, it can run without a garbage collector and it can run in IL mode. its insane
@@Mr.BinarySniper ? haven't written c# since 3 years, just following its progress. No need for assumptions. Its still shit since Microsoft cant get their asses together to get a fucking UI system that will not get deprecated within a week by Microsoft itself.
@@plaintext7288 i'm talking about pure C# not even unity or anything like that. since .NET 8 has AOT support which unity effectively does with their IL2CPP
26:13 you typically see a chip wide downclock when running AVX instructions on a lot of chips. You also have an overhead on the loading the input/fetching the results in many cases. In my experiments I typically see a 40X improvement, not a 64X improvement but it's constantly creeping towards that 64 number by each new architecture.
IIRC, the biggest problems with AVX512 appeared when you mixed them with scalar (or less vectorized) instructions but it could've gotten better since then.
That’s not true anymore. It happened on a range of Intel CPU’s only, but not for newer ones. As Turalcar says, it was the mix of AVX and scalar, because AVX-512 running at those clocks was much faster than scalar at normal clocks. It was also for AVX-512 only, but AVX-512 has been expanded massively since that, thanks to Intel again 😂
I think it was Cloudflare that posted in their blog, btw. They were doing AVX-512 encryption at the proxies, which was slowing down all the other things the proxy does, basically all routing and HTTP.
9:25 What is kind of missing in this pyramid are insane languages like haskell, that explicitly don't surface things like execution order and other details to the user and therefore are able to do aggressive optimizations. Haskell can often get within ~5% of C performance, which is kind of insane for a high-level language.
Looking into some of the internals for Haskell is kind of crazy. The more I learn about it the less I'm surprised that it beats C in some cases. There are definitely workloads not well suited for Haskell, but its semantics allow for some very aggressive optimization that C cannot allow.
as a AI researcher using python all the time... my compute heavy workloads are already running on C++ under the hood. dataset.map isn't the same as a for loop... and if I use a for loop or two --- it takes a few second once or twice. Sure thing, that is good enough. The developers who integrate a c library below python via cffi are the real heros. That stuff is tough - and I am also working on something like that.
This is true. The reason I never found Python to be slow for most tasks is because all the libraries which do common heavy lifting tasks are written in some lower level language by folks who are experts in the domain, so the code ends up running faster than I would be able to do it myself in a lower level language anyway.
@@AggressivesnowmaN For today, Cython 3.1 is very fast for extension libraries and Nim also has great Python interoperably and speed very close to C. Mojo will take a while.
40:22 Vec::new simply doesn't allocate. Unless you push to it, it will simply point to null. That's really helpful, as that way it can just simply be used in Default impls without having to do any allocations.
I found that helpful when using `std::mem::take` to get around a borrow checker issue without a needless allocation. The current borrow checker might not need a workaround for whatever it was that I was doing.
45:51 Doing SIMD manually in rust really isn't fun, but often you really don't need to do it anyway. If you structure your code correctly (i.e. have it have the right alignments and sizes and stuff), LLVM is *really* good at optimizing code for your platform. So instead of doing SIMD manually you structure your code *as if* you want to use SIMD and let the compiler do the rest.
31:17 The reason why rust uses drop flags is actually really interesting: If a value gets dropped conditionally, (for example when one branch has a drop(value) and another does not) rust still has to keep track of whether the value was dropped or not at the end of the scope. The solution to remove this overhead would be, to rather than keeping track of whether the value was dropped, to always drop the value at the earliest point of time that it is not used. So in the other branch that does *not* call drop(value) (As a partially deinitialized can't be used anyway, that would be totally doable.) The big downside would be that that would make the deinitialization of values slightly non-deterministic, which is also the reason this method was decided against. (Though sometimes I think it would have been nice if rust had just decided to just do the deinitialization always as soon as possible, but it might also have some other considerations like code size.)
@@nii-san5485 Not necessarily. As long as you don't drop/move the guard (for example by giving away ownership to another function), there is not really any reason for rust to drop the guard early. But *if* you really want to always just insert drops into the program as soon as a value is no longer used (I would call that non-lexical drops), you could just have an explicit drop at the end of the guard to make sure that you explicitly use it and it doesn't get dropped early. If you want to know more about this whole topic, there is a blog article on faultlore called "Destroy All Values: Designing Deinitialization in Programming Languages" going in depth on this topic.
@remrevo3944 I never considered how conditional drops worked under the hood thx for linking that article. But I'm saying with your suggestion of dropping after the last use, a guard would instantly be dropped always unless an explicit drop was added or they're special cased to drop at end of lexical scope.
Which you did consider now that I'm reading back. I still like "lexical" drop as it's a bit more intuitive , e.g. the value just "goes out of scope" which most programmers will already have a feel for 😁
@@stevenhe3462 @stevenhe3462 It's more about how "press" (or a dude/dudette with a blog) makes things up. One Primeagen is now multiple "Rust experts". This will get repeated more and more, and next thing you know people believe all kinds of stuff said by "Rust experts", very few of them actually hauling their arse to verify what was said and if it's actually true and for which values of "win" it is true. At some iteration, someone will seriously ascribe this quote to Klabnik.
TCO does not "unroll it into a loop" it "reuses the stack frame". When you call a non-inlined function you allocate a stack frame (by incrementing the stack pointer), this holds space for the arguments and the return value and possibly a little other bookkeeping. TCO reuses all the values in the stack frame, including the return address, and as a side effect never has to increment the stack. It can only be used in very exacting conditions (the conditions related to everything the function does must be able to be aggregated into a return value and/or argument variables, and the stack frame must be identical in size). Its real benefit is not so much performance (though it does give some) , its that it prevents a stack overflow from happening (which happens when your stack pointer overflows the preallocated stack space of your application. In this case, the vec initialization creates memory on the heap. If the next call were to rewrite the contents of that variable, the underlying vector on the heap would be leaked. Mojo is greedy in that since _stuff is never used and deleted on its "last use" there is nothing that is not stack allocated, whereas in rust it cannot. Rust can use TCO, just not with heap allocated variables. Granted, I would try an explicit drop in there after the vec init and see what happends.
The observation here would be that TCO was effective on mojo, TCO was not effective on rust, but its smart about stack sizes and had enough to not have a stack overflow, and javascript did not have TCO at all.
Somewhat off topic but a big thing here is that python also doesn't have TCO (at least not out of the box), so dynamic programming is better to do in mojo than python for essentially only some keywords added.
I’m keen to see what a really capable Mojo dev vs a really capable Rust dev can build in a fixed time window and the performance of the 2 solutions. Hell, throw a C++ and Zig dev in there too. Effort is the biggest constraint in my work.
Contrived use case of the recursive example. In production Rust you'd use a for loop and allocate the Vec once outside and not destroy it for each iteration.
That is fair but the point is that someone in the AI space newly learning Rust will need some level of understanding these things, not to mention there possibly being other cases where the lack of tail call optimization can lead to performance issues. It's true that you still need knowhow to create performant Mojo but someone new to Mojo is less likely to fall into these obscure and minor pitfalls sprinkled throughout the journey of learning something new.
That TCO example they did is so bad, Rust do have the issue of not being able to do TCO, but that example just does it, because stuff is never used or black_boxed, to demonstrate this you will need a function that counts sheep down starting at n, prints a message for each sheep counted, and returns the total number of sheep that have been counted. This function does not TCO, because the addition operation is performed after the recursive call and the result being used, so the compiler does not automatically optimize it.
There is an additional difference between semaphore and mutex. Semaphore can be controlled by anyone who got access to it, while mutex can only be released by the thread that locked it. Mutex is more of a monitor rather than semaphore.
For those that don’t know C# can compile to native code just like Go. In terms of speed C#, Java and Go are very similar for many tasks. For pure synthetic benchmarks it’s C# > Go > Java. You have greater memory control in C# than the competitors.
Umm no. Is not that optimized compared to Go. You can not use a lot of libraries due to libraries not being AOT compatible. Like bro, I love C# but I’m not afraid of using another language.
@@peanutcelery neither am I but C# can give u full memory control while Go just can’t. As time passes many libs will support AOT. Minimal APIs already do it in .NET 8
C# is a failure as a language, it's barely floats and only because business adaption and Microsoft sales department. You are kinda right about Go > Java if you ignore most things that makes Java interesting. And comparison about GO and Java is also kinda level of understanding before junior. Did you know that you actually can tweak both GO and Java for your exact application. Discord team actually wrote a guide about optimizing GO garbage collection, and you also have CGO to go beyond that. The idea to think about C# like something more than just Office work and games for Windows is something that only C# developers can have.
@@d1namis Your take about C# is very misleading. I don't know which type of field you work on but C# is everywhere but in ML. Yes that's right, everywhere. There are many places in the world where there is more demand for C# devs than Java. That should tell you something. Go, while nimble , "easy" and easy to grok, it has major problems specially around FFI and FIPS compliance. It would be a fool's mistake to write any high performance application that needs interops with libs already written in C or C++ with Go. C# was made with this in mind ! JNI is a terrible mess, which Java added JNA. C#'s threading model is just like Rust which aligns with C. Have you ever written software that needs to be audited by the goverment? Well guess which lang is easiest to pass? All because MS has first party support for SO many thing that makes the std in Go laughable. You rarelly need third part libs. I'm telling you this as I have writen software which run on Air Force One. Web Dev is not all Dev. Anyone who thinks C# is "dying" is living under a rock for the past decade. I'm also aware of C#'s shortcommings too
I don't like that "owned" thing in Mojo, because the caller of the function may not be aware that a copy of the string is being made! That's why I like that in Rust you have to explicitly use foo clone() at the caller site, making it immediately obvious to who reads that code for the first time what is going on.
They said that it transfers ownership but reverts to clone under the hood if you try to modify it, unless you use the caret in the function call, then it strictly transfers ownership (at least that's what I understood). So it seems like a quality-of-life thing, and it's up for debate for implicit vs explicit is better here
It's always explicit when a copy isn't being made because you have to use the transfer (^) operator to move ownership. It is less clear when a copy is made vs when it's an immutable borrow, you'd have to check the original function definition
One thing that doesn't seem to get mentioned at all - mojo is proprietary and not open source? That alone means it's a non-starter for so many projects and use cases that I highly doubt it will reach any kind of critical mass as a language to replace and / or supplement Python / C++ / Rust in the ML space (or any other space for that matter).
Oh, yeah that's a deal breaker imo. But considering that most (afaik) Ai statisticians don't care too much about what lies under the hood (their code is abysmal at times), they might not really care about whether it's open source, but that also means they might not care about using MOJO either
I think they promised to open source it later. I'm not holding my breath. Also if it compile to executable, it's bad thing for AI. In python I can take code of llama and hack around it. When I download LLM models I always review implementation(as most of them are (a) copies of llama, (b) small), if models were delivered as exe it would be terrible: I run linux, many AI researchers run linux, but many LLM users run windows so who knows what executable should be uploaded on HF and .exe can't be reviewed as easily as python code.
@@XxZeldaxXXxLinkxX they might not care but the one funding them will care the fact they won't have SWE by their side to help them as those cares also matters a lot
27:00 i believe odin does that with their built vector classes and matrices. able to get the power out of simd but have it implicitly built in. Zig is also getting them as well, but Odin is technically 1.0 already.
I like Mojician. Also, not surprising that Chris Lattner is iterating on the intermediate language design from LLVM seeing as he was a core (founding?) dev on it for a long time.
At 17:25 did the article confuse move semantics and copy semantics? Rust moves by default to avoid copying the string. In case of copy the original foo would be available for dbg!(foo) and there wouldn't be a compiler error. Primagen should point this out.
28:57 What he might have omitted there is that RAII does not necessarily means heap allocation. Basically, if you are not using new, the memory for the object will be allocated in the stack. And allocating in the stack is one instruction, freeing is one instruction, no matter how many objects are allocated in the function. So this is far better than GC (if you forget that C# allows to put structs in the stack). On the other hand, yeah, malloc/free can feel slower than a GC in many cases.
47:30 I think "Would you switch to typescript if introducing this new syntax would allow it to run 100x faster than javascript?" would be even closer analogy. And even I would stop writing vanilla JavaScript if TypeScript were actually faster.
10:35 actually got me there prime, damn, ''ruby is not skill issue, it's just slow'' Phew xD, really messed up with my head, got me in the first half, had to recheck the video for the graph xD
He's saying tail call optimization isn't possible because of the scope level deferred memory managment. Meaning for that specific use tail call optimization isn't possible for Rust. Whereas for Mojo, given they are not deferring scope cleanup at the end of each call (end of scope), it easily facilitates tail call optimization. In other words Mojo will provide tail call optimization for generalized use whereas Rust only provides it in subsets where scope does not require memory allocation clean up until scope destruction. This is also why the "drop" didn't fix the issue because it's not changing the deferred scope destruction. In Rust, "drop" flags the allocation for scope destruction. Effectively changing nothing.
1. Rust is definitely doing TCO in the example he show since there's no way that the program will create 1 billion stack without hitting the limit. 2. I don't see any reason that allocating heap memory in recursive function will make TCO impossible in rust, since converting TCO-able function into loop is literally just declaring argument as normal variable, then put the function body into the while loop, and compiler can just put the clean-up step before jumping to next loop.
@@OnFireByte I agree, I would very much like to better understand what's going on here. It could be the author is falsely attributing TCO failure to some underlying semantic implementation detail within Rust. I guess a review of the generated code is the only way to know for sure what's actually going on there.
3. Drop flags only show up when you have a drop that can't be statically determined (e.g. if a variable is only dropped when a runtime condition is true). There are not going to be any drop flags compiled into that code, and explicitly adding an unconditional drop before the recursive call _should_ cause the implicit drop at the end of scope to be omitted.
@@GrantGryczan My understanding is the drop always adds a bit flag and nothing more. The drop is then evaluated at end of scope. Which means in this case, the drop remains regardless of the drop flag. Resulting in no change as the drop is already deferred from scope destruction. In other words, the drop is saying I want you to do what you're already planning on doing.
@@justanothercomment416 As I said in my last comment, that is incorrect; the drop flag is only needed in very few cases. Take a look at the official nomicon documentation on drop flags. It explains this well and is very short and easy to understand.
~40:00 Maybe rust in current iteration of the compiler in release mode can detect that vec is not used and DCE it out of existence. More proper test would be to do something with vec. Honestly, at that time it'd be necessary to look at assembler code. (I'm not fan of their copies, it seems it can create a lot of headaches, if some things will be copied deeply, some not, but hopefully they thought of it and there will be no auto_ptr 2.0 but with every copyable type)
I'd love to see Prime react to an interview Chris Lattner did on how Mojo works and what's happening behind the scenes. I don't think most people realize that Mojo is simply using Python as the syntactical glue for a completely different set of backend processes. To use a car analogy, if Python is a Toyota Camry it's syntax is the paint job and decaling on the outside of the car. Mojo is a Formula 1 car that uses the same paint color as Python's Toyota Camry but it also has cool racing stripes and decals for the superset features/syntax. When I hear people talk about Mojo it's as if they think the language is still a Python Toyota Camry with some aftermarket mods to make it faster which leads them to think it can't possibly be as fast as something like Rust which to them is racecar. Nope Mojo is an actual Formula 1 car with a familiar Python paint job. That's about all they have in common. Which clears up why there's so much appeal. Mojo is basically telling Python programmers you just need to learn a few new concepts and some additional syntax and you'll be able to drive this Formula 1 car. What's even better is that it'll feel almost as easy to drive as your Toyota Camry which you can still drive whenever you want.
@@PRIMARYATIAS Copy on write is a convenience to the programmer that has runtime performance implications. It is not a bad tradeoff for many uses, but does impact ultimate performance.
I just tuned in to bits of this, but at 42:02, prime says "you do with_capacity() if you want to create a vector that does not resize itself" - errr, not my understanding. with_capacity(n) just initialises the vector with the capacity to hold n elements (it does the allocation up front), but it can and will resize itself if you try to put more than n elements in the vector, and this will result in a reallocation of additional memory to hold the large number of elements, which has performance implications. Just checked the Rust docs, and they refer to it as "with at least the specified capacity". This is a pretty basic concept imo - perhaps prime was having a bad day....
the point prime made at @19:24 about the ownership being orthogonal to the type is actually quite good. I wish rust did this the other way round. It seems the fell into a trap trying to make references similar to C++ references. the could required you to say things like `ref` and `owned/copy/clone`. and also remove the idea of implicit copies and require you to always .clone() something.
42:52 my guess is that, because they delete objects as soon as they arent in use, and becausethe vec‘s never in use, that they never alocate until u write code that uses it.
I honestly LOVE that Mojo programmers are called Mojicians. It just sounds cool. A programming magician. Honestly sounds better than Rustaceans and Pythonistas.
1. you can just quote the command `hyperfine "node src/index.js"` 2. The point about dropping I think is because the drop for the Vec is put after the recursive call, so it's preventing the TCO?
Why do you harp about Arc over and over, it is a way for multiple threads/tasks to safely share same recourse, it is not specific to Rust, other languages have that too.
It's because it's a tool that effectively steamrolls over the borrow checker. Yeah there's legitimate uses, but you can just use it as a "fuck it just take the damn variable". Using it introduces overhead and performance reduction
@@XxZeldaxXXxLinkxX You use it when you need to use it, if you want to access same recourse from multiple threads/tasks. Yes you can misuse it but in most technologies there are tools that can be misused. My comment reflected constant Primeagen's harping about Arc like that is Rust's way of doing most of data flow/access, which is not.
@@maniacZesci he's not harping on Rust, he's harping on the the people that do that (as a crutch ) . Like harping on the people that use "as any" in typescript. Just memeing pretty much
@22:13 I don't quite get that point they made. if every object has an identity then there must be some indirection happening there. if every object has an identity then that means there is a lookup to get any object ( aside from direct memory location ). If the lookup is insignificant then the argument is fine. but if you always have to first lookup the objects id then get it's location in memory that's obviously a penalty that you don't have to pay with Pinning in rust. The idea of pining is that data can be at 0x01 and it moves to 0x9. any one who assumed it was at 0x01 needs to know that it's now at 0x9. pinning allows you to not have allocate dynamic memory all the time.
Forget about the AI buzzword bingo, but if Mojo becomes a general purpose Language which can be compiled an still interact with the Python ecosystem (even if the library calls have to be interpreted and GC obviously), it would still be a win for me! Yes, maybe their claims about performance are false, but if it is good enough, at least as fast as Go and supports all normal Python features (even if for Example Structs are typed while Python classes are untyped but can still have things like inheritance), it would still be the optimal language, maybe not for AI developers, but for the average Web/Backend/Enterprise Developer.
9:33 couldn't explain any better. The higher tier requires skill cause even though you can produce correct code in c but if you end up freeing memory more often than a garbage collected language your program will be slower. It's not only the fact that the freeing is done manually, it's cause you can do it less often and in moments where it bothers the less. I call the top tier languages deterministic, you know what they are doing at a given moment.
20:30 I see this as argument that is logically "I don't like the syntax of Rust" and the implicit case (without "&" or "owned" or "^") should be different than it's for Rust. That is, skill issue. That said, Mojo seems to have lots of good ideas so it's definitely yet another language worth learning.
Correct C++ is Garbage collected, kinda... There's these things called smart pointers that use the constructor/destructor paradigm to automatically delete on out scope
Rust also has that, it’s RC/ARC. Definitely not a traditional GC like GC tier language (mark and sweep that need to stop the world) but yeah you could say that
@@OnFireByte I'm not say rust doesn't have it I'm just saying calling C++ a manual memory language is wrong if you follow best practices (which is to not use raw pointers unless you don't transfer ownership).
That's a lie. If C++ had GC it would be possible to make equivalent of python's class GraphNode: linked: List["GraphNode"] Impossible in C++, You need to come up manually with explicit strategy on who owns what in a graph and clear up memory. * You can't use unique_ptr's because many nodes can link the same node. * You can't use shared_ptr because graphs have cycles which means if you have A->BC and pass A and A goes out of scope, B and C survive * It's impossible to use weak_ref because somebody need to have non-weak ref. So you need to manually make graph class to handle ownership because C++ has no GC. You don't need to do anything of that in GC. "My program doesn't leak memory kinda" doesn't count. when it ~kinda does.
@@AM-yk5yd it’s just how you define GC, many people consider reference counting as GC because they define GC as just a system that automatically and safely deallocating memory at runtime, but yeah RC isn’t GC if you wanna say that GC need be able to deallocate every data thats doesn’t get referenced by root node (tracing GC). It’s just definition anyway
The "people writing Python aren't gonna move to rust if mojo becomes a thing" isn't true I think (saying that as one of the people in that domain that actually writes rust right now). Sometimes the problem with python isn't speed but correctness - there's definitely been insitances where I couldn't be confident in the python code doing the right thing; that I haven't missed some edge cases etc., and from what I heard mojo does hardly improve on python in that domain. Mojo may take some use away from rust but it can't replace it - even in the ML / AI domain
What leads to this correctness? Obviously not memory management (cause python has automatic memory managment). So it isn't rusts ownership and borrowing. Is it simply the existence of strong typing? Mojo has strong typing if wanted (and it is often required for high performance mojo code). Is it the more ML features of rust (it powerful enum type and pattern matching)? Genuinely curious what you think leads to this gain in correctness.
@@brendanhansknecht4650 Rust has inherited a lot of ML-isms (as in SML not AI), basically stuff like algebraic dt, hindley-milner types, optionals etc allow you to encode lot of extra information and guard rails into the type system. Mojo can’t have this because it would break compatibility with python esque stuff on fundamental level.
@@brendanhansknecht4650 I think it's that it's generally very explicit and ekes out edge cases - and that it's strongly and *statically* typed yes; and that it has quite an expressive typesystem. I'm not going to accidentally put a a "regular" unsigned into a place where a nonzero one is required for example; I can make algorithms that fail with nans take a floating point type that doesn't have NaNs, can use sum types where they're a good fit, ... Python is of course also strongly typed but the dynamicism takes away a lot. Regarding the memory management: if you get into writing more optimized python you actually start to care about memory management even in Python. I feel like there's not really a lot - if anything - gained here with python over rust.
@@SVVV97 cool. So those are the same reason why I would say I prefer rust over python. That said, people who write python is a gigantic market. Most of them aren't in the same boat. I think for most people who write python, Mojo is much more interesting. Assuming mojo is complete, it would give them: 1. Instant performance gains without changing their code at all 2. A way to add strong static types. On top of that, adding types increases the performance even more. 3. To the python people I interact with, they don't understand the benefits that come from ML. They have never used a nice sum type. So they don't know what they are missing in rust and other ML descendant languages. That said, I do hope that mojo adds good ml style types and pattern matching to python. Would be super happy if they just copy the rust enum type or similar. 4. Assuming modular as a company is successful, it also gives the access to state of the art machine learning tooling All of this with only incrementally changing their python code. I think for most people I know that program in python that is a way bigger sell than rust. Rust isn't something they are considering learning. It is just something they hope someone else learns to make them nicer libraries. Anyway, all this really just to point out the target market of mojo, which is quite large (cause the python ecosystem is huge). I think it only lightly overlaps with the rust market. Aside, I don't full understand Mojo's memory model, but it has ownership, borrowing, and no GC. That said, if I understand correctly, it will have to fall back on reference counting more often than rust.
@@brendanhansknecht4650 "1. Instant performance gains without changing their code at all" Doesn't seem to be that true, at least not unqualifiedly true. There might be some cases where that happens, particularly where python's design leads to things being excruciatingly slow (e.g. loops) but all the examples they have of mojo going blazingly fast (TM) are using the new syntax. "2. A way to add strong static types. On top of that, adding types increases the performance even more." That seems to be built upon python's type annotations, which is understandable, but those are kind of a bad fit for python in general due to their strongly nominal nature in a language that's structural to an extreme. Getting those type annotations right is often non-trivial for this reason, and I don't see what mojo is doing to improve on that. They should've gone with something like C++ concepts or Rust traits instead, that is, syntactic and semantic constraints on types rather than explicitly named types, in most cases. "3. (...) They have never used a nice sum type." Related to the above, seems like a bad fit for such a structural-heavy language. "cause the python ecosystem is huge" I think it remains to be seen how much of an advantage that really is in the end. I suspect people will find that python's dynamic features will make moving to Mojo harder than might've been anticipated from the sales pitch.
If you compile targeting a native CPU typically rust will auto-generate SIMD code for you, which you can see on compiler explorer with quite simple code. It becomes more fiddly if you want something that is more platform independent, or if you have dynamic input sizes which always mean you get a couple of items left at the end of the array, the remainder from array_size / sims_block_size, then you need to write painful hand-cranked stuff, but if you know what platform you are running on and compile for it you get most of the benefit without writing specialist code, just as MOJO does.
If the argument for migrating from python adding 15% learning to get 100x performance would be an irresistible value proposition, then everyone would be writing Nim or Julia
Yeah, I’m quite lost in this Mojo vs Rust discussion. Which usecases we’re talking about, which developers? Say, we take the claim that Mojo has the hardware level of performance seriously. Should BLAS and TensorFlow be reimplemented in Mojo? In that case, I don’t think the familiarity would be a strong selling point. If it’s on the Python side of things, then the most of runtime is spent inside libraries anyway, so what kind of performance gain we are talking about here: instead of 7154 seconds, it will take 7127 (if we are generous)?
If you are worried about speed the language is probably the last place you should be looking. Especially as a web developer. If you are that concerned you arent going to be swapping out languages as a fashion statement especially when 50 year old languages will do the job and have been doing the job for those with those concerns. I will never understand some developers. I sometimes think that they really would be more comfortable in a congregation than pretending to be an engineer.
@@sacredgeometry Yes Prime does webdev. Mojo is exciting for me though as a physicist because I have a lot of gripes with the current tooling at our dispense. Numpy, Numba etc are all excellent but I believe that Python is not the right tool for high performance scientific software. That has always been C/C++ and of course, Fortran. So when I'm asked to build complex models in Python from scratch (because that's what community is accustomed to) it's a pain to make it as performant as those compiled languages. That's why I started looking towards Julia and intend to use it as my primary language for my own scientific development until Mojo becomes widely available/open-source. And when it does, we'll see if it is indeed better or not. But if it goes the MATLAB proprietary way, then Julia is our best bet.
If you are worried about raw performance/latency you ARE limited to high performance languages like C/C++/Rust/... If you are programming tight real time control loops or even a game engine you just can't afford running a garbage collector(java) or a slow interpreter(python). Python is awesome but if I can i will use the c backend of a library as it can be a 100x faster (protobuf is a good example)
@@robstamm60 Absolutely. Time and performance critical software exists but as I said: The people writing game engines aren't constantly hunting for new languages. Almost all of the embedded developers I know think the overhead/ abstraction of C++ is too much and that C is perfectly well suited to their jobs. They aren't looking to replace 30+ years of experience every few months to hop on the new hype train.
So the problem with tail-call optimization in this instance is that they added an extra semicolon, that's it. That's a feature that's in C as well, tail-call optimization only happens when you're returning the final expression. Also, the reason Vector::new() is faster is the allocation gets optimized away.
I was curious and did some reading, because I was curious how Mojo can claim pass-by-reference as a default, and also better semantics. Maybe folks in the know could clear things up for me? I see that Mojo currently has implementations of neither explicit lifetimes, nor enforcement around taking immutable references when a mutable reference is still alive. I'm also unclear on how Mojo ending lifetimes at point of last call makes reference lifetime semantics easier to reference about. Does that just mean that every reference is an RC by default? Also, doesn't the eventual implementation of mutable borrow enforcement have the potential to introduce a lot of complexity into this system? And, if passing by `owned` sometimes references, and sometimes moves, doesn't that also force the programmer to understand how the Mojo compiler works to identify performance bottlenecks, and gets an extra level of complication when mutable borrows are enforced? It must be that Mojo will eventually *actually* move the value if it's mutated and has other references alive, so that introduces a third branch of implicit behavior that you might have to track. At least, presumably, package authors will. It's logically impossible that Mojo can be faster than Rust (or even faster than Go above a certain level of borrowing complexity?), have less implicit copying (or much more use of reference counting), and have simpler ownership semantics, right? Unless they've found some new proofs around ownership that unlock fundamentally different approaches to resolving ownership. I just want to make sure I'm not missing something/too stupid to understand it before I form a strong opinion about how much this cart is being put before the horse. EDIT: I see now that Mojo doesn't support *returning* references, because that's obviously what causes the need for explicit lifetimes. That removes the needs for RCs, surely reasoning about lifetimes becomes *more* difficult, and not less, when ownership doesn't necessarily last until the end of a scope?
I guess if the idea is that Mojo's explicit purpose is supporting ML, and isn't worried about being natively integration into larger application stacks, a lot of those things might not be issues. Seeing as you're unlikely to have to worry about mutable references and having large numbers of RCs when you're mostly doing matrix operations
Wait, isn't that reference behavior is a standard python thing? I see no difference here, in python unless you start explicitly modify variable, it's passed as ref(for example, pop() and append() on the list if you don't assign smth to that reference).
@@retereum mojo isn't garbage collected, it's borrow checked, like rust is. So you have to know (whether through the user manually tracking, or through proofs built into the compiler) that the underlying memory you're pointing to is still safe to access
@@olazawho Unless they want to be in a similar situation as python where the code eventually gets rewritten as C++, they'll have to write a bunch of business logic around it. It better be at least as good as Rust for that. That business logic probably won't be the perf bottleneck but being able to reason about ownership and lifetimes is still important. I don't think reasoning about lifetimes is any more difficult than with scoping, but you do lose out on functionality (can't stick cleanup logic in an ad-hoc destructor, can't tie it to locks, etc). IMO it's not worth it.
One more Important thing to realise, If you know python and have learned Rust, you are more close to learn Mojo. Because Mojo also introducing features from Rust like Ownership and Borrowing etc. Adding such features will have a skill issue impact on Python developer interested in learning Mojo. Because al-least you need to learn those concept before using them.
The Vec with capacity allocates the vector whereas the new Vec is removed by the compiler because it's never used. I do not know Rust but from a general compiler viewpoint, this would be logical. Rust might even "zero" out the memory allocated to the Vec of capacity. Mojo seems to just wait to allocate until a value is pushed to the vector meaning it never allocates any memory for the Vector in the given example.
This article is pure marketing. These guys should have just taken the L and walked away. They make a lot of false arguments and they cherry picked that final benchmark, with something that was completely simulated. In the mojo example, the compiler actually just calls the main function and then returns because no work is being done in the program. The same happens in the rust variant when you use the vector macro instead of the with capacity call. Rust can use the same back end as mojo and it also can be optimized for SIMD. This idea that somehow a Python dev is going to have zero friction learning mojo and also get the better performance that rust is absurd. With the straight up false statements that they made in this article, I'm not going to believe anything that these people write in the future.
that level of skill issue when running a benchmark is indicative of one of two things, 1. incompetence when evaluating the performance of optimized code or 2. dishonesty Both make me incredibly skeptical they have the capacity to deliver on their claims.
"Future proof for 50 years" sounds like a dumb prediction, that includes a ramp up and ramp down of usage like we've seen with C, and by the time those 50 years are reached (if that even happens) a new and better language will have been developed. There is nothing lost by learning the language, especially if you already use python, but it's a hyperbolic statement.
I don't understand why anyone would learn Mojo when you have Nim. I mean it's the same idea- python-like syntax but compiled. It's even faster than Rust though in many benchmarks and the ecosystem is more mature.
Mojo is proprietary and not python. Codon has the same issue. The skill issue with effective SIMD programming is not a syntax issue. If a programmer has the intelligence to program with SIMD, GPU, lifetimes, manual memory management effectively, they can certainly overcome superficial syntax differences such as indention vs curly braces. When thinking of memory management techniques for AI, borrow checking seems like a general bad fit since it is often paired with the general heap allocator. Arena (bump) allocation probably makes more sense for performance. Languages like Zig/Odin/Jai have better deterministic memory management control. Rust is not flexible when it comes to manual memory management, although certainly just a few "unsafe" blocks away from hacking something together.
( ts-10:58 ) , I dont know about the rust comment , I dont think you need a great amount of skill to use rust to make code fast , it kind of gives you a bunch of hints and with basic benchmark of your functions , it like minutes for you to figure out when your doing something problematic , nevermind when the compiler shouts at you and tells you , that you need to change this that thing . I kind of think Rust makes people with low skill levels to make things fast , it's that middle spectrum between c++ and javascript/python . I'd agree golang makes it easy to do a bunch but I think that bunch is focused on skilled system admins and devOps engs, who arguablly have enough of an overall understanding and logic thaught system that they should be considered high skill leveled peops anyway .
@@serena_m_ which in my humble opinion is worse. Sorry but making two different ways to write functions will be so confusing. They also have two types of objects, the standard class and structs. I think this is messy and will make things more difficult, for people coming from python.
About vec![0; 42], it actually memsets the first 42 elements. So it allocates and sets, so it might allocate on the first push. With capacity only allocates, so as long as you push less than capacity, you're guaranteed to not allocate.
From the official Mojo manual: "Mojo uses a third approach called “ownership” that relies on a collection of rules that programmers must follow when passing values. The rules ensure there is only one “owner” for each chunk of memory at a time, and that the memory is deallocated accordingly. In this way, Mojo automatically allocates and deallocates heap memory for you, but it does so in a way that’s deterministic and safe from errors such as use-after-free, double-free and memory leaks. Plus, it does so with a very low performance overhead." So it's much closer to Rust than Java or J# or JS.
mojo is going to end up like julia where it’s mostly a meme and you only end up getting fast code if you spend a bunch of time fussing around trying to wrangle the runtime to do what you want no such thing as free lunch
also the focus on tail call optimization as a selling point is kinda meme-worthy in and of itself nobody who’s serious about performance is using recursion and relying on TCO to begin with, and if they are it’s because what they’re doing couldn’t be translated to a for loop without extra memory anyway
@@yevgeniygrechka6431 I agree. Right now Mojo is doing it right. A slower REPL for dev, and static compilation for running the code. No idea why Julia devs bet in pure dynamic language with JIT. Sure it gives you nice features, but makes it worse than Python for small task, that are most of the tasks you do.
🎯 Key Takeaways for quick navigation: 00:00 *🚀 Mojo's speed compared to Rust and Python* - Mojo claims to be 50% faster than Rust, particularly relevant for AI experiments and fast, throwaway code. - Discussion on the skepticism and response regarding Mojo's speed compared to Rust. - Mojo aims to provide Python developers with performance benefits similar to Rust without a steep learning curve. 02:04 *🛠️ Technical aspects of Mojo and performance considerations* - Mojo is built on modern compiler technology (MLIR) and aims to meet Python developers' needs while optimizing performance. - Discussion on skill levels, performance trade-offs, and the professional reality of time constraints. - Mojo's goal is to optimize code performance without requiring developers to extensively learn new paradigms or languages. 03:31 *🚀 Potential impact of Mojo's performance in AI development* - Potential adoption of Mojo in AI development if it delivers on promised speed improvements over Rust. - Discussion on the value proposition for data scientists, ML engineers, and researchers in choosing Mojo over Rust. - Considerations for Mojo's adoption in AI infrastructure and its competition with Rust in the AI space. 06:30 *🤔 Comparison between Mojo and Rust in code ergonomics and performance optimization* - Comparison of code ergonomics between Mojo and Rust, focusing on simplicity and efficiency for developers. - Explanation of technical concepts such as automatic reference counting (ARC), mutex, and performance optimization strategies in Rust and Mojo. - Consideration of overhead and performance trade-offs in idiomatic code writing in Mojo and Rust. 11:34 *🔍 Evaluating performance benchmarks and considerations in Mojo and Rust* - Discussion on the challenges of comparing performance benchmarks between Mojo and other languages. - Explanation of memory management concepts, LLVM optimization, and code efficiency in Mojo compared to Rust. - Considerations for optimizing code performance and reducing overhead in both Mojo and Rust for real-world applications. 20:00 *💻 Rust's improved default behavior and borrow checker:* - Rust's default behavior after a move leads to more efficient code and reduces borrow checker conflicts. - Dynamic programming background engineers can work without roadblocks and expect desired behavior with optimal performance. 21:10 *🤔 Understanding pinning in Rust:* - Pinning in Rust determines whether an object can be moved in memory. - The concept of pinning, while crucial, can be confusing for many Rust developers. 22:07 *📚 Mojo's explanation of pinning and self-referential structs:* - Mojo provides a clear explanation of pinning and its implications for self-referential structs. - Pinning is essential for ensuring data validity and memory location stability in async Rust. 23:01 *🚀 Mojo's advantages over Rust and LLVM:* - Mojo leverages MLIR, a modern compiler stack, for improved performance and support for GPUs. - Chris Lattner's background with LLVM and MLIR contributes to Mojo's innovative approach. 24:37 *💡 Mojo's focus on performance and ease of use:* - Mojo aims to provide fast iteration times and enjoyable programming experiences. - The language's design appeals to AI and compiler enthusiasts, offering both power and simplicity. 25:36 *💻 SIMD optimizations in Mojo:* - Mojo's native support for SIMD optimizations improves performance for operations on large datasets. - Examples demonstrate how Mojo simplifies SIMD operations compared to traditional coding approaches. 27:29 *🔧 Eager destruction and memory management in Mojo:* - Mojo's approach to memory management, including eager destruction, aligns with efficient resource utilization in AI applications. - The language's design reduces complexities related to object lifetimes and destruction. 30:02 *🔄 Tail call optimization and overhead reduction in Mojo:* - Mojo eliminates overhead associated with drop flags and optimizes memory management to improve performance. - Tail call optimization and elimination are handled differently in Mojo compared to Rust, leading to potential performance gains. 32:07 *🧪 Testing and comparison between Rust and Mojo:* - Practical testing and comparisons between Rust and Mojo reveal insights into performance characteristics and compiler behavior. - Understanding memory allocation strategies and compiler optimizations is crucial for evaluating language performance. 43:49 *🛠️ Rust's Destructor Optimization in Mojo Explained* - Rust's destructors are called when a value goes out of scope, impacting tail call optimization. - Mojo's early destruction allows optimization with tail call even for heap-allocated objects. - Curiosity about Rust's stack allocation guarantees and potential optimizations. 45:04 *🧠 Mojo's Performance and Language Ergonomics* - Mojo delivers exceptional performance, surpassing expectations with significant speed improvements. - Highlight of Rust's high-level ergonomics despite being a systems-level language. - Discussion on the ease of use and adoption of Mojo for developers. 46:38 *🔄 Challenges and Solutions in AI Programming* - Complications faced in AI programming with Rust's SIMD, including slow compilation and resistance from Python-centric AI researchers. - Mention of attempts to address these challenges with Swift for TensorFlow at Google. - Insights into the complexities of integrating new languages into AI development workflows. 48:19 *🚀 Mojo's Potential and Future Development* - Recognition of Mojo's optimal performance for system engineers but ongoing development for dynamic features expected by Python programmers. - Comparison with Rust as an immediate production choice versus Mojo's potential for future AI advancements. - Speculation on Mojo's evolution, including the development of AI-specific libraries and a robust standard library akin to Go. Made with HARPA AI
So, as far as I understand it, Rust doesn't implement TCO when you are allocation because it isn't really "safe" and can lead to unintended behavior. The article"The Story of Tail Call Optimizations in Rust " has a little bit on this, though it's quite old. The reason Vec::new doesn't do this is because Vec::new only lazily allocates, therefore TCO can be applied, but Vec::with_capacity DOES do heap allocation and according to the rust devs if they did TCO there, this might lead to undefined behavior. Thoguht the speedup when using vec! is wierd, since vec! is just using Vec::with_capacity and fill under the hood... Maybe some optimizations?
If we start to account for skill issues, then Java can be as fast as Rust/C++ or even faster (after warmup), because having enough skill you can write garbage-free code and make mnual memory allocations/deallocations. And the part that can make it faster is JIT optimizations, which can be done in current specific usecase, like look-unwinding or operation reordering, which C++ or Rust simply cannot do, because they don't know how the code they produce will be used every time you run a program.
To prove the TCO example, you could write a for loop that allocates the vector the same number of times. I mean if the idea of TCO and TCE is to make recursive algorithms work as iterative then this should be a fair example of the advantages of having that optimization. My understanding is that since stack variables are eagerly destructed, every time you stop using a variable, the stack pointer decrements so when you get at the end of the funcion, your next stack starts where the old one was. This improves locality and you can work exclusively in cache, making the mojo version significantly faster, your playing with registers at that point.
Looking back at that blogpost, there is a very incorrect usage of commas throughout, and it dips its toes into Oxford commas seemingly at random. Doesn’t Modular have anyone to proofread?
I work with AI, I use both Python and Rust. I don't know Mojo (yet). This debate irritated me quite a bit - good debate, but I mildly disagree 🙂 We don't use python for its speed! Python is good language to "configure" frameworks like Keras, Torch, Tensorflow or Scikit - that are implemented in c++. Rust is a great replacement for that c++, not for python. Will Mojo be that c++ replacement? I have doubts. Can you trust a language rooted in Python-like prototyping to write hardcore numerical libraries? I would need some more convincing. When somebody says that it is 50% faster than Rust, that does not elicit trust - it just creates hype. On the other hand, to replace Python, Mojo would need to have a library support comparable to python - why would you use it otherwise? Again - we don't use python for its speed... Funny enough - Rust's speed or safety may as well not be the main reason to use it. I have started to rewrite some of my python code to Rust not to gain speed, but mainly for the excellent type-system and secondary for its ability to compile to wasm.
agree on the type system, and add traits and pattern matching for me (I know, Python >3.11 has it too, but it feels like it was an afterthought). I like Rust approach to writing software more simply because of these language design choices (plus testing and examples). In addition, I get amazing speed and memory safety, which I welcome.
If safety was the only concern then they wouldn’t be trying to replace C for decades. It’s a little bit more complicated than that. The best joke about that is that the essence of computing is about sepatation of church and state.
If Mojo can have Pydantic data structs with validation, HTTP libs for serving and posting, database connectors and a kafka connector or something in addition to the AI stuff on the standard library, it could potentially be THE lang for AI powered web
I don't why you can't use Go and Rust together as appropriate, java/Rust, HTML/etc. Mojo ... (mainframe does do assemble), Mojo < internet>, heavy data management: Rust ,Java for human interactive code to the other systems.
18:09 Literally the first thing you learn in Rust is how to pass variables mutably or immutably through references, in both languages, idiomatically, you should be writing code that aligns with said language's definition of borrowing and owning. I don't think this is a good example, plus, in Mojo, you don't actually know if functions pass arguments immutably or mutably unless you look at the function signature. If anything, it shows that Rust is less ambiguous.
", in Mojo, you don't actually know if functions pass arguments immutably or mutably " I'm guessing immutable arguments are annotated with "in" as opposed to "inout" or "owned". They seem to have taken that from Herb Sutter's presentations on parameter passing in C++.
Two words: Time Dilation. There's always a sense of being relative. Use what works. The time variants between Rust and Mojo are going to be too close and you'll not really lose unless maybe if you are targeting a process that has an advantage. Mojo will probably target ML solutions and solve them with simple solutions. Whatever you use the other can be it's good looking sibling.
GC requires indirect access. Direct allocation/deallocation can cause fragmentation. Rust tends to have larger continuous struts than copy on write memory management. Explicit memory management can run in far smaller memory usage.
CNC maschines are programmed with C here in germany. Depending on cpu and how mojo works their will be a bright future for it. But for I dont know how this will be on a cnc maschine cuz they need execute one after the other and not parallel thats how a cnc works
Frankely the whole who is faster debate: Dont give a flying fuck. As someone who was probably going to be programming in python or MatLab for her entire career i just see Mojo 🔥 as an absolute win. And that's really what their pitch should be: "hey python devs, ready for a language thay is written the same way as the one you already use and is 8x faster with exactly the same code and could do more once you learn the arcane runes?"
Something must have gone wrong. All vec![v; n] does is call vec::from_elem(v, n), which calls Vec::with_capacity(n) and on the returned Vec, Vec::extend_with(n, v). How can just calling Vec::with_capacity(n) be worse than that?
JUNIOR MOJO DEVELOPER REQUIRED. Must have 15 years MOJO development experience. Apply within.
I had this same experience someone in LinkedIn said he has 20 years of exp in ReactJS.
@@mac.ignacio JS makes sense, but not React LOL.
🤣@@mac.ignacio
has to pay minimum wage for EXPOSURE.
Dude, i've got a 40yo experience of Basic. That counts !
The article: *quotes Prime's own points back at him"
Prime: "I totally agree"
Chadagen
Greenhairgen
You have to love yourself before you can love others
nailed it
Needs to add "I stand by my words" to his vocabulary. But for now he agrees with himself.
Having done some research about the speed claim, decompiling both of the rust and mojo binaries, I saw that mojo optimized out everything, making the resulting binary call main and returning without any work; this was proven by making a program with only a main function, and running it through the same benchmark test, which lead to the exact same results. Rust does the same optimization in any version equal or newer to 1.75.0, and its execution is so much faster that it cannot be properly calculated. In older rust versions, the compiler would create the entire recursion, and the content inside the loop looks something like: allocate 336 bytes, deallocate 336 bytes, check branch condition, call recursion function; this is slower than mojo because it is actually doing work by having to call the allocator multiple times, and having to manage a branch; also note that the compiler did do TCO, shown by having the deallocation happen before the branch condition check.
This is a case where the benchmark was cherry-picked to something rust did not optimize at compile time, while also lying about rust not doing TCO, as well as using non-ideomatic rust (as shown in the video, `vec![0; size]` is preferred over `Vec::with_capacity(size)`, and results in a faster execution time than mojo).
If they cannot properly explain why their language has the faster execution in the benchmark, having to assume things about how rust works to make something believable, then I do not see how any claims they make will be trusted in the foreseeable future. I am willing to change my mind in the matter if the article writer is able to explain themselves about this discrepancy, but for now, here's the truth about those benchmarks.
type shit
thanks for the explanation, I was wondering if some compile-time shenanigans were happening
And all of that to justify an L
@@deiminator2 No engineer worth their time in dolla bills should reach for recursion where they don't need the full stack trace, because even in Mojo you cannot rely on the optimizer in all cases, that's an NP complete problem, not happening. (a side note, a surprising number of optimizations are NP complete problems, we've always just optimized for the most common case)
Mojo is targeting the AI Python dev who does not think about these things, so they optimized for their target use case, it's really just that simple. Rust targets syst... well everything really. Unfortunately, neither Rust nor Python nor Mojo is going to edge C++ within the -real- technical AI space. (it's a joke, calm down, I'm a theoretician too).
Also, another side note, but MLIR (IR optimized for tensor/AI use case) can be used by Rust as well, it'd not be harder than the GCC back end. I'd be concerned about the creator getting his ego all up in him and gating MLIR optimizations behind Mojo, but I *really* don't think they will. This article is probably just marketing towards those same AI Python devs who don't understand LLVM vs MLIR or SIMD in the first place.
Expected
9:20 C# is in an even weirder spot, its between 1, 2 and 3 tier.
It can compile to native code, it can run without a garbage collector and it can run in IL mode. its insane
Yeah C# is hard to classify b/c there's a project for seemingly everything. .NET can't do X...or can it?
a c# fanboy detected 😁
@@Mr.BinarySniper ? haven't written c# since 3 years, just following its progress.
No need for assumptions. Its still shit since Microsoft cant get their asses together to get a fucking UI system that will not get deprecated within a week by Microsoft itself.
Iirc it has two entries in the first tier - one which you described + Unity C#-to-C++ so it truly is a magical language 😂
@@plaintext7288 i'm talking about pure C#
not even unity or anything like that. since .NET 8 has AOT support which unity effectively does with their IL2CPP
Green hair = rust expert
Green is what color your white sink will turn if there is Rust in your water ! 🤔 🤔 I think you on to something
Actually patina expert
Cyan hair 😫
Pretty much. It's just called Rust dev hair.
@@mattmmilli8287That’s verdigris, not rust.
26:13 you typically see a chip wide downclock when running AVX instructions on a lot of chips. You also have an overhead on the loading the input/fetching the results in many cases.
In my experiments I typically see a 40X improvement, not a 64X improvement but it's constantly creeping towards that 64 number by each new architecture.
IIRC, the biggest problems with AVX512 appeared when you mixed them with scalar (or less vectorized) instructions but it could've gotten better since then.
That’s not true anymore. It happened on a range of Intel CPU’s only, but not for newer ones. As Turalcar says, it was the mix of AVX and scalar, because AVX-512 running at those clocks was much faster than scalar at normal clocks. It was also for AVX-512 only, but AVX-512 has been expanded massively since that, thanks to Intel again 😂
I think it was Cloudflare that posted in their blog, btw. They were doing AVX-512 encryption at the proxies, which was slowing down all the other things the proxy does, basically all routing and HTTP.
9:25 What is kind of missing in this pyramid are insane languages like haskell, that explicitly don't surface things like execution order and other details to the user and therefore are able to do aggressive optimizations.
Haskell can often get within ~5% of C performance, which is kind of insane for a high-level language.
With only the downside that you're writing Haskell 😢
sometimes beats C in a few benchmarks...
@@funprogif something beats C in a benchmark, the C code is poorly written
Looking into some of the internals for Haskell is kind of crazy. The more I learn about it the less I'm surprised that it beats C in some cases. There are definitely workloads not well suited for Haskell, but its semantics allow for some very aggressive optimization that C cannot allow.
Highly optimized Haskell looks funny, but yeah the amount of good work that went into optimization side of things in GHC is crazy.
I'm totally learning Mojo to be able to call myself a Mojito
lol, but i do love Mojician much more better.
as a AI researcher using python all the time... my compute heavy workloads are already running on C++ under the hood. dataset.map isn't the same as a for loop... and if I use a for loop or two --- it takes a few second once or twice. Sure thing, that is good enough.
The developers who integrate a c library below python via cffi are the real heros. That stuff is tough - and I am also working on something like that.
True, I figure maybe they want to win performance over bindings overhead? Because most AI code in Python is just calling C / CUDA libs anyway
This is true. The reason I never found Python to be slow for most tasks is because all the libraries which do common heavy lifting tasks are written in some lower level language by folks who are experts in the domain, so the code ends up running faster than I would be able to do it myself in a lower level language anyway.
But you may find the need for a custom task that runs quickly. Then you may want to pick a simple / fast language like Mojo or GO
That's what mojo is all about. Making the world under the hood pythonic by building around the new compiler.
@@AggressivesnowmaN For today, Cython 3.1 is very fast for extension libraries and Nim also has great Python interoperably and speed very close to C. Mojo will take a while.
40:22 Vec::new simply doesn't allocate. Unless you push to it, it will simply point to null. That's really helpful, as that way it can just simply be used in Default impls without having to do any allocations.
I found that helpful when using `std::mem::take` to get around a borrow checker issue without a needless allocation. The current borrow checker might not need a workaround for whatever it was that I was doing.
45:51 Doing SIMD manually in rust really isn't fun, but often you really don't need to do it anyway. If you structure your code correctly (i.e. have it have the right alignments and sizes and stuff), LLVM is *really* good at optimizing code for your platform.
So instead of doing SIMD manually you structure your code *as if* you want to use SIMD and let the compiler do the rest.
And writing SIMD friendly code in Rust is super easy, e.g. chunks/chunks_exact.
31:17 The reason why rust uses drop flags is actually really interesting:
If a value gets dropped conditionally, (for example when one branch has a drop(value) and another does not) rust still has to keep track of whether the value was dropped or not at the end of the scope.
The solution to remove this overhead would be, to rather than keeping track of whether the value was dropped, to always drop the value at the earliest point of time that it is not used. So in the other branch that does *not* call drop(value) (As a partially deinitialized can't be used anyway, that would be totally doable.)
The big downside would be that that would make the deinitialization of values slightly non-deterministic, which is also the reason this method was decided against.
(Though sometimes I think it would have been nice if rust had just decided to just do the deinitialization always as soon as possible, but it might also have some other considerations like code size.)
you would also have to special case guard patterns (e.g. MutexGuard)
@@nii-san5485 Not necessarily. As long as you don't drop/move the guard (for example by giving away ownership to another function), there is not really any reason for rust to drop the guard early.
But *if* you really want to always just insert drops into the program as soon as a value is no longer used (I would call that non-lexical drops), you could just have an explicit drop at the end of the guard to make sure that you explicitly use it and it doesn't get dropped early.
If you want to know more about this whole topic, there is a blog article on faultlore called "Destroy All Values: Designing Deinitialization in Programming Languages" going in depth on this topic.
@remrevo3944 I never considered how conditional drops worked under the hood thx for linking that article. But I'm saying with your suggestion of dropping after the last use, a guard would instantly be dropped always unless an explicit drop was added or they're special cased to drop at end of lexical scope.
Which you did consider now that I'm reading back. I still like "lexical" drop as it's a bit more intuitive , e.g. the value just "goes out of scope" which most programmers will already have a feel for 😁
Prime is called "Rust Experts" now.
Totally disagree. Can't even read a simple missing generic argument error message Lol.
@@stevenhe3462 @stevenhe3462 It's more about how "press" (or a dude/dudette with a blog) makes things up. One Primeagen is now multiple "Rust experts". This will get repeated more and more, and next thing you know people believe all kinds of stuff said by "Rust experts", very few of them actually hauling their arse to verify what was said and if it's actually true and for which values of "win" it is true.
At some iteration, someone will seriously ascribe this quote to Klabnik.
@@stevenhe3462Now he switches to Go lol xD I call him a language hopper (similar to distro hopper)
TCO does not "unroll it into a loop" it "reuses the stack frame". When you call a non-inlined function you allocate a stack frame (by incrementing the stack pointer), this holds space for the arguments and the return value and possibly a little other bookkeeping. TCO reuses all the values in the stack frame, including the return address, and as a side effect never has to increment the stack. It can only be used in very exacting conditions (the conditions related to everything the function does must be able to be aggregated into a return value and/or argument variables, and the stack frame must be identical in size). Its real benefit is not so much performance (though it does give some) , its that it prevents a stack overflow from happening (which happens when your stack pointer overflows the preallocated stack space of your application.
In this case, the vec initialization creates memory on the heap. If the next call were to rewrite the contents of that variable, the underlying vector on the heap would be leaked. Mojo is greedy in that since _stuff is never used and deleted on its "last use" there is nothing that is not stack allocated, whereas in rust it cannot. Rust can use TCO, just not with heap allocated variables.
Granted, I would try an explicit drop in there after the vec init and see what happends.
The observation here would be that TCO was effective on mojo, TCO was not effective on rust, but its smart about stack sizes and had enough to not have a stack overflow, and javascript did not have TCO at all.
Somewhat off topic but a big thing here is that python also doesn't have TCO (at least not out of the box), so dynamic programming is better to do in mojo than python for essentially only some keywords added.
I’m keen to see what a really capable Mojo dev vs a really capable Rust dev can build in a fixed time window and the performance of the 2 solutions. Hell, throw a C++ and Zig dev in there too.
Effort is the biggest constraint in my work.
Rust definitely has TCO in that example, no way that it can create like 1 billion stack without hitting stack limit if they didn’t unroll it into loop
Contrived use case of the recursive example. In production Rust you'd use a for loop and allocate the Vec once outside and not destroy it for each iteration.
That is fair but the point is that someone in the AI space newly learning Rust will need some level of understanding these things, not to mention there possibly being other cases where the lack of tail call optimization can lead to performance issues. It's true that you still need knowhow to create performant Mojo but someone new to Mojo is less likely to fall into these obscure and minor pitfalls sprinkled throughout the journey of learning something new.
That TCO example they did is so bad, Rust do have the issue of not being able to do TCO, but that example just does it, because stuff is never used or black_boxed, to demonstrate this you will need a function that counts sheep down starting at n, prints a message for each sheep counted, and returns the total number of sheep that have been counted. This function does not TCO, because the addition operation is performed after the recursive call and the result being used, so the compiler does not automatically optimize it.
There is an additional difference between semaphore and mutex. Semaphore can be controlled by anyone who got access to it, while mutex can only be released by the thread that locked it. Mutex is more of a monitor rather than semaphore.
For those that don’t know C# can compile to native code just like Go. In terms of speed C#, Java and Go are very similar for many tasks. For pure synthetic benchmarks it’s C# > Go > Java. You have greater memory control in C# than the competitors.
C# made colored functions
Umm no. Is not that optimized compared to Go. You can not use a lot of libraries due to libraries not being AOT compatible. Like bro, I love C# but I’m not afraid of using another language.
@@peanutcelery neither am I but C# can give u full memory control while Go just can’t. As time passes many libs will support AOT. Minimal APIs already do it in .NET 8
C# is a failure as a language, it's barely floats and only because business adaption and Microsoft sales department. You are kinda right about Go > Java if you ignore most things that makes Java interesting. And comparison about GO and Java is also kinda level of understanding before junior. Did you know that you actually can tweak both GO and Java for your exact application. Discord team actually wrote a guide about optimizing GO garbage collection, and you also have CGO to go beyond that. The idea to think about C# like something more than just Office work and games for Windows is something that only C# developers can have.
@@d1namis Your take about C# is very misleading. I don't know which type of field you work on but C# is everywhere but in ML. Yes that's right, everywhere. There are many places in the world where there is more demand for C# devs than Java. That should tell you something. Go, while nimble , "easy" and easy to grok, it has major problems specially around FFI and FIPS compliance.
It would be a fool's mistake to write any high performance application that needs interops with libs already written in C or C++ with Go. C# was made with this in mind ! JNI is a terrible mess, which Java added JNA. C#'s threading model is just like Rust which aligns with C.
Have you ever written software that needs to be audited by the goverment? Well guess which lang is easiest to pass? All because MS has first party support for SO many thing that makes the std in Go laughable. You rarelly need third part libs. I'm telling you this as I have writen software which run on Air Force One.
Web Dev is not all Dev. Anyone who thinks C# is "dying" is living under a rock for the past decade. I'm also aware of C#'s shortcommings too
I don't like that "owned" thing in Mojo, because the caller of the function may not be aware that a copy of the string is being made! That's why I like that in Rust you have to explicitly use foo clone() at the caller site, making it immediately obvious to who reads that code for the first time what is going on.
They said that it transfers ownership but reverts to clone under the hood if you try to modify it, unless you use the caret in the function call, then it strictly transfers ownership (at least that's what I understood).
So it seems like a quality-of-life thing, and it's up for debate for implicit vs explicit is better here
@@XxZeldaxXXxLinkxX this
It's always explicit when a copy isn't being made because you have to use the transfer (^) operator to move ownership. It is less clear when a copy is made vs when it's an immutable borrow, you'd have to check the original function definition
One thing that doesn't seem to get mentioned at all - mojo is proprietary and not open source? That alone means it's a non-starter for so many projects and use cases that I highly doubt it will reach any kind of critical mass as a language to replace and / or supplement Python / C++ / Rust in the ML space (or any other space for that matter).
Completely agree
Oh, yeah that's a deal breaker imo. But considering that most (afaik) Ai statisticians don't care too much about what lies under the hood (their code is abysmal at times), they might not really care about whether it's open source, but that also means they might not care about using MOJO either
I think they promised to open source it later.
I'm not holding my breath.
Also if it compile to executable, it's bad thing for AI. In python I can take code of llama and hack around it.
When I download LLM models I always review implementation(as most of them are (a) copies of llama, (b) small), if models were delivered as exe it would be terrible: I run linux, many AI researchers run linux, but many LLM users run windows so who knows what executable should be uploaded on HF and .exe can't be reviewed as easily as python code.
@@XxZeldaxXXxLinkxX They won't care, but the people developing the libraries they're relying on will care.
@@XxZeldaxXXxLinkxX they might not care but the one funding them will care
the fact they won't have SWE by their side to help them as those cares also matters a lot
27:00
i believe odin does that with their built vector classes and matrices. able to get the power out of simd but have it implicitly built in. Zig is also getting them as well, but Odin is technically 1.0 already.
I like Mojician. Also, not surprising that Chris Lattner is iterating on the intermediate language design from LLVM seeing as he was a core (founding?) dev on it for a long time.
Didnt Chris invent/ created the Swift programming language?
@@vectoralphaSec Yep. It's hard to imagine anyone on the planet who is more qualified for doing this than him.
I'm sorry but Mojician is hilariously awesome
Hell yeah thats what im saying. Mojicians just sounds cool at least to me. I like it better than Rustaceans and Pythonistas.
At 17:25 did the article confuse move semantics and copy semantics? Rust moves by default to avoid copying the string. In case of copy the original foo would be available for dbg!(foo) and there wouldn't be a compiler error. Primagen should point this out.
28:57 What he might have omitted there is that RAII does not necessarily means heap allocation. Basically, if you are not using new, the memory for the object will be allocated in the stack. And allocating in the stack is one instruction, freeing is one instruction, no matter how many objects are allocated in the function. So this is far better than GC (if you forget that C# allows to put structs in the stack). On the other hand, yeah, malloc/free can feel slower than a GC in many cases.
47:30 I think "Would you switch to typescript if introducing this new syntax would allow it to run 100x faster than javascript?" would be even closer analogy. And even I would stop writing vanilla JavaScript if TypeScript were actually faster.
I just checked the article and the benchmarks got actually updated
What was he saying at 20:35 - 20:43? Sounded like a lot of annoying beeps. Is he a robot? Love the content!
10:35 actually got me there prime, damn, ''ruby is not skill issue, it's just slow''
Phew xD, really messed up with my head, got me in the first half, had to recheck the video for the graph xD
He's saying tail call optimization isn't possible because of the scope level deferred memory managment. Meaning for that specific use tail call optimization isn't possible for Rust. Whereas for Mojo, given they are not deferring scope cleanup at the end of each call (end of scope), it easily facilitates tail call optimization. In other words Mojo will provide tail call optimization for generalized use whereas Rust only provides it in subsets where scope does not require memory allocation clean up until scope destruction.
This is also why the "drop" didn't fix the issue because it's not changing the deferred scope destruction. In Rust, "drop" flags the allocation for scope destruction. Effectively changing nothing.
1. Rust is definitely doing TCO in the example he show since there's no way that the program will create 1 billion stack without hitting the limit.
2. I don't see any reason that allocating heap memory in recursive function will make TCO impossible in rust, since converting TCO-able function into loop is literally just declaring argument as normal variable, then put the function body into the while loop, and compiler can just put the clean-up step before jumping to next loop.
@@OnFireByte I agree, I would very much like to better understand what's going on here. It could be the author is falsely attributing TCO failure to some underlying semantic implementation detail within Rust.
I guess a review of the generated code is the only way to know for sure what's actually going on there.
3. Drop flags only show up when you have a drop that can't be statically determined (e.g. if a variable is only dropped when a runtime condition is true). There are not going to be any drop flags compiled into that code, and explicitly adding an unconditional drop before the recursive call _should_ cause the implicit drop at the end of scope to be omitted.
@@GrantGryczan My understanding is the drop always adds a bit flag and nothing more. The drop is then evaluated at end of scope. Which means in this case, the drop remains regardless of the drop flag. Resulting in no change as the drop is already deferred from scope destruction.
In other words, the drop is saying I want you to do what you're already planning on doing.
@@justanothercomment416 As I said in my last comment, that is incorrect; the drop flag is only needed in very few cases. Take a look at the official nomicon documentation on drop flags. It explains this well and is very short and easy to understand.
~40:00 Maybe rust in current iteration of the compiler in release mode can detect that vec is not used and DCE it out of existence. More proper test would be to do something with vec. Honestly, at that time it'd be necessary to look at assembler code.
(I'm not fan of their copies, it seems it can create a lot of headaches, if some things will be copied deeply, some not, but hopefully they thought of it and there will be no auto_ptr 2.0 but with every copyable type)
I'd love to see Prime react to an interview Chris Lattner did on how Mojo works and what's happening behind the scenes. I don't think most people realize that Mojo is simply using Python as the syntactical glue for a completely different set of backend processes. To use a car analogy, if Python is a Toyota Camry it's syntax is the paint job and decaling on the outside of the car. Mojo is a Formula 1 car that uses the same paint color as Python's Toyota Camry but it also has cool racing stripes and decals for the superset features/syntax. When I hear people talk about Mojo it's as if they think the language is still a Python Toyota Camry with some aftermarket mods to make it faster which leads them to think it can't possibly be as fast as something like Rust which to them is racecar. Nope Mojo is an actual Formula 1 car with a familiar Python paint job. That's about all they have in common. Which clears up why there's so much appeal. Mojo is basically telling Python programmers you just need to learn a few new concepts and some additional syntax and you'll be able to drive this Formula 1 car. What's even better is that it'll feel almost as easy to drive as your Toyota Camry which you can still drive whenever you want.
It sounds like mojo copy on write is like swift. What rust offers is contiguous memory layouts which gains from cache hits.
IIRC Matlab and Octave (its open source counterpart) also do copy on write.
@@PRIMARYATIAS Copy on write is a convenience to the programmer that has runtime performance implications. It is not a bad tradeoff for many uses, but does impact ultimate performance.
I just tuned in to bits of this, but at 42:02, prime says "you do with_capacity() if you want to create a vector that does not resize itself" - errr, not my understanding. with_capacity(n) just initialises the vector with the capacity to hold n elements (it does the allocation up front), but it can and will resize itself if you try to put more than n elements in the vector, and this will result in a reallocation of additional memory to hold the large number of elements, which has performance implications. Just checked the Rust docs, and they refer to it as "with at least the specified capacity". This is a pretty basic concept imo - perhaps prime was having a bad day....
the point prime made at @19:24 about the ownership being orthogonal to the type is actually quite good. I wish rust did this the other way round.
It seems the fell into a trap trying to make references similar to C++ references.
the could required you to say things like `ref` and `owned/copy/clone`.
and also remove the idea of implicit copies and require you to always .clone() something.
Bro I’m never forgetting what a mutex or semaphore is after that godly explanation. 13:00
@@harikrishnanb7273 13:00 :D
That “explain it to me like i am 4 because i am too dumb to be 5” had me crying
42:52 my guess is that, because they delete objects as soon as they arent in use, and becausethe vec‘s never in use, that they never alocate until u write code that uses it.
I honestly LOVE that Mojo programmers are called Mojicians. It just sounds cool. A programming magician. Honestly sounds better than Rustaceans and Pythonistas.
1. you can just quote the command `hyperfine "node src/index.js"`
2. The point about dropping I think is because the drop for the Vec is put after the recursive call, so it's preventing the TCO?
Why do you harp about Arc over and over, it is a way for multiple threads/tasks to safely share same recourse, it is not specific to Rust, other languages have that too.
It's because it's a tool that effectively steamrolls over the borrow checker. Yeah there's legitimate uses, but you can just use it as a "fuck it just take the damn variable". Using it introduces overhead and performance reduction
@@XxZeldaxXXxLinkxX You use it when you need to use it, if you want to access same recourse from multiple threads/tasks.
Yes you can misuse it but in most technologies there are tools that can be misused. My comment reflected constant Primeagen's harping about Arc like that is Rust's way of doing most of data flow/access, which is not.
@@maniacZesci he's not harping on Rust, he's harping on the the people that do that (as a crutch ) . Like harping on the people that use "as any" in typescript. Just memeing pretty much
@@XxZeldaxXXxLinkxX fair enough I might have missed that, not a big fan of reaction videos so I don't follow his channel closely.
@22:13 I don't quite get that point they made.
if every object has an identity then there must be some indirection happening there.
if every object has an identity then that means there is a lookup to get any object ( aside from direct memory location ).
If the lookup is insignificant then the argument is fine. but if you always have to first lookup the objects id then get it's location in memory that's obviously a penalty that you don't have to pay with Pinning in rust.
The idea of pining is that data can be at 0x01 and it moves to 0x9. any one who assumed it was at 0x01 needs to know that it's now at 0x9.
pinning allows you to not have allocate dynamic memory all the time.
I'm all about the idiot-matic...i write code, come back 6 months later and think: "which idiot wrote this?...oh"
then you rewrite it better, comeback in 6 months and think "which idiot wrote this" and rewrite it how you had it the first time.
That’s me after 48 hours
Haha okay found my peer group in here
🤣👌
Forget about the AI buzzword bingo, but if Mojo becomes a general purpose Language which can be compiled an still interact with the Python ecosystem (even if the library calls have to be interpreted and GC obviously), it would still be a win for me! Yes, maybe their claims about performance are false, but if it is good enough, at least as fast as Go and supports all normal Python features (even if for Example Structs are typed while Python classes are untyped but can still have things like inheritance), it would still be the optimal language, maybe not for AI developers, but for the average Web/Backend/Enterprise Developer.
9:33 couldn't explain any better. The higher tier requires skill cause even though you can produce correct code in c but if you end up freeing memory more often than a garbage collected language your program will be slower. It's not only the fact that the freeing is done manually, it's cause you can do it less often and in moments where it bothers the less. I call the top tier languages deterministic, you know what they are doing at a given moment.
20:30 I see this as argument that is logically "I don't like the syntax of Rust" and the implicit case (without "&" or "owned" or "^") should be different than it's for Rust. That is, skill issue.
That said, Mojo seems to have lots of good ideas so it's definitely yet another language worth learning.
Correct C++ is Garbage collected, kinda... There's these things called smart pointers that use the constructor/destructor paradigm to automatically delete on out scope
Rust also has that, it’s RC/ARC.
Definitely not a traditional GC like GC tier language (mark and sweep that need to stop the world) but yeah you could say that
@@OnFireByte I'm not say rust doesn't have it I'm just saying calling C++ a manual memory language is wrong if you follow best practices (which is to not use raw pointers unless you don't transfer ownership).
That's a lie.
If C++ had GC it would be possible to make equivalent of python's
class GraphNode:
linked: List["GraphNode"]
Impossible in C++, You need to come up manually with explicit strategy on who owns what in a graph and clear up memory.
* You can't use unique_ptr's because many nodes can link the same node.
* You can't use shared_ptr because graphs have cycles which means if you have A->BC and pass A and A goes out of scope, B and C survive
* It's impossible to use weak_ref because somebody need to have non-weak ref.
So you need to manually make graph class to handle ownership because C++ has no GC.
You don't need to do anything of that in GC. "My program doesn't leak memory kinda" doesn't count. when it ~kinda does.
@@AM-yk5yd no it's not they're called smart pointers.
@@AM-yk5yd it’s just how you define GC, many people consider reference counting as GC because they define GC as just a system that automatically and safely deallocating memory at runtime, but yeah RC isn’t GC if you wanna say that GC need be able to deallocate every data thats doesn’t get referenced by root node (tracing GC). It’s just definition anyway
The "people writing Python aren't gonna move to rust if mojo becomes a thing" isn't true I think (saying that as one of the people in that domain that actually writes rust right now). Sometimes the problem with python isn't speed but correctness - there's definitely been insitances where I couldn't be confident in the python code doing the right thing; that I haven't missed some edge cases etc., and from what I heard mojo does hardly improve on python in that domain. Mojo may take some use away from rust but it can't replace it - even in the ML / AI domain
What leads to this correctness? Obviously not memory management (cause python has automatic memory managment). So it isn't rusts ownership and borrowing. Is it simply the existence of strong typing? Mojo has strong typing if wanted (and it is often required for high performance mojo code). Is it the more ML features of rust (it powerful enum type and pattern matching)?
Genuinely curious what you think leads to this gain in correctness.
@@brendanhansknecht4650 Rust has inherited a lot of ML-isms (as in SML not AI), basically stuff like algebraic dt, hindley-milner types, optionals etc allow you to encode lot of extra information and guard rails into the type system. Mojo can’t have this because it would break compatibility with python esque stuff on fundamental level.
@@brendanhansknecht4650 I think it's that it's generally very explicit and ekes out edge cases - and that it's strongly and *statically* typed yes; and that it has quite an expressive typesystem. I'm not going to accidentally put a a "regular" unsigned into a place where a nonzero one is required for example; I can make algorithms that fail with nans take a floating point type that doesn't have NaNs, can use sum types where they're a good fit, ... Python is of course also strongly typed but the dynamicism takes away a lot.
Regarding the memory management: if you get into writing more optimized python you actually start to care about memory management even in Python. I feel like there's not really a lot - if anything - gained here with python over rust.
@@SVVV97 cool. So those are the same reason why I would say I prefer rust over python.
That said, people who write python is a gigantic market. Most of them aren't in the same boat. I think for most people who write python, Mojo is much more interesting. Assuming mojo is complete, it would give them:
1. Instant performance gains without changing their code at all
2. A way to add strong static types. On top of that, adding types increases the performance even more.
3. To the python people I interact with, they don't understand the benefits that come from ML. They have never used a nice sum type. So they don't know what they are missing in rust and other ML descendant languages. That said, I do hope that mojo adds good ml style types and pattern matching to python. Would be super happy if they just copy the rust enum type or similar.
4. Assuming modular as a company is successful, it also gives the access to state of the art machine learning tooling
All of this with only incrementally changing their python code. I think for most people I know that program in python that is a way bigger sell than rust. Rust isn't something they are considering learning. It is just something they hope someone else learns to make them nicer libraries.
Anyway, all this really just to point out the target market of mojo, which is quite large (cause the python ecosystem is huge). I think it only lightly overlaps with the rust market.
Aside, I don't full understand Mojo's memory model, but it has ownership, borrowing, and no GC. That said, if I understand correctly, it will have to fall back on reference counting more often than rust.
@@brendanhansknecht4650 "1. Instant performance gains without changing their code at all"
Doesn't seem to be that true, at least not unqualifiedly true. There might be some cases where that happens, particularly where python's design leads to things being excruciatingly slow (e.g. loops) but all the examples they have of mojo going blazingly fast (TM) are using the new syntax.
"2. A way to add strong static types. On top of that, adding types increases the performance even more."
That seems to be built upon python's type annotations, which is understandable, but those are kind of a bad fit for python in general due to their strongly nominal nature in a language that's structural to an extreme. Getting those type annotations right is often non-trivial for this reason, and I don't see what mojo is doing to improve on that. They should've gone with something like C++ concepts or Rust traits instead, that is, syntactic and semantic constraints on types rather than explicitly named types, in most cases.
"3. (...) They have never used a nice sum type."
Related to the above, seems like a bad fit for such a structural-heavy language.
"cause the python ecosystem is huge"
I think it remains to be seen how much of an advantage that really is in the end. I suspect people will find that python's dynamic features will make moving to Mojo harder than might've been anticipated from the sales pitch.
If you compile targeting a native CPU typically rust will auto-generate SIMD code for you, which you can see on compiler explorer with quite simple code. It becomes more fiddly if you want something that is more platform independent, or if you have dynamic input sizes which always mean you get a couple of items left at the end of the array, the remainder from array_size / sims_block_size, then you need to write painful hand-cranked stuff, but if you know what platform you are running on and compile for it you get most of the benefit without writing specialist code, just as MOJO does.
He always reminds me of one of the voice actors in elder scrolls. Especially when he speaks the way he does in the first second of this video
If the argument for migrating from python adding 15% learning to get 100x performance would be an irresistible value proposition, then everyone would be writing Nim or Julia
Yeah, I’m quite lost in this Mojo vs Rust discussion. Which usecases we’re talking about, which developers? Say, we take the claim that Mojo has the hardware level of performance seriously. Should BLAS and TensorFlow be reimplemented in Mojo? In that case, I don’t think the familiarity would be a strong selling point. If it’s on the Python side of things, then the most of runtime is spent inside libraries anyway, so what kind of performance gain we are talking about here: instead of 7154 seconds, it will take 7127 (if we are generous)?
If you are worried about speed the language is probably the last place you should be looking. Especially as a web developer. If you are that concerned you arent going to be swapping out languages as a fashion statement especially when 50 year old languages will do the job and have been doing the job for those with those concerns.
I will never understand some developers. I sometimes think that they really would be more comfortable in a congregation than pretending to be an engineer.
Not all do webdev. Mojo is not for webdev. It's for AI and scientific programming
@@kinomonogatari Oh no I know. I mean that seems to be primarily what the Primeagen does ... isn't it? If I am wrong then ignore it.
@@sacredgeometry Yes Prime does webdev. Mojo is exciting for me though as a physicist because I have a lot of gripes with the current tooling at our dispense. Numpy, Numba etc are all excellent but I believe that Python is not the right tool for high performance scientific software. That has always been C/C++ and of course, Fortran. So when I'm asked to build complex models in Python from scratch (because that's what community is accustomed to) it's a pain to make it as performant as those compiled languages. That's why I started looking towards Julia and intend to use it as my primary language for my own scientific development until Mojo becomes widely available/open-source. And when it does, we'll see if it is indeed better or not. But if it goes the MATLAB proprietary way, then Julia is our best bet.
If you are worried about raw performance/latency you ARE limited to high performance languages like C/C++/Rust/... If you are programming tight real time control loops or even a game engine you just can't afford running a garbage collector(java) or a slow interpreter(python). Python is awesome but if I can i will use the c backend of a library as it can be a 100x faster (protobuf is a good example)
@@robstamm60 Absolutely. Time and performance critical software exists but as I said: The people writing game engines aren't constantly hunting for new languages.
Almost all of the embedded developers I know think the overhead/ abstraction of C++ is too much and that C is perfectly well suited to their jobs.
They aren't looking to replace 30+ years of experience every few months to hop on the new hype train.
So the problem with tail-call optimization in this instance is that they added an extra semicolon, that's it. That's a feature that's in C as well, tail-call optimization only happens when you're returning the final expression.
Also, the reason Vector::new() is faster is the allocation gets optimized away.
I was curious and did some reading, because I was curious how Mojo can claim pass-by-reference as a default, and also better semantics. Maybe folks in the know could clear things up for me? I see that Mojo currently has implementations of neither explicit lifetimes, nor enforcement around taking immutable references when a mutable reference is still alive. I'm also unclear on how Mojo ending lifetimes at point of last call makes reference lifetime semantics easier to reference about. Does that just mean that every reference is an RC by default? Also, doesn't the eventual implementation of mutable borrow enforcement have the potential to introduce a lot of complexity into this system? And, if passing by `owned` sometimes references, and sometimes moves, doesn't that also force the programmer to understand how the Mojo compiler works to identify performance bottlenecks, and gets an extra level of complication when mutable borrows are enforced? It must be that Mojo will eventually *actually* move the value if it's mutated and has other references alive, so that introduces a third branch of implicit behavior that you might have to track. At least, presumably, package authors will. It's logically impossible that Mojo can be faster than Rust (or even faster than Go above a certain level of borrowing complexity?), have less implicit copying (or much more use of reference counting), and have simpler ownership semantics, right? Unless they've found some new proofs around ownership that unlock fundamentally different approaches to resolving ownership. I just want to make sure I'm not missing something/too stupid to understand it before I form a strong opinion about how much this cart is being put before the horse.
EDIT: I see now that Mojo doesn't support *returning* references, because that's obviously what causes the need for explicit lifetimes. That removes the needs for RCs, surely reasoning about lifetimes becomes *more* difficult, and not less, when ownership doesn't necessarily last until the end of a scope?
I guess if the idea is that Mojo's explicit purpose is supporting ML, and isn't worried about being natively integration into larger application stacks, a lot of those things might not be issues. Seeing as you're unlikely to have to worry about mutable references and having large numbers of RCs when you're mostly doing matrix operations
Wait, isn't that reference behavior is a standard python thing? I see no difference here, in python unless you start explicitly modify variable, it's passed as ref(for example, pop() and append() on the list if you don't assign smth to that reference).
@@retereum mojo isn't garbage collected, it's borrow checked, like rust is. So you have to know (whether through the user manually tracking, or through proofs built into the compiler) that the underlying memory you're pointing to is still safe to access
@@olazawho Unless they want to be in a similar situation as python where the code eventually gets rewritten as C++, they'll have to write a bunch of business logic around it. It better be at least as good as Rust for that. That business logic probably won't be the perf bottleneck but being able to reason about ownership and lifetimes is still important.
I don't think reasoning about lifetimes is any more difficult than with scoping, but you do lose out on functionality (can't stick cleanup logic in an ad-hoc destructor, can't tie it to locks, etc). IMO it's not worth it.
32:20 TCO i know, but what is TCE tail call elimination?
One more Important thing to realise, If you know python and have learned Rust, you are more close to learn Mojo. Because Mojo also introducing features from Rust like Ownership and Borrowing etc. Adding such features will have a skill issue impact on Python developer interested in learning Mojo. Because al-least you need to learn those concept before using them.
the compatibility of linux distros is only limited to ubuntu, as a debian user I haven't experienced what it's like to program using mojo.
The Vec with capacity allocates the vector whereas the new Vec is removed by the compiler because it's never used. I do not know Rust but from a general compiler viewpoint, this would be logical. Rust might even "zero" out the memory allocated to the Vec of capacity.
Mojo seems to just wait to allocate until a value is pushed to the vector meaning it never allocates any memory for the Vector in the given example.
This article is pure marketing. These guys should have just taken the L and walked away. They make a lot of false arguments and they cherry picked that final benchmark, with something that was completely simulated. In the mojo example, the compiler actually just calls the main function and then returns because no work is being done in the program. The same happens in the rust variant when you use the vector macro instead of the with capacity call. Rust can use the same back end as mojo and it also can be optimized for SIMD. This idea that somehow a Python dev is going to have zero friction learning mojo and also get the better performance that rust is absurd.
With the straight up false statements that they made in this article, I'm not going to believe anything that these people write in the future.
that level of skill issue when running a benchmark is indicative of one of two things, 1. incompetence when evaluating the performance of optimized code or 2. dishonesty
Both make me incredibly skeptical they have the capacity to deliver on their claims.
To be honest though, Ruby also comes with a JIT compiler (a recent new feature) so it can be sped up if needed.
"Future proof for 50 years" sounds like a dumb prediction, that includes a ramp up and ramp down of usage like we've seen with C, and by the time those 50 years are reached (if that even happens) a new and better language will have been developed.
There is nothing lost by learning the language, especially if you already use python, but it's a hyperbolic statement.
mojo sucks for a few reasons
1. closed source
2. auth needed to use it
3. terrible setup on linux
if this is true its DOA
@@madsen4617 DOA ?
9:34
Bro angered all assembly programmers with a short sentence.
"C is just fancy assembly"
Such blasphemy !
and then there is the "eye hovering over the pyramid" category: Hardware Description Languages, like Verilog and VHDL.
I don't understand why anyone would learn Mojo when you have Nim. I mean it's the same idea- python-like syntax but compiled. It's even faster than Rust though in many benchmarks and the ecosystem is more mature.
Modular pinned Primeagen who then passed Modular by reference.
Has he already covered with the community the Julia Language ?
Mojo is proprietary and not python. Codon has the same issue. The skill issue with effective SIMD programming is not a syntax issue. If a programmer has the intelligence to program with SIMD, GPU, lifetimes, manual memory management effectively, they can certainly overcome superficial syntax differences such as indention vs curly braces.
When thinking of memory management techniques for AI, borrow checking seems like a general bad fit since it is often paired with the general heap allocator. Arena (bump) allocation probably makes more sense for performance. Languages like Zig/Odin/Jai have better deterministic memory management control. Rust is not flexible when it comes to manual memory management, although certainly just a few "unsafe" blocks away from hacking something together.
( ts-10:58 ) , I dont know about the rust comment , I dont think you need a great amount of skill to use rust to make code fast , it kind of gives you a bunch of hints and with basic benchmark of your functions , it like minutes for you to figure out when your doing something problematic , nevermind when the compiler shouts at you and tells you , that you need to change this that thing .
I kind of think Rust makes people with low skill levels to make things fast , it's that middle spectrum between c++ and javascript/python . I'd agree golang makes it easy to do a bunch but I think that bunch is focused on skilled system admins and devOps engs, who arguablly have enough of an overall understanding and logic thaught system that they should be considered high skill leveled peops anyway .
respect to mojo for using "fn" instead of "def"
It actually has both-def remains the same as regular Python, while fn has new Mojo semantics & optimizations
@@serena_m_ which in my humble opinion is worse. Sorry but making two different ways to write functions will be so confusing. They also have two types of objects, the standard class and structs. I think this is messy and will make things more difficult, for people coming from python.
fn is bad because it's the nth element of the f sequence. Are we paying by the character now?
About vec![0; 42], it actually memsets the first 42 elements. So it allocates and sets, so it might allocate on the first push. With capacity only allocates, so as long as you push less than capacity, you're guaranteed to not allocate.
From the official Mojo manual: "Mojo uses a third approach called “ownership” that relies on a collection of rules that programmers must follow when passing values. The rules ensure there is only one “owner” for each chunk of memory at a time, and that the memory is deallocated accordingly. In this way, Mojo automatically allocates and deallocates heap memory for you, but it does so in a way that’s deterministic and safe from errors such as use-after-free, double-free and memory leaks. Plus, it does so with a very low performance overhead."
So it's much closer to Rust than Java or J# or JS.
mojo is going to end up like julia where it’s mostly a meme and you only end up getting fast code if you spend a bunch of time fussing around trying to wrangle the runtime to do what you want
no such thing as free lunch
also the focus on tail call optimization as a selling point is kinda meme-worthy in and of itself
nobody who’s serious about performance is using recursion and relying on TCO to begin with, and if they are it’s because what they’re doing couldn’t be translated to a for loop without extra memory anyway
Julia would have been ok if it could properly statically compile.
@@yevgeniygrechka6431 I agree. Right now Mojo is doing it right. A slower REPL for dev, and static compilation for running the code. No idea why Julia devs bet in pure dynamic language with JIT. Sure it gives you nice features, but makes it worse than Python for small task, that are most of the tasks you do.
Hopefully less buggy though. Also Mojo is kind of already winning by having 0-indexed arrays.
🎯 Key Takeaways for quick navigation:
00:00 *🚀 Mojo's speed compared to Rust and Python*
- Mojo claims to be 50% faster than Rust, particularly relevant for AI experiments and fast, throwaway code.
- Discussion on the skepticism and response regarding Mojo's speed compared to Rust.
- Mojo aims to provide Python developers with performance benefits similar to Rust without a steep learning curve.
02:04 *🛠️ Technical aspects of Mojo and performance considerations*
- Mojo is built on modern compiler technology (MLIR) and aims to meet Python developers' needs while optimizing performance.
- Discussion on skill levels, performance trade-offs, and the professional reality of time constraints.
- Mojo's goal is to optimize code performance without requiring developers to extensively learn new paradigms or languages.
03:31 *🚀 Potential impact of Mojo's performance in AI development*
- Potential adoption of Mojo in AI development if it delivers on promised speed improvements over Rust.
- Discussion on the value proposition for data scientists, ML engineers, and researchers in choosing Mojo over Rust.
- Considerations for Mojo's adoption in AI infrastructure and its competition with Rust in the AI space.
06:30 *🤔 Comparison between Mojo and Rust in code ergonomics and performance optimization*
- Comparison of code ergonomics between Mojo and Rust, focusing on simplicity and efficiency for developers.
- Explanation of technical concepts such as automatic reference counting (ARC), mutex, and performance optimization strategies in Rust and Mojo.
- Consideration of overhead and performance trade-offs in idiomatic code writing in Mojo and Rust.
11:34 *🔍 Evaluating performance benchmarks and considerations in Mojo and Rust*
- Discussion on the challenges of comparing performance benchmarks between Mojo and other languages.
- Explanation of memory management concepts, LLVM optimization, and code efficiency in Mojo compared to Rust.
- Considerations for optimizing code performance and reducing overhead in both Mojo and Rust for real-world applications.
20:00 *💻 Rust's improved default behavior and borrow checker:*
- Rust's default behavior after a move leads to more efficient code and reduces borrow checker conflicts.
- Dynamic programming background engineers can work without roadblocks and expect desired behavior with optimal performance.
21:10 *🤔 Understanding pinning in Rust:*
- Pinning in Rust determines whether an object can be moved in memory.
- The concept of pinning, while crucial, can be confusing for many Rust developers.
22:07 *📚 Mojo's explanation of pinning and self-referential structs:*
- Mojo provides a clear explanation of pinning and its implications for self-referential structs.
- Pinning is essential for ensuring data validity and memory location stability in async Rust.
23:01 *🚀 Mojo's advantages over Rust and LLVM:*
- Mojo leverages MLIR, a modern compiler stack, for improved performance and support for GPUs.
- Chris Lattner's background with LLVM and MLIR contributes to Mojo's innovative approach.
24:37 *💡 Mojo's focus on performance and ease of use:*
- Mojo aims to provide fast iteration times and enjoyable programming experiences.
- The language's design appeals to AI and compiler enthusiasts, offering both power and simplicity.
25:36 *💻 SIMD optimizations in Mojo:*
- Mojo's native support for SIMD optimizations improves performance for operations on large datasets.
- Examples demonstrate how Mojo simplifies SIMD operations compared to traditional coding approaches.
27:29 *🔧 Eager destruction and memory management in Mojo:*
- Mojo's approach to memory management, including eager destruction, aligns with efficient resource utilization in AI applications.
- The language's design reduces complexities related to object lifetimes and destruction.
30:02 *🔄 Tail call optimization and overhead reduction in Mojo:*
- Mojo eliminates overhead associated with drop flags and optimizes memory management to improve performance.
- Tail call optimization and elimination are handled differently in Mojo compared to Rust, leading to potential performance gains.
32:07 *🧪 Testing and comparison between Rust and Mojo:*
- Practical testing and comparisons between Rust and Mojo reveal insights into performance characteristics and compiler behavior.
- Understanding memory allocation strategies and compiler optimizations is crucial for evaluating language performance.
43:49 *🛠️ Rust's Destructor Optimization in Mojo Explained*
- Rust's destructors are called when a value goes out of scope, impacting tail call optimization.
- Mojo's early destruction allows optimization with tail call even for heap-allocated objects.
- Curiosity about Rust's stack allocation guarantees and potential optimizations.
45:04 *🧠 Mojo's Performance and Language Ergonomics*
- Mojo delivers exceptional performance, surpassing expectations with significant speed improvements.
- Highlight of Rust's high-level ergonomics despite being a systems-level language.
- Discussion on the ease of use and adoption of Mojo for developers.
46:38 *🔄 Challenges and Solutions in AI Programming*
- Complications faced in AI programming with Rust's SIMD, including slow compilation and resistance from Python-centric AI researchers.
- Mention of attempts to address these challenges with Swift for TensorFlow at Google.
- Insights into the complexities of integrating new languages into AI development workflows.
48:19 *🚀 Mojo's Potential and Future Development*
- Recognition of Mojo's optimal performance for system engineers but ongoing development for dynamic features expected by Python programmers.
- Comparison with Rust as an immediate production choice versus Mojo's potential for future AI advancements.
- Speculation on Mojo's evolution, including the development of AI-specific libraries and a robust standard library akin to Go.
Made with HARPA AI
17:58 `copy` would be a more intuitive (less confusing) choice for the `owned` keyword.
Get owned
What is the Rust book shown at the beginning of the video? 5:03
zero to production in rust, it's mentioned on the screen
So, as far as I understand it, Rust doesn't implement TCO when you are allocation because it isn't really "safe" and can lead to unintended behavior. The article"The Story of Tail Call Optimizations in Rust " has a little bit on this, though it's quite old.
The reason Vec::new doesn't do this is because Vec::new only lazily allocates, therefore TCO can be applied, but Vec::with_capacity DOES do heap allocation and according to the rust devs if they did TCO there, this might lead to undefined behavior.
Thoguht the speedup when using vec! is wierd, since vec! is just using Vec::with_capacity and fill under the hood... Maybe some optimizations?
If we start to account for skill issues, then Java can be as fast as Rust/C++ or even faster (after warmup), because having enough skill you can write garbage-free code and make mnual memory allocations/deallocations. And the part that can make it faster is JIT optimizations, which can be done in current specific usecase, like look-unwinding or operation reordering, which C++ or Rust simply cannot do, because they don't know how the code they produce will be used every time you run a program.
They've actually changed the article on part for Tail Call Optimization.
it might be worth rereading that part.
To prove the TCO example, you could write a for loop that allocates the vector the same number of times. I mean if the idea of TCO and TCE is to make recursive algorithms work as iterative then this should be a fair example of the advantages of having that optimization.
My understanding is that since stack variables are eagerly destructed, every time you stop using a variable, the stack pointer decrements so when you get at the end of the funcion, your next stack starts where the old one was. This improves locality and you can work exclusively in cache, making the mojo version significantly faster, your playing with registers at that point.
Looking back at that blogpost, there is a very incorrect usage of commas throughout, and it dips its toes into Oxford commas seemingly at random. Doesn’t Modular have anyone to proofread?
I have no idea what's going on in this video, but I find it fascinating
I work with AI, I use both Python and Rust. I don't know Mojo (yet). This debate irritated me quite a bit - good debate, but I mildly disagree 🙂
We don't use python for its speed! Python is good language to "configure" frameworks like Keras, Torch, Tensorflow or Scikit - that are implemented in c++. Rust is a great replacement for that c++, not for python. Will Mojo be that c++ replacement? I have doubts. Can you trust a language rooted in Python-like prototyping to write hardcore numerical libraries? I would need some more convincing. When somebody says that it is 50% faster than Rust, that does not elicit trust - it just creates hype. On the other hand, to replace Python, Mojo would need to have a library support comparable to python - why would you use it otherwise? Again - we don't use python for its speed...
Funny enough - Rust's speed or safety may as well not be the main reason to use it. I have started to rewrite some of my python code to Rust not to gain speed, but mainly for the excellent type-system and secondary for its ability to compile to wasm.
agree on the type system, and add traits and pattern matching for me (I know, Python >3.11 has it too, but it feels like it was an afterthought). I like Rust approach to writing software more simply because of these language design choices (plus testing and examples). In addition, I get amazing speed and memory safety, which I welcome.
@@orestdubay6508 cry libtard, Rust is superior 🦀🦀🦀🦀🦀
If safety was the only concern then they wouldn’t be trying to replace C for decades. It’s a little bit more complicated than that. The best joke about that is that the essence of computing is about sepatation of church and state.
11:19 is there is any one thinks that there is some amount of bias to Rust here??
What happened to the earlier upload? Was watching it partway, refreshed the page, and suddenly it got privated lol (it was about HTNX)
If Mojo can have Pydantic data structs with validation, HTTP libs for serving and posting, database connectors and a kafka connector or something in addition to the AI stuff on the standard library, it could potentially be THE lang for AI powered web
I don't why you can't use Go and Rust together as appropriate, java/Rust, HTML/etc. Mojo ... (mainframe does do assemble),
Mojo < internet>, heavy data management: Rust ,Java for human interactive code to the other systems.
There’s a “tailcall” crate which adds an annotation (derived trait) to functions.
18:09 Literally the first thing you learn in Rust is how to pass variables mutably or immutably through references, in both languages, idiomatically, you should be writing code that aligns with said language's definition of borrowing and owning. I don't think this is a good example, plus, in Mojo, you don't actually know if functions pass arguments immutably or mutably unless you look at the function signature. If anything, it shows that Rust is less ambiguous.
20:32 Nice
", in Mojo, you don't actually know if functions pass arguments immutably or mutably "
I'm guessing immutable arguments are annotated with "in" as opposed to "inout" or "owned". They seem to have taken that from Herb Sutter's presentations on parameter passing in C++.
Btw, who noted that background replacement has treated his gray hair as background allowed to see through, probably because of the hoody
Two words: Time Dilation. There's always a sense of being relative. Use what works. The time variants between Rust and Mojo are going to be too close and you'll not really lose unless maybe if you are targeting a process that has an advantage. Mojo will probably target ML solutions and solve them with simple solutions. Whatever you use the other can be it's good looking sibling.
GC requires indirect access. Direct allocation/deallocation can cause fragmentation. Rust tends to have larger continuous struts than copy on write memory management. Explicit memory management can run in far smaller memory usage.
I think Primeagen misread that as "Explain that like I'm 5 years into a Computer Science program"
Mojo for Html " ... literal nonsense ... ", Java for heavy lifting communication to Rust (to maintain code/data on a large scale, etc.)
Modular makes mojo, they of course will say whatever to make mojo more relevant
CNC maschines are programmed with C here in germany. Depending on cpu and how mojo works their will be a bright future for it. But for I dont know how this will be on a cnc maschine cuz they need execute one after the other and not parallel thats how a cnc works
Frankely the whole who is faster debate: Dont give a flying fuck. As someone who was probably going to be programming in python or MatLab for her entire career i just see Mojo 🔥 as an absolute win.
And that's really what their pitch should be: "hey python devs, ready for a language thay is written the same way as the one you already use and is 8x faster with exactly the same code and could do more once you learn the arcane runes?"
Something must have gone wrong. All vec![v; n] does is call vec::from_elem(v, n), which calls Vec::with_capacity(n) and on the returned Vec, Vec::extend_with(n, v).
How can just calling Vec::with_capacity(n) be worse than that?