The New Massively Parallel Language

ThePrimeTime

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 25 вер 2024
Recorded live on twitch, GET IN
Links
/ 1791213162525524076
By: / victortaelin
• Mind-bending new progr...
By: / @fireship
My Stream
/ theprimeagen
Best Way To Support Me
Become a backend engineer. Its my favorite site
boot.dev/?prom...
This is also the best way to support me is to support yourself becoming a better backend engineer.
MY MAIN YT CHANNEL: Has well edited engineering videos
/ theprimeagen
Discord
/ discord
Have something for me to read or react to?: / theprimeagenreact
Kinesis Advantage 360: bit.ly/Prime-K...
Hey I am sponsored by Turso, an edge database. I think they are pretty neet. Give them a try for free and if you want you can get a decent amount off (the free tier is the best (better than planetscale or any other))
turso.tech/dee...
Наука та технологія

КОМЕНТАРІ • 468

@infernoplayzgames1684 4 місяці тому ⁺⁸¹³
"I'm not gonna use this..." you say that, then six months later we will see a "Learning the Bend Language" video.
@dylancode 4 місяці тому ⁺⁵⁰
How can we take the language seriously when the promotional videos use light mode??!!
@mrs8683 4 місяці тому ⁺¹⁶
@@dylancode light mode is better for showcase
@dylancode 4 місяці тому ⁺³
@@mrs8683 Maybe in some cases... Depends what you prefer I guess!
@Machine_State 4 місяці тому ⁺⁴
@@dylancode Light mode is fine actually -Light mode enjoyer
@crimiusXIII 4 місяці тому
Probably some AoC is my guess
@origamitraveler7425 4 місяці тому ⁺⁴³⁹
This thing is built on top of another of the guy's projects called HVM, which was a massively parallel runtime originally envisioned as a compilation target for Haskell. But it's a whole platform, you can write any language on top of it. It's pretty cool.
@joszuarotschild2270 4 місяці тому ⁺³⁰
I love that guy as he managed to implement this, enable it to be incoroprated in custom projects via design and finally promote it out (at least enough to be heard by Prime) at the same time.
@zweitekonto9654 4 місяці тому
HASKELL MENTIONED
@Adowrath 4 місяці тому ⁺⁹
When somebody mentioned Bend to me, I got immediatly reminded of the original HVM - was glad to see it's a continuation of that idea in HVM2. Amazing work, wish I was intelligent enough to understand how it works.
@ZeroInDaHouse Місяць тому
There is a massive market out there for parallel computing and the current libraries or '''toolkits" to use them are a joke. I hope there will be support for AMD GPUs.
@Sandromatic 26 днів тому
If I can run haskell on this I'd love to do that, bend's lack of types really hurts understanding for this kind of code tbh. Or heck, just give me ADT in bend, that's all I need.
@paryonix3d6657 4 місяці тому ⁺²⁵³
Mutex: To freely quote Kevlin Henney here: "They should really have called them Bottleneck instead. No one would be like - Oh, I think we just need to add two bottlenecks here in those classes and we are fine" :D.
@alkeryn1700 3 місяці тому ⁺²
sure, but there are many cases where not using them is not an option for correctedness.
@dipereira0123 4 місяці тому ⁺¹⁷³
This language is so advanced that we dont even have a problem for it to solve (YET)
@ThePrimeTimeagen 4 місяці тому ⁺⁶⁵
But you can imagine
@irakli2166 4 місяці тому ⁺⁸
I think the language could be ideal for creating a solver for linear programming, where you need a lot of parallel calculations. This could be a huge business opportunity.
@timmygilbert4102 3 місяці тому ⁺¹²
The problem exist, it's called video games
@dtracers 3 місяці тому ⁺⁹
hobbyist wanting to get into GPU programming without needing to get into cuda
@colbyboucher6391 3 місяці тому ⁺¹
@@timmygilbert4102There are almost certainly no graphics libraries in Bend.
@Kram1032 4 місяці тому ⁺⁹⁵
interaction combinators are super simple. The basic form is just:
"If two of *the same* combinator meet, connect up their input wires"
"it two *different* combinators meet, copy them"
and somehow that suffices to
- be Turing complete
- calculate things as efficiently as possible
And furthermore, these compute locations can happen completely independently from each other, i.e. in parallel.
There ARE limitations on this. Namely, you have to structure your code in a form that properly takes advantage of this. However, I *think* that's where Bend comes in.
HVM2 is basically typed interaction combinators. Bend probably is a layer that attempts to minimize your having to think about the specific function form. And the design with folds and recursions rather than loops *also* is all about increasing the chance to write in a style that the interaction combinator parallelism can actually be taken advantage of.
I recommend at least trying to skim the mentioned paper. You don't have to understand every last bit of the theory in order to understand at least roughly how these interaction combinator graphs work. It's really clever stuff, and pretty long-known too. What's new here is this effort to make it run seamlessly on a GPU.
@oleonardohn 4 місяці тому ⁺³
There are no evidences for the second claim. In fact, the author clearly distorted the meaning of optimal evaluation in the context of lambda calculus. Take a look at what Lévy means by optimal evaluation in his 1978 paper, then read what Asperti says in 1998 about the ineficiency of optimal evaluation.
@Hector-bj3ls 4 місяці тому ⁺⁵
Super simple he says... proceeds to use words no one with an IQ under 9000 understands.
@Hector-bj3ls 4 місяці тому ⁺¹³
Specifically:
What is a combinator?
Why can they interact?
What interactions are we talking about?
What does it mean to copy one?
What does it mean "to connect up their input wires"?
What is a compute location?
What is a function form?
@nandans2506 4 місяці тому ⁺⁴
It's just pure functional programming to be honest
@Kram1032 4 місяці тому ⁺¹⁶
@@Hector-bj3ls all of those things are best explained through pictures I'm afraid. The diagrams in the paper are pretty good though.
Here are my best tries:
You can think of a combinator as almost like an assembly instruction for computations.
I don't think "why" they interact is a good question: They do so "by definition". The questions is really "how". There are different kinds of combinators that interact in various ways. These interaction combinators are among the simplest though.
I don't actually know why they are called interaction combinators rather than anything else, but I'd say they interact with each other? Whenever two combinators encounter each other, they annihilate or "pass through each other" in a copy step. - What that means is very easy to show in a diagram and very hard to only put in words.
However, you can see that diagram in this video. It's the one where a pair of combinators turns into four.
The triangles are the combinators. They have one input and two outputs each, given by the lines.
A compute location is whenever two combinators are directly connected along the input.
The term "function form" isn't anything rigoros. That was my handwavy term.
Basically, a seemingly magical property of these combinators is, that you can compute a lot of things in a *sublinear* number of steps. Sometimes even in O(1)
And the programs you write for that often look quite naive.
A toy example is to apply "not" to a bit like quadrillions of times. - Obviously, it's quite easy to say whether the outcome is the same or the opposite as you start with simply by looking at your enormous number in the quadrillions and check the least significant bit (check if it's even or odd)
However, *naively* you might just go "oh apply not and then not and then not ..." across those quadrillions of steps. It'd take way too long to do that sequentially, even on modern GPUs.
But by the magic of Interaction Combinators, defined in certain precise ways, you can take even this really stupid naive program and it'll give you the result in essentially as much time as the parity check would've done.
However, whether this works depends on how you define your "not" function.
IIRC it was like this:
If you have your classic church-encoded bools:
True = \t f -> t
False = \t f -> f
basically just a pair of singletons, where the left pair is considered True, and the right pair is considered False
Then there are two seemingly equivalent versions of Not:
not_a = \b -> b False True
not_b = \b t f -> b f t
The first definition explicitly assigns True to False and False to True.
The second one simply says "swap what you get".
Using the first definition, you will *not* achieve any speedup with repeat application. It will stay linear.
The second definition, however, will allow interaction combinators to discover *sub expressions* that are identical.
So when you get something like
not not not not not not not not True
the first definition will just sequentially do:
not = not_a
not ( not ( not ( not ( not ( not ( not ( not True )))))))
which is linear.
But the second will end up doing something like
not = not_b
(((not . not) . (not . not)) . ((not . not) . (not . not))) True
which can be simplified to:
a = (not . not)
b = (a . a)
x = (b . b) True
effectively using only three steps.
By construction, it is guaranteed that your combinators will *always* find this optimal form with the fewest numbers of steps, so long as your function is defined in a way that can be turned into a tree like that.
The algorithm it finds is equivalent to Exponentiation by Squaring where you go
a = x * x
b = a * a
x^8 = b * b
It'll do that without you having to cleverly think about how to do this efficiently yourself rather than using an expensive pow(x, 8)
So long as your function has a reasonably tree-like form (or whatever), it will find the optimal way. And moreover, every single compute step is completely local, so if you ever have multiple places that can be reduced at the same time, those places can be safely evaluated in parallel without fear of race conditions or anything like that.
I think that's also the reason why bend doesn't directly offer loops and wants you to use recursion instead. It makes it so you are kinda forced into a style that's more likely to take advantage of this kind of speedup. Perhaps with time some syntactic sugar will make that more straight forward.
@Flakelolz 4 місяці тому ⁺³⁷
13:45 Clippy warns you about this things. The specific lint rule is called "let_underscore_future". Clippy should be mandatory when writing Rust imo, it has so many helpful warnings.
@edumorangobolcombr 4 місяці тому ⁺⁴⁶⁵
Another Brazilian language 🇧🇷
@alexandrecolautoneto7374 4 місяці тому ⁺⁵⁹
BR mentioned, carai
@aldi_nh 4 місяці тому ⁺²⁰
vamos
@josemarques3454 4 місяці тому ⁺¹⁰
Brasil sil sil sil
@zerorusher 4 місяці тому ⁺⁵
Vai Taelin!
@greidinger-reis 4 місяці тому ⁺¹⁴
BRAZIL MENTIONED
@victormurphy3511 4 місяці тому ⁺³¹
Adult education in mathematics ... Fast Python ... Loops that don't suck ass ... Check Check Check. I am delighted, as I'm going to use for Weather and Climate analysis... Thank you for the video and your point of view as when I came across this language I thought it was to good to be true. Awesome.
@victormurphy3511 4 місяці тому ⁺²
@@bbourbaki Grift ?
@oldtools6089 3 місяці тому ⁺²
You can't use math to make yourself a seer. To predict the weather requires personality.
@victormurphy3511 3 місяці тому
@oldtools6089 then i said your call is important to us 🤣🤣🤣 yes math is actually a lie because airline companies are a conspiracy as they use flying carpets and magic because we all live on a flat earth... 😉 ... lol 😆 🤣 😂
@drdca8263 3 місяці тому
@@oldtools6089lol, no
@willothy 4 місяці тому ⁺⁴⁹
12:59 I usually use `.ok()` to ignore an error instead of assigning it to underscore. This will give you a warning if the type changes from Result to impl Future, etc.
@meandsee 4 місяці тому ⁺²⁵
Literally in December “Advent of Code in bend”
@novantha1 4 місяці тому ⁺⁹³
As someone who is the target audience of this language, that call out at the end felt super weird, lol
@reviraemusic 4 місяці тому ⁺²
As a brazilian aware of how Ideology works, that also felt weird.
@eNtrozx 4 місяці тому ⁺²
Out of curiosity, how are you the target audience? what do you do?
@novantha1 4 місяці тому ⁺¹⁷
@@eNtrozx Distributed linear algebra across a variety of devices with the intent of deploying various MLP and Transformer blocsk in edge and consumer facing applications.
This language is really interesting because it's a lot more ergonomic than writing GPU kernels. Even if it was, say, 10% slower, if it saves you 90% of the dev time it's worth it because CUDA engineers and ML engineers are pretty expensive right now.
@SpacialCow 4 місяці тому ⁺¹
Weird in which sense?
@graham8316 4 місяці тому
Why weird? I thought it was A+ v funny
@TecnocraciaLTDA 4 місяці тому ⁺⁶
The creator is a friend of mine, he is called Victor Taelin, he is from Brazil and he is a very skilled language creator, he made more than 5 programming languages already and he is a workaholic as sh*t
@Jaood_xD 4 місяці тому ⁺²³
"Mutalisks" sounds like a scary flying creature from star craft 😁
@lmnk 4 місяці тому ⁺²
That's just hydralisks but they also mutated from all the radiation and stuff (like ninja turtles, y'know), so now they are mutalisks
@oserodal2702 4 місяці тому ⁺⁷⁹
Lowkey, Bend isn't even that guy's most impressive work (it's HVM).
@KBIZWORLD 4 місяці тому ⁺³
Real
@PeterFaria 4 місяці тому ⁺²⁸
Coolest thing I’ve tried yet. Can’t wait to see where it goes
@monolith-zl4qt 4 місяці тому
I am a prophet: it'll go absolutely nowhere!
@danielandresvasquezprieto9902 4 місяці тому ⁺⁵
Perhaps the most impressive part about this all, and is something that's not mentioned in the video, is that the HVM compiler can in fact compile from other languages (given that touring computation based languages are equivalent to lambda computation ones, and lambda computation ones can be then converted to the interaction combinators)
So you can like write javascript or python or whatnot c, and compile ir with the HVM and boom, you haven't changed a single line of code and your program runs on gazillion threads and fucking fast
@SimGunther 4 місяці тому ⁺¹⁷
Those interaction combinators look so cool and mind bending
@dmolast-kv8jg 4 місяці тому ⁺¹¹
21:46 ah yes, the first phase, denial
@reviraemusic 4 місяці тому ⁺⁴⁵
Brazilians are really proud of this, Prime.
@Ekce 4 місяці тому ⁺⁶⁸
It's really cool but I noticed that while their relative benchmarks compare against CPU they also mention that their "single core performance is still extremely sub-par". So that 57x speedup may not really be accurate.
I do have an advanced education in mathematics, but I don't really understand how this language works. I'll have to spend time reading through their paper. My initial expectation was that the paper would describe the language in terms of category theory/type theory (especially after seeing the string diagram looking diagrams) but that's not the case.
There are a lot of new domain specific languages (DSLs) being created and studied in research labs all over the world these days (for many applications including concurrency, database updating, parallelism, message passing, etc..). Many of them offer many promising specialized features but most are ultimately just research projects not intended for widespread use. In the future, the hope is that they will lead to good and mature languages for many things. I don't know if that's the case with Bend but it would be good to get an interview with the developers to learn more about it as there is very little information on their website (e.g., who is behind it, is there a future roadplan, is there funding, etc..).
Also on a sidenote, I don't think the transpiler/compiler distinction is really that meaningful these days. For instance, the Java compiler is really just a transpiler down to Java bytecode, which is then interpreted by the JVM (unless you're running on ARM in which case the JVM is implemented as a stack machine in microcode and in that case the Java compiler really is a compiler).
@alainzscheile6913 4 місяці тому ⁺⁶
oh their parallel performance is also still extremely sub-par (compared e.g. to haskell, I haven't seen a comparison with Python single-core or multiprocess perf yet)
@VivekYadav-ds8oz 4 місяці тому ⁺¹⁴
Well the parallel performance is just a multiple of the single-core performance, so that statement stlll makes sense. If they improve their single-core perofrmance, all of the metrics will improve, but (ideally) would stay in the same ratio.
@MartinJaniczek 4 місяці тому ⁺¹
Bend compiles Lambda Calculus to Interaction Combinators (see also Interaction Nets), and HVM is a parallel runtime for these.
@isodoubIet 4 місяці тому ⁺²
Guy Steele spent several years in Sun's labs designing precisely this (it was called Fortress then, suggesting a "secure fortran"). Eventually he got to some problems they couldn't really solve and the whole project was scrapped.
@IsmeGenius 4 місяці тому
@@VivekYadav-ds8oz Their single-core is in Rust, multi-core in C and massive-core in CUDA, they will have to improve each thing separately.
@commarchinin 3 місяці тому ⁺³
Oh no, Fireship missed that python threads aren't parallel... The Global Interpreter Lock means it's only ever one thread running at once in python, so python threads only help with IO bound stuff
@complexity5545 4 місяці тому ⁺⁴
Bend is basically haskell's design and approach to parallelism.
Interesting.
@nicholasfigueiredo3171 4 місяці тому ⁺⁷
This will be super-useful to me. I was programming in python for machine learning and a lot of time I had to write C++ code and a wrapper but parallelizing is fucking hard. I only did this when I needed to do some sort of data manipulation that numpy/pandas/matplotlib didn't allow(they covered all data analysis up to date) but when you want to modify the sub-arrays inside an image they sometimes fall short. I would need to fallback into python for the manipulation but not kidding python is AT LEAST 3 orders of magnitude slower then whatever non-parallelized crap I write in C++ to do the same job and any non-parallelized crap I write in C++ is already considered slow to be useful in consumer hardware.
@vitalyl1327 4 місяці тому ⁺¹
Write in OpenCL then. C++ is not suitable.
@nicholasfigueiredo3171 4 місяці тому
@@vitalyl1327 The problem was the data transfer since I don't know OpenCL very well so when I did that it was slow swapping the data between the openGL part of the code and the numpy part. That's the reason I used C++ together with PyCuda. I believe they refactored my old code to CUDA PYTHON/Cupy some magician did the manipulation in Cupy directly if I am not mistaken. I will prob have to read a lot of code to understand when I get back to the company in october thanks for the suggestion. Also if you know how to that transfer faster or some documentation showing that would be very helpful OpenGL would allow us to use a couple of AMD GPUS we have sitting there idle even if the perfomance isnt as good
@JohnUrbanic-m3q 3 місяці тому ⁺³
Don't call it "massively parallel" unless it is at least multi-node, and tens of thousands of cores (with the potential to go to millions). Like MPI, for example. This is just multi-threading.
@Oscaragious 4 місяці тому ⁺⁹
"Everyone virtually in this chat will never use language"
We'll see.
@Hebruwu 4 місяці тому ⁺¹
Do you use CUDA, OpenMP, or MPI? If not, I doubt you’ll need it. Seems like a language targeted at heavily computational fields
@Oscaragious 4 місяці тому
@@Hebruwu Yet, it's not.
@isodoubIet 4 місяці тому ⁺¹
@@Hebruwu I use CUDA regularly; roll my own parallelism on the cpu side. Little bit of tbb here and there. There's basically a 0% chance that something like this would be competitive with our custom solutions.
@tui3264 4 місяці тому ⁺¹
most people in chat writes javascript, lol they don't need performance not gpu programming 😅
@MaxHaydenChiz 4 місяці тому ⁺⁶
Fold is more than just reduce. It's "universal" in that you can have full Turing completeness without imperative for loops using folds. They map well to modern hardware. This is a promising idea that's complex to execute. I hope he succeeds though. This isn't something that should require Haskell or other exotic tooling in order to use.
@tophy9865 3 місяці тому
All loops can be implemented using just recursion. The lambda calculus is turning complete. You don't need loops at all. I have no idea what you mean that it's more universal.
@MaxHaydenChiz 3 місяці тому
@@tophy9865 I didn't mention recursion. I was talking about folds. Folds are a universal construct. So are certain combinators. Typically, in toy languages used to study programming language theory, you may have a language with branches and loops, but no recursion. There are other ways to achieve similar results.
@tophy9865 3 місяці тому
@@MaxHaydenChiz I'm still not sure what you mean when you say it's universal. My point is that omitting loops is quite normal for Turing complete languages. All you need is something that can do the same thing. Fold (which can be implemented as a general recursive function) is one way to do this.
@MaxHaydenChiz 3 місяці тому
@@tophy9865 We are talking past each other. Prime said that fold was "just reduce". Folds are not "just reduce". They are more general. Any loop can be replaced by a fold. (This is more or less how Scala implements loops under the hood IIRC.) Not all loops can be replaced by a reduce.
Yes. We could *also* replace all loops with recursion. But that is not relevant to my statement that you can have an imperative language that only understands folds but not loops and not recursion and still be Turing complete.
We could also use the combinators in Haskell's G-machine for that matter. Or any of a large number of other primitives.
So I don't understand what point you are trying to make. But hopefully I've made mine clear.
@tophy9865 3 місяці тому
@@MaxHaydenChiz Can you explain how it's more general? Do you just mean that it works on any stream?
@isodoubIet 4 місяці тому ⁺⁹
This is a language where easy-to-parallelize things are trivial and hard-to-parallelize things are impossible.
@pranavbadrinathan6693 4 місяці тому ⁺⁸
Ah the `bend` keyword seems to be a sort of generator for recursive datatypes? the HOF `fold` (left and right) consumes recursive data types, and `bend` produces them. The example shown in the video at 20:36 creates a binary tree with depth 3, for example. Really cool!
If only these languages were not python like lol. I hate the syntax, as I much prefer curly braces and the flexibility that comes with that than whitespace-scoping in python-like languages. Also, TYPES! I sincerely hope there is a way to enforce strict typing in this language.
@gralha_ 4 місяці тому ⁺³
There's also a haskell/ml-like syntax you can use
@agustinpizarro 3 місяці тому
What do you use braces for? In the end you are indenting your code properly, right?
@pranavbadrinathan6693 3 місяці тому ⁺¹
@@agustinpizarro Preference, mostly. I have used a ton of Python and GDScript, but I simply can't get used to the indentation based scoping. It's so hard to tell which block you are in if you have multiple levels of indentation, vs using braces where it's super easy to tell with just a glance. While it's just plain bad to have too many levels of indentation, braces-scopes are so much easier to distinguish with lots of indentation, especially if you are going from an indented line to a non indented one. Also, accidentally indenting something to a different scope than expected is much easier with indentation, and I've done it far too many times.
Also, if you really need, you can have an arbitrary scope with braces, which you can't with indentation. Python will just error. And yes, I have used this before.
@agustinpizarro 3 місяці тому
@@pranavbadrinathan6693 Unless you have syntax highlighting fir your braces in your IDE and you are looking for the right color of the brace (which not always works), i still wonder how having extra characters added (braces) without any strict indentation rules, helps you find that you are in the right scope.
@pranavbadrinathan6693 3 місяці тому ⁺¹
@@agustinpizarro Distinguishing between 3-4 indentation levels on screen at once is easier with braces imo. I can see a closing brace and know the next line is the previous scope. I usually use 4 space indents, and find it harder to figure out scope changes as quickly.
@BryonLape 4 місяці тому ⁺¹
Multi-threading is straight forward when you stay out of the "shared mutable state" quadrant.
@solagratiasolafide 4 місяці тому ⁺¹
Prime: I don't know that lambda calculus terminology.
Also Prime: I believe it's called a catamorphic operation.
😂
@ansidhe 4 місяці тому ⁺³
Another feeling that I just got: it may be a magic box analogous to Kolmogorov-Arnold Networks that just apply analytical approach to perceptron activation functions to reduce multiple simple ones into a single one that is more complex mathematically. As long as you can keep within the consistent boundaries of mathematical rigor, analytical calculus can work magic for you, like the Fourier transform, convolutions or Taylor series. Or the Karnaugh maps.
@patrickl5290 4 місяці тому
What is analytical calculus in this context?
@ansidhe 4 місяці тому
@@patrickl5290 Whenever the term „calculus” comes up, it mosty gets associated with integrals and derivatives - whereas in fact it has a much broader meaning, as in lambda calculus for example. What I had in mind here though was functional analysis as opposed to numerical methods or matrix algebra. To give a more practical example: if there is a known integral of a function or if a compound or convolution of functions can be analytically transformed into a simpler form over the whole number domain - you win without any doubts. The real life problem is that usually the reality very rarely conforms to ideals and the analytical approach has to give way to numerical approximations that are a brute force approach essentially. Replacing bilions of params with hundreds or thousands but which describe more complex analytical building blocks - like b-splines or trig-functions (for potential Fourier transform approach proposed in the comments here) - would be a great reduction in numerical load.
@Zoragna 4 місяці тому ⁺²
There are programming models that detect dependencies which are "high-level" where you give a sequence of intructions and it becomes a parallel program (OpenMP kinda gets you there, task-based runtimes, ...)
@sharkinahat 4 місяці тому ⁺⁶
Me brain too smooth for this.
@tomlee7073 4 місяці тому ⁺¹
1 problem: If you're someone that programs in Bend, are you then known as a Bender?
@sentinelav 4 місяці тому ⁺⁶
I'm slowly reading through the original paper on Interaction Combinators and it's blowing my mind. If you can comprehend it, Bend represents the future of high-performance software. Moore's law has found a stopgap in the domain of groundbreaking theory.
@palmberry5576 4 місяці тому
??? It doesn’t fundamentally speed up anything though, it just seems to represent more things as trees (kinda? Not even though really?)
@sentinelav 4 місяці тому
@@palmberry5576 Have you had a chance to check out the original paper? They've essentially created the parallel equivalent of a Turing machine.
@Takyodor2 4 місяці тому
It's not going to get better performance compared to handwritten parallel code. The promise is to make it easier to write fast-ish parallel code.
@sentinelav 4 місяці тому
@@Takyodor2 I don't know, the ability to massively parallelise even basic arithmetic across 16k GPU threads is a massive challenge in any coding language.
@Takyodor2 4 місяці тому ⁺¹
@@sentinelav It's really not. If you know C++, and read about the memory model of GPU's for a few hours, you could easily parallelize problems with inherent parallelism in CUDA (or OpenCL). On the other hand, if the problem _doesn't_ trivially translate into massive parallelism, you will either A) not be able to utilize all those GPU threads, and get better performance with fewer CPU threads, or B) perform a whole lot of extra work on the GPU, synchronizing and performing duplicate calculations, and get better performance with fewer CPU threads. Many threads does not mean automatically faster!
4 місяці тому ⁺¹
About the rust underscore issue @13:00 - set the type of the underscore to the type of the returning function and when it changes you get the error as intended. For example let _ : Result = run_fun(); You can also tell clippy to bug you if you forget to do so.
@TheLouisEric 4 місяці тому ⁺³
They missed a great opportunity here. The main process exit command should have been over();
@xslashsdas 4 місяці тому ⁺⁵
Bend Over LMAO
@andrewdunbar828 4 місяці тому ⁺²⁰
a transpiler is a compiler. it's a subcategory.
@trickeddev 4 місяці тому
w take
@nevelis 4 місяці тому ⁺¹¹
Trans piler awareness month coming up
@vitalyl1327 4 місяці тому
I hate idiots who even use the term "transpiler". Dumb code monkeys invented it, it has nothing to do with any academic CS.
@oldtools6089 3 місяці тому
Thank you. Sometimes I feel like everyone just wants there to be compilers and runtimes. Get freaky. Believe in yourself.
@notsojharedtroll23 3 місяці тому
@@nevelis😂😂😂
@MMetalRain 4 місяці тому ⁺¹
I would guess the downside is memory usage, I'd assume Bend needs to build full execution graph (interaction net) in memory and then make at least partial copies when net is reduced. More traditional algorithm can loop over structures and modify them in-place. However not having to think parallelism so hard can produce unexpected speedups in some algorithms.
@nicgeorge6126 4 місяці тому ⁺³
Programming languages are becoming as common as JavaScript frameworks
@dead-claudia 3 місяці тому ⁺¹
inb4 bend compiles to js
@dsdy1205 3 місяці тому ⁺¹
Finally, something to run my teetering tkinter monolith at faster than 5 seconds per frame
@agustinpizarro 3 місяці тому ⁺¹
FP with thread optimizations has entered the room.
@MorningNapalm 4 місяці тому ⁺¹
LOL, that async comic is like smoking. There were 99 non-smokers in a room. Then a smoker entered. Then there were 100 smokers.
@dillbourne 3 місяці тому
"Check out this new thing! it's easy! just python!" - proceeds to write a hieroglyphic script that makes python look sane.
@ajmgdaj 4 місяці тому ⁺¹
In all the outrage the one chatter goin "We STUPID?" somehow ... got me right in the chuckles ... seriously, when you are reading this it is still going on...
@marcsh_dev 11 днів тому
A similar (sounding at least) language, is Halide.
Its built on top of the dreaded C++, but it is really cool. Its a bit more "manual", but provides a straight forward way to test different layouts of the same data.
@paxdriver 4 місяці тому ⁺¹
Parallel is only hard if you're worried about linearly scaling ram and power, otherwise you could just make a hard copy of every process at the time of splitting, respawn on failure.
@JackDespero 3 місяці тому ⁺¹
I want to learn it to call myself a code bender.
@shiccup 4 місяці тому ⁺²
Doesn't this mean that nontraditional accelerator cards are viable now? Like cards from tenstorrent which are specifically for ai. The main disadvantage was that it didn't have cuda but if you run bend you would actually take full advantage of all the parallel cores so now you get better than 4090 performance for 1/4 the price???
@rdubrdub 4 місяці тому ⁺³
Seems like the peak Advent of Code language
@jimhrelb2135 4 місяці тому ⁺¹
So a more structured MapReduce.
It's so good because now you get the benefits of a type system among ADT abstraction, with the benefits of how many infrastructure is in place ready for MapReduce-like environment.
Big if true
@gavinpaull1793 2 місяці тому
I’m definitely gonna try using this. I work in data science at a university and I think this could definitely speed up my terribly implemented parallel python scripts for linear algebra and entropy analysis.
@EvanEdwards 4 місяці тому ⁺¹
My late wife could have used this in her daily work. She was obsessed with h bar (Planck's constant over 2π, or Dirac's constant) because she used it all the time in her code. When she died, a military officer came to my door and asked me very politely to let them know if she had any notes or material from her work in DC so they could come and collect it. If you're not doing that kind of work, it is unlikely you'll be using Bend.
@brain.yogurt.5.56 4 місяці тому ⁺²
One point that not mentioned at all is that python doesn't run in parallel even if you have parallel code, because of the Global Interpreter Lock
@oxyht 4 місяці тому
Multiprocessing?
@agustinpizarro 3 місяці тому
Until you use 3.12... or numba
@darianschnose6177 4 місяці тому ⁺¹
I'm absolutely going to use it! Not professionally probably, but just because it's neat. I'll probably write a couple applications then go back to JS
@Kiyuja 4 місяці тому ⁺¹
while we probably wont just jump head first into this, maybe it lays the groundwork for something amazing in the future. What I meant to say is: Bend me over 'til I fold.
@edusandalo 4 місяці тому ⁺⁶
22:48 the idea of the language is exactly the opposite of what you're saying here! bend's whole existence is to try to make parallelism simpler, so you _don't_ need the adult education in mathematics to use it
@TurtleKwitty 4 місяці тому ⁺¹
He meant more in the sense of, if you're not a mathematician you likely don't have a usecase for this no matter how cool it sounds
@kiseitai2 4 місяці тому ⁺¹
It would be nice if this lead to another form of optimization in compilers. Although it looks like the user is expected to prime the program by writing an operation in a specific form, maybe someone can map common patterns into patterns that the hvm can then parallelize. The computation kernels might need a further pass in which you turn the instructions into simd compatible operations so our cpus can cook hooooot. Haha. But yes, I won’t be using it directly. I struggle as it is to convince people to let write a small utility in Rust.
@isodoubIet 4 місяці тому
It's a pure functional language so yes math phds are the only audience.
@edusandalo 4 місяці тому ⁺¹
@@isodoubIet we're trying to make the language so you don't need a math phd to use it! for example, the `fold` operation could be defined mathematically as a catamorphism like primagen said in the video, but we're adding plenty of syntax sugar to make it less daunting
@larjasoul 2 місяці тому
if the package manager ends up slapping, this language will take over
@souchy7 4 місяці тому ⁺¹³
Fucking nightmare without types
@atijohn8135 4 місяці тому ⁺²
a pr0n bot copied your comment and got promoted to the top of the comment section
@souchy7 4 місяці тому
@@atijohn8135 that's funny
@ShrinathBalakrishnan-rs2ko 3 місяці тому
There is a part of my codebase that has a complexity of O(n4) which to some degree has been optimised using Dynamic Programming (i.e, caching) to O(n3), but this calculation has to run many times a minute and with a growing dataset, I was considering moving this to CUDA.... After a lot of research, I hit upon the C++ CUDA wall. I know that if I stepped into it, even for this part of the codebase, it would be a minefield area for the rest of the team including myself if I don't look at it for a while. This sounds interesting to me. Thanks mate!
@nilssab 3 місяці тому
Given the focus on tensors in new hardware, it would be really cool if it bent the code so that anything that could be written as tensor multiplication, is.
@JaredHatesGames 4 місяці тому
The start of this video is an extremely concise summary of learning any programming language
@adamrak7560 4 місяці тому ⁺⁵
I see one big problem: Bend solves the easier part of the problem only.
If you give me a complicated algorithm written perfectly in Bend, I can _easily_ port it into CUDA to run on thousands of cores.
But actually reimagining the algorithm into Bend can be extremely difficult, because the whole parallelizing part is difficult.
@vidal9747 4 місяці тому ⁺¹
I think the whole point is not making it inherently sequential. But, I would let them update a little and mature before giving it a try.
@sburton84 4 місяці тому
Yeah, it's not really that different to what something like SYCL does, except SYCL is used from C++ rather than being its own language. It can run parallel tasks on various backends including CPU or GPU, but the hard part is always going to be translating your algorithm into a suitable parallelised form.
@user-shilov- 3 місяці тому
With Go I used multithreading for a data extract as a beginner and honestly it was pretty easy. It was only extracting chunks in parellel because of some dependencies but go routines made it pretty simple. I didn't know how lucky I was to be using Go I guess.
@seedmole 3 місяці тому
Shaders are sick, I delved into them earlier this year, nothing else ever felt so complicated to get started.. in the end you get multiple layers, mainly one doing all the vertices in parallel and one doing all the pixel colors in parallel, and it uses a notation basically the same as running through a for loop across the x axis with one inside it for running across the y axis (or vice versa, i forget which order they actually use -- and i'm even more lost on how the vertices are all indexed).
@newsofthenerd 4 місяці тому ⁺¹
Glasgow Haskell Compiler mentioned! Bend is just functional programming.
@DrJaneLuciferian 3 місяці тому
I can see a use for it in trading strategy backtesting. I've been looking better solutions than jumping into the deep end with elixir.
@Dekharen 4 місяці тому
The point about parallelism is SO true.
I mean it's true of so many interactive elements of code, or even libraries at time. Like, interacting with the raw output from a speaker in Rust; look at the example and it's the easiest thing in the world, but try to do something with it and you run into race conditions, you are moving data around in closures, and need to keep track of them while they get multithreaded ; you're probably going to end up multithreading both receiving the data and processing it, while needing for it to be in order, and so many other possible issues.
@EvanBoldt 4 місяці тому ⁺⁷
0:54 single letter variable names should never be used and showing off a demo with them is wild.
@beofonemind 4 місяці тому ⁺¹
Been waiting for you to go over this.
@picalquers 4 місяці тому
Its from the etherium dev that already solved all the worlds blockchain problem in magical blockchain scam land ...
@luigi4622 3 місяці тому
i loved how wonderful was the conclusion in the end of the video 🤣
@Raziaar 4 місяці тому
Even a well known algorithm, name your shit better than single character things, jesus.
@klex3905 4 місяці тому
This is actually kind of special. Maybe most people don't target algo's in everyday work, but for those people who do, like game physic engineers, maybe some other who need to work with some evaluations/simulations etc.
This makes their lives so much more incredibly quicker, MAYBE less reliable, depends on the actual output of it's compiler. But I'm here for it, physics rendering in general, if this optimizes instead of just 'facilitating' could honestly change a lot quicker than we've ever seen.
@paladynee 3 місяці тому
I can't wait for the time where they are done optimizing and the thread creation overhead is the only culprit of performance
@lukeskywalker7029 4 місяці тому
Great final take at the end. Cant wait to put my hands on it and benchmark it against CUDA and Triton!
@krilektahn8861 4 місяці тому
Something to consider would be the substantial power draw increase for 5x faster by spawning 1000x more threads. An M3 doesn't peak anywhere near a 4090.
@francescotomba1350 4 місяці тому
Parallelism is easy, untill you have a single, multithreaded machine. When you go REALLY parallel, such as in HPC clusters using MPI, here, here you experience real pain. Our heroes are MPI developers. Trust me.
@MoonDweller1337 4 місяці тому ⁺⁵
Just learn Haskell Primeagen
@jly_dev 4 місяці тому
Python with one-letter variable names and no type hints drives me bonkers
@SL3DApps 4 місяці тому ⁺¹
How would you even handle UI updates in a fully parallel language? Every UI component needs to be rendered separately and need to allow updating at specific synchronized times. This language seems very unlikely to help with front-end and be more of a backend booster.
@osiristeam6959 4 місяці тому ⁺⁴
If you can replace loops with "folds", then you can replace "folds" with loops and achieve a more developer friendly experience, instead of needing to wrap my head around recursion for every loop I want to write. I guess there is a reason why they don't do this, feel free to tell me.
@theshadower5565 4 місяці тому ⁺²
Because when talking about consuming recursive values, imperative loop is a specialized version of Bend's fold, that only handles one branch at a time. On the other hand, fold consumes all branches of a recursive structure (like a tree) in parallel, which can't be achieved like a good old loop which runs linearly.
@c.Orange 4 місяці тому
"Currently not working on Windows, please use WSL2 as a workaround." 😭
@kevinmcfarlane2752 4 місяці тому
That applies to most new tech! Tends to be macOS and Linux only, from a dev perspective. E.g., blockchain SDKs. And often WSL2 doesn’t quite work and you need to go full Linux VM.
@ehhhhhhhhhh 4 місяці тому
6:05 First 16 threads, 12x speedup. 1000x more threads... get a 4x speedup. I feel like I'm supposed to be wowed by their 57x number, but I'm mostly puzzled by why 1-16 gets such a huge speedup, but the GPU gets the expected speedup of only 4 more cores.
@kevinmcfarlane2752 4 місяці тому
See the actor-based Pony language. Taking a long time to get to 1.0 though.
@zolitakacs6306 4 місяці тому
pytorch with extra hoops
@schrummy14 4 місяці тому ⁺³
I do want to see where this goes, but it would need to incorporate MPI to have any hope in replacing C/C++.
Edit:
Looks like the current max int (2024/05/24) is a 24 bit int. For CFD simulations, that may be able to handle a mesh with a few million cells, but bend would need to support at least 32 bit ints (64 bit ints for 1 billion cell meshes). Hopefully, just early day issues.
@isodoubIet 4 місяці тому
> Looks like the current max int (2024/05/24) is a 24 bit int.
wat
@oat1000 Місяць тому
This feels like knot theory to me
@mfc1190 4 місяці тому ⁺¹
Parallelism isn’t hard if you can wrap your head around the units of data you can parallelize, and where they need to converge. Define a unit of data (can be a million points, can be one point), fan out, then decide when you need to fan back in when that unit of data becomes dependent on other units of data. If it never does, then don’t wait for other data.
@zweitekonto9654 4 місяці тому ⁺²
This has the same energy of "if you're homeless just buy a house"
@mfc1190 4 місяці тому
@@zweitekonto9654 I don’t understand why
@psychodriveskip 4 місяці тому
Oh, we are making more HPC languages? Nice.
@mpilosov 4 місяці тому
spent a day trying to write a matrix dot product in this. absolutely infuriating.
@CèliaCasanovas 14 днів тому
tbh running calculations in parallel in c#, kotlin or c++ without having to do much is already fairly easy. or so i thought
@marcosmoreira4277 4 місяці тому
I'm gonna make a lib on it to measure weather conditions, it's gonna be called "Air Bending"
@gammalgris2497 4 місяці тому
The memory footprint would be interesting. More interesting is at which point there is a real benefit, i.e. how to write the code to really benefit from parallelism. The small example (see formulas) would rather cause more overhead than benefit.
@worgenzwithm14z 4 місяці тому
immutable by default data and controlled side effects makes parallelization easier
@wendersonbarros1900 4 місяці тому ⁺²²
1 more language made by a brazilian 🇧🇷
@MeriaDuck 4 місяці тому
In my favorite language, Java, I did a short test LongStream.range(0, Integer.MAX_VALUE). parallel ().sum(). One hundred times in a 3 seconds.AMD 3900X CPU.
a) thats 30ms average
b) Only specify what I want, not how to do that.
Plain old hotspot jvm (openjdk 22) doing actual magic.
Give that, I'm eagerly waiting for java to unlock gpu programming.
@SilverDashie 4 місяці тому
It's always dumb when you can't read the code because types magically don't exist.
@ds-1111 4 місяці тому
"Just run this code without any modifications and it can run on Cuda" - Yeah, but the code is using special data type and is written in case statement when it would normally be a loop in any other language. So you have to use the entire new language and rethink everything in terms of case statements. :/

Наступне

Автоматичне відтворення

Python Sucks And I LOVE It | Prime Reacts