When he started with "I created a new language", I was expecting a boring egocentric talk about a language created because someone hates a specific problem of a specific language. For my surprise, his thoughts on how different languages work was outstanding! ROC looks promising.
Yeah! I suspect that given the trajectory of hardware and the steadily improving language development toolchains, language development may be growing stronger as a technique after decades in the wilderness. This doesn’t solve the truly hard problem of language popularity that constrains progress in industry but hopefully with focus on interop, a polyglot strategy can enable teams to choose languages for results more often and be less confined by mandates for consistency and choosing whatever will give employers the biggest pre-trained prospective employee pool. One day!
@@chromosundrift "This doesn’t solve the truly hard problem of language popularity" This is not a hard problem, because its a social problem, not an engineering one. The real hard problem is how to scale knowledge learning. One of the few good things about JS was that you kind of started your carreer on a FP language, that until they were contaminated by classes.
This guy is a legend! I have always wanted to make a functional language with all those optimizations to beat imperative ones and he is doing just that!
Yeah, I though about doing those optimizations when I was writing F# code, what if I could use F# everywhere and ditch C++ for those specific parts I needed performance.
rfeldman is one of the most thoughtful people i've seen in the software talk/blog space that i've come across. he does a lot of rhickey-style hammock time, you can tell. can't wait to see where this project goes.
Another language that implements this FBIP (functional but in place) technique is Lean. It is mostly a research language with a very advanced type system for doing mathematics in.
I might be a bit biased, but the situation he's describing at 23:30 really sounds like the exact place a rust-style borrow-checker could fix things. If you're already running all this static compile time reference counting stuff, why not communicate that back to the user in the language design itself by enforcing they choose between array2 = array.clone().map(... or array2 = array.into_map(... (with whatever syntax makes more sense given context).
Just 4 words: Give Us The Compiler Seriously, as a c++ fanboy, I very much like this endeavor. I hope, you will succeed in competing with heavy-duty langs
Well they're safe in the sense that they're isolated from the rest of your program but inside your ST monad you can absolutely have exactly the same bug as an imperative program, and if you're using accessors that don't do bound checking (for speed), you have the same problems as languages like C++. It's better than those languages in the sense that you can limit the code that can do unsafe things that you have to be really cautious of, but I wouldn't really call that "safe mutable arrays".
It does - this was a mistake! I decided to add the last row at the last minute and didn't realize it retroactively made the preceding row inaccurate. 😅
Small note - CPython technically executes bytecode on the CPython VM. Sure, the compilation step happens at execution time, but it's not executing the AST.
Yeah, but that bytecode is so close to the AST that it’s really just a more compact representation, much closer to interpreting the AST than Java, C# or Erlang.
@@aDifferentJT To be fair, C#'s IL and Java's JVM bytecode are also close to the original language's AST (the CIL is object oriented and is "just" the result of lowering higher-level features like generics or async/await) that the comparison has more meat to it and first meets the eye. The difference is that Python directly interprets bytecode without optimizations (which is why projects like PyPy exists) whereas Java's and C#'s runtime use JIT and optimizing compilers as a second step before the actual execution. Quite famously Python doesn't do TCO because it makes it harder to debug as it completely transforms the runtime execution flow from what's written in source, and the decision was taken instead to be more friendly to beginners.
@@SolarLiner it's all a rather fuzzy example as at no time does .NET execute the byte code, it translates everything to machine code before any execution. So it's on the same level as LLVM and LLVM's bytecode
@@SolarLiner Makes me wish they did, but only as a toggleable feature after debugging. I like python and mostly program in it for convenience, but the speed losses hurt and working with cpython is rough compared to intelligent optimization. Ill look at PyPy, ill take whatever resources i can get.
You can get benefits of LLVM compilation AND utilize all the runtime hotspot information that languages that arent a static binary can. This exists in Java and the JVM languages like clojure and kotlin under the VM "Azule plateform prime". This utilizes all the information you get from seeing how the program is being used in production. And is able to combine that information with the ability to convert hot live code into optimized llvm compiled code. You mention garbage collection pauses. That same VM doesnt have pauses when garbage collecting.
APL is a value-based language, but it does allow side-effects, particularly in indexed assignment, ie mutating some values of an existing array. However, if I write B
GC pauses are a function of the implementation and in more advanced runtimes, the algorithm can be chosen or tuned to reduce or remove the prevalence or severity of pauses. For example, the Hotspot JVM has had GC algo options for many years that include aggressively continuous/parallel GC that provides minimal GC latency. "Stop the world" GC used to be the only option but those days are long gone. The significance of GC pauses can be very application-specific and it will usually be traded off against global throughput or workload-aware opportune moments for minimally disruptive GC timing.
@Tree by a lake I believe you are mistaken on this point. Also I have been gradually convinced that the theoretical if not already the practical average performance of managed runtimes with GC and dynamic JITs, supple memory models etc. are higher than the traditional static compilation and manual memory management of C and friends. Certainly this is where we are going on modern CPU architectures and internally, they have so much going on these days which extends performance using tactics isomorphic to these kinds of magic that what we learned about CPU performance decades ago is oversimplified to the point of being misleading. Embedded systems and microcontrollers may still exhibit the old school performance profiles, but for high end metal, ultimately the unfortunate lesson is “you’re never going to understand how it works”. Bringing to mind the analogous problem of Machine Learning models compared to explicitly programmed algorithms, we now must make do with empirical measurement to confirm performance and behaviour since staring at code and knowing big O just doesn’t tell us how well it will run in the wild.
2 роки тому+1
@@chromosundrift GC and JIT are pretty orthogonal topics.
I think making an interface between Elm and graalVM would probly be an easier way to get a binary compilation method. Rather than remaking the whole language with a different compile target.
This is fantastic! I appreciate this as someone just starting to work on the same kinds of problems. I actually had the same in-place mutation idea to get around performance, another idea is just to allow move semantics to explicitly allow it, but it's not tested yet.
Boxing is a value representation that "wraps" CPU-native values like floating point numbers into some metadata, adding an overhead of some bits/bytes and making the access to the actual value less direct, involving a lot more machine code for even basic, CPU-supported operations like multiplying two floats (the container specifying, is it a float? an integer? signed or unsigned? or maybe some kind of nil? or a distant address where the actual value resides?). It's not about whether something lives on the heap or on the stack
@@technicalmachine1671 Boxed values are only necessary if you need a dynamic data type. Do you know how often in my life I needed to make a type check between a float and an integer? Zero times. This is overhead for overhead's sake.
7:11 leak is a surprisingly good strategy for command line applications that run for a couple of seconds on behalf of the user. just leak the memory and let the operating system clean up pages, its very, very efficient
What's the reason for requiring that the array not be referenced elsewhere, proven statically or via reference counting? If there's another reference, it's super cheap (O(N)) to copy the array into another one, now by definition having no other references, compared to the actually expensive O(N log N) work of the sorting. Same with other tasks such as matrix multiplication. On the other hand, there's a whole depth to memory locality related optimizations. Simple example: matrix multiplication is faster with one of the matrices transposed, even if the transpose takes an extra operation, because values read consecutively will now be contiguous in memory, resulting a better use of the CPU cache hierarchy. More complex example: FORTRAN packages divvy up large matrices into smaller matrices recursively, in part for memory locality and for other reasons, and then combine the results of that. There's some fine grained considerations that would be hard to automate via low hanging fruit optimizations like allowing mutation if it's guaranteed that nobody can notice
I'm assuming that copying won't *always* be faster. So if imagine the challenge would be to identify when copying is a speed increase and when it is unnecessary.
If something is never referenced then what is the point of changing it? The thing of value can’t then be the object but some derivative.
2 роки тому+2
Logically, the algorithm as coded does a copy for every swap. Doing an actual copy at runtime for every swap is not going to be cheap. So they have to detect that they can safely do the swaps in place. Yes, before you run the whole algorithm, you can do a copy of your input, and that will 'only' affect the constant factors in your O(n log n) runtime. Though, of course, if you are happy to do sorting that's not in-place, why are you using quicksort? Especially in a functional language.
How OPPORTUNISTIC IN-PLACE MUTATION any different than Swift’s copy-on-write semantics? Swifts collection types (also all primitive types) are also immutable (semantically) by default and aren’t copied unless more than one variable pointing at it.
"Can we write a quicksort in pure functional language that is beautiful, elegant and concise..." Yes, we definitely can partition arr pivot low high = loop arr low low where loop arr parIx i | i >= high = (swap arr high parIx, parIx) | arr[i] < arr[pivot] = loop (swap arr i parIx) (parIx+1) (i+1) | otherwise = loop arr parIx (i+1) (This is a translation of the JS function, that gets optimized into inplace updates.)
At the compiler level I don’t see the difference between Swift and Rust. Both use llvm to compile to machine code n? And the opportunistic in place mutation seems very similar to what Swift does with its ‘copy on write’ strategy for mutating value types.
2 роки тому+2
> At the compiler level I don’t see the difference between Swift and Rust. Both use llvm to compile to machine code n? Haskell's GHC can also use LLVM. Doesn't mean Haskell and Swift and Rust all have the same compiler..
@@DavidAguileraMoncusi he is calling qsort twice recursively, so no its not missing.
2 роки тому
@@JosefCZE If you want something idiomatic that does the same comparisons as quicksort, you'd probably want to do something that builds a binary search tree. That's very idiomatic in Haskell. Alternatively, if you just want idiomatic sorting of linked lists in Haskell, something resembling bottom up merge sort might be the way to go.
Haskell has some support for opportunistic in-place mutation via stream fusion. I would not be surprised if one could write a similarly performant code with purely functional vectors, without “cheating” with explicitly mutable vectors :)
I was surprised when i learned that FP forbids all mutability. I figured that the point of a pure function is predictable effects, i.e. the result doesn't depend on anything besides its input, and there is no effect on the program besides the returned result. The general state of the program is not mutable by most functions. That's great. But requiring that every data structure within the local scope of a function also be immutable sounds extreme. Do we really gain anything from eschewing for-loops because we prohibit in-place incrementation of variables on principle? Why can't we just have the best of both worlds without a fuss?
You can replace most mutable structures for immutable ones without much hasle. For loops are a great example of something that is replaced for a for-loop function that covers almost every use of a imperative, mutable for statement.
I think making everything immutable stems from some of the theoretical foundations of FP. They were interested in simplifying programs a LOT, going so far as to compile languages into a "point-free" form that was not just without variables, but where the functions had no arguments! (SKI combinators). I like your idea; I think a language where all function arguments and return values were immutable, but the functions themselves could have mutable local variables, would give you a really great balance between efficiency and safety.
A few corrections: 3:38 Python is usually not interpreted at run-time. The first time, it is compiled into bytecode and stored, and from here on out, the bytecode is executed unless a change in the source code requires a new compilation. As for the LLVM distinction, it is irrelevant, it's a compiler like gcc or others and has about the same stages and is as good for optimization, no real speed boost there. Lastly, "un/boxing" usually refers to switching between interface/base class and actual object representation. In some language it is done on the heap. Here it seems the speaker is talking about heap allocation (not boxing per se), which can have additional cost when system calls are necessary. When it's not necessary, it doesn't need to be slower than the stack, both are indirect accesses to memory.
There’s a great talk between rich hickey and Brian Beckman where rich goes in to pretty good detail how he implemented his opportunistic mutation on clojure. Fascinating stuff.
Clojure doesn't have opprotunistic mutation, its all on the user to call functions to convert from one data type to another, you call `transient` to make a transient collection and `persistent` to go back. The Clojure compiler isn't nearly advanced enough to do something like this automatically.
One of the things I love is thinking of stuff like this, waiting four years, and seeing somebody else do it. I've been thinking about this since I started programming professionally, and you pulled it off! :+ )
It's always cool to see these thing happen. Thou didn't the gent mention that non of the functions ran are optimized for the graph around the 19:00 mark? I wonder if that's still true for the specially optimized version; if not, it's not quite a fair comparison is it?
2 роки тому+2
Python has interpreted bytecode. It doesn't interprete the source directly.
Any particular reason why you didn't try F# which already supports Elm like programming for web and mobile, and has a similar syntax as Elm? Academic research languages...I assume you put F# in this category, in which case I can only assume you never looked at it. We're about to release our third pure F# application to the public. I'd suggest you check it out because Roc surely seems to match the term "re-inventing the wheel" :) Kidding aside, good presentation as always.
@@klasus2344 Is compiling to binary a useful feature or a property? Not trying to be arrogant, but whether it compiles to binary or not means little to me. The features you get from it compiling to binary may possibly be worth while. What are your main concerns if I may (I'm curious)? CPU cycles? Mem footprint?
@@islandman9619 not "compiling to binary" means there is a runtime required. That puts a bottom limit on how small and simple the code can be, and rules out uses such as device drivers or embedded code.
I believe he mentions F# in one of his other talks, though I can't recall what his reasons were for being opposed to it. It may have been simply that F# is functional-first, not pure-functional. It's also worth noting that this talk is *not* a comprehensive overview of Roc, and so there are many language features which aren't show off here which differentiate it from F#.
@@alxjones One reason he mentioned was that the ecosystem around F# is mostly not functional, because you end up using all C# libraries most of the time.
well, rails as a framework is the mother of all overhead, you can achieve 10x performance and remain in ruby by dropping rails and switching to roda and sequel or even a pure rack app. In many cases making multiple tiny rack apps could be a lot faster.
One more optimization to consider: as all functions are pure, you can also schedule some of them on a GPU when you have to run a lot of them in parallel.
That doesn't really sound like the kind of parallelism where GPU is going to excel, though (at least on current hardware). Unless big contiguous chunks of memory are being crunched more-or-less in lockstep , parallelising over CPU cores is probably going to perform better.
Ha-ha, "schedule"! You mean have a whole new compiler backend, compile to a totally different execution model, arrange all the data transfers between the main CPU code and GPU workers automatically and hope it works well.
Not if you still think form yesterdays limited paradigms... In reality, yes you can squeeze a lot more performance on certain platforms, like Apple's M-series SoC-s with unified memory architecture. Especially when we are talking about cryptographic work. The kind that cryptocurrency mining uses.
Well, it doesn't need to be a GPU, but yeah, that's the main point of the "resurgence" of functional programming lately. Be able to parallelize better. Although, as you say it, seems like doing things with "pure functions" it's easy.... it is not.
I believe in high-level "declarative" language + optimization, to eventually always beat handcrafted in both speed of execution and speed of writing and especially in correctness. - We are not there yet, and maybe AI will be writing all SW before we get there, although I would say that would still count as "computer transforming description of what we want into an efficient compiled binary" - Anyway, yeah: Roc looks really promising, great talk, very excited to see it's future!
So it's a highly tweaked Haskell implementation and a highly optimised Roc version being compared to basic implementations in imperative languages. My take away is this concedes imperative out performs FP...
Well, swapping 2 values in memory will always be faster than copying an entire 10 millions values array with 2 values swapped. So yes, pure FP will always be unefficient for big array sorts if unoptimized. There's no debate about that. But the video shows that you can use the pure FP paradigm with its safety guarantees and still be competitive if you use heavy optimization at compile time.
The Roc version isn't optimized at all except by the compiler. Our goal was to port the imperative algorithm as directly as possible, given the language primitives available!
@@rtfeldman Ah this answered a question i had. I am a lazy programmer so this matters a lot to me. thx. since its possible to write it identically to a for loop, can you then just make the for loop and let the compiler do the hard work of translating it into a recursive function? Sry if this is a dumbish question.
outperforms at certain tasks. but maintaining code across a decade with multiple coders is a different order of performance and that's where the argument can be made that FP has it in spades over imperative coding with side effects.
Scala is a mix of OO, Imperative and Functional. It can do a lot of things, and still sacrifices some neat tricks. Type inference is fairly limited, as an example. :) And the native compilations is pretty broken, as it always was for JVM languages. It could be done properly with the GraalVM. "The most functional language" is probably something like Lean, Agda, Haskell or that sort.
When he started with "I created a new language", I was expecting a boring egocentric talk about a language created because someone hates a specific problem of a specific language. For my surprise, his thoughts on how different languages work was outstanding! ROC looks promising.
Yeah! I suspect that given the trajectory of hardware and the steadily improving language development toolchains, language development may be growing stronger as a technique after decades in the wilderness. This doesn’t solve the truly hard problem of language popularity that constrains progress in industry but hopefully with focus on interop, a polyglot strategy can enable teams to choose languages for results more often and be less confined by mandates for consistency and choosing whatever will give employers the biggest pre-trained prospective employee pool. One day!
@@chromosundrift "This doesn’t solve the truly hard problem of language popularity" This is not a hard problem, because its a social problem, not an engineering one.
The real hard problem is how to scale knowledge learning.
One of the few good things about JS was that you kind of started your carreer on a FP language, that until they were contaminated by classes.
Looks really interesting, like what I've learned so far of the language
@@monad_tcp social problems are usually more difficult to solve than every other problem because people have quirks, wants, needs, etc.
I just realized that Roc's logo is a bird made from the tangram from Elm's logo. That's really cool.
This guy is a legend! I have always wanted to make a functional language with all those optimizations to beat imperative ones and he is doing just that!
Yeah, I though about doing those optimizations when I was writing F# code, what if I could use F# everywhere and ditch C++ for those specific parts I needed performance.
I always enjoy Richards talks. Very smart and a very good communicator. A rare combination.
rfeldman is one of the most thoughtful people i've seen in the software talk/blog space that i've come across. he does a lot of rhickey-style hammock time, you can tell. can't wait to see where this project goes.
6:24 -- hey, that's my paper! :D
This is going to change the perspective of functional programming languages. I’m so excited for the future of FP
this is insane! i can't wait to try out this language
Doesn't Swift has all the same optimisations?
@@thedeemon Hmm.. I don't think it does FBIP optimization but the bigger problem is that it's not purely functional.
That was an impressive talk. Will watch out for more about the roc-lang
Another language that implements this FBIP (functional but in place) technique is Lean. It is mostly a research language with a very advanced type system for doing mathematics in.
And also Koka.
Appreciate this talk so much. I am ready to Roc.
I might be a bit biased, but the situation he's describing at 23:30 really sounds like the exact place a rust-style borrow-checker could fix things. If you're already running all this static compile time reference counting stuff, why not communicate that back to the user in the language design itself by enforcing they choose between array2 = array.clone().map(... or array2 = array.into_map(... (with whatever syntax makes more sense given context).
Roc beating Go at quick sort has sold me on this language. Like what???
Just 4 words:
Give
Us
The
Compiler
Seriously, as a c++ fanboy, I very much like this endeavor. I hope, you will succeed in competing with heavy-duty langs
Send him an email. Youll Get access to the repo. You can build the compiler locally already
It should be noted that Haskell does have safe mutable arrays with Control.Monad.ST.
Well they're safe in the sense that they're isolated from the rest of your program but inside your ST monad you can absolutely have exactly the same bug as an imperative program, and if you're using accessors that don't do bound checking (for speed), you have the same problems as languages like C++.
It's better than those languages in the sense that you can limit the code that can do unsafe things that you have to be really cautious of, but I wouldn't really call that "safe mutable arrays".
@@chaddaifouche536 All safe language features have unsafe implementation details, it is a distinction without a difference.
@4:10 Doesn't Swift also use LLVM to compile to binary?
It does - this was a mistake! I decided to add the last row at the last minute and didn't realize it retroactively made the preceding row inaccurate. 😅
Haha yep, the creator of Swift being the creator of LLVM.
I didn't knew about this. I loved it, specially because I am a F# programmer, and Elm is so close that I can write a source-to-source compiler.
Languages like clojure have tools that allow for near full mutability performance, without the risks and problems that come with mutability.
Small note - CPython technically executes bytecode on the CPython VM. Sure, the compilation step happens at execution time, but it's not executing the AST.
Yeah, but that bytecode is so close to the AST that it’s really just a more compact representation, much closer to interpreting the AST than Java, C# or Erlang.
@@aDifferentJT To be fair, C#'s IL and Java's JVM bytecode are also close to the original language's AST (the CIL is object oriented and is "just" the result of lowering higher-level features like generics or async/await) that the comparison has more meat to it and first meets the eye.
The difference is that Python directly interprets bytecode without optimizations (which is why projects like PyPy exists) whereas Java's and C#'s runtime use JIT and optimizing compilers as a second step before the actual execution. Quite famously Python doesn't do TCO because it makes it harder to debug as it completely transforms the runtime execution flow from what's written in source, and the decision was taken instead to be more friendly to beginners.
@@SolarLiner it's all a rather fuzzy example as at no time does .NET execute the byte code, it translates everything to machine code before any execution. So it's on the same level as LLVM and LLVM's bytecode
@@SolarLiner Makes me wish they did, but only as a toggleable feature after debugging. I like python and mostly program in it for convenience, but the speed losses hurt and working with cpython is rough compared to intelligent optimization. Ill look at PyPy, ill take whatever resources i can get.
Good would the Haskell version of QuickSort perform in roc? I mean the short one, since that's what I would write in a functional language
You wouldn't want to write the short 'QuickSort' in Haskell. Something like merge sort is much more idiomatic in Haskell.
You can get benefits of LLVM compilation AND utilize all the runtime hotspot information that languages that arent a static binary can.
This exists in Java and the JVM languages like clojure and kotlin under the VM "Azule plateform prime".
This utilizes all the information you get from seeing how the program is being used in production. And is able to combine that information with the ability to convert hot live code into optimized llvm compiled code.
You mention garbage collection pauses.
That same VM doesnt have pauses when garbage collecting.
APL is a value-based language, but it does allow side-effects, particularly in indexed assignment, ie mutating some values of an existing array. However, if I write B
One thing to do it dynamically, looking at the current reference counter value. A bit another thing to make the decision at compile time.
GC pauses are a function of the implementation and in more advanced runtimes, the algorithm can be chosen or tuned to reduce or remove the prevalence or severity of pauses. For example, the Hotspot JVM has had GC algo options for many years that include aggressively continuous/parallel GC that provides minimal GC latency. "Stop the world" GC used to be the only option but those days are long gone. The significance of GC pauses can be very application-specific and it will usually be traded off against global throughput or workload-aware opportune moments for minimally disruptive GC timing.
@Locria Cyber How did Doligez, Leroy and Gonthier do it then for Caml? How did Click, Tene and Wolf do it for Java (producing Pauseless)?
@Tree by a lake I believe you are mistaken on this point. Also I have been gradually convinced that the theoretical if not already the practical average performance of managed runtimes with GC and dynamic JITs, supple memory models etc. are higher than the traditional static compilation and manual memory management of C and friends. Certainly this is where we are going on modern CPU architectures and internally, they have so much going on these days which extends performance using tactics isomorphic to these kinds of magic that what we learned about CPU performance decades ago is oversimplified to the point of being misleading. Embedded systems and microcontrollers may still exhibit the old school performance profiles, but for high end metal, ultimately the unfortunate lesson is “you’re never going to understand how it works”. Bringing to mind the analogous problem of Machine Learning models compared to explicitly programmed algorithms, we now must make do with empirical measurement to confirm performance and behaviour since staring at code and knowing big O just doesn’t tell us how well it will run in the wild.
@@chromosundrift GC and JIT are pretty orthogonal topics.
I think making an interface between Elm and graalVM would probly be an easier way to get a binary compilation method.
Rather than remaking the whole language with a different compile target.
This is fantastic! I appreciate this as someone just starting to work on the same kinds of problems. I actually had the same in-place mutation idea to get around performance, another idea is just to allow move semantics to explicitly allow it, but it's not tested yet.
Boxing is a value representation that "wraps" CPU-native values like floating point numbers into some metadata, adding an overhead of some bits/bytes and making the access to the actual value less direct, involving a lot more machine code for even basic, CPU-supported operations like multiplying two floats (the container specifying, is it a float? an integer? signed or unsigned? or maybe some kind of nil? or a distant address where the actual value resides?). It's not about whether something lives on the heap or on the stack
In all of the languages mentioned in the talk, boxed values are on the heap.
@@technicalmachine1671 Boxed values are only necessary if you need a dynamic data type. Do you know how often in my life I needed to make a type check between a float and an integer? Zero times. This is overhead for overhead's sake.
7:11 leak is a surprisingly good strategy for command line applications that run for a couple of seconds on behalf of the user. just leak the memory and let the operating system clean up pages, its very, very efficient
Hackers will love you, too. ;-)
What's the reason for requiring that the array not be referenced elsewhere, proven statically or via reference counting? If there's another reference, it's super cheap (O(N)) to copy the array into another one, now by definition having no other references, compared to the actually expensive O(N log N) work of the sorting. Same with other tasks such as matrix multiplication. On the other hand, there's a whole depth to memory locality related optimizations. Simple example: matrix multiplication is faster with one of the matrices transposed, even if the transpose takes an extra operation, because values read consecutively will now be contiguous in memory, resulting a better use of the CPU cache hierarchy. More complex example: FORTRAN packages divvy up large matrices into smaller matrices recursively, in part for memory locality and for other reasons, and then combine the results of that. There's some fine grained considerations that would be hard to automate via low hanging fruit optimizations like allowing mutation if it's guaranteed that nobody can notice
I'm assuming that copying won't *always* be faster. So if imagine the challenge would be to identify when copying is a speed increase and when it is unnecessary.
If something is never referenced then what is the point of changing it? The thing of value can’t then be the object but some derivative.
Logically, the algorithm as coded does a copy for every swap. Doing an actual copy at runtime for every swap is not going to be cheap. So they have to detect that they can safely do the swaps in place.
Yes, before you run the whole algorithm, you can do a copy of your input, and that will 'only' affect the constant factors in your O(n log n) runtime.
Though, of course, if you are happy to do sorting that's not in-place, why are you using quicksort? Especially in a functional language.
6:49 - Loved the cheerful laugh
How OPPORTUNISTIC IN-PLACE MUTATION any different than Swift’s copy-on-write semantics? Swifts collection types (also all primitive types) are also immutable (semantically) by default and aren’t copied unless more than one variable pointing at it.
"Can we write a quicksort in pure functional language that is beautiful, elegant and concise..."
Yes, we definitely can
partition arr pivot low high = loop arr low low
where loop arr parIx i
| i >= high = (swap arr high parIx, parIx)
| arr[i] < arr[pivot] = loop (swap arr i parIx) (parIx+1) (i+1)
| otherwise = loop arr parIx (i+1)
(This is a translation of the JS function, that gets optimized into inplace updates.)
That looks really promising! I still not sure about convenience of using pure-fp languages, but definitely a great leap forward.
Pure FP is already very convenient, once you get used to it. Try eg Haskell.
Absolute legend. I hope it targets wasm/wasi too.
I just watched a different Roc video in which they demonstrated wasm as a target (platform)
A great talk 👏🏿👏🏿
At the compiler level I don’t see the difference between Swift and Rust. Both use llvm to compile to machine code n?
And the opportunistic in place mutation seems very similar to what Swift does with its ‘copy on write’ strategy for mutating value types.
> At the compiler level I don’t see the difference between Swift and Rust. Both use llvm to compile to machine code n?
Haskell's GHC can also use LLVM. Doesn't mean Haskell and Swift and Rust all have the same compiler..
LLVM is a Virtual Machine
The compiler is split in two:
A) Frontend (compiles to LLVM IR)
B) Backend (compiles to machine code)
No, it’s not a virtual machine. It doesn’t execute code like e.g JVM does. LLVM is a compiler framework & toolset
Never disappointed by talks given by Richard. 👍
Next step: Make the ideomatic Haskell quicksort in place.
Code: qsort (x:xs) = qsort (filter (=x) xs)
There is nothing idiomatic about quick sort on linked lists. Even in Haskell.
@@DeGuerre I guess it depends on how you understand idiomatic.
If nothing else, it is the shortest definition of qsort-like algorithm I have seen.
Aren't you missing the recursive call?
@@DavidAguileraMoncusi he is calling qsort twice recursively, so no its not missing.
@@JosefCZE If you want something idiomatic that does the same comparisons as quicksort, you'd probably want to do something that builds a binary search tree. That's very idiomatic in Haskell.
Alternatively, if you just want idiomatic sorting of linked lists in Haskell, something resembling bottom up merge sort might be the way to go.
Haskell has some support for opportunistic in-place mutation via stream fusion. I would not be surprised if one could write a similarly performant code with purely functional vectors, without “cheating” with explicitly mutable vectors :)
I was surprised when i learned that FP forbids all mutability. I figured that the point of a pure function is predictable effects, i.e. the result doesn't depend on anything besides its input, and there is no effect on the program besides the returned result. The general state of the program is not mutable by most functions. That's great. But requiring that every data structure within the local scope of a function also be immutable sounds extreme. Do we really gain anything from eschewing for-loops because we prohibit in-place incrementation of variables on principle? Why can't we just have the best of both worlds without a fuss?
You can replace most mutable structures for immutable ones without much hasle. For loops are a great example of something that is replaced for a for-loop function that covers almost every use of a imperative, mutable for statement.
I think making everything immutable stems from some of the theoretical foundations of FP. They were interested in simplifying programs a LOT, going so far as to compile languages into a "point-free" form that was not just without variables, but where the functions had no arguments! (SKI combinators). I like your idea; I think a language where all function arguments and return values were immutable, but the functions themselves could have mutable local variables, would give you a really great balance between efficiency and safety.
@@wmhilton-old I think Rust and Scala both have this, but you have to annotate the functions as pure.
Are there any papers about this? Is this related to the ideas used in Lean 4?
A few corrections: 3:38 Python is usually not interpreted at run-time. The first time, it is compiled into bytecode and stored, and from here on out, the bytecode is executed unless a change in the source code requires a new compilation. As for the LLVM distinction, it is irrelevant, it's a compiler like gcc or others and has about the same stages and is as good for optimization, no real speed boost there. Lastly, "un/boxing" usually refers to switching between interface/base class and actual object representation. In some language it is done on the heap. Here it seems the speaker is talking about heap allocation (not boxing per se), which can have additional cost when system calls are necessary. When it's not necessary, it doesn't need to be slower than the stack, both are indirect accesses to memory.
There’s a great talk between rich hickey and Brian Beckman where rich goes in to pretty good detail how he implemented his opportunistic mutation on clojure. Fascinating stuff.
Clojure doesn't have opprotunistic mutation, its all on the user to call functions to convert from one data type to another, you call `transient` to make a transient collection and `persistent` to go back. The Clojure compiler isn't nearly advanced enough to do something like this automatically.
One of the things I love is thinking of stuff like this, waiting four years, and seeing somebody else do it. I've been thinking about this since I started programming professionally, and you pulled it off! :+ )
It's always cool to see these thing happen. Thou didn't the gent mention that non of the functions ran are optimized for the graph around the 19:00 mark? I wonder if that's still true for the specially optimized version; if not, it's not quite a fair comparison is it?
Python has interpreted bytecode. It doesn't interprete the source directly.
very good talk
I want to learn roc lang. how do I get into it? any internship/trainee opportunities?
This is very exciting! Will roc have high level functional concepts that elm lacks? Like type classes, true monads etc?
Don’t think so
You might want to check out a language called Koka =)
I did not know that there are people that like function programing.
Any particular reason why you didn't try F# which already supports Elm like programming for web and mobile, and has a similar syntax as Elm? Academic research languages...I assume you put F# in this category, in which case I can only assume you never looked at it. We're about to release our third pure F# application to the public. I'd suggest you check it out because Roc surely seems to match the term "re-inventing the wheel" :) Kidding aside, good presentation as always.
F# does not compile to binary
@@klasus2344 Is compiling to binary a useful feature or a property? Not trying to be arrogant, but whether it compiles to binary or not means little to me. The features you get from it compiling to binary may possibly be worth while. What are your main concerns if I may (I'm curious)? CPU cycles? Mem footprint?
@@islandman9619 not "compiling to binary" means there is a runtime required. That puts a bottom limit on how small and simple the code can be, and rules out uses such as device drivers or embedded code.
I believe he mentions F# in one of his other talks, though I can't recall what his reasons were for being opposed to it. It may have been simply that F# is functional-first, not pure-functional. It's also worth noting that this talk is *not* a comprehensive overview of Roc, and so there are many language features which aren't show off here which differentiate it from F#.
@@alxjones One reason he mentioned was that the ecosystem around F# is mostly not functional, because you end up using all C# libraries most of the time.
well, rails as a framework is the mother of all overhead, you can achieve 10x performance and remain in ruby by dropping rails and switching to roda and sequel or even a pure rack app. In many cases making multiple tiny rack apps could be a lot faster.
Playbackspeed 0.85x 😅
One more optimization to consider: as all functions are pure, you can also schedule some of them on a GPU when you have to run a lot of them in parallel.
That doesn't really sound like the kind of parallelism where GPU is going to excel, though (at least on current hardware). Unless big contiguous chunks of memory are being crunched more-or-less in lockstep , parallelising over CPU cores is probably going to perform better.
Ha-ha, "schedule"! You mean have a whole new compiler backend, compile to a totally different execution model, arrange all the data transfers between the main CPU code and GPU workers automatically and hope it works well.
Not if you still think form yesterdays limited paradigms... In reality, yes you can squeeze a lot more performance on certain platforms, like Apple's M-series SoC-s with unified memory architecture. Especially when we are talking about cryptographic work. The kind that cryptocurrency mining uses.
Well, it doesn't need to be a GPU, but yeah, that's the main point of the "resurgence" of functional programming lately. Be able to parallelize better. Although, as you say it, seems like doing things with "pure functions" it's easy....
it is not.
Link to the Roc talk mentioned so you don't have to type it out: ua-cam.com/video/cpQwtwVKAfU/v-deo.html
I’d like to see the comparison of the TodoMVC with Svelte (also compiles to js)
I believe in high-level "declarative" language + optimization, to eventually always beat handcrafted in both speed of execution and speed of writing and especially in correctness.
- We are not there yet, and maybe AI will be writing all SW before we get there, although I would say that would still count as "computer transforming description of what we want into an efficient compiled binary"
- Anyway, yeah: Roc looks really promising, great talk, very excited to see it's future!
Is there any high level declarative language that’s adopted reasonably ?
I am genuinely curious as I like the “declarative” part :)
F#, Scala, Elixir, Elm and to a certain degree Ruby.
Surprised this pun hasn't already been made: Roc 'n roll
Seriously looks cool though :)
So it's a highly tweaked Haskell implementation and a highly optimised Roc version being compared to basic implementations in imperative languages. My take away is this concedes imperative out performs FP...
Well, swapping 2 values in memory will always be faster than copying an entire 10 millions values array with 2 values swapped. So yes, pure FP will always be unefficient for big array sorts if unoptimized. There's no debate about that. But the video shows that you can use the pure FP paradigm with its safety guarantees and still be competitive if you use heavy optimization at compile time.
The Roc version isn't optimized at all except by the compiler. Our goal was to port the imperative algorithm as directly as possible, given the language primitives available!
@@rtfeldman Ah this answered a question i had. I am a lazy programmer so this matters a lot to me. thx.
since its possible to write it identically to a for loop, can you then just make the for loop and let the compiler do the hard work of translating it into a recursive function? Sry if this is a dumbish question.
outperforms at certain tasks. but maintaining code across a decade with multiple coders is a different order of performance and that's where the argument can be made that FP has it in spades over imperative coding with side effects.
Yeah and the Bottleneck will always be the Coders and the time limit versus cost when you are in a Big project 😁
Java is already faster than all these languages though. Scala is the most functional language ever created and can compile to native.
Scala is a mix of OO, Imperative and Functional.
It can do a lot of things, and still sacrifices some neat tricks.
Type inference is fairly limited, as an example. :)
And the native compilations is pretty broken, as it always was for JVM languages.
It could be done properly with the GraalVM.
"The most functional language" is probably something like Lean, Agda, Haskell or that sort.