Author of the Gleam file_streams package here. Yep a previous version of the library was painfully slow at reading large text files. This was fixed and an update released within 48hrs of the GH issue being opened, several weeks prior to this video being published on YT! Would have been great if that could have been part of the video, but regardless thanks for helping make the Gleam ecosystem better, and everyone’s invited to join the Discord to chat and learn some BEAM & FP 😎
The original point still semi stands I would think where these kinds of things might be hangups for a new language as they get discovered along the way and resolved.
Think about how only one thing can own some data, multiple things can borrow it, and only one can edit it at any given time. I just saved you 10 years.
yeah but in rust it's enforced by default rather than being a good idea that most people act like they follow but throw away at a moments notice once their idea for how something should work doesnt perfectly convert to code
Did I… just hear a TS dev say the DX of Go is bad? Dude there is no setup - that’s the whole point. Consistent, built in formatting. The package manager is incredible, seamless, and global so that you don’t have a massive node_modules. Built-in file embedding into the binary. It doesn’t even matter that much if go build takes a second or two because you spend way less time iterating vs TS. Are you forgetting bundlers, tsconfig, eslint, prettier, type definitions you literally have to describe for your editor…? Wtf is that if not an awful DX? I’ve use TS a lot and have been way happier since moving to Go. (No hard feelings here, just doesn’t make sense to me)
wholeheartedly agree, man that line irritated me. As someone who (sadly) worked with JS and TS for more than 3 years now, I've been using different runtimes, package-managers, bundlers, builders, linters, etc, etc, etc but when it came to Go having it all pre-defined by the language itself made it easier for me to focus more on the logic of the code that I was about to write, rather than concern myself with what fucking tools I use.
@@baxiry. System76 is building a Linux DE COSMIC in Rust, and along with several apps like a text editor, file manager, package manager etc. The software store is significantly faster than the Elementary AppCenter that its going to replace.
To everyone watching the video - the challenge was created for Java, and their leaderboard has sub 3 seconds - that’s with all the parsing and aggregation also
Its so weird that theo is bashing on go’s “outside of code” because of no watch flag when even node only adding it this year (as stable feature) despite being interpreted runtime. I know he doesn’t like go but this is just pure bias at this point lmao.
They seem to have tied their personality to JS. Gotta find trivial reasons to hate other languages. Bashing on Go for needing to install a package for watch, yet in the same video talking about how the language eco system really matters. You would think that Theo would be comfortable installing libs. Adding dependencies to solve problems is like JS dev's default mode. Go at least comes with a compiler...
Random Gleam library dev here. I remember talking to the author of the file_streams package on Discord about there being no "read to end" function in the library, and that discussion sparked him to implement said function. I also suggested adding a way to read and write to a file without multiple opens, and he completely reworked the entire library to accommodate that! "this is not the most readable thing for developers that were born within the last 40 years" Yeah as someone who's working on a highly Erlang-focused library (Bravo, which ports Erlang ETS to Gleam) I totally agree. I hate the fact that in case statements, you have to suffix the last line of each arm with a semicolon *except for the final arm, which must not be suffixed and errors if you put a ; there* which causes some headaches. Also had an error once where I was missing a period on a line I didn't think needed a period, and the error message was just "syntax error before" [sic] which is infuriating. Also, every single character in an Erlang string is 16 bytes in size :) (thank god Gleam uses binary arrays for strings instead of linked lists) Great video!
16 bytes per character? Are you sure you don't mean 16 bits? That's astronomically huge, I can't fathom any possible reason why a single char should take up 128 bits.
ex Elixir dev here, just a question - isn’t gleam based on elixir? Based on what I heard from Elixir Wizards pod, I thought it was a superset of elixir, which is obviously based on erlang, which makes all three languages be able to coexists in the same app - although with some typespec headaches if gleam is involved?
@@MrManafon Not at all, Gleam has no relation to Elixir, it compiles tor Erlang like Elixir (but outputting the actual readable Erlang code instead of targeting the Erlang AST), it also has a JavaScript compilation target. All 3 languages can indeed coexist in the same app, there is no typespec headache as far as I'm aware of, Gleam emits typespecs into the compiled Erlang code. I've been using a Gleam package I wrote from Elixir, works great!
@@EraYaN Ah okay, that does explain the original comment’s mention of linked lists. That still seems absolutely wild and wasteful, I don't think even Python's strings take up that much memory (although now that I mention that, I have never actually checked…)
"gofmt's format is no one's favorite, but gofmt is everyone's favorite." -- Go Proverb Also, a clarification: scanner.Text() isn't reading the file. The bufio.Scanner is already reading it into an internal buffer. scanner.Bytes() would give you access to that buffer, while scanner.Text() copies the buffer's contents into a string. It's that allocation and copy that's probably causing the extra second of runtime.
This is what I like about google style guides in general. They just make one and run it on everything. Can't configure shit, so nobody is wasting time arguing about it.
that was then, but now after the Go 1.20 or so scanner.Text() uses unsafe.String under the hood, do note that even scanner.Bytes() also allocate to separate the main bytes from the returned bytes. Tho I may be wrong
@@miracleinnocent2649 I just checked the docs. You are wrong. It would be a weird change, too. If Text() returned an unsafe string, its contents would change the next time that you called Scan(). That would be _really_ bad, not to mention not backwards compatible with code that rightfully didn't expect it to do that.
I'm really confused by what the point of these comparisons are because almost all of the solutions presented here use different algorithms so it doesn't really make sense to compare the languages like this. The slow Go version (~15s) doesn't create any string (so no allocations) but still has to write to an internal buffer to make the result of Scan() available to the user. The fast Go version (~0.3s) doesn't allocate anything at all besides the buffer at the very beginning, the only thing it does is count the number of . The JS and Gleam versions have to create string object in memory to represent the lines so they have to call the allocator at least a billion times.
This was a great video. I recently tried Gleam for a while and it was similar to playing a remastered game and thinking it doesn't look that different from the original, and then looking for screenshots. I worked with Erlang for a while, and the DX Gleam offers is much much nicer. I hope they can keep working on it and it becomes the default language for highly-concurrent problems. You also made me want to go back to go again. I'll have to stop watching your videos.
That’s not a lack of “early returns” in Gleam, that’s just a result of functional programming being expression-based - you can’t love functional programming without at least understanding that functions are expressions, there’s no such thing as a “return”, a function is just an expression
also gleam has a function called bool.guard that, when used in a "use-expression", simulates early returns in a functional way. Also great explanation! Totally agree! In gleam i never felt i needed the return keyword and this shows why.
I made a simple C program to count lines (and one to write newline characters to a file). It took about 0.157 seconds (0.120 user time) to count 100 million newlines. and it took about 0.357 seconds (0.007 user time) to write the file with 100 million newlines. There are probably lots of ways to optimize it, but it works. btw, for 10 billion it takes 1.663 seconds to write (0.194 user time). and 2.143 seconds to read (1.658 user time)
It's funny you posted this today. I have been working in Go quite often recently for work and went to write a quick dirty Python script. Had to get used to not having formatting just fix itself.
doesn't go ship with lsp, formatter, test suite, cross-arch-compiler, package manager, cache for hosted package, docker images, etc and so much more? I wouldn't say they don't care about dx, but yeah they could definitely add watch mode.
Gleam is a really attractive language. If youre a complete beginner to programming, i wouldnt suggest it, as it is functional (no "for loops" or "if statements") and also doesnt have great learning material yet. but if youre already a programmer...its lovely.
The lack of learning materials can be definitely a reason but it being functional? I would argue that it's actually better to start with functional programming as it's higher level. People who learn how to think "functional" also become better developers in my experience.
Aaaa! Isn't the fact that the speed of rust implementation is measured on another computer, using unknown code on a different code, measured by unknown methods, a little bit of confoundinh factor here?
Yeah, I would have liked to have seen Theo run a Rust implementation on his own machine at least (and in release mode, not debug mode, just to be sure).
Afaik, due to tail recursive optimization happening in this example (not sure if erlang does it, I just suppose it does) the amount of times the function is called, gets optimized away under the hood. It should be just as performant as looping over something in other languages from a pure optimization standpoint. FYI: haven't looked into this for like 3 years, so it might be wrong. Also tried stuff like this with Scala and looking at the JavaBytecode after comp to see what is being optimized and how. Please let me know if there is something wrong with my understanding in this. Edit: stopped watching to write comment and now feel stupid for saying the same thing chat said 25:30
I was surprised that an Erlang module was needed. I don’t remember ever having to use Erlang in Elixir. And it would have been no problem as I have used Erlang a lot longer than Elixir. It might have to do with the compilation targets. Gleam compiles to Erlang while Elixir compiles to the Beam AST.
Could you share how much memory you have and what your disk speed is? I suspect the really good go result is impossible unless the entire file is cached in your memory already.
There is definitely caching going on. On gen4 nvme drives, it would typically take at least a second to read the file, and probably more like two seconds.
It's lovely to see Theo so heartbroken when he realizes that numbers doesn't match his personal preferences. We all have been there (hello Haskell devs). I know it hurts, here is a hug (づ。◕‿‿◕。)づ
Gleam is in a rough chicken-egg situation right now but if they can build the libraries and the ecosystem and stick with it, I think they have a shot. Because wow, what great syntax and features. Not even mad about implicit returns.
Good video because I would have never tried that experiment. I wouldn't even try to find an article to look up the experiment. But casually, I looked at it. Also, you need to clear the disk and RAM cache everytime you test it (in a professional setup/test). Your editor is hilarious with that "we're cutting that..." [ parallel ] is your friend, [ parallel ] will throw that stuff across all the cores. Doing this type of stuff is why c, perl, and python were made. Perl and Python exists to load C functions and run them; I guess you could do it in Go (if you're using a web backend or something).
Do you mean explicitly Haskell or functional programming concepts in general? I learned a bit of Haskell to the point where I understood what monads were. Also Lazy Evaluation by default was just soo beautiful to me. Sadly I kinda gave up on Haskell (at least for now). I feel many things are just too complicated. Maybe not because it's supposed to be complicated but because they found really beautiful, theoretical ways of doing things. Also the community does not try to make Haskell accessible. Fighting the tooling, unqualified imports in tutorials (or in general) and heavy use of abbreviations or single letter variable names make it really hard to understand already written code. My favourite moment with Haskell was when i created a little game and realized that processing inputs was just "read every input there will be" into a list and then just doing list processing. Lazy evaluation is just magical! What are your favourite aspects about Haskell and what concepts transformed your way of thinking? And how did it impact how you write code in other languages?
@@Ryuu-kun98 The thing that I like about Haskell the most is pattern matching and its syntax. It makes FP so much easier. You're able to break down problems into smaller bit sized pieces that are easier to understand. A lot of languages make this difficult because the syntax is so verbose that you aren't going to write little 1 line, 20 character functions to break a problem down. There's a certain amount of overhead to functions in many languages that you have to be above a certain threshold before you think it is worth it. I like the fact that they use category theory to break down problems. It is really universal, but I also find it quickly gets complicated when you try to combine many things. What I've noticed is that Haskell is very strict and limited with its type system compared to something like Typescript which is largely more imperative. The limitation means to add something new you have to really fit it in to existing category theory. But if you do, it tends to be much more robust, fit extremely well, and have lots of emergent properties. The $ and infix operators are really convenient too. Saves so much code and once you understand it, makes code so much easier to read and cleaner.
Eh, I'm a Gopher, so take this as you will, but Golang people don't really care about how gofmt formats Go code. What matters most to us is that there's exactly one way that code is expected to be formatted, and Go does that very well.
Saw a recent stream that showed your “to do” list, and one item was to make the same app in several app in several languages/frameworks (eg TS, php, rails, etc). Has that video been released, yet? Super interested in it. Thanks!
So, at 13GB of file in 0.32s, you would be reading it at a speed of around 40GB/s. How does this math check out when an SSD generally has a max read speed of ~7GB/s ?
He probably ran it after it was already cached in memory from his previous runs. I don't know what OS he used, but it probably uses unused memory for disk cache. Ideally, it would be tested after flushing the cache. In Linux, this can usually be done with sysctl.
The testing methods are highly flawed. The OS basically cached a big chunk(or probably the whole file depending on the available ram) in its buffers which meant the next run was not disk IO bound and just read stuff from the much faster ram. You never benchmark this stuff without flushing all system buffers. In my processor design courses we even had to reboot after every benchmark to get reliable results, granted we were timing cpu caches but here you can get away with just force flushing the filesystem caches.
The OS cached the file from the first time he read it. That's why you can't run one instance and call it good for benchmarking. You have to be very careful about caching, and variances between runtimes just to name a few "traps for young players". I have a linux box with ZFS pool on it and 512GB of ram - by default ZFS uses up to 50% of the RAM for a disk cache, so you can get CRAZY read speeds on HUGE files if you try the same benchmark more than once.
Partitioning the file is a reasonable suggestion, but I don't think it's possible to do for a random offset with a utf-8 file. Is the text file utf-8 or only ascii?
I think Theo is still talking about pre go modules. Also there is a lot of tooling around the language. We can tell he isn't using `gopls` which would have formatted the imports into a block.
btw, doing recursion 1 billion times in that case it likely it just optimizing it to a while loop since it's storing the result in the parameters rather than on the stack ( func(x + 1) vs 1 + func() ) so it would optimize it down to something like int x = 0; while (true) { if (ok) x +=1; else break; }
I expect that reading a file line by line will be quite a bit slower than reading binary blocks. The slowness is in the library, not in the language. Also: it does not matter if your library doesn't know UTF-8. For applications like this working with UTF-8 is exactly the same as working with ASCII.
Biggest file we have parse regularly at work is a 1TB csv that arrives every business day. I talked to the vendor and suggested they use a binary format like Parquet (or, frankly, anything else, HDF5, ORC, I dunno, fucking Avro) to deliver it. They had never heard of parquet or any of the other formats.
I _might_ be wrong here since I haven't looked at the implementation, but I would guess that that rust implementation was taking long because it was unbuffered, since I remember by default the File API on rust's stdlib doesn't provide any kind of buffering. The fix IIRC is pretty simple, just wrap the File object with a "buffer object", also provided from stdlib
Watching Theo delete character by character in the command line is painful, does CTRL+U and CTRL+K (or I guess CMD+U and CMD+K or whatever) not work on MacOS?
This is the reason overall that I have liked using Golang. At work we use C#, and Python. I've started introducing Go for some things and have really loved it. I could have built the same things in C# or Python, but I know it would of taken me longer to be as bug free, memory leak free, and performant as my Go code. In languages like C#, C++, Python, Rust, it is very easy to do the wrong thing. In Go I've found that usually anything in Go is performant "enough" meaning that unless I see a need to optimize it then I can just leave it..
I work with C# and so far never had any moment that considered moving to go. If I had to write some really fast command line, I would first try with C# AOT compilation, and if that didn't work out (it has limitations on language features you can use), then I would consider something else. But probably I would go straigh to Zig or Rust - I don't see Go as THAT much faster, and let's be honest, DX for C# beats Go so much, especially with Rider with some nice plugins. Also, there are some things that C# is just the fastest :D XML parsing is one example, even Rust libraries are slower than the out of the box XML parser form C# standard library
@@Qrzychu92 So I think you misunderstood my overall point. You have code performance, dev time, and experience or "skill issues". Rust, C#, probably Zig(haven't used it so unsure), dev time is generally on the slower side. Then you have the skill issues where it takes more experience in those languages to get good performance and it can be easy to cripple performance, even if unintentionally. If I want to maximize performance I would use one of those languages, but If I only need adequate performance I can get that with less dev time in Go. That's my experience anyway. I can also do it while almost entirely using the standard library. TLDR, less work to reach a higher baseline.
@@gsgregory2022 that's where we disagree, probably due to diferent sets of skills and skill issues :D IMO, C# is one of the most productive dev environments out there. Maybe except something like T3 stack if you are doing a full stack app. Hot reload, nuget (yes, I count that as a feature), single command dotnet build, best on the market LSP (especially if you use Jetbrains R# or Rider, VS is also VERY good), LINQ, library for everything, being able to use the same language for desktop apps, command line, servers, lambda functions, to a certain degree websites with blazor - it's awesome. Oh, and the debugger! While in breakpoint, you can change the code, and drag the "current executing line" up to run the new version without restarting. MAGIC. That includes UI test written in Playwritght for example - awesome. I never used go in a proper professional context, but those few times where I tried to replicate something I already done in C# the experience was MEH at best. Only downside is that if you use a library that is not AOT compatible, the deploy size is around 40MB with embedded runtime. Not bad for a whole virtual machine in there to be honest, but still a lot. With AOT it's usually under 10MB, unless you include some big native lib like OpenCV. So, I stay with C# :) plus it's getting better every single year
i started go recently and i remember there being nothing to it and everything just working out of the box. and complaining about the package manager i dont really understand either. seems at least as natural or perhaps more natural than js-based tooling
Weirdly I found watch isn't very usefull unless you are working on something that requires instant feedback on a bunch of small changes, very usefull on the JS world as you're building UIs, not so much in this case, I think. Especially because the JS watch does more than just build/compile, it updates de page automatically, which was the trully annoying part back in the day.
I hate how sync and async are completely used backwards in JS. In english and any other language with common roots with a, syn and chrono (italian: asincrono; spanish: asincrónico; french: asynchrone; greek: a= not syn=together chronos=time)it means exactly the opposite. The terminology is UTTERLY counter-intiutive. Async functions ARE synchronous, they happen at the same time, and that's their power. They DO NOT wait, they DO NOT depend, they DO NOT hold the user waiting, but all those NOTs refer to anything but synchronicity.
"Fast" go result is just wrong. How could you read 13GiB file in 0.3s, doesn't it mean that you reading file at 45Gb/s? Where could I buy such a good ssd? Joking ofcourse, it's file system caching, that rig your results.
Oh god I cringed when I saw the switch indenting in Go, glad to know you also agree. Still think custom lint rules are worth it, though, as long as a large majority of your team agrees
I tried this challenge in gleam right when it first released 1.0. One of the biggest things that was killing me was the inability to mutate anything. Even a variable wrapped in closure doesn't actually mutate the variable. I made a function which set a variable and returned an iterator. The iterator would stream through the file and yield each row. Every time the wrapped iterator would advance to the next step, the variable which existed in scope through closure would always be set to the initial value. If I set the variable to something else and then printed it from within the same step, it would show that I changed the value. On the next iteration, it would be reset back to the original value. I tried a whole bunch of stuff to optimize this further, but didn't succeed. I have some other ideas floating around in my head, but I haven't had a chance to go back to it yet. Not having early returns was difficult. The hardest thing was just not having access to regular old fashioned loops. Also, you can't just push to an array or move things around in one. Any time you modify anything, it'll copy the contents to a new array and make the change there. So you really gotta think about every single thing you do, in a way I'm not used to.
yeah its a totally diff way of thinking. also, the language simply wasnt designed to excel in this challenge. its focus (for now) is concurrency and web. i make proprietary CLI tools for a living and fell in love when Gleam as soon as it released 1.0. i have fun with the language, but havent been able to do anything useful with it for my company due to the poor CLI, file, and i/o ecosystem. Im sure this will get better with time though, they have a really great team, and its only been v1 for a few months.
You'll find the advantages of the immutability a good thing once you get used to it. Also, a "copy" is pretty much free because the data can be shared safely due to immutability.
Yeah, unfortunately, arrays don't work well with functional languages because arrays aren't immutable-friendly unlike trees and linked lists. Though there are pure functional data structures that simulate arrays with logarithmic updates and access like finger tree and Okasaki's random access list
I would argue defer is bad - you should use decomposition instead. If you're putting file I/O in the same function body that does the text processing, you're failing to apply single responsibility principle properly. The concern of how text is acquired is separate and different than the concern of counting lines. We need less patterns embedded in our language that deter SOLID programming rather than more. :/
It's funny, I'm fairly new to programming and have not so much experience with the JS world, but for me the whole JS ecosystem is very strange, with this bulky package manager npm and the dependency hell. For each project it downloads the packages again into my project folder (maybe I just don't know how to store them in one place isntead of this cluster fuck). The compile time just sucks and I get dozens of deprecated warnings, no matter which packages I use. The syntax is strange and the type system of TS is just an illusion. I don't get why it is still used so much. Humans are just lazy and don't want change? Meanwhile the installtion of go was pretty straight forward, it even comes with an build-in package manger and and awesome LSP, formatter etc. For beginners this is the way to go IMO.
Regarding gofmt formatting in a not perfect way - as Rob Pike himself said, the way gofmt formats is not ANYONE's favourite way, not even his - but it's a decent middle ground that still makes for consistent and easy parsing. It's "good enough".
Go has one of the most complete set of built in tools out of any language. There is plenty to not like about it, DX focus is absolutely not one of them
I hate gofmt doing no space between math operators but switches with a single indent is the way to go. If you're a true never nester then that's all of your indent budget gone in a single operation and you wouldn't be able to use a switch within a for loop, etc.. Rustfmt doing two idents for matches makes me not want to use match at all haha
I dont see assembly or binary on that list... i'll be seeing you tonight sir.. on the streets that are 2 inches wide and homes that are 51 stories tall.
Author of the Gleam file_streams package here. Yep a previous version of the library was painfully slow at reading large text files. This was fixed and an update released within 48hrs of the GH issue being opened, several weeks prior to this video being published on YT! Would have been great if that could have been part of the video, but regardless thanks for helping make the Gleam ecosystem better, and everyone’s invited to join the Discord to chat and learn some BEAM & FP 😎
Commenting so this goes to the top. Theo needs to see this
That is so cool🎉
Can confirm this guy is real
The original point still semi stands I would think where these kinds of things might be hangups for a new language as they get discovered along the way and resolved.
Oh damn, 48h
I really want to learn gleam but I am busy learning rust so it will have to wait 10-15 years when I become intermediate at rust.
Think about how only one thing can own some data, multiple things can borrow it, and only one can edit it at any given time. I just saved you 10 years.
That just seems like generally a good practice
yeah but in rust it's enforced by default rather than being a good idea that most people act like they follow but throw away at a moments notice once their idea for how something should work doesnt perfectly convert to code
finna be rusty
@@tinrab now I only need 5 years to understand what “own”, “borrow” and “edit” mean in this context
Did I… just hear a TS dev say the DX of Go is bad?
Dude there is no setup - that’s the whole point. Consistent, built in formatting. The package manager is incredible, seamless, and global so that you don’t have a massive node_modules. Built-in file embedding into the binary. It doesn’t even matter that much if go build takes a second or two because you spend way less time iterating vs TS.
Are you forgetting bundlers, tsconfig, eslint, prettier, type definitions you literally have to describe for your editor…? Wtf is that if not an awful DX?
I’ve use TS a lot and have been way happier since moving to Go.
(No hard feelings here, just doesn’t make sense to me)
This was my immediate reaction as well 😅
This is because of the mustache
wholeheartedly agree, man that line irritated me. As someone who (sadly) worked with JS and TS for more than 3 years now, I've been using different runtimes, package-managers, bundlers, builders, linters, etc, etc, etc but when it came to Go having it all pre-defined by the language itself made it easier for me to focus more on the logic of the code that I was about to write, rather than concern myself with what fucking tools I use.
He really only just said the formatting for the switch case is bad.
@@ninojamestan8895 This is not about DX then
>thumbnail has the Crab
>video doesn't have Rust
Truly the clickbait of all time
Rust is good for one thing, clickbits.
It is not suitable for any other purpose
unsubscribed from this hipster channel.
A Rust implementation's speed was mentioned, but we didn't see the code
@@baxiry. System76 is building a Linux DE COSMIC in Rust, and along with several apps like a text editor, file manager, package manager etc. The software store is significantly faster than the Elementary AppCenter that its going to replace.
Gleam is written in Rust.
"Go's focus is not on the DX around the language" is a wild statement to me
Yeah, like I feel like the tooling around Go is actually really really good lmao
To everyone watching the video - the challenge was created for Java, and their leaderboard has sub 3 seconds - that’s with all the parsing and aggregation also
Pffff… there are C# versions in the 800ms range now
a lot of folks trash java.. but it's getting much faster... and frameworks like quarkus just make it a major player imo..
@@bjbeguithe JIT in Java is especially good. It can get very fast given it that reading a file is a hot operation
Its so weird that theo is bashing on go’s “outside of code” because of no watch flag when even node only adding it this year (as stable feature) despite being interpreted runtime. I know he doesn’t like go but this is just pure bias at this point lmao.
They seem to have tied their personality to JS. Gotta find trivial reasons to hate other languages. Bashing on Go for needing to install a package for watch, yet in the same video talking about how the language eco system really matters. You would think that Theo would be comfortable installing libs. Adding dependencies to solve problems is like JS dev's default mode. Go at least comes with a compiler...
Yeah and getting typescript to run is a whole process itself.
Random Gleam library dev here. I remember talking to the author of the file_streams package on Discord about there being no "read to end" function in the library, and that discussion sparked him to implement said function. I also suggested adding a way to read and write to a file without multiple opens, and he completely reworked the entire library to accommodate that!
"this is not the most readable thing for developers that were born within the last 40 years" Yeah as someone who's working on a highly Erlang-focused library (Bravo, which ports Erlang ETS to Gleam) I totally agree. I hate the fact that in case statements, you have to suffix the last line of each arm with a semicolon *except for the final arm, which must not be suffixed and errors if you put a ; there* which causes some headaches. Also had an error once where I was missing a period on a line I didn't think needed a period, and the error message was just "syntax error before" [sic] which is infuriating. Also, every single character in an Erlang string is 16 bytes in size :) (thank god Gleam uses binary arrays for strings instead of linked lists)
Great video!
16 bytes per character? Are you sure you don't mean 16 bits? That's astronomically huge, I can't fathom any possible reason why a single char should take up 128 bits.
ex Elixir dev here, just a question - isn’t gleam based on elixir? Based on what I heard from Elixir Wizards pod, I thought it was a superset of elixir, which is obviously based on erlang, which makes all three languages be able to coexists in the same app - although with some typespec headaches if gleam is involved?
@@MrManafon Not at all, Gleam has no relation to Elixir, it compiles tor Erlang like Elixir (but outputting the actual readable Erlang code instead of targeting the Erlang AST), it also has a JavaScript compilation target.
All 3 languages can indeed coexist in the same app, there is no typespec headache as far as I'm aware of, Gleam emits typespecs into the compiled Erlang code.
I've been using a Gleam package I wrote from Elixir, works great!
@@thunder____ Yes, 8 bytes for the codepoint and 8 bytes for the pointer to the next character. It's wild.
@@EraYaN Ah okay, that does explain the original comment’s mention of linked lists. That still seems absolutely wild and wasteful, I don't think even Python's strings take up that much memory (although now that I mention that, I have never actually checked…)
"gofmt's format is no one's favorite, but gofmt is everyone's favorite."
-- Go Proverb
Also, a clarification: scanner.Text() isn't reading the file. The bufio.Scanner is already reading it into an internal buffer. scanner.Bytes() would give you access to that buffer, while scanner.Text() copies the buffer's contents into a string. It's that allocation and copy that's probably causing the extra second of runtime.
gofmt's format and go's lexer. Well said though
scanner.Text() copies the contents into a string? i didn't know that! learnt something new again
This is what I like about google style guides in general. They just make one and run it on everything. Can't configure shit, so nobody is wasting time arguing about it.
that was then, but now after the Go 1.20 or so scanner.Text() uses unsafe.String under the hood, do note that even scanner.Bytes() also allocate to separate the main bytes from the returned bytes. Tho I may be wrong
@@miracleinnocent2649 I just checked the docs. You are wrong.
It would be a weird change, too. If Text() returned an unsafe string, its contents would change the next time that you called Scan(). That would be _really_ bad, not to mention not backwards compatible with code that rightfully didn't expect it to do that.
Thanks Theo! I’m happy I could help here 💖
Thank you for your work on Gleam.
Woooo it's the Gleam team!!!!
I'm really confused by what the point of these comparisons are because almost all of the solutions presented here use different algorithms so it doesn't really make sense to compare the languages like this.
The slow Go version (~15s) doesn't create any string (so no allocations) but still has to write to an internal buffer to make the result of Scan() available to the user.
The fast Go version (~0.3s) doesn't allocate anything at all besides the buffer at the very beginning, the only thing it does is count the number of
.
The JS and Gleam versions have to create string object in memory to represent the lines so they have to call the allocator at least a billion times.
Do you think any JS dev understands a word you just said 😂
This means other languages cannot do what Go can do.
This was a great video. I recently tried Gleam for a while and it was similar to playing a remastered game and thinking it doesn't look that different from the original, and then looking for screenshots. I worked with Erlang for a while, and the DX Gleam offers is much much nicer. I hope they can keep working on it and it becomes the default language for highly-concurrent problems.
You also made me want to go back to go again. I'll have to stop watching your videos.
That’s not a lack of “early returns” in Gleam, that’s just a result of functional programming being expression-based - you can’t love functional programming without at least understanding that functions are expressions, there’s no such thing as a “return”, a function is just an expression
also gleam has a function called bool.guard that, when used in a "use-expression", simulates early returns in a functional way.
Also great explanation! Totally agree! In gleam i never felt i needed the return keyword and this shows why.
I made a simple C program to count lines (and one to write
newline characters to a file).
It took about 0.157 seconds (0.120 user time) to count 100 million newlines.
and it took about 0.357 seconds (0.007 user time) to write the file with 100 million newlines.
There are probably lots of ways to optimize it, but it works.
btw, for 10 billion it takes 1.663 seconds to write (0.194 user time). and 2.143 seconds to read (1.658 user time)
It's funny you posted this today. I have been working in Go quite often recently for work and went to write a quick dirty Python script. Had to get used to not having formatting just fix itself.
thankfully you can also set this up in python as well (I think it was called "black" or something like that).
doesn't go ship with lsp, formatter, test suite, cross-arch-compiler, package manager, cache for hosted package, docker images, etc and so much more? I wouldn't say they don't care about dx, but yeah they could definitely add watch mode.
Gleam is a really attractive language.
If youre a complete beginner to programming, i wouldnt suggest it, as it is functional (no "for loops" or "if statements") and also doesnt have great learning material yet.
but if youre already a programmer...its lovely.
The lack of learning materials can be definitely a reason but it being functional? I would argue that it's actually better to start with functional programming as it's higher level. People who learn how to think "functional" also become better developers in my experience.
16:48 basically options/results by default
Also omg Giacomo being the goat once again (refers to prime's reaction on new gleam features)
Trying my best to help the community! 💕
Aaaa! Isn't the fact that the speed of rust implementation is measured on another computer, using unknown code on a different code, measured by unknown methods, a little bit of confoundinh factor here?
Yeah, I would have liked to have seen Theo run a Rust implementation on his own machine at least (and in release mode, not debug mode, just to be sure).
And with the --release flag...
Looool yeah, there’s no shot that rust code was close to correct with a runtime like that
I like seeing all of these tradeoffs and potentials in the languages.
Afaik, due to tail recursive optimization happening in this example (not sure if erlang does it, I just suppose it does) the amount of times the function is called, gets optimized away under the hood. It should be just as performant as looping over something in other languages from a pure optimization standpoint.
FYI: haven't looked into this for like 3 years, so it might be wrong. Also tried stuff like this with Scala and looking at the JavaBytecode after comp to see what is being optimized and how. Please let me know if there is something wrong with my understanding in this.
Edit: stopped watching to write comment and now feel stupid for saying the same thing chat said 25:30
I just noticed my pictures in the video at the end on the Gleam sponsors page...
lightweight famous
Thanks for supportting the development of gleam
I was surprised that an Erlang module was needed. I don’t remember ever having to use Erlang in Elixir. And it would have been no problem as I have used Erlang a lot longer than Elixir.
It might have to do with the compilation targets. Gleam compiles to Erlang while Elixir compiles to the Beam AST.
Could you share how much memory you have and what your disk speed is? I suspect the really good go result is impossible unless the entire file is cached in your memory already.
Im pretty sure there is some cacheing going on which throws the results off.
There is definitely caching going on. On gen4 nvme drives, it would typically take at least a second to read the file, and probably more like two seconds.
25:02 it's because BEAM is using tail call optimization, it will actually turn that recursive code into a loop on machine level
It's lovely to see Theo so heartbroken when he realizes that numbers doesn't match his personal preferences. We all have been there (hello Haskell devs). I know it hurts, here is a hug
(づ。◕‿‿◕。)づ
Gleam is in a rough chicken-egg situation right now but if they can build the libraries and the ecosystem and stick with it, I think they have a shot. Because wow, what great syntax and features. Not even mad about implicit returns.
it's basically rust syntax
Good video because I would have never tried that experiment. I wouldn't even try to find an article to look up the experiment. But casually, I looked at it. Also, you need to clear the disk and RAM cache everytime you test it (in a professional setup/test).
Your editor is hilarious with that "we're cutting that..."
[ parallel ] is your friend, [ parallel ] will throw that stuff across all the cores. Doing this type of stuff is why c, perl, and python were made. Perl and Python exists to load C functions and run them; I guess you could do it in Go (if you're using a web backend or something).
Add Haskell to your list. Out of all the languages I've learned, it has been the most transformative and had a huge impact on how I write JS / TS.
Do you mean explicitly Haskell or functional programming concepts in general? I learned a bit of Haskell to the point where I understood what monads were. Also Lazy Evaluation by default was just soo beautiful to me.
Sadly I kinda gave up on Haskell (at least for now). I feel many things are just too complicated. Maybe not because it's supposed to be complicated but because they found really beautiful, theoretical ways of doing things.
Also the community does not try to make Haskell accessible. Fighting the tooling, unqualified imports in tutorials (or in general) and heavy use of abbreviations or single letter variable names make it really hard to understand already written code.
My favourite moment with Haskell was when i created a little game and realized that processing inputs was just "read every input there will be" into a list and then just doing list processing. Lazy evaluation is just magical!
What are your favourite aspects about Haskell and what concepts transformed your way of thinking? And how did it impact how you write code in other languages?
@@Ryuu-kun98 The thing that I like about Haskell the most is pattern matching and its syntax. It makes FP so much easier. You're able to break down problems into smaller bit sized pieces that are easier to understand. A lot of languages make this difficult because the syntax is so verbose that you aren't going to write little 1 line, 20 character functions to break a problem down. There's a certain amount of overhead to functions in many languages that you have to be above a certain threshold before you think it is worth it.
I like the fact that they use category theory to break down problems. It is really universal, but I also find it quickly gets complicated when you try to combine many things. What I've noticed is that Haskell is very strict and limited with its type system compared to something like Typescript which is largely more imperative. The limitation means to add something new you have to really fit it in to existing category theory. But if you do, it tends to be much more robust, fit extremely well, and have lots of emergent properties.
The $ and infix operators are really convenient too. Saves so much code and once you understand it, makes code so much easier to read and cleaner.
Looks like the issue has been closed on github with a fix, can you make an updated video too see your opinion on it
Who needs soy when you can look at theo thumbnails
How’s a basic Elixir implementation, btw?
Almost everything you've liked in both languages exists in Rust , AND, it formats the switch statements properly, not like those Go savages.
Eh, I'm a Gopher, so take this as you will, but Golang people don't really care about how gofmt formats Go code. What matters most to us is that there's exactly one way that code is expected to be formatted, and Go does that very well.
Saw a recent stream that showed your “to do” list, and one item was to make the same app in several app in several languages/frameworks (eg TS, php, rails, etc). Has that video been released, yet? Super interested in it. Thanks!
would love to hear your thoughts trying Zig for this
But how fast is java (or kotlin if you hate java) when trying this with a try-with-resources statement on the file stream?
So, at 13GB of file in 0.32s, you would be reading it at a speed of around 40GB/s. How does this math check out when an SSD generally has a max read speed of ~7GB/s ?
If someone has an answer please? Or I'll get back to calculus😂
He probably ran it after it was already cached in memory from his previous runs. I don't know what OS he used, but it probably uses unused memory for disk cache. Ideally, it would be tested after flushing the cache. In Linux, this can usually be done with sysctl.
The testing methods are highly flawed.
The OS basically cached a big chunk(or probably the whole file depending on the available ram) in its buffers which meant the next run was not disk IO bound and just read stuff from the much faster ram.
You never benchmark this stuff without flushing all system buffers.
In my processor design courses we even had to reboot after every benchmark to get reliable results, granted we were timing cpu caches but here you can get away with just force flushing the filesystem caches.
@@pranjalkushwaha197: It’s part of the 1brc (1 billion row challenge) specifications that the file will be fully in the disk cache.
The OS cached the file from the first time he read it. That's why you can't run one instance and call it good for benchmarking. You have to be very careful about caching, and variances between runtimes just to name a few "traps for young players". I have a linux box with ZFS pool on it and 512GB of ram - by default ZFS uses up to 50% of the RAM for a disk cache, so you can get CRAZY read speeds on HUGE files if you try the same benchmark more than once.
The JavaScript church won’t forget about this unloyal event !
There more i learn other langs, the more i love typescript
@@tuananhdo1870 stockholm syndrome
😂😂😂
he shall be 👹
@@tuananhdo1870 Me too
Partitioning the file is a reasonable suggestion, but I don't think it's possible to do for a random offset with a utf-8 file. Is the text file utf-8 or only ascii?
what about buffers in JS?
I replied somewhere else but I wrote it in deno and got 20 seconds for the line count. Not sure what he did in bun to get hour+ run lol
why in god's name is the 'open issue' badge in your github red??
I was wondering that too!
Wait what is quirky and global about the package management in Go?
I think Theo is still talking about pre go modules. Also there is a lot of tooling around the language. We can tell he isn't using `gopls` which would have formatted the imports into a block.
btw, doing recursion 1 billion times in that case it likely it just optimizing it to a while loop since it's storing the result in the parameters rather than on the stack ( func(x + 1) vs 1 + func() )
so it would optimize it down to something like
int x = 0;
while (true) {
if (ok) x +=1;
else break;
}
Thank you for recognizing defer's OP status.
yeah, and being aware of your cycle and how they can affect you during the week is important too
I expect that reading a file line by line will be quite a bit slower than reading binary blocks. The slowness is in the library, not in the language.
Also: it does not matter if your library doesn't know UTF-8. For applications like this working with UTF-8 is exactly the same as working with ASCII.
Biggest file we have parse regularly at work is a 1TB csv that arrives every business day. I talked to the vendor and suggested they use a binary format like Parquet (or, frankly, anything else, HDF5, ORC, I dunno, fucking Avro) to deliver it. They had never heard of parquet or any of the other formats.
I _might_ be wrong here since I haven't looked at the implementation, but I would guess that that rust implementation was taking long because it was unbuffered, since I remember by default the File API on rust's stdlib doesn't provide any kind of buffering. The fix IIRC is pretty simple, just wrap the File object with a "buffer object", also provided from stdlib
It is in fact, buffered.
@@Mordinel Not according to the documentation on their website. Says you have to use BufReader to get buffered IO.
Funny how js devs interact with compiled languages
recursion and stack machine, tai call optimization, register overflow?
It's ok Theo, everyone experiments in college
have you tried squeak? Alot of the modern programs have taken stuff from this programmer.
squeak language?... you mean smalltalk?
I like go except for the error handling. I know that exceptions are a parallel flow control but not having it is so annoying.
Would be interesting to see SIMD solutions using Bun/Zig or Mojo
Would love more gleam videos!
Watching Theo delete character by character in the command line is painful, does CTRL+U and CTRL+K (or I guess CMD+U and CMD+K or whatever) not work on MacOS?
My bindings were broken when I filmed this, I’ve since fixed them
or use vim plugin for terminal commands😅
This is the reason overall that I have liked using Golang. At work we use C#, and Python. I've started introducing Go for some things and have really loved it. I could have built the same things in C# or Python, but I know it would of taken me longer to be as bug free, memory leak free, and performant as my Go code. In languages like C#, C++, Python, Rust, it is very easy to do the wrong thing. In Go I've found that usually anything in Go is performant "enough" meaning that unless I see a need to optimize it then I can just leave it..
I work with C# and so far never had any moment that considered moving to go. If I had to write some really fast command line, I would first try with C# AOT compilation, and if that didn't work out (it has limitations on language features you can use), then I would consider something else. But probably I would go straigh to Zig or Rust - I don't see Go as THAT much faster, and let's be honest, DX for C# beats Go so much, especially with Rider with some nice plugins.
Also, there are some things that C# is just the fastest :D XML parsing is one example, even Rust libraries are slower than the out of the box XML parser form C# standard library
@@Qrzychu92 So I think you misunderstood my overall point. You have code performance, dev time, and experience or "skill issues". Rust, C#, probably Zig(haven't used it so unsure), dev time is generally on the slower side. Then you have the skill issues where it takes more experience in those languages to get good performance and it can be easy to cripple performance, even if unintentionally. If I want to maximize performance I would use one of those languages, but If I only need adequate performance I can get that with less dev time in Go. That's my experience anyway. I can also do it while almost entirely using the standard library. TLDR, less work to reach a higher baseline.
@@gsgregory2022 that's where we disagree, probably due to diferent sets of skills and skill issues :D
IMO, C# is one of the most productive dev environments out there. Maybe except something like T3 stack if you are doing a full stack app.
Hot reload, nuget (yes, I count that as a feature), single command dotnet build, best on the market LSP (especially if you use Jetbrains R# or Rider, VS is also VERY good), LINQ, library for everything, being able to use the same language for desktop apps, command line, servers, lambda functions, to a certain degree websites with blazor - it's awesome. Oh, and the debugger!
While in breakpoint, you can change the code, and drag the "current executing line" up to run the new version without restarting. MAGIC. That includes UI test written in Playwritght for example - awesome.
I never used go in a proper professional context, but those few times where I tried to replicate something I already done in C# the experience was MEH at best.
Only downside is that if you use a library that is not AOT compatible, the deploy size is around 40MB with embedded runtime. Not bad for a whole virtual machine in there to be honest, but still a lot. With AOT it's usually under 10MB, unless you include some big native lib like OpenCV.
So, I stay with C# :) plus it's getting better every single year
Technically you can tail call optimize in javascript. I've seen the implementation and it's very icky.
you can think of cases of switch like code "lables" in go in thats the reason for the formatting like a "goto"
You should try a similar approach buy not reading lines but hunting for new line in typescript as well.
Suggestion: JS solution could be faster, and discussed more in depth, and more search of good libs done :)
i started go recently and i remember there being nothing to it and everything just working out of the box. and complaining about the package manager i dont really understand either. seems at least as natural or perhaps more natural than js-based tooling
Weirdly I found watch isn't very usefull unless you are working on something that requires instant feedback on a bunch of small changes, very usefull on the JS world as you're building UIs, not so much in this case, I think. Especially because the JS watch does more than just build/compile, it updates de page automatically, which was the trully annoying part back in the day.
I hate how sync and async are completely used backwards in JS. In english and any other language with common roots with a, syn and chrono (italian: asincrono; spanish: asincrónico; french: asynchrone; greek: a= not syn=together chronos=time)it means exactly the opposite.
The terminology is UTTERLY counter-intiutive. Async functions ARE synchronous, they happen at the same time, and that's their power. They DO NOT wait, they DO NOT depend, they DO NOT hold the user waiting, but all those NOTs refer to anything but synchronicity.
Time to check out roc-lang?
When using case you should use let ret = case(){} because then you can make ret your last variable to return :) hope this helps!
All languages are weird, even the ones you love.
tail calls just optimize to roughly the same asm as loops
is it "poimandres" colorscheme?
"Fast" go result is just wrong. How could you read 13GiB file in 0.3s, doesn't it mean that you reading file at 45Gb/s? Where could I buy such a good ssd? Joking ofcourse, it's file system caching, that rig your results.
12:46 looks like the editor forgot to move the censor thing when theo pressed enter.
yeah I noticed that also, but he also showed his screenshot tweet of it when he was showing twitter earlier in the video anyway lol
Oh god I cringed when I saw the switch indenting in Go, glad to know you also agree. Still think custom lint rules are worth it, though, as long as a large majority of your team agrees
btw what's your font??
I tried this challenge in gleam right when it first released 1.0. One of the biggest things that was killing me was the inability to mutate anything. Even a variable wrapped in closure doesn't actually mutate the variable. I made a function which set a variable and returned an iterator. The iterator would stream through the file and yield each row. Every time the wrapped iterator would advance to the next step, the variable which existed in scope through closure would always be set to the initial value. If I set the variable to something else and then printed it from within the same step, it would show that I changed the value. On the next iteration, it would be reset back to the original value. I tried a whole bunch of stuff to optimize this further, but didn't succeed. I have some other ideas floating around in my head, but I haven't had a chance to go back to it yet. Not having early returns was difficult. The hardest thing was just not having access to regular old fashioned loops. Also, you can't just push to an array or move things around in one. Any time you modify anything, it'll copy the contents to a new array and make the change there. So you really gotta think about every single thing you do, in a way I'm not used to.
yeah its a totally diff way of thinking.
also, the language simply wasnt designed to excel in this challenge.
its focus (for now) is concurrency and web.
i make proprietary CLI tools for a living and fell in love when Gleam as soon as it released 1.0.
i have fun with the language, but havent been able to do anything useful with it for my company due to the poor CLI, file, and i/o ecosystem.
Im sure this will get better with time though, they have a really great team, and its only been v1 for a few months.
You'll find the advantages of the immutability a good thing once you get used to it. Also, a "copy" is pretty much free because the data can be shared safely due to immutability.
Yep, its called immutability and its the best thing about the language - although it definitely takes time getting used
Yeah, unfortunately, arrays don't work well with functional languages because arrays aren't immutable-friendly unlike trees and linked lists. Though there are pure functional data structures that simulate arrays with logarithmic updates and access like finger tree and Okasaki's random access list
What vscode theme is he using ?
Dev: "Hey server, watch me make mistakes and pass them to my customers, please!" Go: "Yeah I'm not doing that. Go get a package to do that for you."
Would love to see you try this with Bend on the GPU.
Putting a file on the GPU that large will consume so much bandwidth. And compute doesn't work well for something that serial.
@@monkev1199 interesting. Hadn't considered that. Thanks for the perspective
Go would be so perfect if it just had sum types and pattern matching.
And no nulls
I would argue defer is bad - you should use decomposition instead. If you're putting file I/O in the same function body that does the text processing, you're failing to apply single responsibility principle properly. The concern of how text is acquired is separate and different than the concern of counting lines. We need less patterns embedded in our language that deter SOLID programming rather than more. :/
It's funny, I'm fairly new to programming and have not so much experience with the JS world, but for me the whole JS ecosystem is very strange, with this bulky package manager npm and the dependency hell. For each project it downloads the packages again into my project folder (maybe I just don't know how to store them in one place isntead of this cluster fuck). The compile time just sucks and I get dozens of deprecated warnings, no matter which packages I use. The syntax is strange and the type system of TS is just an illusion. I don't get why it is still used so much. Humans are just lazy and don't want change?
Meanwhile the installtion of go was pretty straight forward, it even comes with an build-in package manger and and awesome LSP, formatter etc. For beginners this is the way to go IMO.
i have a 27.9 GB file with billion rows and ur go approach takes 7s on linux😢(rust takes 13s)
Anyone knows what font is he using in vscode?
Regarding gofmt formatting in a not perfect way - as Rob Pike himself said, the way gofmt formats is not ANYONE's favourite way, not even his - but it's a decent middle ground that still makes for consistent and easy parsing. It's "good enough".
Chat: "Theo, why are you not using ?" Theo: ""
How much time it takes in Python?
You can literally just install Air for go
If there is something great about go it's the very good standard library.
i hoped there would be some rust but go and gleam sound interesting too.
As typescript dev I find funny to have a build system at all.
Go has one of the most complete set of built in tools out of any language. There is plenty to not like about it, DX focus is absolutely not one of them
I hate gofmt doing no space between math operators but switches with a single indent is the way to go. If you're a true never nester then that's all of your indent budget gone in a single operation and you wouldn't be able to use a switch within a for loop, etc.. Rustfmt doing two idents for matches makes me not want to use match at all haha
10:36 C pepople also do that, but I've never been convinced
What is that font?
"so ive been trying other languages"
javascript in the thumbnail
ok
You should have used deno instead of node
I swear if I ever to get tech money you'll know. doesn't change me being a nerd, whether I get tech money or not.
Waiting for RUST and ZIG Bench marking
I dont see assembly or binary on that list...
i'll be seeing you tonight sir.. on the streets that are 2 inches wide and homes that are 51 stories tall.