“This is probably the most undersubscribed Rust channel I’ve seen in a while. Please all of you go out there and spread the word so this guy gets millions of subscribers” is my version of the opening dialogue 😊 Thanks again for more amazing content
This comes at the right time with the right project for me. I just finished a Telegram bot that used OpenAI. I'll totally go with this solution to cut on costs. I cannot thank you enough for sharing this with the world.
Overall great video. I can think of a few business reasons people are not deploying any bots like this: 1. Bot may not be able to answer business specific questions well, because the business questions may not be in the training set. 2. In order to solve point 1, one has to somehow retrain the bot (for every business the bot is deployed to!), but businesses may not be willing to do so because it takes time and money and they have to hand over their call transcripts. 3. This change will impact head count in a big company. We all know what that means. So is only applicable for smaller companies, who don't necessary have that many calls..... Either way, it is cool technology, but I don't see the business case. I believe in market efficiency to a certain degree: if it is really that great, someone should have done it or is doing it, given that the technology is not new and everything is open source as you mentioned.
@@codetothemoon so I'd like to be able to answer any customer question based on any amount of documents that the business has input without needing to rely on keywords to narrow down the documents so basically a good semantic search to narrow results down and GPT can phrase this all properly
Great question, I'm currently looking into this. What I do know is that rust-bert seems to accept model files of a different format than the Transformer library in the Python world, and Hugging Face only offers models in ".ot" format for a small subset of all the models they offer. They don't offer the .ot format for Bloom - I'm not sure if that is because nobody has gotten around to doing the conversion or if there is some technical limitation within rust-bert that precludes such a thing.
Very interesting. Is there any way to speed up the text generation? Considering amount of processing power and time this is not really practical as it is.
I think having a CUDA capable GPU (I think that means Nvidia only, somebody keep me honest here) is the best way to speed things up. To your point - I think that would be the only way to realistically use this in production. I haven't tried it locally, but if you see how quickly Hugging Face and OpenAI respond in their web interfaces, it seems like it's near instantaneous.
I'm planning to make a personal AI assistant that has tie ins to obsidian for note taking and various IoT/WoT devices around my home so that I can hopefully have an audio interface to take notes, control my home, etc.
@@codetothemoon of course! progress is relatively slow as I work on other projects and slowly fumble my way through developing it. As of now I just have the vosk model setup in rust to do voice to text conversion. working on using the cpal crate to get audio directly from a mic. Once I can get input from the mic, translate it to text, then I can start looking into an LLM to feed that the text into as input.
Great content as usual. Tinkering with AI and Solana/Rust for the last two years i agree the possibilities are limitless, the problem is making viable apps for real life use cases with limited resources
I am going to see if this can be used to build domain-specific Q/A system that can answer about rules that apply for a situation. :-) I'll let you know how it goes
This is such a cool and exciting video, but I cannot run this demo on my machine, which has 16G RAM. I ran it from the console and had a system monitor open - it consumed all 14.2G of RAM that my system had to offer. Could you maybe do an updated version with Stanford Alpaca or something, and also maybe talk a little bit about how a developer could go about modifying this to create something new?
thanks! Yeah this model is resource hungry! There seem to have been a ton of developments in the space since this video was made, and I definitely plan on doing more on the topic.
Thanks Mian - yeah I don't think there is much performance benefit as the inference is being done by the same low level code whether you're invoking it via Python or Rust. But yeah, some might see the Rust language itself as a huge advantage 😎
good question, I haven't tried that! For some reason I neglected to commit the code for this one, here it is github.com/Me163/youtube/tree/main/bert_test
I went to set this up on Ubuntu 22.04, and everything was fine until I went to execute "cargo run". Seems like there's a libcurl related problem. I have libcurl installed, but it seems to be complaining about inflate/deflate issues, so I checked to make sure I had zlib installed too, and I did. Anybody else run into this issue?
I had some problems on Ubuntu 22.04 but it was because I tried to specify the libtorch path manually (like he showed in the begin) But I instaled libtorch using: sudo apt istall -y libtorch-dev libtorch-test libtorch1.8 So, theoretically, the installation location already is in my PATH, so the OS knows where to search for. So when I just don't specify the libtorth installation location manually, it works just fine.
What about hard rules? Let's say you have a data contract for cell phones and you offer 5 GB of included data every month and unused capacity will be lost. How can we enforce the model to obey this hard constraint?
I think the more advanced models should have no problem adhering to these hard rules, but I'm not sure about GPT-Neo 27B. This is the sort of rule I would explain in the text generation prefix. I would expect it to adhere to it the vast majority of the time, but I'm not sure about 100% of the time. Maybe with the right tuning parameters!
Good question, I wasn't actually watching my resource usage when I was trying it out. My guess is that it would use at least as much memory as the model occupies on disk, so at least 10GB.
How much RAM did it spend? My attempt spent almost all - 7/8G RAM and generated 15 virtual RAM and then crashed :( Is it me doing something wrong with the local resources or does it really need 23G+ RAM?
So far I use vscode for all my videos, primarily because I get the sense that it's what most people use. When prototyping I usually use Helix or neovim
I'm working on an OpenAI content generator. The process your described could be great for building synthetic training data on the cheap. Could then use Fluvio for Rust based data pipelines. The day of Rust AI adoption is quickly approaching 🤗
Muh dude, please no more loop { let mut line = String::new(); std::io::stdin().read_line(&mut line).unwrap(); either do (ergonomic) for line in std::io::stdin().lock().lines().map(Result::unwrap) { or (efficient) let mut line = String::new(); let stdin = std::io::stdin().lock(); loop { line.clear(); stdin.read_line(&mut line).unwrap(); Creating the String inside the loop means it will allocate every time (once it begins being written to, and potentially more than once per line for long lines) Calling `std::io::stdin()` always checks lazy initialization Reading from a `Stdin` instead of an `StdinLock` locks a mutex, even if next line is already in the BufReader's buffer! in this video the performance impact is absolutely dwarfed by running the model, but this kind of REP loop is something you do in a lot of your videos, so switching to either of the other approaches would make sense.
Thanks!
Wow thank you so much Glenn!!! I really appreciate your support of the channel!
“This is probably the most undersubscribed Rust channel I’ve seen in a while. Please all of you go out there and spread the word so this guy gets millions of subscribers” is my version of the opening dialogue 😊
Thanks again for more amazing content
LoL thanks so much for the kind words MrKeebs!
what are some other rust channels you would recommend?
@@Moof__ "No Boilerplate" is good, even if a bit cult-like.
This comes at the right time with the right project for me. I just finished a Telegram bot that used OpenAI. I'll totally go with this solution to cut on costs. I cannot thank you enough for sharing this with the world.
I'd love to try it out 🙌. I hope Rust will get more popularity in the AI domain soon.
It's worth a try. and me too!
AI researchers are too dumb to write rust 🤓
Exceptional. Thanks for this.
Wow thank you so much for your support Dennis!! Really happy you liked the video
Overall great video. I can think of a few business reasons people are not deploying any bots like this: 1. Bot may not be able to answer business specific questions well, because the business questions may not be in the training set. 2. In order to solve point 1, one has to somehow retrain the bot (for every business the bot is deployed to!), but businesses may not be willing to do so because it takes time and money and they have to hand over their call transcripts. 3. This change will impact head count in a big company. We all know what that means. So is only applicable for smaller companies, who don't necessary have that many calls..... Either way, it is cool technology, but I don't see the business case. I believe in market efficiency to a certain degree: if it is really that great, someone should have done it or is doing it, given that the technology is not new and everything is open source as you mentioned.
I've been building a customer support tool with GPT Neo, this stuff is very powerful
Nice! What are the biggest challenges you've encountered so far?
@@codetothemoon so I'd like to be able to answer any customer question based on any amount of documents that the business has input without needing to rely on keywords to narrow down the documents so basically a good semantic search to narrow results down and GPT can phrase this all properly
@@isheanesunigelmisi8400 Good luck with that..
This was a really great video. Thanks for putting it together!
Thanks for watching Brad!
Would be interesting to see if this could use the newer Bloom model. Any reason it wasn't? Hardware?
Great question, I'm currently looking into this. What I do know is that rust-bert seems to accept model files of a different format than the Transformer library in the Python world, and Hugging Face only offers models in ".ot" format for a small subset of all the models they offer. They don't offer the .ot format for Bloom - I'm not sure if that is because nobody has gotten around to doing the conversion or if there is some technical limitation within rust-bert that precludes such a thing.
as of oct 30, 2022 brew gives a warning that libtorch is deprecated and recommends using the pytorch package instead.
I Can't execute cargo run, did you manage how to solve?
Here's some lib torch error
This video deserves more views
The space background! I like it!
Thanks! Was worried it might be a little distracting, glad to hear it didn't miss the mark 🙃
great content mate, keep it up!
Thanks Pablo!!
Very interesting. Is there any way to speed up the text generation? Considering amount of processing power and time this is not really practical as it is.
I think having a CUDA capable GPU (I think that means Nvidia only, somebody keep me honest here) is the best way to speed things up. To your point - I think that would be the only way to realistically use this in production. I haven't tried it locally, but if you see how quickly Hugging Face and OpenAI respond in their web interfaces, it seems like it's near instantaneous.
compiling with `--release` for starters will help a lot :p
I'm planning to make a personal AI assistant that has tie ins to obsidian for note taking and various IoT/WoT devices around my home so that I can hopefully have an audio interface to take notes, control my home, etc.
nice, that sounds like a fun project! definitely report back on how it goes if you can!
@@codetothemoon of course! progress is relatively slow as I work on other projects and slowly fumble my way through developing it. As of now I just have the vosk model setup in rust to do voice to text conversion. working on using the cpal crate to get audio directly from a mic. Once I can get input from the mic, translate it to text, then I can start looking into an LLM to feed that the text into as input.
good work king, love you
❤️❤️❤️
Great content as usual. Tinkering with AI and Solana/Rust for the last two years i agree the possibilities are limitless, the problem is making viable apps for real life use cases with limited resources
what is generationg your "quick fixes" options.
mine only ever show "no quickfixes found"
I am going to see if this can be used to build domain-specific Q/A system that can answer about rules that apply for a situation. :-) I'll let you know how it goes
That sounds like a really interesting use case, can't wait to hear about the results!
links missing from description. pls dont hurt my feeling like that again
💔 oh no! I forgot - what link did I promise? lmk and I'll add it :)
@@codetothemoon haha looks like I was hoping for the link to gpt neo on huggingface. no worries though, i found it 😁
This is such a cool and exciting video, but I cannot run this demo on my machine, which has 16G RAM. I ran it from the console and had a system monitor open - it consumed all 14.2G of RAM that my system had to offer. Could you maybe do an updated version with Stanford Alpaca or something, and also maybe talk a little bit about how a developer could go about modifying this to create something new?
thanks! Yeah this model is resource hungry! There seem to have been a ton of developments in the space since this video was made, and I definitely plan on doing more on the topic.
Amazing! Looking forward to it 👀
Huh I wonder if rust can work with stable diffusion consideting rust has transformers port
Good question, I haven't played with stable diffusion yet - it sure looks incredible. Would be cool to be able to use it in a Rust stack
Fantastic video. Is there any advantage of using this rust method compared to a direct python module?
you get to use rust instead of python. I can think of no greater benefit you could possibly want
Thanks Mian - yeah I don't think there is much performance benefit as the inference is being done by the same low level code whether you're invoking it via Python or Rust. But yeah, some might see the Rust language itself as a huge advantage 😎
can anyone tell me what is the .. operator doing at 6:14?
edit: found it, it's "update struct syntax"
correct! It definitely comes in handy sometimes!
is there any github code of this sample? please
github.com/Me163/youtube/tree/main/bert_test
I'm curious about the way that you r using to edit your videos
I used to edit them myself, I've used a few fantastic editors for the last 6 or 7, so I probably won't be much help here :(
What's the font in your terminal?
I like it 👍
Thanks, I actually wasn't sure what I was using. I just looked it up an it's Monaco!
I'm glad you answered, cause I wasn't able to find it by myself. The best match was Osaka Mono, but it's not the same.
I wonder if this thing can generate code. Can you publish this to your github repo?
good question, I haven't tried that! For some reason I neglected to commit the code for this one, here it is github.com/Me163/youtube/tree/main/bert_test
Can this model be exported to ONNX?
I went to set this up on Ubuntu 22.04, and everything was fine until I went to execute "cargo run". Seems like there's a libcurl related problem. I have libcurl installed, but it seems to be complaining about inflate/deflate issues, so I checked to make sure I had zlib installed too, and I did. Anybody else run into this issue?
I didn't have a problem using wget
I had some problems on Ubuntu 22.04 but it was because I tried to specify the libtorch path manually (like he showed in the begin)
But I instaled libtorch using:
sudo apt istall -y libtorch-dev libtorch-test libtorch1.8
So, theoretically, the installation location already is in my PATH, so the OS knows where to search for.
So when I just don't specify the libtorth installation location manually, it works just fine.
This is a really interesting technology. I am definitely going to mess with it a little.
What about hard rules? Let's say you have a data contract for cell phones and you offer 5 GB of included data every month and unused capacity will be lost. How can we enforce the model to obey this hard constraint?
I think the more advanced models should have no problem adhering to these hard rules, but I'm not sure about GPT-Neo 27B. This is the sort of rule I would explain in the text generation prefix. I would expect it to adhere to it the vast majority of the time, but I'm not sure about 100% of the time. Maybe with the right tuning parameters!
How much RAM does it usually need?
Good question, I wasn't actually watching my resource usage when I was trying it out. My guess is that it would use at least as much memory as the model occupies on disk, so at least 10GB.
dude...so sick
I agree!
It will be
Worked, thx
Nice Mar!
How much RAM did it spend? My attempt spent almost all - 7/8G RAM and generated 15 virtual RAM and then crashed :(
Is it me doing something wrong with the local resources or does it really need 23G+ RAM?
Damn you type fast
150GB+ RAM!??
switch from emac to vscode hh :)
So far I use vscode for all my videos, primarily because I get the sense that it's what most people use. When prototyping I usually use Helix or neovim
@@codetothemoon sure thing and i like ur content, keep up
thank you :)
That is it. I'm done thinking in this world. ;) Onto another universe. Oh, wait, no AI there yet. Hm, maybe I'll stick around for a bit... ;) here.
I agree, dimensions that have AI are vastly preferable to those that don't
I'm working on an OpenAI content generator. The process your described could be great for building synthetic training data on the cheap. Could then use Fluvio for Rust based data pipelines. The day of Rust AI adoption is quickly approaching 🤗
can't wait!
tu es muito bom
muito obrigado!
i think i'll win next cc, thank you
cc ?
Creative Commons ?
Muh dude, please no more
loop {
let mut line = String::new();
std::io::stdin().read_line(&mut line).unwrap();
either do (ergonomic)
for line in std::io::stdin().lock().lines().map(Result::unwrap) {
or (efficient)
let mut line = String::new();
let stdin = std::io::stdin().lock();
loop {
line.clear();
stdin.read_line(&mut line).unwrap();
Creating the String inside the loop means it will allocate every time (once it begins being written to, and potentially more than once per line for long lines)
Calling `std::io::stdin()` always checks lazy initialization
Reading from a `Stdin` instead of an `StdinLock` locks a mutex, even if next line is already in the BufReader's buffer!
in this video the performance impact is absolutely dwarfed by running the model, but this kind of REP loop is something you do in a lot of your videos, so switching to either of the other approaches would make sense.
This gentleman sounds exactly like zuk 🤣
Hah! Thankfully I don't see running a $300B company anywhere in my immediate future...
keep doing good work.
Thanks Tunç!
correction: i'm making a multi-billion $ start-up with this =P (appreciating the encouragement) 😀
I wish I was!
nix-shell -p libtorch-bin ;)
"SaaS service"
In retrospect, "SaaS product" probably would have been less redundant
@@codetothemoon yeah lol, nice tutorial either way
Suggestion: You could edit the sound of your keyboard when sped up into something pleasant.
Hah, yeah that makes sense I'll figure something out!
@@codetothemoon Pls no! It's very pleasant! I love it
I hope it doesn't change. I love the sped up keyboard sounds!! It's so satisfying
I never understood why some people like keyboard sounds. For me personally, the quieter the better. Let alone loud and sped up lol...
Park it, Shinde
I'm going to use it to scam grandmas out of their hard earned bitcoins.
(1) I'm not sure how easy it'll be to find grandmas with bitcoins (2) I'm sure you can think of something better
next video: praise to .eth
Are we talking about the top level domain .eth?
Could this be used to type out quick and dirty Rust code? Like a cross platform mobile app.