I have used LM Studio, it will run on lower end hardware, but expect very bad performance. You can try using models that have been quantized, which will perform better, but will be less precise (they can degenerate into random text). And I do not remember it having an easy way to reference files. About RAG, be aware that you want a model that has been trained on the general subject. That is: if a model is specialized in poetry, it probably won't do well with code even if you give it all the text books. Why? Because it is trying to rhyme, that is the pattern it learned. On the other hand, with convinience of ChatRTX, you should be able to give it your project files - those that you would not dare upload to an online AI - and have it give you results based on that, which would be specific to what you are doing. And let that be another reason to put comments and choose good variable names: the better the context you can give the AI the better. Finally, do not forget: Garbage in, garbage out.
The quote from HAL made me have to check this video. Local LLM is a cool idea. Regarding NVidia, they basically bailed on Linux which really sucks, but hopefully this will not stop this from being made available soon on Linux.
Yoh, Will definitely spend some time playing with this. Was using chat to help me write a story & lore bible. But Chat can only remember so much before you have to start a new conversation. Not to mention Chats constant need to equivocate over nuanced or political ideas. I spent so much time getting it to see holes in its logic. No doubt this system will still have issues. But at least i wont have to keep starting over. Thanks as always
RAG does not go to the web. With retrieval augmented generation, the local data you provide is embedded (converted to numerical form) and store in a vector database of some sort..then when you make a request to the chat bot, your query is also embedded using the same method as the data you previously embedded. A semantic search is performed (generally with cosine distance from the vectors) and the relevant data is sent to the LLM in its context window so it can base its response on the content in your data. This is done specifically to prevent hallucinations by the LLM since it has never seen the data in your documents.
'Vector database' sounds like the cloud or 3rd party data mining operatives who are only too happy to pay for the privilege. People also have to understand that one of the AI's here are linked to Meta, which is owned by Mark Zuckerberg who is famous for sharing people's data.
I use linux version of Ollama through WSL with OpenWebui as frontend, it already has RAG functionality and everything is installed by basically 3 commands. Llama3 8b works great and I can hook it to VS Code through Continue extension and have personal local Copilot.
@@jimmiealencar7636 Yes, but unless you run natively under windows, you also need 16GB of system ram. Llama3 8b uses about 3.5GB of vram on my 3050. And if everything falls, you can always run it on CPU only, but it will be slow.
At this point it is still a demo. It does not remember the context (your previous questions) or learn anything in the long run. It's just a kind of smart explorer... It can speak your language. It can go over the internet like to make a website page summary (more or less accurate & buggy). It says it can access UA-cam videos (to provide information about them) but as it can't find the video you are asking for but another random video (?), so it's useless for now.
My experience is that these small models fall apart really quickly, especially when it comes to generalized questions. For programming, they seem do to a bit better, but the difference is still quite noticeable if you ask small and large models the exact same question. The first "oh, hey, this actually feels pretty close to at least ChatGPT3.5" for me was llama 3 70b, clocking in at 42GB in size. I can only fit about half of that on my GPU, and with the rest running on CPU it's pretty slow. Like 2 tokens per second.
I have this and while it runs nice with my 4090, I still use online tools for PDFS and general research. Understand if you want to keep files private. But with a chrome extension i can use all the main AI platforms across mutiple devices. Also found gemini pro 1.5 better for large 700 page pdfs.
As a man who has countless useful epubs and pdfs, this looks very useful. I especially like that it will give you a list of its sources; not exactly a full citation yet but very usable. However, I'm not terribly keen on it being Windows only and it's asking for a hefty graphics card and a lot of disk space for something I can do by hand. I think this is good news and it shows that Nvidia is, if haphazardly, listening to the real concerns with LLMs. Otherwise it's becoming an extremely tired subject.
@@MurphyArtPrints Oh, I've scanned a number of them, and many others are from independent epub sellers like Humble Bundle and a few (legal) torrents. I'm with you on proprietary ebook viewers; it may be more durable and portable than paper but you never know when someone's going to pull the plug.
@@micmacha Not sure how scanned PDFs are gonna perform... I don't know if this does OCR, I'd be surprised if it did. Also, what did you mean by "something I can do by hand"?
No api, no custom model loading, just a simple ui, no updater... (I already downloaded it three times to update it, each time ~30gb, yes 30GB FOR MISTRAL!).
Interesting usage for AI. Seems like it could be handy. For me, I really want an easier time localizing my game. I still need to figure out the optimal way to do that.
Ollama is also an interesting option. It supports Linux, Windows, and Mac. AMD support is in preview on Linux and Windows. It sets up a server that can be accessed via an API or a simple cli chat interface.
At 1:20 there is the system requirements: Geforce RTX 30/40 Series, RTX Ampere (the ones like RTX A2000, RTX A4000) and the Ada Generation GPUs (but that is not for us mere peasants)
You can if you can get a text or PDF version of the documentation. Or enough PDF Unreal Engine books. Really its a matter of dumping as much documentation into your training model folder as you can source.
Can one add multiple file/folder locations? Or is it really just one folder that has to be the root? Can it use symbolic links or folder/file shortcuts?
How does it handle cross-referencing? What happens if you ask the math question, then ask how to calculate the same thing in Godot? It would need to know and understand the first question and how it applies to the second question instead of just looking up a direct answer in the documentation you give it.
Anything LLM is probabbly one of the best rags chat with documents software. its open source, the developers are dedicated and it rock a TON of configurations
Thanks for showing this! Seems maybe 1 other downside is you shouldn't have too many editors open or in use while using it. I wonder what would break first when local compute is maxed out.
This might be the first use-case of LLMs I'm interested in. Local is necessary to address huge env cost of GenAI, and the ability to parse your own documentation is interesting.
Mmh, just change the extension to .txt? :D Also, that's probably something nvidia will change at some point to add different text formats, html, markdown, code, etc.
@@r6scrubs126 It would have been great (and still late) if it had all the things I mentioned in my comment. I watched it, THEN commented it. Menu shows llama2 13B (@2:05), no llama3, it's only for window$ (@1:17), and the chat UI is basic (not even sure it does markdown tables). RAG are getting common now. If you're happy because you don't know OS tools, no problemo!
Any HELPFUL comments from those that are already experts on this topic about the better LLMs to use with this from the stand point of game dev? Since Mike admitted this is not his area of expertise.
Neat idea, but unnecessarily high system requirements, make it prohibitive for most people. I can run Ollama with llama3 with lower system requirements and make my own GUI.
That's like saying to a new parent that watching their child breathing must be boring. This is a new type of life developing in front of your eyes. This is History. I do find History boring, but seeing it happening every day is in a different level.
Links
gamefromscratch.com/nvidia-chatrtx-easy-local-custom-llm-ai-chat/
-----------------------------------------------------------------------------------------------------------
*Support* : www.patreon.com/gamefromscratch
*GameDev News* : gamefromscratch.com
*GameDev Tutorials* : devga.me
*Discord* : discord.com/invite/R7tUVbD
*Twitter* : twitter.com/gamefromscratch
-----------------------------------------------------------------------------------------------------------
I have used LM Studio, it will run on lower end hardware, but expect very bad performance. You can try using models that have been quantized, which will perform better, but will be less precise (they can degenerate into random text). And I do not remember it having an easy way to reference files.
About RAG, be aware that you want a model that has been trained on the general subject. That is: if a model is specialized in poetry, it probably won't do well with code even if you give it all the text books. Why? Because it is trying to rhyme, that is the pattern it learned.
On the other hand, with convinience of ChatRTX, you should be able to give it your project files - those that you would not dare upload to an online AI - and have it give you results based on that, which would be specific to what you are doing. And let that be another reason to put comments and choose good variable names: the better the context you can give the AI the better.
Finally, do not forget: Garbage in, garbage out.
The quote from HAL made me have to check this video. Local LLM is a cool idea. Regarding NVidia, they basically bailed on Linux which really sucks, but hopefully this will not stop this from being made available soon on Linux.
Does it work with pycharm code like python and Lua code for game and game engine development
Turn off the internet. What does it respond ?
The same. This doesn't use internet...
Well they've been Stupidly Simple fo a couple years now with the WebUIs like Oobabooga.
So this isn't exactly anything impressive.
Yoh,
Will definitely spend some time playing with this. Was using chat to help me write a story & lore bible. But Chat can only remember so much before you have to start a new conversation.
Not to mention Chats constant need to equivocate over nuanced or political ideas. I spent so much time getting it to see holes in its logic. No doubt this system will still have issues. But at least i wont have to keep starting over.
Thanks as always
RAG does not go to the web. With retrieval augmented generation, the local data you provide is embedded (converted to numerical form) and store in a vector database of some sort..then when you make a request to the chat bot, your query is also embedded using the same method as the data you previously embedded. A semantic search is performed (generally with cosine distance from the vectors) and the relevant data is sent to the LLM in its context window so it can base its response on the content in your data. This is done specifically to prevent hallucinations by the LLM since it has never seen the data in your documents.
'Vector database' sounds like the cloud or 3rd party data mining operatives who are only too happy to pay for the privilege. People also have to understand that one of the AI's here are linked to Meta, which is owned by Mark Zuckerberg who is famous for sharing people's data.
I use linux version of Ollama through WSL with OpenWebui as frontend, it already has RAG functionality and everything is installed by basically 3 commands. Llama3 8b works great and I can hook it to VS Code through Continue extension and have personal local Copilot.
Came to comment something similar.
Now if openWevUI gives the functionality to fine tune models furthur...
Is there a good resource for a smooth-brain to get started on this track? Thanks!!! 🙏
@@rewindcat7927 networkchuck has recently done a video explaining everything. setup ollama and then simple install the continue extension on vs code
Would it run well with a 6gb rtx?
@@jimmiealencar7636 Yes, but unless you run natively under windows, you also need 16GB of system ram. Llama3 8b uses about 3.5GB of vram on my 3050. And if everything falls, you can always run it on CPU only, but it will be slow.
Can confirm this does *not* install on a 2000 series RTX card. Tried on my 2080Ti and the installer goes nope.
How can I fetch Unreal Engine Docs to PDF ?
🤔
Getleft or other website downloader and then HTML to PDF converter.
@@UltimatePerfection
Recently they moved Docs to forums LMAO 🥵
It's not working
At this point it is still a demo. It does not remember the context (your previous questions) or learn anything in the long run. It's just a kind of smart explorer... It can speak your language. It can go over the internet like to make a website page summary (more or less accurate & buggy). It says it can access UA-cam videos (to provide information about them) but as it can't find the video you are asking for but another random video (?), so it's useless for now.
My experience is that these small models fall apart really quickly, especially when it comes to generalized questions. For programming, they seem do to a bit better, but the difference is still quite noticeable if you ask small and large models the exact same question.
The first "oh, hey, this actually feels pretty close to at least ChatGPT3.5" for me was llama 3 70b, clocking in at 42GB in size. I can only fit about half of that on my GPU, and with the rest running on CPU it's pretty slow. Like 2 tokens per second.
I have this and while it runs nice with my 4090, I still use online tools for PDFS and general research. Understand if you want to keep files private. But with a chrome extension i can use all the main AI platforms across mutiple devices. Also found gemini pro 1.5 better for large 700 page pdfs.
As a man who has countless useful epubs and pdfs, this looks very useful. I especially like that it will give you a list of its sources; not exactly a full citation yet but very usable. However, I'm not terribly keen on it being Windows only and it's asking for a hefty graphics card and a lot of disk space for something I can do by hand. I think this is good news and it shows that Nvidia is, if haphazardly, listening to the real concerns with LLMs.
Otherwise it's becoming an extremely tired subject.
What's your primary source for said PDF's and files? I need to start building a collection with the way things are going.
@@MurphyArtPrints Oh, I've scanned a number of them, and many others are from independent epub sellers like Humble Bundle and a few (legal) torrents. I'm with you on proprietary ebook viewers; it may be more durable and portable than paper but you never know when someone's going to pull the plug.
@@micmacha Not sure how scanned PDFs are gonna perform... I don't know if this does OCR, I'd be surprised if it did. Also, what did you mean by "something I can do by hand"?
That is a bad app. For example, it cannot take into account the previous chat when answering a follow up question.
No api, no custom model loading, just a simple ui, no updater... (I already downloaded it three times to update it, each time ~30gb, yes 30GB FOR MISTRAL!).
You mentioned copilot. Does this work as well as copilot if you give it your code folder to train on?
Interesting usage for AI. Seems like it could be handy. For me, I really want an easier time localizing my game. I still need to figure out the optimal way to do that.
Does it support the Arabic language?
Ollama is also an interesting option. It supports Linux, Windows, and Mac. AMD support is in preview on Linux and Windows. It sets up a server that can be accessed via an API or a simple cli chat interface.
Nice! but you're too fast man!
It's not training doccumintation/dataset it's document retrival. It literally just takes pieces of your documents and inserts them into the prompt.
no it doesnt
This is not how RAG works...
That RAG or 'sanity checker' means that it's possible your data is being distributed to 3rd parties for analysis.
No. You can run this in airplane mode. Also, if you're worried about what gets sent where, just open wireshark and check it yourself.
@@flrn84791 What do I look for in Wireshark to see whether my data is going elsewhere ?
I have a feeling that this thing need NVidia hardware that has RTX. My 1060 GTX won't run that.
At 1:20 there is the system requirements: Geforce RTX 30/40 Series, RTX Ampere (the ones like RTX A2000, RTX A4000) and the Ada Generation GPUs (but that is not for us mere peasants)
the 1060 is ancient.
@@hipflipped yes, it is. What's your point?
ChatRTX's installer has lots of bugs, never fixed. My PC has Win11 24H2 192GB DDR5+4090 installed.
I use GBT4All for local LLM
AI : hallo user, what are you doing?
please upgrade your nvidia graphics card.. or you can't continue using our AI service
train unreal engine 5
You can if you can get a text or PDF version of the documentation. Or enough PDF Unreal Engine books. Really its a matter of dumping as much documentation into your training model folder as you can source.
Can one add multiple file/folder locations? Or is it really just one folder that has to be the root? Can it use symbolic links or folder/file shortcuts?
Why do they hate linux so much
How does it handle cross-referencing? What happens if you ask the math question, then ask how to calculate the same thing in Godot? It would need to know and understand the first question and how it applies to the second question instead of just looking up a direct answer in the documentation you give it.
Anything LLM is probabbly one of the best rags chat with documents software. its open source, the developers are dedicated and it rock a TON of configurations
LM Studio is great ... I use it quite often ... there is also Ollama but it as far as I know it doesn't have a UI, but it's easy to use.
Thanks for showing this! Seems maybe 1 other downside is you shouldn't have too many editors open or in use while using it. I wonder what would break first when local compute is maxed out.
One thing to add is ChatRTX requires Windows 11. 70% of the market is Windows 10 so it's only for a limited number of users
Thanks for that lmao, I'm on win10 because 11 breaks my dev software and kills my performance. Shame this is win 11 only
LMDE 6 is the future, Windows can go to hell
It runs fine on win10 for me (using rtx 3070)
I am running this on Windows 10. Working fine.
Get real. I haven't seen a single windows 10 pc on the market. Not even the cheap ones competing with chromebooks.
Easy to install and use, easy to train on my own data? Man this thing is gonna be killer for brainstorming and worldbuilding!
This might be the first use-case of LLMs I'm interested in. Local is necessary to address huge env cost of GenAI, and the ability to parse your own documentation is interesting.
Llama 3 with Pinokio works great for this as well.
This actually sounds pretty cool.
Dang it... I have so much text in markdown format that is useless for this training data 😭
Mmh, just change the extension to .txt? :D Also, that's probably something nvidia will change at some point to add different text formats, html, markdown, code, etc.
Too slow, given it is 7b running on a GPU.
Does it read image based pdfs? Or do you have to convert the pdfs into readable format?
Most likely it doesn't it doesnt state anywhere and some people commented on NV page that it doesn't see the files.
Win 11 only ? Lol no..
I've got 2060 with 6 GB :(
Nice video. This is good content
Excellent! Thanks.
So Kobold but Nvidia?
This is very interesting 🤔
Need this with agents
They're late to the party: not llama3, only windos, only a basic chat interface. Open source RAG tools are already here.
Did you even watch the first 30 seconds. It's an easier alternative to all the open source build it yourself ones. I think that's great
@@r6scrubs126 It would have been great (and still late) if it had all the things I mentioned in my comment. I watched it, THEN commented it. Menu shows llama2 13B (@2:05), no llama3, it's only for window$ (@1:17), and the chat UI is basic (not even sure it does markdown tables). RAG are getting common now. If you're happy because you don't know OS tools, no problemo!
Llama 3 is not even open source by definition, Mistral is doing a better job
Llama3 isnt open-source??
Seriously, i was using for an entrepreneur application
Any HELPFUL comments from those that are already experts on this topic about the better LLMs to use with this from the stand point of game dev? Since Mike admitted this is not his area of expertise.
Neat idea, but unnecessarily high system requirements, make it prohibitive for most people.
I can run Ollama with llama3 with lower system requirements and make my own GUI.
i like some things on AI but this is getting pretty boring now
That's like saying to a new parent that watching their child breathing must be boring.
This is a new type of life developing in front of your eyes. This is History. I do find History boring, but seeing it happening every day is in a different level.
OMG This is way more cooler than I thought.
Awesome video! thank you for sharing