FINALLY! Open-Source "LLaMA Code" Coding Assistant (Tutorial)
Вставка
- Опубліковано 6 жов 2024
- This is a free, 100% open-source coding assistant (Copilot) based on Code LLaMA living in VSCode. It is super fast and works incredibly well. Plus, no internet connection is required!
Download Cody for VS Code today: srcgr.ph/ugx6n
Join My Newsletter for Regular AI Updates 👇🏼
www.matthewber...
Need AI Consulting? ✅
forwardfuture.ai/
Rent a GPU (MassedCompute) 🚀
bit.ly/matthew...
USE CODE "MatthewBerman" for 50% discount
My Links 🔗
👉🏻 Subscribe: / @matthew_berman
👉🏻 Twitter: / matthewberman
👉🏻 Discord: / discord
👉🏻 Patreon: / matthewberman
Media/Sponsorship Inquiries 📈
bit.ly/44TC45V
Llama code 70b video coming soon!
Can you make a video to a local LLM with fine tuning guide?
codellama 70b has been amazing for me so far. code is definitely SOTA for local model. Can't wait to see tunes and merges like Phind or DeepSeek in the near future.
Will you cover miqu 70b too? Rumors aside, it's closest to GPT-4 for any local model yet and I predict it produces a surprise or two if you put it through your normal benchmarks.
@@orangeraven3869 How is it compared to the latest GPT4 build?
LOL cant wait!
How about the current huggingface leader, the moreh momo 72b model?
Need to sign in to use the plugin. No thanks. That is not completely local.
Are you saying you had to login to authenticate your license to use a local instance of their software for free? 🤯
@@carktokYes, and that is a deal breaker for many people, believe it or not.
Yeah buts that's mainly because they're paying OpenAI and Claude 2 completions API to use it without cost. Also if you want to I think you can self host Cody without login to sourcegraph.
cody is open source, you can completely run it locally..
@@vaisakh_km It's not open-source if they force you to login.
My machine, my rules!
I typically don't rely too heavily on AI when coding. I use TabbyML, which has a limited model, but it works for me. It's completely open-source and includes a VSCode extension too. It's free and doesn't require login. I use the DeepSeekCoder 6.7B model locally.
thanks for the hint, I was looking for that. I hate that cloud crap.
Cheers for the suggestion.
VSCode? Why not VSCodium?
Just tried tabby, thanks!
How did u run it ,can u give me the steps 😅?
Matt Williams, a member of the ollama team, shows how to make this work 100% free and open source in his video "writing better code with ollama"
thanks for that head up man
This is how i am setup. works great!
link please?
@@LanceJordan "writing better code with ollama"
btw there's an issue on YT putting links into a comment, even YT links, seemingly a lot of comments with links go on the missing list!
Does it have knowledge of the entire codebase?
The problem with Cody is that it just autocomplete with local models, a thing you can do with many VsCode Extensions like LLaMA Coder, an many more. All the nice features use the online version, which is extremely limited in numbers of requests if you go for the free plan (a bit of expansion on the monthly numbers of these would make things better to test or to grow a serious interest later leading to a better plan). Also there is a, not indifferent, number of extensions that do those nice features (chat, document, smells, refactoring, explain and tests) the same all in one extension and for free using local models (ollama or openai compatible endpoints). Cody does these features a little better and has a better interaction with the codebase, probably due to the bigger context window (at least from my tests) and a nicer implementaion/integration in VScode, but unless you pay you're not gonna really benefit from them cause of the low free number of requests you can afford, which aren't really enough to seriously dive in.
Matthew just confirmed on a post above that the limitations on the free tier does not apply if you run the model locally.
Can you suggest any particular alternative among those "different number of extensions"?
@@alx8439 There are many, I so far tested a few but I'm not using them at the moment so I don't remember those names.
What I did was searching extensions with names like "chat, gpt, AI, code, llama" and many will be there, then you have to test them one by one (this if what i did). I suggest you go for those who already in the description and in the pictures show options for customization like base URL for ollama or openai compatible local servers.
I think one of them has "genie" in the name.
i'm curious of those too @@alx8439
I'll answer myself then: Twinny, Privy, Continue, TabbyML
How is it local if i have to authorize with 3rd party 😮
it is not, it is clickbait
nothing is free dude.
Pay a little money and have fun my people!
Even if its not world changing breakthroughs, the speed at which all this tech is expanding can not be overstated. I remember one of the research labs was talking about how every morning they would wake up and another lab had solved something they had just started/were about to start. This is a crazy time to be alive, stay healthy everyone.
Is it cody that understands? I think it is the LM that does. Also, why $9 if I am running everything locally?
I know it is a sponsored video, but is there any open source alternative to Cody extension?
We need a completely local solution, because Cody may use telemetry and gathering some information behind the scenes
Continue does chat and fix, but doesn’t do autocompletion, and is quite unstable. There is another one that does autocomplete with ollama (LlamaCode).
You have collama which is a fork of Cody and uses llama.cpp
@@Nik.leonardPhind, best free one.
Twinny, Privy, TabbyML
one more issue with cody is it can take only 15 files for context at a time while i need an assistant that can take whole folder of project
The only time I'm coding - is while being on flight. I'm so glad I can use LLM from now on!
About 40 minutes of battery life.
Yep, i ran llms on my 15watt 7520U laptop. My 5900HX would gobble the battery even faster i think.
Wait, you have GitHub Copilot enabled there too, which shows up in your editor.
Are you sure that the completion itself is not provided by the GitHub Copilot extension and not Cody with the local model?
There is text for you to choose in the video, and it has the icon of cody, so you can see that it is the code generated by cody.
since you have to sign in, does it sends any data upstream when you use local models?
Ok it worked, kinda funny: I wrote first two lines and last line and the rest cody did after telling it to "generate fibonnaci sequence code"... thanks might be usefull some day, bit flimsy, but interesting, next I try if it can translate code too
function Fibannoci : integer;
begin
var a, b, c: integer;
a := 0;
b := 1;
while b < 100 do
begin
writeln(b);
c := a + b;
a := b;
b := c;
end;
end;
end;
Am I misunderstanding something or are you advertising this as an open source solution while it still dependent on a 3rd party service? What exactly is cody? I would have assumed if it is completely local, it's just a plugin that lets you use local models on your machine. Yet you describe it as having multiple versions with different features in each tier, including a paid tier. How exactly does that qualify as open source?
He shows you how fast it is.
Unfortunately Ollama is still not on windows, and Linux/Mac isn't an option for me so I'll have to give this a pass, unless it can work with LM studio
Has anyone manage to make it work with LM Studio?
I'm looking for a local code assistant. I don't mind supporting the project, with a license for example, but I don't want to log in each use or at all. How often does this phone-home? Will it work if my IDE is offline? Pass.
wait for llama code 70b tutorial
@@ryzikxthat should require > 40GB VRAM.
@@ryzikx 70b would require 2x 4090 or 3090.
34b takes 1.
Tabby and code llama can do that , let me find the link to the playlist
Don't think this is an option unless you got a pretty good graphics card. I set mine up and gave it a setup to autocomplete. I heard my mac cpu fan going crazy, and it took about 20 secs to get a 5 token suggestion (it was correct tho :P )
Get an M3 Max 64 or 96GB MacBook Pro. The inference speed is really good. For development it seems like you need a really good Mac nowadays.
It’s not actually fully offline. It still uses their services for embedding and caching even when using local models.
Can you please also include steps for those who are using Windows
Thanks so much! We need a video on how to train a local model via LM Studio / VS / Python
or just use oobabooga and stop using junk?
Nice! Seems to work pretty well on my linux laptop. Would be great if I could save my 10 euros a month for copilot.
Awesome video! This video series is the best source for cutting edge practical AI applications bar none. Thanks for all the work you do.
the default model for cody/ollama to use is deepseek-coder:6.7b-base-q4_K_M. You have to change this in raw json settings if you want to use a different model.
how would you do that? appreciate the help.
Enterprise AI is the best alternative for OpenAI, always helpful with coding questions
Despite the extension being available in the marketplace of VSCodium, after registration, it attempts to open regular Visual Studio Code (VSC) and doesn't function properly. It's unfortunate to encounter developers creating coding helpers that turn out to be broken tools.
Apparently they only support the given models. And the llama one actually only uses coda-llama13b. Basically it can't run something like Mistral or other llama models. Am I right?
The code for the Fibonacci function is correct in the sense of a specification. But as an implementation it's totally inefficient with exponential time O(2^n).
(In functional languages where all functions are referential transparent their results can be cached transparently, called "memoizing. But Python lacks this feature.)
I followed your instructions and I failed at 2:38 because I'm using Linux I'm seeing a different output. And thanks for your assistance.
Thanks for this awesome tutorial, how to do that for Windows os??
🎯 Key Takeaways for quick navigation:
00:00 💻 *Introduction to Local Coding Assistants*
- Introduction to the concept of a local coding assistant and its advantages,
- Mention of the coding assistant "Codi" setup with "Olama" for local development.
01:07 🔧 *Setting Up the Coding Environment*
- Guide on installing Visual Studio Code and the Codi extension,
- Instructions on signing in and authorizing the Codi extension for use.
02:00 🚀 *Enabling Local Autocomplete with Olama*
- Steps to switch from GPT-4 to local model support using Olama,
- Downloading and setting up the Olama model for local inference.
03:39 🛠️ *Demonstrating Local Autocomplete in Action*
- A practical demonstration of the local autocomplete feature,
- Examples include writing a Fibonacci method and generating code snippets.
05:27 🌟 *Exploring Additional Features of Codi*
- Description of other useful features in Codi not powered by local models,
- Examples include chatting with the assistant, adding documentation, and generating unit tests.
07:04 📣 *Conclusion and Sponsor Acknowledgment*
- Final thoughts on the capabilities of Codi and its comparison to GitHub Copilot,
- Appreciation for Codi's sponsorship of the video.
Made with HARPA AI
I thought I was getting a video on Llama, not a sales pitch for cody or w/e it's called (i dont care)
thanks for the video, this is absolutely a blessing of an assistant
If you love open source and hate products which are having strings attached and spying on you, prefer to use VSCodium instead of VSCode which is having a lot of telemetry included by default
Does anyone knows if the 500 autocompletions per month on the Free tier, also applies if we run codellama locally?
You get unlimited code completions with a local model.
@matthew_berman it said "Windows version is coming soon",I had to stop at the download step ,so I cannot continue this tutorial.
Not everyone got a lunix machine or a powerful Mac.
Could you warn people about prerequisites before starting new videos? That would help thanks.
Is it possible to use it in a Windows and WSL system? If yes how we should install LLaMA?
Same steps
if someone has error when running `ollama pull codellama:7b-code` in terminal just close and reopen your VSCode
ollama : The term 'ollama' is not recognized as the name of a cmdlet, function, script file, or operable program
is there a working tutorial for windows 10?
restart your pc
Now the only thing I need to figure out is how to add a command to the cody pop up menu or something to add: "translate from go language to pascal language" so I don't have to re-type this constantly... testing big translation now...
What an amazing new development! Thanks for you video.
A question: can I use this to complete translate a Python code repository to C++ with the goal to make it run faster? How exactly would we go about doing this?
Please make video using GPT- PILOT with LLaMA locally I appreciate all the videos you do though me a lot
Please make a tutorial on how to use AlphaGeometry
This is so cool, but doesn't the Cody login kind of invalidate the local benefits? A 3rd party still gets access to your code.
Yes, don't know though how and if the code is retained long term somehow the moment you start chatting with your codebase, plus free version has very limited amount of request you can issue a month, 500 autocomplete requests a month ( that you would probably end in a day or two considering the moment you stop typing it will process a request immediately in a few seconds delay), this is solvable with the local model, but then you have only 20 chat messages or builtin commands per months which make them useless unless you choose the paid plan.
How can you are not enjoying creating unit test
Who's here for the Chaos that comes when he starts evaluating the models on BabyAGI?
Cody is love, I'll continue using it even after 14th of February because it's just that amazing.
Ni Matthew, it seems Cody "Pro" is not free anymore, or at least it isn't obvious how to get it free with both the current stable or beta. Arch Linux and VSCodium here. I would like more AutoGen Studio LOW-CODE real-world examples, included agents that do web search+retrieval and embedding the results together with uploaded PDFs to get an offline RAG with online updated search.
*sigh*Any idea when Ollama is coming to windows??
correction. By default, it uses Claude 2.0
It worked, made a folder and then went there and command: PS G:\Models\ollama> ollama pull codellama:7b-code
So, Pro is free for 2 more days? 😁
Thanks for this awesome tutorial
what about window?~anyway to run llama ?
It installed itself on windows in something messy like: C:\Users\skybu\AppData\Local\Programs\Ollama
This ollama pull is apperently to download a model... I guess it can be stored anywhere...
Nova editor needs support for this.
Llama? More like lame clickbait. It's not local if you're required to login.
Damn... That is super dope.
Mr Berman thanks for this wonderful tutorial, add in the terminal commands you use in the description below would make it easier to copy and follow along. thank you again 🤗.
Love your channel Matthew! For me however, 100% Local is not having to have an account with an external vendor to run your coding assistant completely locally. I'm looking for just that.
You lost me at the terminal step, how you get into ollama, is that it's folder ?
It now looks like once code is selected from pull down list: CodyCompletionProvider:initialized: experimental-ollama/codellama:7b-code
can it be run using text generation web UI
OH WOW ! After installing GO language extension official from google, (and also omni pascal language extension) CODY can translate from go language to pascal language ! LOL VERY IMPRESSIVE ! LOL. THANKS !!!! =D
So the local one is the 7b version, not the 70b? Or is it a typo in the release?
70b was released and it can be run locally, but it is a massive model and should require around 40GB VRAM.
You should’ve mentioned your sponsor at the beginning of the video, or when you started talking about Cody.
That's fucked up.
ollama for Windows is available now
Hi, thanks! I use chatgpt 3.5 for generating python code by just describing what I want. It kind of works... In your opinion is this solution you propose better than gpt 3.5?
Way better than gpt3.5, gpt3.5 is pretty outdated even for simple tasks.
What to do when you love the idea but you have a windows machine?
is there another interesting alternative to Ollama that are available for windows?????????
Why does this extension force you to sign in if you are using a self-hosted Ollama service? That's really weird. Seems like they just want to collect your data.
because they plan to monetize
It also automatically opened a command prompt... can proceed from there... plus there is an item in the start menu... probably linked to this messy installation.
can you select the OpenAI one and run it through LM Studio locally too?
Does this method also allow for completion for large code bases like you went over in a previous tutorial using universal-ctags? Or do you have to still download and use universal-ctags? I think it was your aider-chat tutorial. I do not work with pyhton so using this vscode extension and cody is much better for me (front end developer using HTML, CSS and JS).
Sucks that only 2 weeks after this video was posted, Cody already makes it very hard to 'not pay' and use 100% locally...
Solution: Use Continue... Cody doesn't allow self-hosting anymore without an Enterprise ($$$) license.
I have a similar setup, but I'm encountering difficulty getting Cody to function offline. Despite specifying the local model (codellama) and disabling telemetry, the logs indicate that it's still attempting to connect to sourcegraph for each operation.
I get some strange window says: edit instruction code, I guess I have to tell it with to do... generate fibonnaci sequence code perhaps ?
It does not work for me. The extension seems to prefer connecting to sourcegraph over the internet, even though it shows it selected codellama from usntable-ollama. Inference simply does not work if I unplug the wire.
Try other, better extensions. There are number of truly open source ones, which run locally unlike this gimmick. Privy, Twinny, TabbyML, Continue and many more
And Ollama doesn't work for Windows. meh. skip.
local coding assistant sounds great indeed, but unfortunately the data politics of VSCode makes your code transparent to Microsoft anyway - if you want to keep your code and the project you are working on NDAed you will likely have to fix that as well.
1:17 - Um, why Visual Studio Code when there's VSCodium available?
I'm saving this
Lost me at sign in
Maybe I should have left out the word "from" to not confuse it with a language, re-trying with 400 lines of code.
Please tell me this thing has a mute button, I don't want to listen to robot small talk
Thank you very much for your replay. What type of HW is required to run this model locally
Can you write in the Title for what OS this Tutorial is?
It's not working... I get an error where it says it can't find the model :(
However I did not yet install go extension, maybe if go extension is installed, maybe cody can then do code translation from go language ? hmmm not sure yet.... probably not... but very maybe..
Only real issue with this model (70b) right now is that it's slow as hell on consumer hardware. I've got a 4090 and 128gb ram and it's still super slow lol.
Im same. 70b takes 2x 4090 to fit in vram.
Great but what are the the requirements for Microsoft servers and clients?
What it at least it works function by function, not sure yet how good it's translation is... me new to go...
That's great! But how is it magically linked to the ollama? How to specify other ollama hosted model (13/34b)?
This is awesome
Good tutorial!
Why isn't it working??
It's just cycling my message with cody
small models are the future
OK, I see ollama is a command line and it has a "pull" command, so this is not git pull.
I am using 2017 MacBook Air. Will using it be instantaneous?
Is it better than chatgpt4?
Someone could send me a link to learn about all the concepts of LLM, and start knowing about, models, data, parameters, how works in general to support the community and know how AI works
It failed to do a big translation... so this is a bit akward... not sure how to tell how many 8000 tokens is a text file or whatever I guess counting words whatever.. akward..