Omg you actually did it thank you so much man know that everyone is searching for the qwen-2 models on UA-cam if you make another video for tomorrow or idk when testing and showcasing how we can get the best of it for our custom data like a RAG app or how can we tunined with agents and make workflows idk you are the creative content creator here man . Much love and respect ❤❤
I set this all up in a docker container with help from this vid. Thanks for the content. There were some differences from being in docker, but easily figured out.
would be awesome if you can make a video about where to find live and updated information about llm ranking. every week we have new llm coming out but it's very hard to compare them with others llm to see witch one perform the best.
installed pip items (had to use sudo due to permissions)...that config file doesn't exist for me or sudo.... Also, this all looks like it looks for local (on-box) servers. Is there anything I can point to a central server? I've set up a host with GPUs, Ollama docker w/ GPU access, and models... I've had troubles connecting to it properly. Followed your video for twinny/codestral. Chat works, FIM doesn't.
Maybe check out aider and instruct on how to connect to different providers (gemini, local, groq since it requires extra steps unlike gpt and claude)? aider is benchmarked and seems to surpass devin and others. Seems cool!
@@AICodeKing I am tripping nvm thanks lol. Maybe focus not only on coding and also other cool tools or UIs like you did before? I really want to see open interpreter tutorial!
Still not working force me to use an OPENAI_API_KEY. Followed your video. Using windows 10, and is stuck when using sgpt. Update: in .sgptrc file i set OPENAI_API_KEY=not-needed It works, sort of, but now another issue comes up: LiteLLM:ERROR: utils.py:2297 - Model=qwen2:1.5b not found in completion cost map. Setting 'response_cost' to None
Just Wondering. what is the minimum Requirements for those Ollama Models to work in a PC cuz i wanna try it but im pretty scared cuz i dont know how much resource will it consume doin those it would be good making a video out of it explaining those
It won't take much space on disk. About 5-6gb. To run the 7b model you'll need about 16gb of RAM or 8GB of VRAM. While 1.5b will run fine on even 8gb of ram.
You can exactly calculate the memory requirement of an LLM via a formula you can find on the internet. Or you can use the estimation for the memory needed in GB: model parameter count [in billions] * bits per parameter / 8 * (1 + overhead) + (input token count + output token count) / 1024. Overhead depends on the storage type and size of the model. Smaller models have more overhead. Complicated storage formats also have more overhead. An example for a multi turn conversation with a lot of input text: 7B model, 4 bit quantization, 4096 input tokens, 512 output tokens, overhead 15%: 7 * 4 / 8 * (1 + 0.15) + (4096 + 512) / 1024 = 8.5 GB. An example for code autocompletion: 1.5B model, 4 bit quantization, 512 input tokens, 64 output tokens, overhead 20%: 1.5 * 4 / 8 * (1+ 0.2) + (512 + 64) / 1024 = 1.46 GB Besides the required amount of memory, the data transfer speed ("bandwidth") of the memory is also extremely important for using LLMs. New graphics card memory has maximal 1000 GB/s bandwidth (e.g. Geforce RTX 3090 Ti). If you don't use graphics card memory but RAM, the available bandwidth is much lower: DDR5 RAM has 90 GB/s with Intel CPUs for socket 1700. That means using an LLM is 11 times faster with a new graphics card than with a new CPU, which is why these new graphics cards are very expensive.
Why is my qwen answer in chinese language after runing "ollama run codeqwen", i have to manually type it to answer at English and type it over again for each session, im using VSC on macos
Omg you actually did it thank you so much man know that everyone is searching for the qwen-2 models on UA-cam if you make another video for tomorrow or idk when testing and showcasing how we can get the best of it for our custom data like a RAG app or how can we tunined with agents and make workflows idk you are the creative content creator here man . Much love and respect ❤❤
I set this all up in a docker container with help from this vid. Thanks for the content. There were some differences from being in docker, but easily figured out.
Love the channel! would love to get links in the description, and chapters aswell
Thanks and all the best!
Suggesting to add shell commands in the description. 👍 if you agree, please. Love your videos!
thank you for the contribution codeking
would be awesome if you can make a video about where to find live and updated information about llm ranking. every week we have new llm coming out but it's very hard to compare them with others llm to see witch one perform the best.
Thank you for the videos.
Great guy this guy
this is supper helpful please can you make a toturial about auto gpt using this llm or any local llm I think that's will be relly powerful
the "
Users can define templates.
Thank you!! Is there any way to use this for a Devin-like experience?
I'll cover a devin like experience in upcoming days. Stay tuned!
@@AICodeKing love your video! Keep it up! Thank you!!
4:17 what config-file ist this?
did not find it on the first try, but it is right there on my ubuntu system :)
installed pip items (had to use sudo due to permissions)...that config file doesn't exist for me or sudo....
Also, this all looks like it looks for local (on-box) servers. Is there anything I can point to a central server? I've set up a host with GPUs, Ollama docker w/ GPU access, and models... I've had troubles connecting to it properly. Followed your video for twinny/codestral. Chat works, FIM doesn't.
Maybe check out aider and instruct on how to connect to different providers (gemini, local, groq since it requires extra steps unlike gpt and claude)? aider is benchmarked and seems to surpass devin and others. Seems cool!
Haha! I have already done a video on it 4-5 days ago. Check it out!
@@AICodeKing I am tripping nvm thanks lol. Maybe focus not only on coding and also other cool tools or UIs like you did before? I really want to see open interpreter tutorial!
Yes, It's on my lineup. Will try to cover it this week!
😍
If It is that good, why in the huggingface arena is worse than llama3 70b?
Still not working force me to use an OPENAI_API_KEY. Followed your video. Using windows 10, and is stuck when using sgpt.
Update: in .sgptrc file i set OPENAI_API_KEY=not-needed
It works, sort of, but now another issue comes up:
LiteLLM:ERROR: utils.py:2297 - Model=qwen2:1.5b not found in completion cost map. Setting 'response_cost' to None
yo bro what i'm watching ai tools review or national geographics?
which code editor you use on right side have a fancy sidebar whats this?
Lightning AI
Just Wondering. what is the minimum Requirements for those Ollama Models to work in a PC cuz i wanna try it but im pretty scared cuz i dont know how much resource will it consume doin those
it would be good making a video out of it explaining those
It won't take much space on disk. About 5-6gb. To run the 7b model you'll need about 16gb of RAM or 8GB of VRAM. While 1.5b will run fine on even 8gb of ram.
You can exactly calculate the memory requirement of an LLM via a formula you can find on the internet. Or you can use the estimation for the memory needed in GB:
model parameter count [in billions] * bits per parameter / 8 * (1 + overhead) + (input token count + output token count) / 1024. Overhead depends on the storage type and size of the model. Smaller models have more overhead. Complicated storage formats also have more overhead.
An example for a multi turn conversation with a lot of input text:
7B model, 4 bit quantization, 4096 input tokens, 512 output tokens, overhead 15%:
7 * 4 / 8 * (1 + 0.15) + (4096 + 512) / 1024 = 8.5 GB.
An example for code autocompletion:
1.5B model, 4 bit quantization, 512 input tokens, 64 output tokens, overhead 20%:
1.5 * 4 / 8 * (1+ 0.2) + (512 + 64) / 1024 = 1.46 GB
Besides the required amount of memory, the data transfer speed ("bandwidth") of the memory is also extremely important for using LLMs. New graphics card memory has maximal 1000 GB/s bandwidth (e.g. Geforce RTX 3090 Ti). If you don't use graphics card memory but RAM, the available bandwidth is much lower: DDR5 RAM has 90 GB/s with Intel CPUs for socket 1700. That means using an LLM is 11 times faster with a new graphics card than with a new CPU, which is why these new graphics cards are very expensive.
Why is my qwen answer in chinese language after runing "ollama run codeqwen", i have to manually type it to answer at English and type it over again for each session, im using VSC on macos
You have to run Ollama run qwen2 not codeqwen
@@AICodeKingThanks a lot, will check 😁
@@AICodeKing But I really wonder why a specialized and bigger earlier Codeqwen 1.5 7b coud or could not be better then a later Qwen2 7b ?!
🥵
I'm on Windows 11. Does anyone know where the settings file is? (.sgptrc linux only i think?)
I managed to get it working. It was in the c:/users/[user]/sgpt folder.
i have a error associate with openapikey?after modifying .sgptrc file
Try adding a dummy word like "ABCD" to the OpenAI API key variable in .sgptrc file. That should fix it.
@@AICodeKing this leading to 401 client error unauthorized, any ideas?
fixed: need to change openai_api_host to local ollama server address
I can't install sgpt and litellm on Debian...
Why?
I use sgpt go port.
github tbckr/sgpt