These videos are really cool. I'm not a beginner, far from it, but it is soooo nice to get this information in such a distilled manner, and from a person that clearly knows what they are talking about. So natural!
That looks like a nice way to run an LLM for my personal use, but I’d like to also try out one of the bigger LLM models. Is that doable at all? Or will I need to stick to models that fit within the 40gb GPU memory of the a100 for instance?
How big are you talking about? Generally the amount of vram you need is parameter count times parameter size in bits divided by 8 to get bytes plus 20%. Check this short for more info: ua-cam.com/users/shortstCE-awsKmmg In general though: * 13b or lower: any GPU works, no caveats * 30b or lower: any GPU works, but you need at least Q8 or FP8 quantization * 70b or lower: use the a100-80gb or the L40s * greater than 80b: it depends, if you're lucky it'll work on one GPU, if not then you'll need to use multiple GPUs
Hey, I tried setting this up but I have this error: 2024-08-24T00:27:36.386 runner[***] ord [info] Machine started in 3.517s 2024-08-24T00:27:37.133 app[***] ord [info] INFO Main child exited normally with code: 1 2024-08-24T00:27:37.152 app[***] ord [info] INFO Starting clean up. 2024-08-24T00:27:37.266 app[***] ord [info] INFO Umounting /dev/vdc from /root/.ollama 2024-08-24T00:27:37.268 app[***] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument 2024-08-24T00:27:37.269 app[***] ord [info] [ 3.718685] reboot: Power down any ideas on what would cause this?>
These videos are really cool. I'm not a beginner, far from it, but it is soooo nice to get this information in such a distilled manner, and from a person that clearly knows what they are talking about. So natural!
And she has a great personality.😂
I see alot of explainer videos and yours are the best!, just grate content delivery and tone, prefection at all!
Can we use any llama based model ? In the destination xan we use the llm we have downloaded ? Imean the custom llm based on llama ?
Can you provide wireguard instructions you mentioned? Btw perfect tutorial :)
That looks like a nice way to run an LLM for my personal use, but I’d like to also try out one of the bigger LLM models.
Is that doable at all?
Or will I need to stick to models that fit within the 40gb GPU memory of the a100 for instance?
How big are you talking about? Generally the amount of vram you need is parameter count times parameter size in bits divided by 8 to get bytes plus 20%. Check this short for more info: ua-cam.com/users/shortstCE-awsKmmg
In general though:
* 13b or lower: any GPU works, no caveats
* 30b or lower: any GPU works, but you need at least Q8 or FP8 quantization
* 70b or lower: use the a100-80gb or the L40s
* greater than 80b: it depends, if you're lucky it'll work on one GPU, if not then you'll need to use multiple GPUs
Cool vid, thanks!
ollama run llama3 why is fly cool?
Great video 👌👌
Thank you 👍
I don't understand, how is this self hosting, isn't this cloud hosting?
Hey, I tried setting this up but I have this error:
2024-08-24T00:27:36.386 runner[***] ord [info] Machine started in 3.517s
2024-08-24T00:27:37.133 app[***] ord [info] INFO Main child exited normally with code: 1
2024-08-24T00:27:37.152 app[***] ord [info] INFO Starting clean up.
2024-08-24T00:27:37.266 app[***] ord [info] INFO Umounting /dev/vdc from /root/.ollama
2024-08-24T00:27:37.268 app[***] ord [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-08-24T00:27:37.269 app[***] ord [info] [ 3.718685] reboot: Power down
any ideas on what would cause this?>
I got it, I had to play around with the memory sizes
@@TheloniousBird What memory size? Explain?
@@dareljohnson5770 in the fly Toml, VM -> memory I had to set it to 16 where it was originally set to 8