Secrets to Self-Hosting Ollama on a Remote Server

Поділитися
Вставка
  • Опубліковано 8 чер 2024
  • 👋 Hey Tech Enthusiasts! Today, I'm thrilled to share a complete guide on self-hosting the Llama 3 language model using Google Cloud! Whether you're using GCP, AWS, or Azure, the concepts remain the same. 🌐💻 Secrets to Self-Hosting Ollama on a Remote Server
    🔧 What You'll Learn:
    Creating a Linux VM: How to set up a virtual machine with GPU support on Google Cloud.
    Installing Llama: Step-by-step instructions on installing and activating Llama 3 on your VM.
    Remote Access Activation: Tips on how to make your Llama server accessible and secure.
    UI Integration: How to build and integrate a chatbot user interface to interact with your Llama model.
    🎬 In this video, I take you through each step, from VM creation to the exciting part of chatting with your own AI. Don't miss out on learning how to fully control your AI’s environment and keep your data in-house. Perfect for developers and tech enthusiasts looking for hands-on AI deployment experience!
    👍 Like this video if you find it helpful and subscribe to stay updated with more content on artificial intelligence and technology. Ring that bell for notifications on new uploads!
    🔗 Resources:
    Sponsor a Video: mer.vin/contact/
    Do a Demo of Your Product: mer.vin/contact/
    Patreon: / mervinpraison
    Ko-fi: ko-fi.com/mervinpraison
    Discord: / discord
    Twitter / X : / mervinpraison
    Code: mer.vin/2024/05/ollama-remote...
    📌 Timestamps:
    0:00 - Introduction
    1:32 - Setting Up a Virtual Machine with GPU
    2:01 - Adjusting VM Settings and Storage
    2:17 - Installing NVIDIA Drivers
    4:02 - Installing Ollama
    4:30 - Enabling Remote Access
    6:53 - Final Steps: App Integration and Testing
    Key Points for Description:
    Highlight the self-hosting aspect of Llama 3 using Google Cloud, which can be adapted for other platforms like AWS or Azure.
    #OLlama #Host #RemoteServer #HowToInstallOLlama #InstallingOLlama #PublishOLlama #OLLamaDeploy #DeployOLlama #DeployOLlamaServer #DeployOLlamaGoogleCloud #GCPOllama #GCloudOLlama #GoogleCloudOLlama #OLlamaGoogleCloud #OLlamaAzure #OLlamaAWS #OLlamaLlama3 #Llama3 #Llama3Deploy #Llama3Publish #Llama3GoogleCloud #SelfHostLlama3 #SelfHost #Host #Hosting #HostOLLama #HostLlama #HostingOLLama #OLlamaHosting
  • Навчання та стиль

КОМЕНТАРІ • 59

  • @60pluscrazy
    @60pluscrazy 29 днів тому

    Mervin keep up the good work. Significant videos 🎉🎉🎉

  • @AlloMission
    @AlloMission 29 днів тому

    Thanks really amazing! As always. You do a great job.

  • @thebluefortproject
    @thebluefortproject 15 днів тому

    great video, thanks a lot

  • @christophercelaya
    @christophercelaya 28 днів тому

    I love these type of projects!

  • @rccmhalfar
    @rccmhalfar 29 днів тому

    so much for self-hosting! GCloud!

    • @MervinPraison
      @MervinPraison  29 днів тому

      Choose Google cloud as an example. Same procedure you can follow on any computer to set it up

  • @GoWithAndy-cp8tz
    @GoWithAndy-cp8tz 7 днів тому

    this way is more expensive than OpenAI with GPT 4o or even GPT 4-turbo... Anyway the video is very detailed and I appreciate it. Cheers!

  • @sophiedelavelle5958
    @sophiedelavelle5958 29 днів тому +1

    This is amazing ahah

  • @jmsdvs
    @jmsdvs 29 днів тому

    Great video! Would love a tutorial on installing a server on your home PC to access from anywhere! Thanks again!

    • @MervinPraison
      @MervinPraison  29 днів тому +1

      You could just follow the same steps on your home PC. To make your home PC as the server.
      I choose Google cloud just as an example.

    • @jmsdvs
      @jmsdvs 29 днів тому +1

      @@MervinPraison I figured that was the case, I'm a little bit of a noob when it comes to servers, I appreciate the response!

  • @jeetendrachauhan3236
    @jeetendrachauhan3236 28 днів тому

    I did same experiment yesterday with aws EC2 instance 32 GB Memory (without GPU), And output was amazing.

  • @rccmhalfar
    @rccmhalfar 29 днів тому +1

    Still respect for delivering such content. What I am after is to create an automatic ollama deployment on cloud provider which bills per minute and after the prompt is consumed the machine is shut off so to save on consumption and this way you would be billed for the minutes used - would like to see that

    • @JoeSmith-kn5wo
      @JoeSmith-kn5wo 29 днів тому

      I will say automating the deployment is pretty straightforward, but shutting down the server after an API call does not make much sense. You would no longer be able to send API request to the Ollama server after the server is shutdown. You would manually need to restart the server that is hosting Ollama.

  • @classictablet9149
    @classictablet9149 15 днів тому +1

    how many concurrent calls does this accept? can you please comment on this topic?
    thanks

  • @ben_stace
    @ben_stace 29 днів тому

    What version of llama are you running 8 or 70? Thank you as well for the great video.

  • @mikew2883
    @mikew2883 29 днів тому

    This is great! Have you been able to setup the Ollama Web UI remotely as well?

    • @MervinPraison
      @MervinPraison  29 днів тому +2

      Yes it should be easy. May be I’ll plan to create a video about ollama web UI and remote setup

    • @mikew2883
      @mikew2883 29 днів тому

      ​​@@MervinPraisonThat would be awesome! 👍

  • @farexBaby-ur8ns
    @farexBaby-ur8ns 29 днів тому

    Think better to buy or make a machine to host the ai server. But didn’t know there is costlier way to do this via Google cloud . Also was familiar with using openui, but never seen chainlit option. So good value with this vid.. kudos

    • @MervinPraison
      @MervinPraison  29 днів тому

      Yes you can buy a machine yourself and configure it. But the Graphic Cards might cost a lot and managing an AI server is Tedious. Also it is not easily scaleable.

  • @ckgonzales16
    @ckgonzales16 23 дні тому

    Can I use firebase instead of gcloud

  • @AdarshSingh-rm6er
    @AdarshSingh-rm6er День тому

    Can you please help me how to host llama3 on my website/proxy. I have a vm created by my company one gcp . i installed and run the llama3 on the local system but dont know how to host it on the website. My senior used nginx and its showing some error 403 . please help. I can share the code and config if you want.

  • @hasstv9393
    @hasstv9393 28 днів тому +1

    Is it possible to make it sass ready?

    • @MervinPraison
      @MervinPraison  28 днів тому +1

      This is a starting point. Also implement enough security to make it production ready

    • @hasstv9393
      @hasstv9393 28 днів тому +1

      @@MervinPraison Can you show how to do that, So that we can provide this software as a service!

  • @jiuvk8393
    @jiuvk8393 28 днів тому

    is the "$204" exactly what you have to pay regardless of how many people use the app per day or per month?

    • @MervinPraison
      @MervinPraison  28 днів тому

      Once the number of users increases massively then you might need to increase the spec. That would result in costing more.
      But this is a good starting point

    • @BamiCake
      @BamiCake 26 днів тому

      So in essence it would cost up $2400 a year to self host an LLM on GCP?

  • @Epirium
    @Epirium 29 днів тому

    Make video on hosting ollama for Free Remotely ?

  • @collinsalomon
    @collinsalomon 29 днів тому +1

    amazing video ! but you need to turn off your vm !!! :)🤣

    • @MervinPraison
      @MervinPraison  29 днів тому

      Thanks for letting me know. I did turn off after recording the video :)

  • @phutrinh686
    @phutrinh686 28 днів тому

    how much per month for host it? commercial deficit will kill your wallet. no tks.

  • @williamwong8424
    @williamwong8424 28 днів тому

    is the API key really just fake api key? we dont need to find where to get the api key?

    • @williamwong8424
      @williamwong8424 28 днів тому

      and we just need the base url will do?

    • @MervinPraison
      @MervinPraison  28 днів тому

      Just any key is fine. It’s not locked based on api key

  • @basilbrush7878
    @basilbrush7878 29 днів тому

    Nice idea, but surely, it's cheaper, more efficient to use Groq?

    • @MervinPraison
      @MervinPraison  29 днів тому +1

      Yes, possibly. But some people prefer to have more control end to end for more security. Also this can be used for testing, for users within a private network and for more privacy

  • @impactsoft2928
    @impactsoft2928 29 днів тому

    google cloud free

    • @MervinPraison
      @MervinPraison  29 днів тому

      They should provide approx $300 credits to use in Google cloud to get started.

  • @jobasti
    @jobasti 29 днів тому +6

    I really wanted this video to be traching how to host your own Ollama instance for non tech savy people - I really tried to like this video for what i could be. But sadly the video ist like you deliverd it to us ..... Honest feedback: This video feels more like a G-Cloud ad video with no explanation what so ever .... You leave a lot of unanswered questions: WHY Google-Cloud when the Title is Self-Hosted? (Self-Hosted means YOU host something YOURSELF) Why show 204$ a month first then add options and then place your camera over the final price? Why Ubuntu 22.04 LTS when 23.10 will have never gpu drivers? Why say Copy and Pste 5 times in 15 Seconds ... why not just explain what you do or dont say anything? Why not give a "security warning" that copy pasting that ollama command as root could be dangerous, read the script code first? Why choose a Tesla T4? How many Tokens does it have? WHY did you choose this gpu which makes it so expensive? Why did you change for 10GB to 100GB? How many models do you want to host there? How big are the models? Where is the information for the user to make an informed and education decision from you? If you use Chainlit, explain beforehand what this is please. - Nice Security warning around the firewall and network topic, thanks for that! - I really dont want to come over as an a**hole but other videos of you have been much more detail oriented and better planned and excuted for me as a viewer and an DevOps person. Thanks for the time!

    • @d.d.z.
      @d.d.z. 29 днів тому +1

      I'd like to see Marvin's response

    • @MervinPraison
      @MervinPraison  29 днів тому +7

      Thanks for your feedback. All your questions are very valid. My original intent is to show how to host Ollama on remote server for beginners.
      I could have just showed the remote server and focused on setting up Ollama. But just to show each and every step I did end-end, I had to cover even Google Cloud setup.
      Google cloud is an option, it could be any server. Even it could be your own Local PC as a server, all other steps remains similar.
      Why Google Cloud when the title is "Self-Hosted"?
      "Self-hosted" generally means running services on servers you control, but it can include cloud servers that you manage, like Google Cloud.
      Why show $204 a month first then add options and then place your camera over the final price?
      The initial price shown is the base rate, with added options increasing the cost.
      Why Ubuntu 22.04 LTS when 23.10 will have newer GPU drivers?
      Ubuntu 22.04 LTS is a Long-Term Support release, offering stability and support for 5 years, which is preferable for servers over newer, less tested versions. This is based on my various research and some testings. This will surely change in the next few months.
      Why say "Copy and Paste" multiple times in a short period without further explanation?
      This aims to keep the tutorial short and to focus on Ollama, rather than on Nvidia Driver Setup. Here are the things I copied to install Nvidia Driver cloud.google.com/compute/docs/gpus/install-drivers-gpu#secure-boot
      Why no security warning about copying and pasting commands as root?
      This is a valid concern. Running scripts as root can be risky. Always read and understand scripts before executing them, especially with root privileges.
      Why choose a Tesla T4 GPU? How many tokens does it have?
      I choose Tesla T4 because it was one among the cheapest and costs less to produce this video demo. A100, H100's are the best.
      Why did the storage change from 10GB to 100GB?
      Lllama3 8B takes approx 5GB for itself, and other tools will take 5GB approx.
      So had to increase to 100GB to have enough storage if required.
      Where is the information for the user to make an informed and educated decision?
      I understand that Video doesn't explain about Graphic cards, Detailed Nvidia driver setup because when I started recording the video my intension is just to show how to setup Ollama on a Remote server and integrate it with a local application.
      Yes, I should have explained what is Chainlit, valid point. Thanks for letting me know and for your detailed feedback.

    • @jobasti
      @jobasti 27 днів тому +1

      @@MervinPraison "Thanks for letting me know and for your detailed feedback." Sure anytime! - Dont get the wrong impression of my many questions, i like what you are doing! Please keep up the good work! I just like bit more detail if that is possible =)