Code › github.com/nonoesp/live/tree/main/0113/google-gemma ↓ Timestamps! 00:00 Introduction 01:05 Find Models in Hugging Face 01:28 Terms 01:57 Install the Hugging Face CLI 02:21 Login 02:55 Download Models 03:51 Download a Single File 04:50 Download a Single File as a Symlink 05:25 Download All Files 06:32 Hugging Face Cache 07:00 Recap 07:29 Using Gemma 08:02 Python Environment 08:47 Run Gemma 2B on the CPU 12:13 Run Gemma 7B in the CPU 13:07 CPU Usage and Generating Code 17:24 List Apple Silicon GPU Devices with PyTorch 18:59 Run Gemma on Apple Silicon GPUs 23:52 Recap 24:25 Outro Thanks for watching! Subscribe to this Luma calendar for future live events! lu.ma/nono
Thank you so much! I was able to run gemma-2b-it. great model. love how google is releasing this open source, rather than closed-source (unlike ClosedAI's ChatGPT)
7 місяців тому
Nice! Happy to hear you were able to run Gemma. =)
"Why is it that when I run 2B it's very slow on my Mac Air M2, usually taking over 5 minutes to generate a response? But on Ollama, it's very smooth?"🤨
6 місяців тому
Hey! It's likely because they're running the models with C++ (llama.cpp or gemma.cpp) instead of running them with Python. It's much faster, and I'm still to try Gemma.cpp. Let us know if you experiment with this! Nono
hi i am getting this error Traceback (most recent call last): File "C:\Users\Priyank Pawar\gemma\.env\Scripts un_cpu.py", line 2, in from transformers import AutoTokenizer, AutoModelForCausalLM ModuleNotFoundError: No module named 'transformers' i installed transformer but still getting error
7 місяців тому
Hey, Priyank! Did you try exiting the Python environment and activating it again after installing transformers?
I have followed everything. I get this error when trying to run on GPU: RuntimeError: User specified an unsupported autocast device_type 'mps' I have confirmed the mps is available and have reinstalled everything
7 місяців тому
Hey! If you've confirmed mps is available, you must be running on Apple Silicon, right? If you are, and you've set up the Python environment as explained, can you share what machine and configuration you're using? I've only tested this on an M3 Max MacBook Pro.
7 місяців тому
Other people have mentioned the GPU not being available in macOS versions previous to Sonoma. Are you on the latest update?
@Hello! Thank you for your response. Yes i have an M1 MAX with the Sonoma 14.3.1. I also tried all the models in case there was an issue with the number of parameters.
@@alkiviadispananakakis4697 I had the same issue on M2 Pro. I just fixed it by downgrading transformers to 4.38.1. Now my only problem is it's unbelievably slow to run!
7 місяців тому
Nice! The only thing that may be faster is running gemma.cpp.
Hi, I'm trying to make an order delivery chatbot. I made with GPT by giving APIs but I think it will cost too much. That's why I want to train a model. What you suggest about this?
6 місяців тому+1
Hey! I would recommend you try all open LLMs available at the moment and asses which one works better for you in terms of costs for running it locally, speed of inference, and performance. Ollama is a great resource because in one app you can try many of them. Gemma is a great option, but you should look at Llama 2, Mistral, Falcon, and other open models. I hope this helps! Non
I am trying to run with cpu. I am getting this error: Gemma's activation function should be approximate GeLU and not exact GeLU. Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu` instead of `hidden_act`. See github.com/huggingface/transformers/pull/29402 for more details. Loading checkpoint shards: 0%| | 0/2 [00:00
Can you provide the code, where symlinks is not used. Just downloaded as a repo. How to add that repo in the code. I have copied the folder in the environment and just added like, tokenizer = AutoTokenizer.from_pretrained("./gemma-7b-it")
7 місяців тому
No matter whether you symlink or download the files to the folder, you should be able to load the files in the same way. To download the files (without symlink) you can add the flag --local-dir LOCAL_PATH_HERE and not use the --local-dir-use-symlink flag. Note that even when you don't symlink, the large files, i.e., the models, will still be symlinked because they are often huge files. I hope that helps! =) Nono
Code › github.com/nonoesp/live/tree/main/0113/google-gemma
↓ Timestamps!
00:00 Introduction
01:05 Find Models in Hugging Face
01:28 Terms
01:57 Install the Hugging Face CLI
02:21 Login
02:55 Download Models
03:51 Download a Single File
04:50 Download a Single File as a Symlink
05:25 Download All Files
06:32 Hugging Face Cache
07:00 Recap
07:29 Using Gemma
08:02 Python Environment
08:47 Run Gemma 2B on the CPU
12:13 Run Gemma 7B in the CPU
13:07 CPU Usage and Generating Code
17:24 List Apple Silicon GPU Devices with PyTorch
18:59 Run Gemma on Apple Silicon GPUs
23:52 Recap
24:25 Outro
Thanks for watching!
Subscribe to this Luma calendar for future live events! lu.ma/nono
Thank you so much! I was able to run gemma-2b-it. great model. love how google is releasing this open source, rather than closed-source (unlike ClosedAI's ChatGPT)
Nice! Happy to hear you were able to run Gemma. =)
im still on monterey, so gpu doesnt work
yeah cant wait to update to sonoma and use the full power of the m1 pro@
Awesome stuff! Thank you, Nono.
Thank, you Nadiia! Glad you found this useful. =)
"Why is it that when I run 2B it's very slow on my Mac Air M2, usually taking over 5 minutes to generate a response? But on Ollama, it's very smooth?"🤨
Hey! It's likely because they're running the models with C++ (llama.cpp or gemma.cpp) instead of running them with Python. It's much faster, and I'm still to try Gemma.cpp. Let us know if you experiment with this!
Nono
@ can you link gemma.cpp haven't looked google yet but if you can would be nice
github.com/google/gemma.cpp
hi i am getting this error
Traceback (most recent call last):
File "C:\Users\Priyank Pawar\gemma\.env\Scripts
un_cpu.py", line 2, in
from transformers import AutoTokenizer, AutoModelForCausalLM
ModuleNotFoundError: No module named 'transformers'
i installed transformer but still getting error
Hey, Priyank! Did you try exiting the Python environment and activating it again after installing transformers?
I have followed everything. I get this error when trying to run on GPU:
RuntimeError: User specified an unsupported autocast device_type 'mps'
I have confirmed the mps is available and have reinstalled everything
Hey! If you've confirmed mps is available, you must be running on Apple Silicon, right? If you are, and you've set up the Python environment as explained, can you share what machine and configuration you're using? I've only tested this on an M3 Max MacBook Pro.
Other people have mentioned the GPU not being available in macOS versions previous to Sonoma. Are you on the latest update?
@Hello! Thank you for your response. Yes i have an M1 MAX with the Sonoma 14.3.1. I also tried all the models in case there was an issue with the number of parameters.
@@alkiviadispananakakis4697 I had the same issue on M2 Pro. I just fixed it by downgrading transformers to 4.38.1. Now my only problem is it's unbelievably slow to run!
Nice! The only thing that may be faster is running gemma.cpp.
Hi, I'm trying to make an order delivery chatbot. I made with GPT by giving APIs but I think it will cost too much. That's why I want to train a model. What you suggest about this?
Hey!
I would recommend you try all open LLMs available at the moment and asses which one works better for you in terms of costs for running it locally, speed of inference, and performance. Ollama is a great resource because in one app you can try many of them. Gemma is a great option, but you should look at Llama 2, Mistral, Falcon, and other open models.
I hope this helps!
Non
Thanks, a lot @
Great Content!
Hi Ludovico!
Thanks so much for letting me know.
I'm you found the content useful. =)
Cheers!
Nono
what is your machine?
Hi, Maxyan!
I'm using a MacBook Pro M3 Max (14-inch, 2023) with 1TB SSD, 64GB Unified Memory, and 16 cores (12 performance and 4 efficiency).
Nono
😍thanks! that is very detailed@
I am trying to run with cpu. I am getting this error:
Gemma's activation function should be approximate GeLU and not exact GeLU.
Changing the activation function to `gelu_pytorch_tanh`.if you want to use the legacy `gelu`, edit the `model.config` to set `hidden_activation=gelu` instead of `hidden_act`. See github.com/huggingface/transformers/pull/29402 for more details.
Loading checkpoint shards: 0%| | 0/2 [00:00
Hey, Aamir! What machine are you running on?
@ I am using intel core ultra 5 14th gen.
I've only run Gemma on Apple Silicon so I can't guide you too much. Hmm.
Can you provide the code, where symlinks is not used. Just downloaded as a repo. How to add that repo in the code.
I have copied the folder in the environment and just added like,
tokenizer = AutoTokenizer.from_pretrained("./gemma-7b-it")
No matter whether you symlink or download the files to the folder, you should be able to load the files in the same way.
To download the files (without symlink) you can add the flag --local-dir LOCAL_PATH_HERE and not use the --local-dir-use-symlink flag.
Note that even when you don't symlink, the large files, i.e., the models, will still be symlinked because they are often huge files.
I hope that helps! =)
Nono