What is the UI at 2:01? Looks like you may be playing a video provided by Meta but I'd like to test the multimodal capabilities of the model and I don't think Open WebUI supports it.
In the end, the vision encoder translates the picture to text which is then inserted into the input, then normal llm stuff, attention, perceptron and so on, takes place. Is that correct? If so, is it natural language or some other format that the Vision encoder outputs?
For you people not understanding how he ran it on his phone: He used llama 3.2 to write the code for and create the app himself. Just load the ai on your computer and tell it you want to run the 1billion model on your iphone or android and the ai will walk you through it
What is the UI at 2:01? Looks like you may be playing a video provided by Meta but I'd like to test the multimodal capabilities of the model and I don't think Open WebUI supports it.
In the end, the vision encoder translates the picture to text which is then inserted into the input, then normal llm stuff, attention, perceptron and so on, takes place. Is that correct?
If so, is it natural language or some other format that the Vision encoder outputs?
For you people not understanding how he ran it on his phone: He used llama 3.2 to write the code for and create the app himself. Just load the ai on your computer and tell it you want to run the 1billion model on your iphone or android and the ai will walk you through it
1b phone version is not released yet.
Mervin, can you test Molmo? Its also a multimodal model.
Can’t find Llama mobile app anywhere…
Thanks
Pls make a video how to run this small models on Android
I tested both 1b and 3b out on my laptop results seems to be very mediocore in comparison ro gemma and phi.
How to install the 1B version on my iPhone ?
not sure about iPhone, but Mac or Windows gaming pc is easy if you use Ollama
Yes- during the video he demo it on his iphone. How?? Can you explain how? Thank you.
@@2010SiskoNot sure where to download that Llama app from
@@2010Siskothe example codes were in the github for iphone, other than that it's not clear how to make it work on other mobile devices
Hi
3.2 models aren't very good. No where as good as 3.1 models. 3.2 90B isn't comparable to 3.1 70B. 3.1 70B is much better!
That doesn't make much sense, please qualify your comments. Do you have tests to share or any reports?