Why are you like the only one reporting this in a timely manner on UA-cam. Who _are_ you and why am I not subbed. No really, community, who is this guy
I look at this channel and it implements about 90% scary thoughts I had lately but never had time to implement- including bringing back people from the dead. Awesome!
Thank you! Yes, it is really a fascinating thing to watch the agents interact and how the orchestrator gets them to do things, etc. This is something I am going to continue playing with as I have wanted an autonomous browsing agent like this for a while now.
@@DanishFarooqWani Hello, the repo is here, I had also shared it in response to your other comment, just want to make sure that did appear haha: github.com/OminousIndustries/autogen-llama3.2
You have made a really great thing. The visual part is completed by Llama 3.2 Vision. Can the coder model be changed to another model with coding expertise?
Thanks very much! Yes, I believe it can be and that is something I wanted to attempt myself, to essentially have a different model that may be a little smarter to handle certain tasks, leaving the vision model to handle web images. It would ofc be best to just use a large multi model like vision 90b but since that isn't realistic for a lot of people I think that perhaps delegating specific tasks like coding to a more domain specific model would be a good idea. It would be a bit of work to implement this but based off what I saw on the codebase it would not be unrealistic at all.
very cool. can we use local model routing to so it uses a specific model for coding (deepseek coder), for img vision (llama3.2 / sam1 / ...), for vid vision (sam2 / ...), writing (ideally also differentiating between type of writing: technical, creative, summarization, ), img generation (flux dev / ...), ...?
i pulled the org repo and then updated the files from your repo, then i set the openai api key to "ollama" (w/o setting it i'm getting the error msg that it needs to be set) and i'm getting the error now: File "autogen/python/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1634, in _request raise self._make_status_error_from_response(err.response) from None openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at ... - any ideas?
Based on that, it seems like the src/autogen_magentic_one/utils.py file may not have been updated correctly. I would honestly clone my whole repo and try it to rule out an issue stemming from implementing the changes manually into the original repo.
I think the only question I would have is how you eventually got it to work. I mean, it's great you got it working but I would love to get that same functionality working on my own hardware. Based on what I am seeing, you have a fairly beefy system so I am not sure I would have enough GPU power to replicate what you did.
It is not as resource intensive as a lot of other models, which is nice. The only real stressor was the model in ollama which was showing me to be utilizing around 11-12gb of vram. The rest of my system is just generic components like a 12th gen i7, cheap motherboard and 32gb of ram. I see the blog post for the model says it requires 8gb of vram so perhaps there is a way to get it to work on that: ollama.com/blog/llama3.2-vision If that was a possibility, it would be quite possible to run this all on a relatively cheap (in terms of localllm) system with a 12gb 3060 card. As for replicating it on ones own system assuming the hardware is there, I think just installing it per the instructions in the magentic-one readme but with my fork instead would allow anyone to get it working without having to wrangle with any code.
Hi! This new agentic framework looks really nice! I went through some obstacles with installation but now it's running smoothly. I wanted to see whether it's capable of scraping a dynamic content. I provided it a link where I asked it to scroll up the down of the page to extract all the submissions card titles. It was good until it had to execute the code the "coder" agent has written with selenium to scroll up to the down and the issue was in missing packages: "selenium" and "webdriver". It identified the issue and coder even provided steps to install those dependencies, however, it didn't manage to install packages and got hooked in a loop where it constantly asked me to execute the code, saw the error with missing packages, asks me again, saw the error, again and again.
Interesting feedback and very valuable, thanks for that. I also find that it gets a bit confused when needing to install dependencies as I noticed the same thing, the coder would understand the missing package thrown by trying to run the code but after that would not really "get" needing to install it. I believe this may be a limitation of the model, though I wonder if it could be bypassed by manually installing the necessary dependencies for the scraping you will be doing in the docker that the agents are running in.
Good points. I wanted to try with 3.2 vision to start but definitely going forward I want to use a larger model. Getting extra gpus for 90b is semi-tempting but I do not want to rebuild my pc hahah
Why are you like the only one reporting this in a timely manner on UA-cam. Who _are_ you and why am I not subbed. No really, community, who is this guy
I look at this channel and it implements about 90% scary thoughts I had lately but never had time to implement- including bringing back people from the dead. Awesome!
hahah I really appreciate the kind words! I like to make videos on new stuff I find cool and am glad others are interested too haha
Very cool stuff, that magentic-one looks incredible.
Thank you! Yes, it is really a fascinating thing to watch the agents interact and how the orchestrator gets them to do things, etc. This is something I am going to continue playing with as I have wanted an autonomous browsing agent like this for a while now.
very cool! :) exciting to see how fast this is all moving!
Thanks very much, it certainly is!
Ty a lot for your works !! Keep going !!
Thanks for the kind words, will do indeed!
Please let me know if you are planing to publish the changes on github? If already published please share the link. Thanks!
@@DanishFarooqWani Hello, the repo is here, I had also shared it in response to your other comment, just want to make sure that did appear haha: github.com/OminousIndustries/autogen-llama3.2
You have made a really great thing. The visual part is completed by Llama 3.2 Vision. Can the coder model be changed to another model with coding expertise?
Thanks very much! Yes, I believe it can be and that is something I wanted to attempt myself, to essentially have a different model that may be a little smarter to handle certain tasks, leaving the vision model to handle web images. It would ofc be best to just use a large multi model like vision 90b but since that isn't realistic for a lot of people I think that perhaps delegating specific tasks like coding to a more domain specific model would be a good idea. It would be a bit of work to implement this but based off what I saw on the codebase it would not be unrealistic at all.
very cool. can we use local model routing to so it uses a specific model for coding (deepseek coder), for img vision (llama3.2 / sam1 / ...), for vid vision (sam2 / ...), writing (ideally also differentiating between type of writing: technical, creative, summarization, ), img generation (flux dev / ...), ...?
In theory, yes this is possible, though it would require some more in depth modification of the way the repo functions.
Following on this amazing work done. Can you please provide the link of fork for this modification.
Thanks very much, yes the fork is here: github.com/OminousIndustries/autogen-llama3.2
i pulled the org repo and then updated the files from your repo, then i set the openai api key to "ollama" (w/o setting it i'm getting the error msg that it needs to be set) and i'm getting the error now: File "autogen/python/.venv/lib/python3.10/site-packages/openai/_base_client.py", line 1634, in _request
raise self._make_status_error_from_response(err.response) from None
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ollama. You can find your API key at ...
- any ideas?
Based on that, it seems like the src/autogen_magentic_one/utils.py file may not have been updated correctly. I would honestly clone my whole repo and try it to rule out an issue stemming from implementing the changes manually into the original repo.
I think the only question I would have is how you eventually got it to work. I mean, it's great you got it working but I would love to get that same functionality working on my own hardware. Based on what I am seeing, you have a fairly beefy system so I am not sure I would have enough GPU power to replicate what you did.
It is not as resource intensive as a lot of other models, which is nice. The only real stressor was the model in ollama which was showing me to be utilizing around 11-12gb of vram. The rest of my system is just generic components like a 12th gen i7, cheap motherboard and 32gb of ram. I see the blog post for the model says it requires 8gb of vram so perhaps there is a way to get it to work on that: ollama.com/blog/llama3.2-vision
If that was a possibility, it would be quite possible to run this all on a relatively cheap (in terms of localllm) system with a 12gb 3060 card. As for replicating it on ones own system assuming the hardware is there, I think just installing it per the instructions in the magentic-one readme but with my fork instead would allow anyone to get it working without having to wrangle with any code.
Hi! This new agentic framework looks really nice! I went through some obstacles with installation but now it's running smoothly. I wanted to see whether it's capable of scraping a dynamic content. I provided it a link where I asked it to scroll up the down of the page to extract all the submissions card titles. It was good until it had to execute the code the "coder" agent has written with selenium to scroll up to the down and the issue was in missing packages: "selenium" and "webdriver". It identified the issue and coder even provided steps to install those dependencies, however, it didn't manage to install packages and got hooked in a loop where it constantly asked me to execute the code, saw the error with missing packages, asks me again, saw the error, again and again.
@OminousIndustries I really want it to do advanced scraping as it is a great use case!
Interesting feedback and very valuable, thanks for that. I also find that it gets a bit confused when needing to install dependencies as I noticed the same thing, the coder would understand the missing package thrown by trying to run the code but after that would not really "get" needing to install it. I believe this may be a limitation of the model, though I wonder if it could be bypassed by manually installing the necessary dependencies for the scraping you will be doing in the docker that the agents are running in.
Yes there are a lot of great use cases for this, especially web based.
Very cool .it would be better if you could share the code .so that it would be useful for everyone.
Thank you. The code is public and shared, you can see it here: github.com/OminousIndustries/autogen-llama3.2
Do you have a github repo where I can copy your changes?
Yes, here is a link: github.com/OminousIndustries/autogen-llama3.2
Hey bud, you got a discord?
I do, though I don't use it too often so it may take me a day or so to see new messages, username is: omns.ind
Yeah, Qwen2-VL and Pixtral are better than 3.2V. You really need to use a larger model.
Good points. I wanted to try with 3.2 vision to start but definitely going forward I want to use a larger model. Getting extra gpus for 90b is semi-tempting but I do not want to rebuild my pc hahah