Why I'm Staying Away from Crew AI: My Honest Opinion

Поділитися
Вставка
  • Опубліковано 13 січ 2025

КОМЕНТАРІ • 141

  • @HarpaAI
    @HarpaAI 7 місяців тому +2

    🎯 Key Takeaways for quick navigation:
    00:30 *🧩 Explanation of Multihop Questions*
    - Multihop questions are designed to be challenging by requiring preceding knowledge.
    - Questions are structured as linear or parallel decompositions.
    - Understanding the structure of multihop questions is essential for building effective agent workflows.
    06:39 *🗂️ Overview of Agent Workflow in Crew AI*
    - The agent workflow in Crew AI consists of a planning agent, search agent, integration agent, and reporting agent.
    - Each agent has a distinct role in the workflow, from breaking down questions to organizing information and delivering responses.
    - Feedback loops between agents help refine the investigation process and ensure accuracy in responses.
    10:25 *🤖 Setting up Tasks and Descriptions in Crew AI*
    - Tasks in Crew AI are assigned to specific agents and include detailed descriptions, expected outputs, tools required, and contextual information.
    - Different agents have different responsibilities, such as conducting searches, organizing information, or delivering final responses.
    - Providing clear and concise descriptions for tasks helps guide the behavior of each agent in the workflow.
    23:29 *🤖 Types of workflows in Crew AI*
    - Explaining sequential and hierarchical workflow structures in Crew AI.
    - Highlighting the role of the manager LLM in hierarchical workflow.
    - Differentiating between sequential and hierarchical operations.
    24:47 *🔄 Testing multi-agent workflow in Crew AI*
    - Setting up tasks in Crew AI for multi-agent workflow testing.
    - Tracking OpenAI API usage costs before running the workflow.
    - Comparing the speed of agent workflow in Crew AI to Autogen in a production scenario.
    41:21 *💰 Cost analysis of complexity in question answering*
    - Analyzing the cost of answering a two to three-hop question in Crew AI.
    - Expressing concerns about the high cost of search operations in Crew AI.
    - Discussing the potential challenges of using Crew AI in production due to cost implications.
    48:08 *🤖 Issues with Crew AI*
    - Crew AI lacks interpretability compared to autogen
    - Inconsistency in the output of multi-agent workflows
    - High cost for running workflows limits practical use cases
    50:03 *🛠️ Pros and Cons of Crew AI*
    - Easy setup for multi-agent workflows in Crew AI
    - Suitable for experimentation and prototyping, not for production
    - High cost and inconsistency are critical barriers to adopting Crew AI
    50:30 *🔮 Future of Multi-Agent Frameworks*
    - Multi-agent frameworks like Crew AI and autogen are limited to current model capabilities
    - As language models improve, the need for multi-agent frameworks may diminish
    - Custom workflows may be more beneficial for specific production applications
    Made with HARPA AI

  • @HassanAllaham
    @HassanAllaham 8 місяців тому +54

    This is one of the best videos I have ever seen related to AI. Let me list some points:
    1- Do not ever expect to have acceptable costs as long as you are depending on ClosedAI
    2- I am with you that such frameworks are not production-ready
    3- In my opinion, Such a framework can be useful if an easy way to modify the hidden prompts is available.
    4- Such a framework can be useful if there is a manager agent (the only one that needs to have a strong LLM) and the other agents depend on a small LLM/s.. breaking the task into small "easy" tasks should make this able to be done using small LLM/s.. (Open source small LLM/s).
    5- The custom tools availability of each agent should help (The agent who does the main search should be different from the agent who reads what is inside each result of the search) - Specialization leads to creativity - By this, we can add just for one agent a direction to force it to mention the URLs of info sources from which it built its answer..
    6- I think no agent flow would be able to be truly autonomous as long as it does not have a self-reflection mechanism i.e. self-improvement mechanism.
    7- When trials or expectations show that the result from one agent might be loooooong, I think it would be better to add an agent just to summarize this result and replace the original result by the summarization in the workflow history.
    Any way, Thanks for the good content. 🌹🌹🌹

    • @Data-Centric
      @Data-Centric  8 місяців тому +3

      Thanks for your comment, I'm glad you found it helpful. Broadly agree with the points you've raised here.

    • @MindVaccine
      @MindVaccine 8 місяців тому +4

      I really appreciate your comment and agree with your points, especially #1 and #4. I disagree with one of @Data-Centrics conclusions when he says that future is large-ai models that can just perform these tasks directly. While he may be right about the ability, that is not the issue. It will be one of costs. As you said, if I break the problem into simple tasks/agents, I get to use simple - and cheap - opensource LLMs. There are times when a large model is required, perhaps for planning or reporting agents, but you really want to keep the ClosedAI model usage to a minimum.
      Furthermore, I expect that the opensource models will improve as well, so that as they continuously improve, each of my agents will improve over time as well. Given enough advancement, the opensource AI models may be able to replace any usage of closedAI, saving even more money in the long term.
      And then looking to the future, I can expect a time when the cost of training/fine-tuning models will come down. I will actually be able to use my agents, and the data I collect in production, as training content to train/fine-tune my own models that are based on free opensource models. Now I can create models that will outperform any closed-ai model for even less money. And I still have the option of using a closed-ai should I need it.
      And for my last point, I know how enthusiastic the industry is about ChatGPT4, but my experience with it is not so positive. Yes, it has very broad knowledge, but I find it is horrible at following instructions. I have to wonder if those hyping it have actually used it for anything more than just chatting. I would be interested in other opinions on this...

    • @HassanAllaham
      @HassanAllaham 8 місяців тому +1

      ​@@MindVaccine ​ I agree with you.
      Related to "future is large-ai models" I do not think this is true but it may become true in the "far" future.
      This is because the essential principle that nowadays LLMs built on is the calculation of the probability of the next tokens (prediction of the next tokens) which is, in itself, can not be a real true imitation of human intelligence. We have to understand the real meaning of "probability". Mathematically this depends on the "resolution" of numbers expressing .. i.e. how many numeric characters after the period "." your GPU can use... This by itself depends on the way nowadays computation is processed = binary.
      I believe this may change in the future only when analog computers coupled with quantum-based hardware become publicly available. Only then you can expect to reach what is called AGI.
      When you read research you find the words: "We discovered" This means discovered by chance and as a result of trials. You don't read "We expected and calculated then checked that calculated expectation to find it true".. Well, why? ... Simply you cannot get the exact same answer from the same question.. (The result of probability)
      If the research keeps running in the same direction there will always be the need to have a huge computation power... There are 100s of very simple questions that you may ask GPT4 and you will get very strange wrong answers... There are many simple prompts in which you can jailbreak LLMs (Overcome the guards' locks)...
      The same above-mentioned questions you may ask to a "fine-tuned" small open-source LLM and you can get good answers.
      To be sure that the task made by LLM will be 99.95% successful you need to fine-tune this LLM for this task. While to fine-tune Large AI models like GPT4 you need a huge computation power, you can fine-tune a small open source LLm using a good consumer-grade PC. That is the real meaning of "Specialization leads to creativity" above and that's why I believe a specialized small LLM is better than a giant one.
      In my opinion, the only good usage of the large LLM is to generate good clean datasets needed to fine-tune the small open-source LLM. Other than this, the usage of something like GPT4 is not cost-effective at all.
      Although the benchmarking methods used to estimate LLM performance are not good enough to reflect the real results in the real apps, the big variations between the results of different LLMs can be used to say which is better... and one can find a small open-source LLM which can win the competition against GPT4 in a specific task.
      Concerning following instructions, that is why there is the "instruct" version of many LLMs.. which means that this version was pre-trained to obey the user's instructions.. This problem can be clear when you use LLM with increased window context size.. As much as the increase in window context size as much as the LLM will be "horrible" at following instructions... In my opinion, one of the weak bad benchmarking is using only one needle-in-the-middle when benchmarking such LLMs.. This problem in the case of GPT4 not related to the window context size since it is supported by a huge computation power but it is related to the way CloseAI manages to add the guarding locks.

    • @mickelodiansurname9578
      @mickelodiansurname9578 8 місяців тому +2

      Or alternatively simply TELL the model in the prompt to be concise. Simply prompt it to conserve tokens - or add in prompting to give it a budget it also needs to keep an eye on!

    • @MindVaccine
      @MindVaccine 8 місяців тому

      @@HassanAllaham Good to know I'm not alone. I really think these closedAIs are all just hype. I have been using TheBloke/Mistral-7B-Instruct-v0.2-Q8_0, and it will, given the right prompt, gives me consistent results and follows my instructions. Try the same with any of the large closedAI and you will get very inconsistent results that are all over the place. I just don't see any of the closedAIs being useful in production today. And those pushing these closedAI models, I don't think any of them have tried using them in a production environment - like I said, it is all just hype! And I'm getting these results without training. Once I have good training data, I don't see how any closedAI could compete far into the future. I again, there is nothing that says I can't use a model like GPT4 if I really have the need.
      By the way, just telling GPT4 to be concise is hopeless. The more they train it so as to not be jail broken, the worse the results are. In my opinion, they are not just training to refuse certain requests, they are training it to never give a consistent answer. Telling a model to not respond for some agenda, right or wrong, will, in my opinion, always degrade the models performance. My experience is that GPT4 is getting WORSE WITH TIME, not better. And I have not seen any benchmark for consistency of a model.
      And then there is the elephant in the room, context size. The larger the context, the better for almost any application. But use a large context with GPT4, and watch your tokens wash away. And the larger the context, the worse the performance. But I am starting to see some of the opensource models starting to get larger context windows without sacrificing performance. This is where I think opensource models will shine and outperform there closed source brothers. Just so many more people experimenting and trying out new methods.

  • @elcaribbeannomad2079
    @elcaribbeannomad2079 8 місяців тому +10

    I started using CrewAI one month ago and I detected all the same problems John detailed in the video, not having control over the flow of the software is hard to handle when you are use to design and implement complex solutions and algorithm, I think João (The founder of CrewAI) know it and thats why in the "Draft Gmail New Emails" example introduced lang graph to have a little more control over the flow but it doesn't solve the inefficient and unneeded token utilization. It is important to know that this problems are not exclusive of CrewAI, Autogen also suffer the same disease. The idea of developing with the future of LLMs in mind is something I didn't have in my radar, make total sense to me. Great video, keep working John!

  • @NoCodeFilmmaker
    @NoCodeFilmmaker 8 місяців тому +7

    Phi-3 through Ollama on a Pi5 using CrewAi, everything local and running acceptable
    You need to also add prompt logic to reduce unnecessary searches.

    • @Sergio-rq2mm
      @Sergio-rq2mm 8 місяців тому +1

      Ill have to check this out, but I had pretty terrible results running local models with crewai. I used LLAMA3, Mixtral, Mistral, etc. Never tried Phi-3, though I did try it with Langgraph and it couldnt consistently use the tools that were available to it.
      Im curious as to your experience and what you have tested with it

    • @daviddiligentful
      @daviddiligentful 8 місяців тому +4

      I have lots of problems with local llms with search tools. In fact, it never is able to use the search tools in the first place...

    • @shimotown
      @shimotown 7 місяців тому +1

      prompt logic where? there are decisions being made within the LLM chain already do you mean hardwiring logic with python?

  • @roccov1972
    @roccov1972 5 місяців тому

    Hi. This was incredibly helpful and useful. I've been watching a lot of UA-cam videos on multi-agent workflows recently, and you are by far the best. You explain things very clearly and well, and make leaning these concepts much easier than all the others. Thanks man. Please keep these kinds of videos coming.

  • @Max_Jean
    @Max_Jean 8 місяців тому +18

    This has been my experience with the frameworks. Going custom is where I’m likely going to end up

    • @rudomeister
      @rudomeister 8 місяців тому +1

      With custom tools and agents, everything is possible. Why not get a team to create their own group-hierarchy tree of agents after demand? It's possible to do. Giving agents sub-processing with the prompt "Hack NASA", with 100% subprocessing commands, yes, is possible as well. But it doesn't help blaming the old laptop running inside the TV-bench when CIA breaks into your door.. haha

    • @HassanAllaham
      @HassanAllaham 8 місяців тому

      @@rudomeister Dangerous funny example of one of the most powerful techniques that can be used to get the maximum power of AI

    • @Maisonier
      @Maisonier 8 місяців тому

      Custom how?

    • @Max_Jean
      @Max_Jean 8 місяців тому +1

      @@Maisonier frameworks are just a set of patterns and abstraction packaged up in a library. Not saying its easy but building a smaller scope library over using third party frameworks is pretty common in software engineering across the board/industries. Most frameworks come from what people are often building on their own. Someone just packages it up. Harrison from Langchain himself has said this for example.

    • @thatsalot3577
      @thatsalot3577 8 місяців тому

      @@Maisonier most of these frameworks just add extra boilerplate prompts which are appended to your queries for a specific behaviour, they make the flakey llms from text->text to proper predictable input and output formats

  • @Whiskey9o5
    @Whiskey9o5 8 місяців тому +3

    I just found you and subscribed. You came to the same conclusion I have. I dug into these frameworks and came to the same conclusion to build my own framework from the ground up for my use cases. The other part is that I use the Julia language.

  • @pizzaiq
    @pizzaiq 7 місяців тому +16

    I think this "agent swarms" idea is not the way to go, at all. It makes no sense. You don't need 15 agents to answer simple questions. I think the solution is to reduce use of LLMs for everything that can be solved with designated function calls. The way to go is to build functions that address desired use cases and use LLMs for summarization of results. If you combine knowledge databases, search engines, function calls and summarization, this can be done at a fraction of the cost. I can see a scenario where a single agent instance is well instructed to run a sequence of logical functions. Even if a few agents do it would be ok. In thisbcase you'd use a lot less tokens.

    • @Data-Centric
      @Data-Centric  7 місяців тому

      Thanks for the feedback.

    • @BillBaran
      @BillBaran 6 місяців тому

      That's what tools are for and using different LLMs for different agents. You have some agents that use 3 turbo, some 4o it Claude, and others to local models

    • @benrayward3437
      @benrayward3437 4 місяці тому

      Horse vs lambo

  • @________4682
    @________4682 5 місяців тому

    I think the high cost in your example is mainly due to the use of gpt-4.0-turbo and the use of a long set of agents against very simple questions that would not need multiple agents.
    You can use an open LLM if cost is an issue. If speed is your main concern, I don't know.
    Either way, we can learn from the open core of crewAI and build our own.

  • @oelberdomingos
    @oelberdomingos 8 місяців тому +1

    one of the problems are in the RAG model. It does not do a good reasoning. There is more models today, such as graph search (much better for "wikipedia" content), where you can use and other workflows to use a better reasoning. But I don't know if you can use with CrewAi

  • @carinebruyndoncx5331
    @carinebruyndoncx5331 8 місяців тому +1

    I feel the reasoning example is more an llm test, than a kind of automation where you would use a multi agent framework

  • @johnpaulgorman
    @johnpaulgorman 7 місяців тому

    Yes I hit the same issues and so many more when used with local ollama models. Looping issues, unable to find co-workers, lots of errors thrown by the framework based on language model response missmatch. The issues list is also growing rapidly with little sign of being resolved even for small issues or items already resolved. Lack of observability was a real pain. The lack of local llmops was also a pain, I linked to external llmops site but found it useless given the local
    Model outputs.

  • @tonyppe
    @tonyppe 7 місяців тому

    I have a similar simple set up while I play and learn. Start small, get it working. There are TON of things that arent common knowledge or are undocumented in crewai but it does work. And while each LLM gives different results even the same LLM will give different results for the exact same prompt, I have had successes with open source local LLMs.
    The power of it is when you start getting more complex config. This is where you now need to be a software dev and data science grad. I am neither of these. So I find it extremely difficult hurdle to get over and I get stuck a lot.

  • @JulianHarris
    @JulianHarris 8 місяців тому +3

    Seems to me the main issues highlighted are 1. Value over direct prompts 2. Prompt engineering issues (specifically trying to get it to provide references in this case) 3. Cost 4. Latency.
    1. You’re never going to get fresh results from an LLM: web search is essential for this
    2. To me I don’t see much difference in the prompt engineering issues than you’d normally have. I wonder what prompt would result in references included? DSPy tries to automate this issue btw
    3. Cost is on a downward trend. I’d love to know for example how claude/haiku performs, or llama-3-70b.
    4. Latency: For sure it is a batch / offline / task switch scenario totally agree. For now. Try it with groq though. The LPU is 10x faster.

    • @Data-Centric
      @Data-Centric  8 місяців тому

      You have some solid points, thanks for raising. I tried DSPy several months ago, I might revisit it. Just on your first point, there was a web search tool used by one of the agents.

    • @prodigroup
      @prodigroup 8 місяців тому

      Try Guidance “guidance-ai” and DSPy for comparison.

    • @prodigroup
      @prodigroup 8 місяців тому

      Try Guidance “guidance-ai” in comparison with DSPy.

  • @RafiDude
    @RafiDude 8 місяців тому +4

    Very good analysis! Could you please do a similar video on LangGraph?

  • @free_thinker4958
    @free_thinker4958 8 місяців тому +1

    Personally speaking, i found that langgraph adds more controllability if you want to use current agentic frameworks + i noticed that memory in agents plays a big role in self improving and learning from past experiences, i tried the memory feature in crewai and it is not that bad, also in autogen (teachability feature), i would like if you could do a similar video but with the agency swarm framework this time, it looks promising and has more controlability.

  • @benh8199
    @benh8199 8 місяців тому +3

    What are your thoughts on Agency Swarm by VRSEN? I haven’t used the framework yet but the author claims you can customize all prompts, including the framework prompts. In the author’s videos he also claims autogen and crewai are not good for production, whereas agency swarm is. Would love to hear your evaluation/opinion about this framework.

    • @littledovecitydust
      @littledovecitydust 5 місяців тому

      i have tried using Agency Swarm but it's having issues with OpenAI token limit, and just crashes every 10 minutes.

  • @gremlinsaregold8890
    @gremlinsaregold8890 2 місяці тому

    So a few months ago I would 'sort of' agreed with you on cost alone. However these frameworks have come a long way in a few month's. So with gpt4o-mini you are looking at about 45cents per million tokens. Also your agentic setup could be done with one agent and two tasks. With two tools... And you did not make the expected output explicit enough.

  • @ARCAED0X
    @ARCAED0X 8 місяців тому +3

    Hey Data Centric, great video break down you have here. When these gpt models first came out from open AI, I thought I should be doing everything with AI all of the workflows I want to get done. But in fact I was wrong. Really you need to know the ins and outs of your workflow and you need to a way to force the system to produce reproducible outputs. This comes in narrowing the problem space or providing the system with solutions to mimic. I’m thinking about making a video about this to demonstrate or a short blog post of sorts. I don’t believe AGI will be the solve to all our problems narrow solutions to our individual problems is the way.
    The best we could get with the next model from open AI is faster , cheaper inference + Better understanding of prompts so that we may simply declare what we want the AI to do like a person and pair this up heavily with automation of the parts of the system we know and understand well. Think of AI as a small bridge and not necessarily a car to take you somewhere

  • @randyh647
    @randyh647 8 місяців тому

    In my experience it has trouble with the third round of adding features to existing code, then I switch to a regular ai and just give it small tickets. Also playing around with different models may work too!

  • @Bana888
    @Bana888 8 місяців тому +1

    Great walkthrough. Really like the level of details. Like the analysis and recommendations at the end. Awesome video. Keep up the great work.

  • @madhudson1
    @madhudson1 8 місяців тому +1

    Great video. Had success with some hobby projects using langgraph. Experimented with crewai, but felt exactly like you mentioned regarding loss of control

  • @JCMShadow1994
    @JCMShadow1994 8 місяців тому +1

    Thanks for covering this topic. Most discussions aren’t honest about their usefulness in prod

  • @MarkoTManninen
    @MarkoTManninen 8 місяців тому

    I have been thinking to utilize decison making intermediate agents. At the moment, if agent needs to decide if the knowledge is general, what tool to use, construct arguments, tell reasoning etc. It seems to be too much. Possibly cheaoer models, maybe even free local models can do a lot of the repetitive tasks and produce simple control flow logic so that unnecessary steps can be skipped.
    I tested agents 10 months ago amd concludes that they were for deep pockets. Already the development epoch becomes expansive, not to speak of production. Whence I do count on local models getting on par with gpt4 this year. Then best models can do the heavy lifting and the inference that requires state of the art reasoning.

  • @andrewowens5653
    @andrewowens5653 8 місяців тому +5

    Thanks, that was a great explanation and tutorial. I could literally write 10,000 words in response, but that would not be practical. I've studied all the trends in AI since 1977, but I've also spent 25 years studying cognitive neuroscience and related subjects, so I have a different take on the way AI should be implemented. My own personal research involves the creation of a brain inspired cognitive architecture. I'm considering the possibility of putting a very small language model at the core. The system would be designed to learn from experience, instead of being force fed the internet. Anyway, my experience with crew AI has been anything but good so far. Have you considered using something like Ludwig for fine-tuning and LoRAX for serving on your local system? If you could get that to work, you could save 95% of your expensive ChatGPT calls. You could use ChatGPT to create a custom Training Data Set for back propagation or PEFT of a smaller open source model. I'm looking forward to your future content. Thanks again.

    • @ayoubfr8660
      @ayoubfr8660 7 місяців тому +1

      A conversation with you regarding ai and few ideas would be invaluable man!

  • @niceplace123
    @niceplace123 7 місяців тому +1

    Hm.. I just fed your questions to Mistral's regular chat with large model without any agents, and it provided the answers. Am I doing something wrong?

  • @airobsmith
    @airobsmith 7 місяців тому

    I tend to agree with the future of agents. Bigger closed source models are going to get smarter and will probably out-perform multiple agents in an agentic workflow. But the rise of open source and local models with fine-tuned specialisation and hallucination reduction techniques could still perhaps make a competitive solution that runs much cheaper and more privately.

  • @creativelearnersacademy7588
    @creativelearnersacademy7588 29 днів тому

    Skills issues? I did build and deploy a working CrewAI agents as a Flask API

  • @jlcasesES
    @jlcasesES 3 місяці тому

    🎯 Key points for quick navigation:
    00:00 *📝 Introducción y propósito del video*
    - Explicación inicial de por qué Crew AI no es adecuado para producción.
    - Descripción de la metodología de múltiples agentes para responder preguntas multihop.
    - Invitación a suscribirse y comentar experiencias.
    02:07 *❓ Comprensión de preguntas multihop*
    - Definición y origen de preguntas multihop.
    - Ejemplos prácticos de descomposición de preguntas complejas.
    - Importancia de conocimientos previos para obtener respuestas completas.
    06:39 *🗺️ Diagrama del flujo de trabajo de agentes*
    - Descripción detallada del flujo de trabajo de agentes en Crew AI.
    - Roles de cada agente: planificación, búsqueda, integración y reporte.
    - Interacción y colaboración entre agentes para resolver preguntas multihop.
    10:25 *💻 Configuración del script Python en Crew AI*
    - Explicación de la configuración del script Python.
    - Manejo y asignación de claves API esenciales.
    - Elección del modelo GPT-4 Turbo y ajustes de parámetros para mayor precisión.
    14:22 *🔧 Definición de agentes y tareas*
    - Configuración de roles y objetivos específicos de los agentes.
    - Asignación y descripción detallada de tareas para cada agente.
    - Uso de herramientas y contexto necesario para la ejecución de tareas.
    21:24 *🚀 Finalización y ajustes de Crew AI*
    - Ajustes finales en la configuración de tareas y contextos.
    - Definición de herramientas específicas para cada agente.
    - Preparación y finalización del setup para ejecutar Crew AI de manera eficiente.
    23:29 *🤖 Organización de Agentes en IA*
    - Comparación entre ejecución secuencial y jerárquica de agentes.
    - Implementación de un gestor LLM para asignar tareas a diferentes agentes.
    - Acceso del gestor a descripciones de tareas y objetivos generales.
    24:47 *💰 Seguimiento de Costos de API*
    - Monitoreo del uso de la API de OpenAI durante las pruebas.
    - Costos iniciales y adicionales asociados a las consultas.
    - Importancia de evaluar la eficiencia económica de los flujos de trabajo.
    26:07 *🛠️ Prueba de Flujo de Trabajo Multi-Agente*
    - Ejecución de una consulta sobre la moneda utilizada en la muerte de Billy JS.
    - Delegación de tareas al agente Searcher para identificar información relevante.
    - Análisis detallado de los pasos lógicos y resultados obtenidos.
    32:36 *⚙️ Evaluación de Eficiencia y Costos*
    - Análisis del costo de 30 centavos por una consulta de dos a tres pasos.
    - Discusión sobre la viabilidad del uso en producción debido a los altos costos.
    - Comparación con alternativas como autogen en términos de eficiencia.
    37:01 *🔍 Problemas con Citaciones y Transparencia de Agentes*
    - Falta de citaciones a pesar de las indicaciones para incluirlas.
    - Dificultad para identificar qué agente maneja cada tarea.
    - Comparación con autogen y su mejor manejo de la distribución de tareas.
    39:18 *📈 Prueba con Preguntas Más Complejas*
    - Intento de responder a una pregunta más desafiante sobre McDonaldization.
    - Continuación de problemas de eficiencia y falta de citaciones.
    - Observaciones sobre el comportamiento repetitivo y consumo de tokens.
    47:55 *🛠️ Problemas de Crew AI para producción*
    - Tiempo de ejecución demasiado largo comparado con otras herramientas.
    - Falta de interpretabilidad en la gestión de agentes.
    - Inconsistencias en el funcionamiento y alto costo operativo.
    50:03 *✅ Aspectos positivos de Crew AI*
    - Fácil de configurar flujos de trabajo multi-agente con poco código.
    - Útil para experimentación y prototipado.
    - Rapidez en la implementación inicial.
    50:30 *🔮 Futuro de los frameworks multi-agente*
    - Los modelos de lenguaje avanzarán a un punto donde un solo agente será suficiente.
    - Recomendación de desarrollar flujos de trabajo personalizados para mayor control.
    - Dudas sobre la viabilidad a largo plazo de frameworks como Crew AI y Autogen.
    Made with HARPA AI

  • @TestMyHomeChannel
    @TestMyHomeChannel 8 місяців тому

    Great educational video about CrewAI. As for the high costs, what if we use a local LLM like llama3? I assume these types of genetic application do not require sophisticated reasoning and Llama 3 could be sufficient. Second, as for missing some features, what if some knowledge programmers like you or others could improve CrewAI to add those needed for following the process and displaying the citations. Thanks for the video.

  • @tigreytigrey8537
    @tigreytigrey8537 6 місяців тому +2

    Unfortunately its just not possible to RELIABLY integrate these into complex business processes. You need to be able to make SMALL TWEAKS to the process without having to play the price is right with prompts. Cool concept. FAR from ready. Were not even CLOSE. the fact that people think so is overly optimistic. Ai right now is good for ONE SINGLE ACTION and alot of times it not even good at THAT since there is no way to LOCK IN a successful process.

  • @yoyartube
    @yoyartube 8 місяців тому +2

    You can set the model to streaming and you'll start to see the result output sooner.

    • @ShaunPrince
      @ShaunPrince 8 місяців тому +1

      You don't need the streaming output for AI agents. That gimmick is just for people that use chatbots. Disabling streaming output will give you more more performance and use less resources. Also less code to deal with.

    • @yoyartube
      @yoyartube 8 місяців тому +1

      @@ShaunPrince I absolutely said streaming is needed in this case but you are right it isn't.
      I didn't know streaming was a gimmick; I will tell my users streaming is a gimmick and they should look at a blank page while the completion loads.
      Which performance is worse when streaming is enabled? Which benchmarks are you referring to exactly? What is the mechanism that causes performance (whatever that means in this case) to be degraded? Please help I don't want to lose performance.
      It's also good that I can take out the 2 whole lines of code to enabled steaming, this will be better.
      Thanks for this useful input.

  • @nicocesar
    @nicocesar 8 місяців тому

    Very honest review and I agree with most points. The conclusion is spot on. As humans we have organization of work with standardized process and I feel these Framework are matching that and wrapping agents around it. I wonder if we will find another way to organize work in the future with more powerful agents

  • @KEVALKANKRECHA
    @KEVALKANKRECHA 4 дні тому

    how to use hierarchical process in multi agent

  • @avg_ape
    @avg_ape 8 місяців тому

    Outstanding review. Thank you. Yes, the frameworks are nice to learn from and test proof of concepts (POCs). However, I see the frameworks to be analogous to 'no-code' app platforms - easy to deploy but a challenge to scale and improve.

  • @anishneunaha6312
    @anishneunaha6312 5 місяців тому

    With Llama 3.1, I would love to see you test it with the open source model, instead of OpenAI

  • @strength9621
    @strength9621 8 місяців тому +2

    Just the type of channel I’ve been looking for

  • @tonycarter8440
    @tonycarter8440 7 місяців тому

    New subscriber here, great content! I'm evaluating several other Agent frameworks, your insight was very valuable.

  • @JulianHarris
    @JulianHarris 8 місяців тому +2

    What about using Claude Haiku? It’s a bunch cheaper I think

    • @gani2an1
      @gani2an1 6 місяців тому

      and less clever

  • @m12652
    @m12652 7 місяців тому +1

    2:51 why would you need to know who the first president was to find out who the second was? Couldn't you just ask "who was the second president...?"?
    And why would you need to know which part of the UK to know what currency it uses... they all use pond sterling?

  • @darkesco
    @darkesco 7 місяців тому

    I need to find a good AI Agent framework. I successfully accessed comfyUI through API and want to have agents generate images for me as well as evaluate them for quality. I started with CrewAI, but curious to know which agentic software is better for this type of project.

  • @brandonwinston
    @brandonwinston 8 місяців тому +4

    I'm going with Langgraph myself after looking into it and Autogen and CrewAI. I loved Autogen though!

    • @Data-Centric
      @Data-Centric  8 місяців тому +2

      Haven't used Lang graph yet myself, but I'll check it out.

    • @andydataguy
      @andydataguy 8 місяців тому +1

      Langgraph is awesome. Its tops for orchestration ​@Data-Centric

    • @jarad4621
      @jarad4621 8 місяців тому

      Look into agency swarm, now that it can do other models its going to be epic

  • @vastvitamins1966
    @vastvitamins1966 8 місяців тому

    Great video. I think most of the problems you had may be due to a lack of prompt engineering. For example specify the type of output you want such as output length should be no more than what you desire. For example a paragraph this should save on tokens. Also when dealing with agents it's important to repeat important info a few times. But as I said earlier this is a great video. A good topic to bring to people's attention . Thanks for sharing.

    • @Data-Centric
      @Data-Centric  8 місяців тому +1

      Thanks, appreciate the insights here!

  • @MartinBlaha
    @MartinBlaha 8 місяців тому

    I stopped experimenting with CrewAI after my initial experiments which produced over USD 10 of API costs on just one afternoon. Sure, it's probably me not knowing enough about CrewAI. But I think my biggest issue is the lack of transparency about how the agentic process is being executed. Again, I'm just a beginner. But for now, it's just a black box to me.
    As next, I'll be experimenting with AutoGen and local LLMs ;-)
    Thanks for sharing your thoughts.

    • @gani2an1
      @gani2an1 6 місяців тому +1

      Same thing here. Any progress on your end since then?

    • @MartinBlaha
      @MartinBlaha 6 місяців тому

      ​@@gani2an1 yes. I started using LangChain and stick to it. I'm not an expert, but with the help of Claude 3.5 Sonnet I got my prototype working and develop it now further. So for now, it's just LangChain and pure Python - for me it works and I understand what's happening under the hood.

  • @MrGluepower
    @MrGluepower 7 місяців тому

    The use case of multi-hop questions is such a niche use case and you are presenting it as core feature for multiagents. Sure, that use case might be challenging but in real corporate world usual workflows as not of that format. Each step is simple but there are 10,000 steps.

    • @Data-Centric
      @Data-Centric  7 місяців тому

      We are building automated workflows for clients and we are not using crew AI. For your simple steps, RPA or something like Zapier can work. The use is niche, but it was a way to showcase what these agentic workflows can do without requiring access to corporate data.

  • @alexemilcar6525
    @alexemilcar6525 7 місяців тому +1

    Great explanation video, but bad use case for multi agents framework

  • @gileneusz
    @gileneusz 7 місяців тому

    Thank you for great feedback on this framework. They are getting better and better, but llms quality are bottleneck... we need true AGI, but that would be maybe next year 😆

  • @vroep6529
    @vroep6529 8 місяців тому +2

    in my experience claude is much cheaper / better than GPT 4, sonnet has consistently given me great results and even haiku too. I however am running it on a custom python solution where it recursively breaks down and summarizes what it is doing along the way. By using these smaller cheaper models I am able to do many parallel prompts which I believe creates a better end output. It was for instance able to create a web scraping module test it, and later use it by itself to draw information related to another question (this cost under 10 cents in total). it does have problems still and it is sort of capped to doing wmall projects / tasks as it debugs stuff line by line (it tries running the program and captures the output, so it can only fix ONE error at a time, so as problem complexity increases the cost/time scale is exponentially increasing). Interesting video I appericiate your content.

    • @vroep6529
      @vroep6529 8 місяців тому

      On another important note, I get much better responses when I provide JSON objects and demand responses as JSON objects, only in the end it will summarize it to actual natural language. It seems the models have a much easier time understanding what exactly to do if they reformat your task into a JSON object themselves, this way it can interconnect stuff that would be hard to hardcode, for instance it might need an action which is not available yet, and maybe something is not a question but a task intead etc.

  • @justrobiscool4473
    @justrobiscool4473 8 місяців тому

    Have you tried the maestro framework?

  • @augmentos
    @augmentos 8 місяців тому

    Thank you! More honest takes needed in agentic space!!!!

  • @Mohamed-sq8od
    @Mohamed-sq8od 7 місяців тому

    if you base you whole opinion on crewai on the cost of tokens of openai, use a local model or ollama :p

  • @lavamonkeymc
    @lavamonkeymc 8 місяців тому

    Did you text with Langgraph??

  • @stuj1279
    @stuj1279 7 місяців тому +1

    I disagree that you need to know the first President of Namibia to answer the question of who succeded him/her. You just need to know who the second President of Namibia was.

  • @trsd8640
    @trsd8640 8 місяців тому

    Thank you a lot for this video. Very important!

  • @federico-bi2w
    @federico-bi2w 4 місяці тому

    Thank you! really helpful! 😄

  • @mr.hackathon
    @mr.hackathon Місяць тому

    This is a very good analysis!

  • @jarad4621
    @jarad4621 8 місяців тому +3

    Looks good, would like you to test Agency Swarm which entire point is to be production capable, it works but not sure if there yet. Ive also heard agents aren't quite there yet overall;, inefficient, their use cases are very specific currently especially where costs dont matter as much so not viable with the top paid models doing simple tasks like research (you would always use a use model system, Opus CEO wtuih swarm of Jaiku for example, right person right job), gpt4 for all is a bad user choice but still shows a point, however combine agents plus a free local but still good model like phi3 or llama 8b and one individual could automate insane quantities of work at no cost, thats where the magic, is not gpt4, or maybe only if the manager is a top model to oversee 1 complex section and the rest are haikus or cheap models, as the models are not as good ive seen a video about how performance can reach close to gpt4 levels got smaller models when placed into an agentic systems and patters with reflection, feedback, collaboration i forget the rest but processes that ensure quality

    • @Data-Centric
      @Data-Centric  8 місяців тому

      Thanks for your comment. I haven't tried Agency Swarm yet, I'll look into it. The last point your raised is interesting a mixture of models with a set workflow of collaboration, reflection, feedback.

  • @jasonb_
    @jasonb_ 8 місяців тому +1

    My advice, is to use sequential process (and try to minimize prompts for goals/tasks) for Q/A style applications. Hierarchical seems buggy and yes not ready for production, Crew is quite a fresh project with plenty of space to grow.

    • @Data-Centric
      @Data-Centric  8 місяців тому

      Only issues with the sequential approach is implementing feedback loops.

    • @jasonb_
      @jasonb_ 8 місяців тому

      @@Data-Centric You can use delegation and max_iter in the agent setup with sequential.

    • @williamwong8424
      @williamwong8424 8 місяців тому +1

      @@Data-Centric do u mean because it's sequential, you can't give feedback during middle of the process or when u give feedback at the end of the process, it is too late? when u have to run the agents again, it has to start all over

  • @st.3m906
    @st.3m906 8 місяців тому +2

    I like Crew AI to get a low fidelity idea of how the system will work before I make it and to also get an idea of where things will go wrong so I can fix it in production.
    Secondly, it's a pretty good tool for copywriting imo. I don't like the use of tools that much - it makes it more stupid from my expiernce.

  • @DavidYang-kd8qr
    @DavidYang-kd8qr 8 місяців тому +2

    Very informative ❤

  • @ilanlee3025
    @ilanlee3025 7 місяців тому

    Excellent video. Liked and subscribed

  • @gileneusz
    @gileneusz 7 місяців тому

    43:12 if it would be scraped by JINA AI it would be smaller and cheaper

  • @roccov1972
    @roccov1972 5 місяців тому

    One more thing: I've been waiting for my access to CrewAI since April (it's now August). I'm still on their waitlist, with no updates since signing up. Any idea why I can't get access? Thanks again.

  • @machinelearning6817
    @machinelearning6817 4 місяці тому

    Subscribed to you my friend !!!!

  • @nikosnos790
    @nikosnos790 8 місяців тому

    that's all good and very detailed, as far as building that system, but what is the answer to the question of the TITLE ? you are not staying away mate, you are building agents with it, i will suggest you to be more clear with your titles next time, just give us what the title say or change it. hope this is helpful, keep it going

  • @AI-Wire
    @AI-Wire 7 місяців тому

    There is a problem with your logic in the first question. One does not need to first know who the first president of Namibia was in order to know who succeeded him. One can simply learn who was the second president.

  • @BradleyKieser
    @BradleyKieser 8 місяців тому

    Always really good content from this guy, he's brilliant.

  • @6lack5ushi
    @6lack5ushi 7 місяців тому

    I love this so much!!!
    Is this a bad answer to the Napoleon “Napoleon occupied the city where the mother of the woman who brought Louis XVI style to the court died in **1804**.
    Output from operation 0002: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. She was born an Archduchess of Austria and was the youngest daughter of Empress Maria Theresa and Emperor Francis I.
    Marie Antoinette's mother, Empress Maria Theresa, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770.
    Output from operation 0000: Marie Antoinette, the Queen consort of King Louis XVI of France, is known for bringing a more extravagant and luxurious style to the French court during her husband's reign. Born an Archduchess of Austria, she was the youngest daughter of Empress Maria Theresa and Emperor Francis I.
    Empress Maria Theresa, Marie Antoinette's mother, died on November 29, 1780, at the Hofburg Palace in Vienna, Austria. She died of natural causes at the age of 63, having ruled the Habsburg Empire for 40 years. Her death greatly affected Marie Antoinette, who was very close to her mother despite living in France since her marriage to Louis XVI in 1770.
    Napoleon occupied the city where Marie Antoinette's mother, Empress Maria Theresa, died in **1804**.”
    Off by a year??? But this is exactly why we built our own in house solution and don’t rely on crew ai or any other multi agent framework

  • @pollywops9242
    @pollywops9242 7 місяців тому

    This seemed like cumbersome to me but 0.30 cent for 1(!) prompt omg
    I'm just using models that are less great because I'm just messing around and can't justify spending more and more on it

  • @st.3m906
    @st.3m906 8 місяців тому +2

    I think you'll like lang graph is you find this as the main issue wtih Crew AI

    • @Data-Centric
      @Data-Centric  8 місяців тому +1

      I'll definitely try lang graph.

  • @ingo-eichhorst
    @ingo-eichhorst 8 місяців тому

    Yea. I do not like the magic that is going on behind the scenes. Autogen, CrewAI and LangGraph do not even have a good verbose logging so you would have a chance to understand it. And LangSmith is a looked in nightmare.

  • @6lack5ushi
    @6lack5ushi 7 місяців тому

    The billy Giles answer is interesting there is a billy giles here who died in America New York. He died in New York “He died at Mount Sinai Hospital in New York City on Sept. 25, 2021 after an eight year struggle with progressive anti-MAG peripheral neuropathy”
    But I’m guessing this is the wrong billy giles

    • @6lack5ushi
      @6lack5ushi 7 місяців тому

      At the same time Google will say Belfast?! So truth becomes the crux in these questions

  • @iamdanfleser
    @iamdanfleser 8 місяців тому +1

    There is an option in vs code to auto save files.

  • @Pure_Science_and_Technology
    @Pure_Science_and_Technology 8 місяців тому +6

    The usual black box disaster. Always code your own.

  • @DonG-1949
    @DonG-1949 8 місяців тому

    Some of those multi hop question examples are ungrammatical, borderline nonsensical. Was that paper peer reviewed

  • @GregPeters1
    @GregPeters1 8 місяців тому

    Well done

  • @michaelthompson8251
    @michaelthompson8251 7 місяців тому

    consider.
    use cases that would work given crewai

  • @RetiredVet1
    @RetiredVet1 5 місяців тому

    CrewAI has made some significant changes. It would be interesting to see if your opinion has changed or if they fixed any of your concerns.

  • @watchdog163
    @watchdog163 7 місяців тому

    It's you! You're going to create Skynet! 🤣

  • @lucatamburrano4611
    @lucatamburrano4611 18 годин тому

    Thank you for this video that pinpoints some issues of crewAI. I think that some problems have been addressed lately in the framework with the concept of flows.
    Here the link to the video that explains how flows work: ua-cam.com/video/8PtGcNE01yo/v-deo.html
    I would love knowing your feedbacks on that since crewAI is the only framework I'm still testing (the learning curve for these frameworks is steep if you are trying to use them for complex real-world use cases) and I must say that now that flows are there you have a better control on what's going on that I agree is one of the biggest issues of these new generation of frameworks.
    Thanks again for this video (imagine that I ended up here looking for some info on creating an answers merger in crewAI :D )

  • @madhudson1
    @madhudson1 8 місяців тому

    It shows promise, but found it too unreliable, especially 'tooling'

  • @ThomasTomiczek
    @ThomasTomiczek 8 місяців тому +1

    It is not prohibitively expensive - it is naive.
    First, even using OpenAi you do not need to use the Turbo 4 model for everything. Web search? Use 3.5 to extract the information from the result - tool use and data preparation is a typical case for lower models.
    Second, your use of the planner is naive - every step should have corrective reviews, even the planner. Is the resulting plan something that can be optimized?
    But generally - yes, we are not there yet, in functionality and cost. If CrewAi is handing a complete context over - that is a fundmental issue.

    • @Data-Centric
      @Data-Centric  8 місяців тому

      Surely adding additional agents to review after every step will increases the cost, latency, and probably the reliability of the overall workflow? I agree you probably could use some smaller models for parts of the workflow. However. I did try using 3.5-turbo initially and had subpar results, specifically with the web search agent.

    • @BoominGame
      @BoominGame 8 місяців тому

      Yes but say you are testing while developing it will cost you hundreds of dollars just in test calls. I mostly use local AIs.

    • @ThomasTomiczek
      @ThomasTomiczek 8 місяців тому

      @@BoominGame Oh, I agree - except that local AI has SERIOUS problems. Crazy low content window, no complex prompts in most cases. Three is a Moat actually. Anyhow, the idea that AI - now - would be cheaper than a human equivalent makes little sense. Over time, yes, but now independence and "works 24/7" are core elements. Within a year or two, prices will be down another 90% - but imagine even how much a minimum wage worker gets.
      But really, we need decent local AI that can follow complex prompts and handle a 100k context without falling apart. And yes, it can take 2x48gb ram - "local" does not have to mean low end (which soon is higher thanks to DDR7 coming in way higher per chip capacities). But Llama was quite - well, 8000 context is not cutting it.

    • @BoominGame
      @BoominGame 8 місяців тому

      @@ThomasTomiczek Yes but to run samples and test the mechanics with smaller sets ot whatever you can iterate as much as you want, I have 2 x RTX A4000 it's also quite fast to work with, then down the line if customers want to use an API and pay for it, bless their little hearts, but I ain't paying for testing my developments.
      Also I am using Mixtral llava3 and before that solar, they are not that bad. As a rule of thumb they are all equally dumb in the same way, I don;t see major differences besides the context indeed. But those guys have 50000 gpus, not 2...

    • @ThomasTomiczek
      @ThomasTomiczek 8 місяців тому

      @@BoominGame Well, I found that most of my prompts aren ot wroking outside of OpenAi so far - and I can not make them simple enough to work. I hope some of the fine tuning focuses on that - as well as long context training.

  • @ilanlee3025
    @ilanlee3025 7 місяців тому

    Funnily enough as an ADHD who hates reading a lot I can give you a tip to save a lot of money. You can prompt the chat agents how long you want the answer to be. You can say "keep the answer to 1 paragraph"

  • @bambanx
    @bambanx 4 місяці тому +1

    Flowise

  • @MiguelCayazaya
    @MiguelCayazaya 7 місяців тому

    agent a.i. is at toy level.

  • @DarxKies
    @DarxKies 8 місяців тому

    Your hands movement give the impression of lashing at the viewer and it is distracting. Try putting the camera higher or further away.

  • @TSKTECHIN
    @TSKTECHIN 8 місяців тому

    totally agree !! @crewai is no good for production, very inconsistent results, my experience till now is not so great and when using GPT4 we can run a huge bill.. 😞, Thanks for the honest review 🙏 of this tool which has long way to go.. 😞 😛
    when using gpt-3.5 model, the results are inconsistent and throws error when running on different data sets or param, just no way of debugging the error..
    ```
    File "", line 1, in
    File "C:\ProgramData\miniconda3\Lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f576'
    ```