EXTRA NOTES: - While I do crop pauses from the video, I intentionally did not crop the inference. So, you'll notice the inference on Mac is much slower than on GPU. - I wrongly say "Gained Adversarial Network" instead of "Generative Adversarial Network" in a few places. any other bugs/errors, lmk
@@TheLokiGT I welcome the pedantic comments! Thanks I suppose the Mac is an integrated CPU/GPU, but yeah it would have been clearer for me to just say CUDA
@@TrelisResearchI also uploaded a short video the other day just to show that the sutup you showcased is not slow, even on a modestly specced machine. I think it was slow on yours cause you were doing screen capture, which is, counterintuitively, a resource hog. For some reasons the comment was deleted by youtube.. Let's try again: ua-cam.com/video/C1OGQnlo1M0/v-deo.html (not a disguised ad, there is basically nothing on my channel, I don't make youtube contents).
@@TheLokiGT cheers for testing this, I appreciate it. I just tried again now without recording and - perhaps got a marginal speedup - but not much . I'm on a Mac M1 with 8GB (and running uses almost 7 GB). What specs have you?
@@TrelisResearch M1 Max, 32Gb. In the asitop terminal window, you can see how and how much my resources are used. The important thing, especially on a M1/8Gb is that no parts of the models (both the main llm and parler) get offloaded upon swap. Otherwise the whole machinery gets a massive slowdown. Thanks for the awesome contents ;-)
Excellent presentation. Eventually, people will be able to select parts of conversations to merge into a shared knowledge graph. A digital platform that could merge millions of simultaneous conversations could become a form of collective intelligence.
Ok, just for reference, tested with both parler and melo on my mac studio (M1 Max, 32Gb URAM, 24-cores GPU). Over 60% of my URAM was occupied before I tried to run the S2S pipeline, and I didn't want to quit anything. Using Gemma2-2b-it (from HF's MLX community). Parler: too slow, it stops in the middle of answers for 2-3 seconds. Melo: works fine. No issues whatsoever, it's very quick to respond. All in all, the T2S part is fine, while the S2T sometimes misunderstands words, but that's on whisper (no it's not on my pronunciation, since no human being misunderstands what I say).
can you tell me which one should i buy please repo of 39usd advanced i like your practices, but first i want to test m and then imlement overall fintech and healthcare, robotics and computer vision or go with the 100usd gig in what case which one packege of repos even if there are 4 -400usd, i want to start,my main fear is that i dont know folders structures outside of a colab they looks divided test, train, etc, data, but dont get the whole understanding,
Howdy, check the page for each product/repo on Trelis.com . If you have more specific questions, you can post there. It’s good feedback. I’ll maybe try to add a video overview to each of the repos to help out. Lemme see if I can do that later this week.
Also, the easiest way to see the content is to find the repo product page on trelis.com and scroll down to see the included videos. They show exactly what’s there for each vid
Hey @flethacker, yes I agree that you can get realtime ASR and TTT systems but the real latency issues occur with TTS. Although if you see, in the video, there's also a segment using only GPUs for inferencing, that's much faster than running everything on MAC. TTS requires the most processing in general, because we are generating a richer data from a non richer data. Also fyi, the other issues with using remote GPUs is the communication lag and network latency. For solving these we mentioned ways like using a GPU closer to the source and preferring UDP over TCP. I hope that helps.
@@sharmarohan03 i looked at meloTTS yesterday. The CLI is slower than the web gradio demo page they provide. So the issue here is the CLI is not optimized. It needs to be streaming the audio as it creates it .
EXTRA NOTES:
- While I do crop pauses from the video, I intentionally did not crop the inference. So, you'll notice the inference on Mac is much slower than on GPU.
- I wrongly say "Gained Adversarial Network" instead of "Generative Adversarial Network" in a few places.
any other bugs/errors, lmk
Not to be pedantic, but the inference on Mac *does* happen on GPU. Just not on a cuda-compatible GPU.
Both Pytorch/MPS and MLX do leverage the GPU.
@@TheLokiGT I welcome the pedantic comments! Thanks
I suppose the Mac is an integrated CPU/GPU, but yeah it would have been clearer for me to just say CUDA
@@TrelisResearchI also uploaded a short video the other day just to show that the sutup you showcased is not slow, even on a modestly specced machine. I think it was slow on yours cause you were doing screen capture, which is, counterintuitively, a resource hog. For some reasons the comment was deleted by youtube.. Let's try again: ua-cam.com/video/C1OGQnlo1M0/v-deo.html (not a disguised ad, there is basically nothing on my channel, I don't make youtube contents).
@@TheLokiGT cheers for testing this, I appreciate it.
I just tried again now without recording and - perhaps got a marginal speedup - but not much . I'm on a Mac M1 with 8GB (and running uses almost 7 GB). What specs have you?
@@TrelisResearch M1 Max, 32Gb. In the asitop terminal window, you can see how and how much my resources are used.
The important thing, especially on a M1/8Gb is that no parts of the models (both the main llm and parler) get offloaded upon swap. Otherwise the whole machinery gets a massive slowdown.
Thanks for the awesome contents ;-)
Thank you for continuing to provide informative, in-depth content. I've learned so much from your video. Keep it coming!
Excellent presentation. Eventually, people will be able to select parts of conversations to merge into a shared knowledge graph. A digital platform that could merge millions of simultaneous conversations could become a form of collective intelligence.
Ok, just for reference, tested with both parler and melo on my mac studio (M1 Max, 32Gb URAM, 24-cores GPU). Over 60% of my URAM was occupied before I tried to run the S2S pipeline, and I didn't want to quit anything. Using Gemma2-2b-it (from HF's MLX community).
Parler: too slow, it stops in the middle of answers for 2-3 seconds.
Melo: works fine. No issues whatsoever, it's very quick to respond.
All in all, the T2S part is fine, while the S2T sometimes misunderstands words, but that's on whisper (no it's not on my pronunciation, since no human being misunderstands what I say).
Thanks
Gonna get m2 Air next month. Just about right time :)
Hey trelis can you share the research paper at 6:45
sure, just added in the description
can you tell me which one should i buy please repo of 39usd advanced i like your practices, but first i want to test m and then imlement overall fintech and healthcare, robotics and computer vision or go with the 100usd gig in what case which one packege of repos even if there are 4 -400usd, i want to start,my main fear is that i dont know folders structures outside of a colab they looks divided test, train, etc, data, but dont get the whole understanding,
Howdy, check the page for each product/repo on Trelis.com . If you have more specific questions, you can post there.
It’s good feedback. I’ll maybe try to add a video overview to each of the repos to help out. Lemme see if I can do that later this week.
Also, the easiest way to see the content is to find the repo product page on trelis.com and scroll down to see the included videos. They show exactly what’s there for each vid
Good job! tks
why is this so slow? you can get much faster , almost realtime, transcribing with whisper, and you can certainly get faster llm processing.
Hey @flethacker, yes I agree that you can get realtime ASR and TTT systems but the real latency issues occur with TTS. Although if you see, in the video, there's also a segment using only GPUs for inferencing, that's much faster than running everything on MAC.
TTS requires the most processing in general, because we are generating a richer data from a non richer data.
Also fyi, the other issues with using remote GPUs is the communication lag and network latency. For solving these we mentioned ways like using a GPU closer to the source and preferring UDP over TCP.
I hope that helps.
@@sharmarohan03 i looked at meloTTS yesterday. The CLI is slower than the web gradio demo page they provide. So the issue here is the CLI is not optimized. It needs to be streaming the audio as it creates it .