- 136
- 239 190
FiveBelowFiveUK
United Kingdom
Приєднався 1 лют 2024
We will demonstrate and guide you through new technologies available to artists and creatives that can be used today. De-mystifying transformers and artificial intelligence so that you can spend more time expressing your Art and taking less time to make your ideas into reality!
Available for Collaborations, Check out my Playlist of Recommended AI UA-camrs too !
If i missed anyone, please tell me so i can add them to the list ;)
Available for Collaborations, Check out my Playlist of Recommended AI UA-camrs too !
If i missed anyone, please tell me so i can add them to the list ;)
"Golden Son" Mochi1-Preview
This is mostly a single prompt, rendered in 100 steps, 3 second segments on RTX4090 with ComfyUI and Mochi1-Preview, no video enhancement or upscaling was done.
Used Shuffle Studio for the main edit, then added some intro/outro clips using other prompts from the Golden Surfer prompt list, created in the last video tutorial.
Song: JXHNSXNT - "Unleashed"
Workflow Pack: civitai.com/models/886896
Mochi1 Article: civitai.com/articles/8313
Mochi1 Wrapper: github.com/kijai/ComfyUI-MochiWrapper
Shuffle Video Studio: github.com/MushroomFleet/Shuffle-Video-Studio
Join this channel to get access to perks:
ua-cam.com/channels/Q2548DcqVLjUlpO6RrVGsg.htmljoin
discord: discord.gg/uubQXhwzkj
www.fivebelowfive.uk
- Workflow Packs:
Foda FLUX pack civitai.com/models/620294
Koda KOLORS pack civitai.com/models/589666
RotoMaker civitai.com/models/586617
Loki Swap/Live Portrait civitai.com/models/539936
SD3 Pack civitai.com/models/515549
Lyra Audio Tools civitai.com/models/579066
Looped Motion civitai.com/models/410919
Trio Triple Latents civitai.com/models/381021
Nova Relighting civitai.com/models/565597
Virgil SVD1.1 civitai.com/models/331664
Merge Models civitai.com/models/432863
cosXL Convertor civitai.com/models/420384
Ananke Hi-Res civitai.com/models/352117
- SDXL Lora's
civitai.com/models/475595/eighties
civitai.com/models/460522/thorra-sdxl-public
civitai.com/models/384333/helldivers-2-style
civitai.com/models/401458/not-the-true-world
civitai.com/models/405640/paul-allens-character-lora
civitai.com/models/339881/assassinkahb-sdxl-test-model
civitai.com/models/320332/assassinkahb-cascade-test-lora
- Introducing Playlist (music/video)
ua-cam.com/play/PLrDILB8WjRg2i7GSJ7Syd-DhukLgEJcy3.html
- Checkpoint Merging
ua-cam.com/video/JJgYp-fpGi4/v-deo.html
- cosXL / cosXL-edit conversion
ua-cam.com/video/aJrkPsuwQkU/v-deo.html
ua-cam.com/video/E1VW_A8DhY8/v-deo.html
- 3D Generation
ua-cam.com/video/Sb49FGc0664/v-deo.html
- New Diffusion Models (August '24)
Flux1:
Kolors:
SD3-medium:
Stable Cascade:
ua-cam.com/video/0_maNKAlIJQ/v-deo.html
ua-cam.com/video/0Iu8VC4he1Q/v-deo.html
SDXS-512:
ua-cam.com/video/IlDJQ8PimyU/v-deo.html
cosXL & cosXL-edit:
ua-cam.com/video/_M6pfypp5x8/v-deo.html
- Stable Cascade series:
ua-cam.com/video/vfjhkn7mv0U/v-deo.html
- Image Model Training
datasets ua-cam.com/video/ah1KRtjDFk4/v-deo.html
colab ua-cam.com/video/7vRS_Ls6OwY/v-deo.html
local ua-cam.com/video/7vRS_Ls6OwY/v-deo.html
civitai ua-cam.com/video/dmSZ5TWEWTQ/v-deo.html
civitai ua-cam.com/video/WPKPO-2WFK8/v-deo.html
- Music with Audacity
ua-cam.com/video/mPwoy1GweKk/v-deo.html
ua-cam.com/video/Qb3iIA-KqsE/v-deo.html
- DJZ custom nodes (aspectsize node)
ua-cam.com/video/MnZnP0Fav8E/v-deo.html
stable diffusion cascade
stable diffusion lora training
comfyui nodes explained
comfyui video generation
comfyui tutorial 2024
best comfyui workflows
comfyui image to image
comfyui checkpoints
civitai stable diffusion tutorial
An excellent reason to buy an RTX4090 with 24GB of VRAM.
amzn.to/3WMvFbU
Used Shuffle Studio for the main edit, then added some intro/outro clips using other prompts from the Golden Surfer prompt list, created in the last video tutorial.
Song: JXHNSXNT - "Unleashed"
Workflow Pack: civitai.com/models/886896
Mochi1 Article: civitai.com/articles/8313
Mochi1 Wrapper: github.com/kijai/ComfyUI-MochiWrapper
Shuffle Video Studio: github.com/MushroomFleet/Shuffle-Video-Studio
Join this channel to get access to perks:
ua-cam.com/channels/Q2548DcqVLjUlpO6RrVGsg.htmljoin
discord: discord.gg/uubQXhwzkj
www.fivebelowfive.uk
- Workflow Packs:
Foda FLUX pack civitai.com/models/620294
Koda KOLORS pack civitai.com/models/589666
RotoMaker civitai.com/models/586617
Loki Swap/Live Portrait civitai.com/models/539936
SD3 Pack civitai.com/models/515549
Lyra Audio Tools civitai.com/models/579066
Looped Motion civitai.com/models/410919
Trio Triple Latents civitai.com/models/381021
Nova Relighting civitai.com/models/565597
Virgil SVD1.1 civitai.com/models/331664
Merge Models civitai.com/models/432863
cosXL Convertor civitai.com/models/420384
Ananke Hi-Res civitai.com/models/352117
- SDXL Lora's
civitai.com/models/475595/eighties
civitai.com/models/460522/thorra-sdxl-public
civitai.com/models/384333/helldivers-2-style
civitai.com/models/401458/not-the-true-world
civitai.com/models/405640/paul-allens-character-lora
civitai.com/models/339881/assassinkahb-sdxl-test-model
civitai.com/models/320332/assassinkahb-cascade-test-lora
- Introducing Playlist (music/video)
ua-cam.com/play/PLrDILB8WjRg2i7GSJ7Syd-DhukLgEJcy3.html
- Checkpoint Merging
ua-cam.com/video/JJgYp-fpGi4/v-deo.html
- cosXL / cosXL-edit conversion
ua-cam.com/video/aJrkPsuwQkU/v-deo.html
ua-cam.com/video/E1VW_A8DhY8/v-deo.html
- 3D Generation
ua-cam.com/video/Sb49FGc0664/v-deo.html
- New Diffusion Models (August '24)
Flux1:
Kolors:
SD3-medium:
Stable Cascade:
ua-cam.com/video/0_maNKAlIJQ/v-deo.html
ua-cam.com/video/0Iu8VC4he1Q/v-deo.html
SDXS-512:
ua-cam.com/video/IlDJQ8PimyU/v-deo.html
cosXL & cosXL-edit:
ua-cam.com/video/_M6pfypp5x8/v-deo.html
- Stable Cascade series:
ua-cam.com/video/vfjhkn7mv0U/v-deo.html
- Image Model Training
datasets ua-cam.com/video/ah1KRtjDFk4/v-deo.html
colab ua-cam.com/video/7vRS_Ls6OwY/v-deo.html
local ua-cam.com/video/7vRS_Ls6OwY/v-deo.html
civitai ua-cam.com/video/dmSZ5TWEWTQ/v-deo.html
civitai ua-cam.com/video/WPKPO-2WFK8/v-deo.html
- Music with Audacity
ua-cam.com/video/mPwoy1GweKk/v-deo.html
ua-cam.com/video/Qb3iIA-KqsE/v-deo.html
- DJZ custom nodes (aspectsize node)
ua-cam.com/video/MnZnP0Fav8E/v-deo.html
stable diffusion cascade
stable diffusion lora training
comfyui nodes explained
comfyui video generation
comfyui tutorial 2024
best comfyui workflows
comfyui image to image
comfyui checkpoints
civitai stable diffusion tutorial
An excellent reason to buy an RTX4090 with 24GB of VRAM.
amzn.to/3WMvFbU
Переглядів: 277
Відео
Mochi1 - Pro Video Gen at home, huge improvements with Spatial Tiling VAE !
Переглядів 1,5 тис.4 години тому
My latest Donut-Mochi-Pack-V7 takes advantage of the VAE Spatial Tiling decoder, this give better quality than V6 (no tiling) but runs on even less GPU power. Shuffled AI Demo Video: ua-cam.com/video/4XrAMXHPjBM/v-deo.html Donut Workflow Pack: civitai.com/models/886896?modelVersionId=1014341 How to use my LLM system to build highly advanced Video prompts: github.com/MushroomFleet/LLM-Base-Promp...
Mochi Outputs Demo V7~(shuffled)
Переглядів 3134 години тому
This is the Demo video is paired with the Donut-Mochi-Pack-V7. Created with Shuffle Video Studio's new comfyui output converter script. I built the tool to create quick and dirty montage reels/shorts, however in the simple mode, it will include all the clips, we can add all our mochi AI video outputs (of matching height x width) into a single folder and have a script rename them, so that they a...
I wrote Video Shuffle Studio in Python with LLM, you can too !
Переглядів 5337 годин тому
It's time for another Story - this time we look at LLM's harnessed to write code, which can be used to perform complex tasks and save you time with repetitive and mundane actions. Project: github.com/MushroomFleet/Shuffle-Video-Studio Video (demo): ua-cam.com/video/ElKTb3J2OkI/v-deo.html I needed to make some randomised montage videos for a few projects, and there was no viable solution for wha...
Shuffle Video Studio - Example
Переглядів 1327 годин тому
This is an example referenced by the Tutorial/Guide video here: Project Source (licensed under Apache 2.0) github.com/MushroomFleet/Shuffle-Video-Studio Note: The central Video is the output from the tool Video was originally Filmed with a Sony DCR-PC110E PAL Camera (1997) I have added the outer expansion to show how this can be easily adapted for Widescreen format, even if you shoot in 5:4 AR ...
Get Started with E2/F5 Text to Speech on Pinokio.computer (One Click install)
Переглядів 36712 годин тому
Today i want to draw attention to a platform that make your life easy, it's not the usualy comfyUI we are used to, but allows us to get started with some lesser known - but awesome AI projects! We will be giving a quickstart / explanation / demonstration with the Text to Speech (TTS) voice cloning system E2/F5-TTS. This can clone voices with only 15 seconds of audio, and offers debate between t...
Don't sleep on Mochi1 - This is Pro Video Gen with ComfyUI on Local & also Runpod !
Переглядів 7 тис.14 годин тому
It was a wild three days of research and development that ended with a big Workflow pack - the winner being the V6 versions, however we will add to this and improve it further. For now we look at the process of the research that was done, in the spirit of community collaboration. There is a lot to cover and please remember V1 - V5 were all research workflows! I gave full written guides in my ar...
SD3.5 is Here! Get started today with #comfyui
Переглядів 1,1 тис.День тому
Here are some easy to follow instructions for how to get setup with SD3.5, SD3.5 Turbo and the FP8 version for low power GPU's. Installation Guide: civitai.com/articles/8306 Thoda Workflow Pack: civitai.com/models/880430 Join this channel to get access to perks: ua-cam.com/channels/Q2548DcqVLjUlpO6RrVGsg.htmljoin discord: discord.gg/uubQXhwzkj www.fivebelowfive.uk - Workflow Packs: Foda FLUX pa...
create video prompts from your images with this time saving workflow
Переглядів 641День тому
Many impressive video generators are out there right now, as they offer the option to use your own images and we have many, i wanted a time saving tool. We can ask this workflow to give us Video prompts. It will use my prompt template to guide OneVision, this will look at each image and then create a detailed video prompt. Find the Workflow here: github.com/MushroomFleet/DJZ-Workflows/blob/main...
Holographic Filesystem combined with my ACE Cognitive Enhancer [LLM]
Переглядів 41414 днів тому
Full project will be expanding here: github.com/MushroomFleet/LLM-Base-Prompts (base prompt shown in this video, ACE R&D Study, more Holographic Filesystem (HFS) details to be added over time) I had been sitting on this discovery for a while and decided it was time to release it, i still have a lot of research to do, but it needed to be published. I already showed my discoveries in the Areas of...
AI Introduction Presentation@DeanCloseSchool on 14th October 2024
Переглядів 46914 днів тому
This week i was invited to visit Dean Close School in Gloucestershire, UK and host a talk on Generative AI. My aim was to introduce the basic concepts and give them a "crash course primer", as part of an AI Week event the faculty was running. Special Thanks to the Staff and all involved with organizing this weeklong event, It was a pleasure to host the opening day. This Talk was given live in t...
Origin of Species (CyberSocietyV66) LORA Training
Переглядів 30421 день тому
In this video we take a look at a basic process which can be undertaken in order to produce custom/unique styles by way of simple re-combination evolution of datasets. In this example with create 4 precursor models with the intention to use them as we would mix paint. This creates the look which you envisioned from the outset. Join this channel to get access to perks: ua-cam.com/channels/Q2548D...
How you can Make music with Udio AI & Master with Audacity/Audition
Переглядів 20028 днів тому
Start making music today with Udio / Suno AI. Mastering Quickstart guide with Audacity & Audition. All Track Examples from Iconoclaust Project: 09-24: www.udio.com/playlists/saCWyw8x8KNkmjUo1FmnPp EP: soundcloud.com/tomino-sama/sets/iconoclaust-ep 10-24: www.udio.com/playlists/3VkTqdoGxwqa3HvDbzL7xw LP: soundcloud.com/tomino-sama/sets/iconoclaust-lp Join this channel to get access to perks: ua-...
Start using Runpod today with my Template for Flux with #comfyui
Переглядів 880Місяць тому
Today i am sharing the Runpod templates I am using, with my custom provisioning script, i will explain what that is and how you can create your own if needed. My Templates will include all 50 Flux Lora that i have trained so far, my own DJZ Nodes pack & ~250 of my Workflows for various Image models. DJZ-OneTrainer & DJZ-KohyaSS are also available for trainers familiar with those projects ;) Inf...
Dataset Recaption & Mutation, plus 25 new Lora Releases #comfyui
Переглядів 602Місяць тому
Dataset Recaption & Mutation, plus 25 new Lora Releases #comfyui
Create Custom Nodes with Zero Code | #Claude #comfyui
Переглядів 645Місяць тому
Create Custom Nodes with Zero Code | #Claude #comfyui
Organise your generations with Project File Path Node | DJZ-Nodes #comfyui
Переглядів 357Місяць тому
Organise your generations with Project File Path Node | DJZ-Nodes #comfyui
Video Generation at Home! with CogVideoX5b in #comfyui
Переглядів 2 тис.Місяць тому
Video Generation at Home! with CogVideoX5b in #comfyui
Soda Pack - a New SDXL Workflow Collection
Переглядів 705Місяць тому
Soda Pack - a New SDXL Workflow Collection
Training Flux Lora with Tensor.Art | Event
Переглядів 679Місяць тому
Training Flux Lora with Tensor.Art | Event
Dataset Tools added to Flux pack | Foda 19
Переглядів 8092 місяці тому
Dataset Tools added to Flux pack | Foda 19
Vid2Vid Relighting with Flux Animation | Foda Pack V18
Переглядів 9552 місяці тому
Vid2Vid Relighting with Flux Animation | Foda Pack V18
Clone all my Packs directly into #comfyui | DJZ Workflows
Переглядів 3752 місяці тому
Clone all my Packs directly into #comfyui | DJZ Workflows
Zenkai Prompt & Wildcards | DJZ-Nodes #comfyui
Переглядів 5262 місяці тому
Zenkai Prompt & Wildcards | DJZ-Nodes #comfyui
Image Size Adjuster | DJZ NODES #comfyui
Переглядів 4112 місяці тому
Image Size Adjuster | DJZ NODES #comfyui
Relighting Images - Create Animated Scenes with Flux Nova! | Foda Pack v17
Переглядів 1,7 тис.2 місяці тому
Relighting Images - Create Animated Scenes with Flux Nova! | Foda Pack v17
Swap as many faces as Needed! Multi-Person Face Swapping | Loki Pack V5
Переглядів 9632 місяці тому
Swap as many faces as Needed! Multi-Person Face Swapping | Loki Pack V5
Smooth star surfing! 🤩
i can use this on forge?
how many vrams needs this tool?
depending on the setup - I showed Q8_0 quantized with CLIP FP16 on CPU, so that would be ~20GB, running on a 4090/3090 However, there are many quantized setups and people in community are running under 16GB, i cannot confirm, but possible to squeeze down to 12 if you offload CLIP to CPU, although that required over 32GB system ram, so many optimization options these days it's hard to test them all. for the safest bet and with the best quality on Local, 20GB is where i stake the signpost on this one.
WOOOWWWWW!!
thanks !
I need to seriously learn this
You know i love to share the goods :)
What do you think about Erick Hartford from UK which bring people uncensored LLMs?
I had not heard about this and would love to learn more ! I do believe it is what is needed - we are limiting the potential of the tech by censoring datasets, machine learning is highly capable and are failing to truly comprehend the power of these LLM's
@FiveBelowFiveUK he is creator of Dolphin LLM & others without any restrictions, on huggingface. Prometheus of our time.
It is interesting, As far as I know, mochi 1 has not been released!!?
Correct ! we are spearheading a community led research effort, thanks to the hard work of Kijai who brought us the wrapper - i'm pretty much finding best settings with the help of all involved, so our workflows have the best params for quality, and efficiency on local GPU. Find my open research thread here: github.com/kijai/ComfyUI-MochiWrapper/issues/26 we just received the VAE Encoder, so image 2 video workflows are now up, but no video until tomorrow, because i'm so hungry and it got too late tonight :)
Good work... How long did it these take to render on the 4090?
i used 100 steps, 7 cfg and 3 seconds (73 frames), so due to those choices is added a lot to each segment of the same prompt ~20 minutes each. I queued it up and left it running for some hours and came back to around 40 videos, then used my randomizer (shuffle video studio) to create two patterns which were put together. It's just one prompt mostly on random seed, the prompts were built in the last video with my ACE-HoloFS system and Claude. The decoding of the latents with the new spatial tiling VAE was the easy part and takes 30 seconds each. So it was all done today, but it does take a long time, when you double the frames and add some seconds :) I'm happy with it though as we are seeing real progression for Local video generation ! Should be more to talk about tomorrow too - I recently updated my workflow packs to include a working Image to Video setup - it used the new VAE Encoder !
@@FiveBelowFiveUK look fwd to your post tomorrow. Thanks.
That was a great deep dive! There's too much surface level shit polluting UA-cam.
thanks so much !
Stupid custom nodes dependencies in this workflow broke torch on my Comfy. 😬 Words cannot express how frustrated and angry I am right now.
none of these node affect the installation of or changes to Torch. I can only advise users of Comfyui portable which used embedded python. If you receive errors like "Torch was not compiled" something else is going wrong, if you need helop you are welcome to come and ask, but it seems like you need to look at how you launched ComfyUI, a workflow cannot break Torch when it is not listed int he requirements.txt Chances are you just didn't launch it as you had on other occasions.
*Building a Video Shuffle Studio with AI: A Python Project Using LLMs* This video details the process of creating a Python-based Video Shuffle Studio using the Claude LLM as a coding assistant. The goal is to create a tool that can split long videos into short segments, shuffle them randomly or based on file size, and then rejoin them into a new, randomized video. * *0:00** Introduction:* The creator introduces the project, highlighting the need for a video shuffling tool that can handle long videos and offer various randomization options. * *2:26** Problem Definition:* Existing video editing software and tools lack the desired functionality for easily creating randomized montages from long videos. * *6:13** Exploring Adobe Scripting:* The creator initially attempts to use Adobe After Effects scripting, but ultimately abandons this approach due to its limitations with long videos and the complexity of creating extensions for Premiere Pro. * *10:40** Python Project Setup:* The creator decides to build the tool in Python, utilizing libraries like MoviePy and NumPy. A virtual environment (venv) is set up for project management. * *13:15** Video Splitting:* The first script, `video_splitter.py`, is created to split a video into segments of a user-defined length. The initial versions are hardcoded, but later iterations incorporate user input for file name and segment length. * *18:42** Simple Shuffle:* A shuffling script, `shuffle_splits.py`, is introduced. The initial "simple mode" renames clips with random hex codes to achieve randomization when ordered by file name. * *22:32** Size Reward Shuffle:* A more sophisticated "size reward" shuffle mode is added. This mode orders clips by file size, assuming larger files contain more visually interesting content, and allows the user to select a percentage threshold (e.g., top 20%) to filter out less interesting clips. * *31:18** CUDA Acceleration:* The video splitting process is accelerated using CUDA to leverage GPU processing for faster performance, especially with high-resolution or long videos. * *36:16** Resolving Errors:* The creator demonstrates how to use the LLM to debug errors encountered during development, showcasing how Claude can identify issues and suggest code modifications. * *40:44** Professional Answers:* The LLM demonstrates its ability to provide insightful feedback, advising against unnecessary optimizations like CUDA for the shuffling script, as it would not yield significant performance improvements. * *46:38** Shuffle Script Refinement:* The shuffle script is further enhanced with options like progress bars and memory usage optimization. * *50:50** Shuffle Demo:* The creator demonstrates the functionality of the splitting and shuffling scripts using a test video. * *53:29** Joiner Script:* A `shuffle_joiner.py` script is created to rejoin the shuffled clips into a single output video. * *56:04** Harnessing Hallucination:* The LLM's ability to dynamically update the project's `readme.md` file based on changes made throughout the development process is highlighted. * *57:01** I-Frames and Compression Issues:* The creator identifies compression artifacts in the rejoined videos, likely due to i-frame loss during splitting and rejoining. Attempts to address this using CUDA for the joiner script encounter limitations and lead to LLM hallucination. * *1:03:43** Joiner Demo:* The joiner script is demonstrated, successfully creating a randomized montage video from the shuffled clips. * *1:07:34** Tool Instructions:* The creator provides a final overview of the toolset, outlining the steps involved in using the scripts and batch files to create randomized montage videos. The video concludes by emphasizing the potential of LLMs as coding assistants, enabling individuals with limited programming experience to build functional tools and bring their ideas to life. The project is released under the Apache 2 license, encouraging further development and adaptation by the community. I used gemini-1.5-pro-exp-0827 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.05 Input tokens: 33701 Output tokens: 849
thanks again for adding this transcription ! I know they are very useful for everyone !
Spooky awesome!!❤️🇲🇽❤️
The song is so sick the video is amazing.
Whaaaa?
RTX 40608gb vram can tank that ?
YES ! in fact some users with 3090 or 16GB GPU were also able to run it, further optimisations can be made at expense of speed to fit inside even less vram
@@FiveBelowFiveUK Woah! Nice man! That's what I have been waiting for... Just got into this AI world and Mochi'1 for me, is the best video generator out there for local play.
What?😮😱 happy Hellawin to all!🎃👻
Very nice bro, a comprehensive showcase! Were they all generated locally? What GPU are you using? How long does it take for you per video?
Yes and yes I'm doing some tricks in the workflow, such as offloading the CLIP model to CPU, which removed a good amount of load from the VRAM allocation, also we used the Q8_0 (there is also a Q4_0 which halves the load again) I'm using a 4090, but users in my group are running it on even 3090 and <16GB GPU. (with Q8) These videos were generated in 2second (49 frames), with 50 steps, so they only took 5 minutes each, i have not used any video enhance or upscalers to give a good idea of the output quality - and i did not remove any videos from the sequence that was generated so what you see here has no cherry picking - to give an idea of consistency.
really?! bla bla bla! most useless video iv ever seen *((((
Wicked idea mate been following your Comfyui content. Im having a mare installing everything required. Keep getting ' module not found error'. Install-SVS-cuda only worked when i moved it to the python folder, since the install isnt to PATH. Well out my depth, Python newbie bro sorry.
the project should be using a virtual environment, this is why the .bat files load the environment before running the scripts, I hope to improve this but i'm not sure what is not working for you exactly. Could you swing by our discord and post in my help section? I'd be glad to help as other may be affected by this and i might be able to change something to address it :) links in description !
fire
LLMs ability to write code is absolutely amazing. I've got no background in the field other than being a lifelong nerd. I got into generative AI and just learning game development from UA-cam videos a little over a year ago. Since then with the help of ChatGPT I've started getting into app development in Python. I wrote a simple interface that allowed me to select a folder or a single file and it would append the frame count to the end of each filename using FFMPEG. Very convenient for some of the video workflows for Comfy. I can give ChatGPT my idea and it suggests how we can build it and holds my hand through the hard parts. I've built two different versions of a very functional voice assistant for Windows that works with OpenAIs api. I literally started coding 12 months ago to teach my kid about game development. The worlds changing fast.
That's a great achievement, keep embracing AI and it will take us all to new heights ! stop by my discord if you get time, i'd love to see this tool you have built too :)
it took some time but u really grew on me and by now ur probaply my fav creator on ai stuffs thx man
Thanks so much - it's an honor !
Oh, I see! I once made something similar as a Gradio interface (Didn't share it). Uses ffmpeg to randomly reshuffle a long video into parts of specific length. What I was missing was content awareness (like words spoken context), then I stopped working on it and moved to other things.. some ppl (me) are shifty sometimes 😁- The process of 'arguing' with the AI is funny (and rather lengthy) - But now coding AI is well improved! - Good wrk, everyone! And as always.
yes a valid next step i want to look into is putting this tool into a Gradio style interface, this would make it much easier to use too - i wanted to get the functionality nailed in python first and then build the GUI over the top. Thanks for your kind comments and i totally agree you have to wrangle the LLM else he will gas light you hahaha :)
@@FiveBelowFiveUK 😆- Yeah! There is a very simple window interface which is not Gradio (I forget the name, but it has a feather as icon) - It's a simple window accommodating buttons checkbox sliders, that sort of thing. Sometimes this is all you need. When I was arguing with the LLM, some of the misunderstandings were about how Gradio works (It's a bit spoiled, so to speak and tends to store data in 'windows USER TEMP' which I very much dislike for the lack of space on C:) So it may be useful to have simpler interface and ask the LLM to always store TEMP data within its own folders. Your dedication and sharing is amazing! - Thank you ever so much!! 🤩
(PERPLEXITY) Answer: The simple window interface you are referring to, characterized by a feather icon in the top left corner, is likely associated with the Tkinter library in Python. Tkinter is the standard GUI toolkit for Python, allowing developers to create windowed applications.
I don't understand what it does 😅, gonna watch your other video about how to make. BTW I am having lots of fun in comfy playing with RyanOnTheInside (Flex) nodes which do synergy of audio/video/other - you may want to give it a view if you have not heard of them. - Not affiliated but well impressed - something new and exciting and different - nice! And as always.
Hi I'm from Egypt I like this content ai create how to make a design interface comfy UI flow and nods ❤
thanks for watching !!
is there anything like eleven labs where i can record my own voice and apply it to the synthesized voice?
I'm a huge fan of voice to voice, if you can do good impressions, this can get you across the line. I showcases Kits.ai in the past, it's a paid service tho. We are trying to show off more Local stuff so RVC is the best imho, trouble is finding it working on local can be tricky. If i find a good one you can expect a feature on this channel ! With all the AI project you might notice i tend to go around in big circles - it's impossible to keep up with it all :)
Nice tutorial mate :) I got it running on runpot but the results had bigggggg ghosting not usable also the i2v did not look at all at my image. also the mochi model is not stored on the network storage so each time starting it loads it again :( . also when i just use the decode side the tile vae is not selectent then i get an OOM with an A40 :/ cheers janosch
Thanks for the feedback - it'#s really helpful for everyone to see what results you have! I must admit that the nodes for the mochi loader are pulling in the models every time, but i think i can improve that by updating my Runpod Provisioning script ! Expect an update on that front soon ;) Regarding the Decoder OOM's with no Tiling - I think this was a 100GB VRAM model (!) so we are still squeezing it in even on aa 48gb. I wanted to offer the "full fat" option for those people that used Runpod as a primary platform. I only decode 2 second clips without the VAE tiling, Video VAE decoding takes an insane amount of VRAM due to all the frames.
@@FiveBelowFiveUK no result yet mate hahah i think my runpod had a headache. also i was not aware that there is an i2v for mochi? perhabs thats why its not working as intended? but keep up the great work :)
Great work mate! 👏👏
Thank you! Cheers!
hello man, after i click synthesize my voice takes forever to process i dont know what i am doing wrong i followed all the steps correctly installed it two times still no luck, can you help me? btw i did put my test voice within 15 seconds and all . please let me know if u know of any issues like this thank you.
hm, i have not seen this problem on my PC, it's a new platform for me also, all i can suggest is to try and see if they have a support forum or discord where you could ask? If it was me i would have done the same thing, delete it, reinstall it and hope it works.
It's all good, but video how to train models on your dataset is really needed because most models excluded from huge corpus of data. For example architectural photos, ornamental works, engraving prints - no model can generate such esp in perfect quality. To train models, the dataset must be prepared and thoroughly described by other Ai, which is not easy with thousands photos libraries.
i think you meant Billions of images, however these models we are using no longer contain copyright images, only SD1.5 and a small amount of SD2.1, even smaller portion of the SDXL dataset contained recent works by living artists IIRC. When i speak about "training on your own work" we mean by that "finetuning checkpoints" or "Finetuning Lora patch models" for various diffusion models. You are not wrong, but in practice a good fine tune will only produce estimations of your work when prompted for that purpose. FLUX1 is the leading base model and if you have enough data (your work) and you caption it accurately (tutorials on this are on this channel) it is totally a viable route artists are using right now! Personally, if i had the budget i would train a true clean base model on only Public domain works, this has been the goal for some time now - although we cannot guarantee this due to the time it would take to audit 5 billion images by human eyes (~20 years with no sleep) !
@FiveBelowFiveUK wow, you're right. I'm interested in that finetuning technique for sure. Do you know Eric Hartford, the famous publisher of uncensored LLMs in UK?
I've not watched yet but can you do img2video with this?
AFAIK there is no VAE encoder, so all you can do is "Vision to Image" which will approximate and image using a complex description/captions. However this is only an update away if the author decided to add this, i was even tempted to try writing one myself. It's still early days so i decided to cover a few other things before coming back to it - hope that helps!
Its stuck at the node MimicMotion GetPose. It tries to download a model but it can't: ComfyUI\\custom_nodes\\ComfyUI-MimicMotionWrapper\\models\\DWPose\\.cache\\huggingface\\download\\dw-ll_ucoco_384_bs5.torchscript.pt.d86a0b2b59fddc0901a7076e9f59c9f8602602133ed72511c693fd11eea23d91.incomplete'. The problem continues even if i download it myself and put it in the folder.
the filename definitely does not contain all the numbers etc with .incomplete extension, you were right to try re-downloading the files. "dw-ll_ucoco_384_bs5.torchscript.pt" is the full filename. It's using the huggingface cache, so you might need to delete that file and have it refetch. I worked with a mobile internet for years so i feel your pain, i've seen this before. Often the huggingface cache is not in the comfyui path but somewhere else in windows (if you are even on windows). direct download might not work because of how the HF cache works. Good Luck !
@@FiveBelowFiveUK I just used another node and It worked. AdvancedLivePortrait. I doesn't like It because It needs to have two animations nodes after the image. I'll try to fix this problem another time. Don't you know if there's some nodes for facefusion 3? They have liveportrait behind It now. I need something to generate audio through Text and lypsync it with some image, do you have any tutorial for comfu ui?
this works with amd gpu?
I would not like to guess as i do not have an AMD GPU, all i can say is try it and see? If it doesn't, let us know, because it will help others :)
Can i do it on my 3090?
Yes, there are people in my discord that are using 3090 with this model, you would use the V6 Fast settings or V5 (Q4/Q8 + T5 FP8 Scaled). I will be making new versions to support lower VRAM this week, i had some other things to cover first :) I explain in articles all the different setups if you can't wait :)
@FiveBelowFiveUK Thanks broski
Great video, thanks a million! The new SD 3.5 is out now, would be awesome to train that model. Have you tried it at all?
Yes I have, I releases a new "Thoda-SD35-Pack" on my github (DJZ-Workflows) and also on CivitAI, check more recent video releases to catch the links. I have put out fixed workflows for SD3.5 Medium and Large already ! Training will get a video as soon as it's more practical for everyone to access :)
@ awesome thank u! Would love to figure out how to train SD 3.5. I trained 1.5 with the Dreambooth Notebook before but it’s been a while and maybe there’s new tools out for training.
Does anybody know what this means LayerUtility: PurgeVRAM I can't use install missing nodes on this, restarted and searched.
github.com/chflame163/ComfyUI_LayerStyle -- this should be what you need, i think its in the comfyui manager - it is unloading the VRAM to help fit the models into VRAM on your GPU, i place them there to help with the crazy load these video models use. Full details in the links in description! feel free to ask more questions if you have them :)
All nodes in your workflows are giving float errors, how can I solve this?
sounds like you need to update the pytorch to at least 2.5.0, use the .bat in the update folder (comfyui) update with dependencies.
@@FiveBelowFiveUK ok, thanks I'll update. Do you have any optimized workflow to run on an rtx 3060 12 GB vRAM?
I like 🎉 the passion and precision of useful data Good looking Brother 👌
🤝
A wonderful deep dive which is very rare to see! Is video2video possible with Mochi?
until there is a VAE decoder all we can do is use ipadapter or Vision type nodes to get tokens from an image and then prompt them, this is only an estimation not true img2video - but i don't see why a future update would not add this in time, i think it's less than a week old. I'll be covering it for sure ! ~ and Thanks so much :)
@@FiveBelowFiveUK Great! The most relieving thing is that there's no technical reason that the model itself would not bend to that purpose. This model is already so good that it's only a question of time when the open source AI-based generation of video will be crazy epic! Most models seem to fail in motion, and video 2 video is a huge help with that. For example, I have been creating music videos with Runway mostly using Runway but with ComfyUI we will get more control instead of doing expensive lottery. In my opinion, what would be needed in order to have a genuinely usable video tool for music videos and movies instead of animated portraits and talking heads, these inputs would be needed: 1. Driving video for motion 2. Reference image for background 3. Reference image for character 4. Text prompt for controlling background action, camera movement and such Keep on keeping on!
*Get Started with E2/F5 Text to Speech on Pinokio-dot-computer* * *0:02** Introduction:* The video introduces voice cloning using a platform called Pinokio-dot-computer, emphasizing its ease of use and accessibility for AI projects. * *1:07** AI Modules on Pinokio:* Pinokio offers various AI modules for diverse applications like image processing (Live Portrait, Aura SR, Flux), audio processing (AudioCraft), and more. Notably, it includes ComfyUI and Whisper, tools previously explored by the channel. * *3:43** Running E2/F5 TTS:* Demonstrates the one-click installation and launch of the E2/F5 TTS module within Pinokio. It highlights the platform's automated setup and isolated environment, ensuring it doesn't interfere with the user's system Python. * *5:09** E2/F5 TTS Explained:* Explains the core features of E2/F5 TTS, emphasizing its ability to clone voices with only 15 seconds of audio. It also details the advanced settings, including options for manual transcription and speed adjustment. * *8:45** Podcast Mode:* This section introduces the podcast mode, allowing users to create debates or conversations between cloned voices. It suggests using an LLM (like Claude) to generate scripts with pre-defined character personalities. * *11:39** Speech Styles:* The video highlights the multi-style feature, allowing users to clone different emotional tones (e.g., surprise, sadness, anger) for a single voice, opening up possibilities for character development and storytelling. * *13:54** Voice Chat:* Briefly demonstrates the voice chat feature, enabling real-time conversations with an AI using the cloned voice. * *15:37** Diamond - AI World Modeling:* Introduces the Diamond module, showcasing its capability to create AI-powered game environments and simulations based on existing games like Counter-Strike and Pac-Man. * *17:38** Creating Voices:* Demonstrates the process of creating a voice clone using a 15-second audio sample. The presenter uses their own voice for the example and experiments with mono and stereo audio formats, noting that results may vary depending on audio quality and voice characteristics. * *24:37** LLM Debates:* Explains the process of using an LLM (like Claude) to generate debate scripts for the podcast mode, emphasizing the use of XML formatting for defining speaker roles and guidelines. * *26:17** Distilled Prompt:* Shows how to refine LLM prompts for generating debate scripts, aiming for concise and versatile templates adaptable for various topics and characters. * *27:00** Prompted Script:* Demonstrates generating a debate script using the refined prompt in Claude, specifying the topic and speaker names. * *31:12** Result:* Combines the generated script with cloned voices in the podcast mode to create a simulated debate, showcasing the potential of E2/F5 TTS for producing audio content. I used gemini-1.5-pro-exp-0827 on rocketrecap dot com to summarize the transcript. Cost (if I didn't use the free tier): $0.03 Input tokens: 21385 Output tokens: 639
Thanks for providing the Summary !! it's a big help for people
Change the MIKE!
Watts the problem
yeah i keep on looking away from the mic haha it's a new mic already see, i'm still getting used to it being on the desk and not on a boom arm.
hahaha :)
My favorite stick figure is back!
This guy is brilliant
Big Thanks everyone :)
34:32 where to put those flash attention files^?
because it's so technical i have not even covered it yet, but the short story is you have to have the files in the comfyui folder (where the startup.bat files for comfyui are) and you do "pip install filename.whl" but this can break things, so again, i hope to return to this in a future update.
amazing. btw, how do you create your video avatar? that's awesome
What do you mean? it's just me :) I used to use after effects and rotoscoping, but now it's all in a comfyui workflow, if you see the first video on the channel the secret is hiding in plain sight. Depth + Openpose, and a wireless mouse :)
amazing study on this amazing video model! thank you brother
pleasure is all mine - thanks for watching !
🙂🙂
I think, people who recommend cloud/subscription services that in any way costs money, does not understand why most people are interested in generating locally. The whole idea of generating locally is that it's free, no additional payments required than the one they've already made for the PC.
I agree to some degree but the cloud services like minimax also generate much faster than what any highend PC can. Putting 5 in a queue and generating 2 at a time, sometimes within a few minutes simply isnt happening locally. I choose to use both cloud and local for now.
It's not only about money. It is about as much privacy and safety of your data as possible (assuming it is even remotely possible with AI technology).
sure but using a 48gig vram card on runpod is cheaper than my electric bill, so if its a matter of money some people might wanna take that into account.
@@quercus3290 I understand a lot of the reasoning and calculations behind it, but I always struggle when I ask myself 'why'. Some people probably makes models, videos and images as a hobby, i make images for fun myself, it literally replaced gaming for me. Though I can't help but notice that there's also a lot of people that starts with generative AI thinking they're gonna make some cash, or fame, or both- Sinking money into it with expensive time constrained services.
@@TheGalacticIndian What's privacy anyway?
Setting up cuda toolkit and vision studio is pain in the ass
agreed - that is why i have not even covered adding flash attention, to be honest it's the difference between 20 minutes and 10 minutes, i can wait :)
@@FiveBelowFiveUK is there any alternatives that you would recommend?
Thanks! Good work. Which workflow gives the best result after your testing?
For maximum quality, the latent sideloading workflows in V6 are almost perfect - certainly peak as far as this model is concerned, you can also double the steps. Some people use 200 steps but i found 50-100 was enough. 49 frames was enough for my use case, but i know some want longer clips. V6 Fast is a good example of shorter videos. However - the reason that is required so much power to decode the latent files (runpod) was i did not use VAE tiling, this is insane because that was how this was even able to run on local in the first place - i wanted to show that the quality always suffered from ghosting no matter what tiling setting you used. Its the seams showing you see.