How to Make the PERFECT Dataset for RVC AI Voice Training

Поділитися
Вставка
  • Опубліковано 26 жов 2024

КОМЕНТАРІ • 366

  • @Jarods_Journey
    @Jarods_Journey  Рік тому +53

    The end of the video got cut off -_-. I only had like 10 seconds left so when I get the chance, I'm just going to link a shorts so that you guys can see the rest of the video lol

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +7

      Finishing the Data Curation Video...

    • @rytraccount4553
      @rytraccount4553 Рік тому +3

      @@Jarods_Journey Your audiosplitter code exports 44.1khz audio. how do I make it export 48khz? I am losing quality with this code!

  • @IIStaffyII
    @IIStaffyII Рік тому +11

    Wow, I am amazed by this channel. A few weeks ago I was searching for Diarization of voices but had no good luck finding a good fit.
    Not only do you have a very good tutorial, you seem to be knowledgeable and up to date with everything (as up to date as one can when things are moving this quick).

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +3

      Too many things, too fast. Appreciate it :D, tis is the realm of open source.

    • @brianlink391
      @brianlink391 9 місяців тому

      @Jarods_Journey Love you, bro! Thanks a ton. I didn't even know this existed!

  • @ohheyvoid
    @ohheyvoid Рік тому +27

    Just found your channel last night, and your workflows are so clear and to the point. Quickly becoming my go-to for voice2voice workflows. Thank you for your work.

  • @M4rt1nX
    @M4rt1nX Рік тому +23

    Thank you Jarod.
    If people don't want to use GIT they just can download the zip and unpack it at the preferred location. 😉

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +2

      Solid tip, thanks Luz! Totally skipped my mind.

  • @keisaboru1155
    @keisaboru1155 Рік тому +39

    how to combine ! - voices to create a total unique one !

  • @joshuashepherd7189
    @joshuashepherd7189 Рік тому +4

    OMG Jerod! Your video tutorials are becoming better and better. I love seeing a new release from you! Thanks for all your hard work!

  • @OthiOthi
    @OthiOthi Рік тому +2

    Jarod managed to help me figure out a strange problem that I was not able to figure out at all. He's got my sub. Thanking you kindly!

  • @ShiinoAndra
    @ShiinoAndra 8 місяців тому +1

    Just found your channel, and I want to say i'm too deep into the rabbit hole that I instantly recognize all the voice you use for conversion at the start😂

  • @ZitronenChan
    @ZitronenChan Рік тому +8

    Your channel and the AI Hub have helped me a lot in getting started. I just trained a model with 2 hours of audio from Faunas last stream in RVCv2 on 1000 epochs and it came out very well

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +2

      Haha awesome, glad to hear!

    • @paradym777
      @paradym777 Рік тому +1

      Is there a way I can get a copy of it? (>

    • @VexHood
      @VexHood 8 місяців тому

      how much better is that than 300? does that prevent static sounds if you don't use pretrained generators?

  • @TLabsLLC-AI-Development
    @TLabsLLC-AI-Development 8 місяців тому

    Bro. This channel is amazing. I've been around and you are needed by many. Welcome.

  • @JobzenPadayattil
    @JobzenPadayattil Рік тому

    Hey Bruh I'm getting some errors while converting trained data to out put,, ffmpeg error + dtype, type error... (Ffmpeg is already installed )

  • @cubicstorm81
    @cubicstorm81 4 місяці тому

    For those receiving an error with the "split_audio" script not creating the .srt audio as per the above tutorial, run this in an Anaconda or Python prompt, let it download the required dependencies and it will work as you need.
    Thank you for a great tutorial!

    • @david7327
      @david7327 6 днів тому

      How does it work? Because i got the exact problem.

  • @m0nkeyb0i666
    @m0nkeyb0i666 5 місяців тому +8

    copied from the issues section, worked for me.
    Running split_audio.py threw this error
    Exception has occurred: FileNotFoundError
    [Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
    extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 180, in main
    process_audio_files(input_folder, settings)
    File "D:\ai\programs\audiosplitter_whisper\split_audio.py", line 183, in
    main()
    FileNotFoundError: [Errno 2] No such file or directory: 'D:\ai\programs\audiosplitter_whisper\data\output\1.srt'
    Additionally, the terminal was saying something about not having or not finding cublas64_12 (I can't remember exactly what it said)
    The error is thrown because the program can't find the srt file, because it can't make the srt file, and this is caused by a mismatch of CUDA versions. Torch (or something) has CUDA 11, but the script (or whatever) needs CUDA 12. I'm not a programmer, I don't know exactly what is what. All I know is that I fixed it.
    To fix this, do the following.
    Download and install CUDA 12 developer.nvidia.com/cuda-12-0-0-download-archive
    Navigate to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0\bin"
    Copy cublas64_12.dll, cublasLt64_12.dll, cudart64_12.dll
    Navigate to "...\audiosplitter_whisper\venv\Lib\site-packages\torch\lib"
    Paste the dlls into this folder
    Now when you run split_audio.py, it will be able to create the srt file, fixing the issue with not being able to find said file.

  • @ControllerCommand
    @ControllerCommand Рік тому

    your channel is amazing. I was looking for this long time.

  • @smokey4049
    @smokey4049 Рік тому +4

    Hey, thanks for your awesome series of tutorials! As someone who is pretty new to this, it really helps out a ton. Would it be possible if you could make a tutorial on how to train a RVC 2 Voices with the dataset I just created? Thanks again and keep up the great work!

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      Appreciate it! Respective tutorials already exist, so I'd go check those out! ua-cam.com/play/PLknlHTKYxuNshtQQQ0uyfulwfWYRA6TGn.html

  • @CaptainCarlossi
    @CaptainCarlossi 19 днів тому

    Hello, me again with two small questions:
    1. The file format of choice is of course WAV, but what should it be for the best quality? 44khz or 48khz? Mono or stereo? (My recordings are in mono, but I could duplicate the channels and create a "pseudo-stereo track" if that produces better results.)
    2. Your Audiosplitter_Whisper is good for my spoken sound files, but what is the best way to split the sung recordings? I think that because of the continuous singing there is not always a silence every 10 seconds (or less). What could you recommend to me? Or do you know a current, nice HowTo that describes everything in detail for achieving best quality ? (These are really my last questions :) )

  • @jr-2nd
    @jr-2nd 3 місяці тому +7

    I'm from the future: Don't install Python 3.12, use 3.10.

  • @VegascoinVegas
    @VegascoinVegas 6 місяців тому

    Exactly
    what
    I
    needed
    to
    know

  • @lockdot2
    @lockdot2 Рік тому +1

    I am still working on it, I have decided to do this on the worst quad core CPU there is, the 1.3 GHz, with no turbo, 4 core, 4 thread AMD Sempron 3850. I spent a bit over a week getting clean audio to save on the Ultimate Vocal Remover. I am using 12 hours of talking.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +2

      There is probably a way to do this on collab, but atm, Collab is a hassle I don't wanna have to deal with :(. Good luck on it 🫡

    • @lockdot2
      @lockdot2 Рік тому +1

      @@Jarods_Journey Thanks! It's going somewhat smoothly, got 5 errors in the CPU part of Visual Studio Code, but I am just going to pretend they don't exist, and move on with it. Lol.

  • @MaorStudio
    @MaorStudio Рік тому

    Thank you so much. King!

  • @matthewpaquette
    @matthewpaquette Рік тому

    Great tutorial!!

  • @RayplayzFN
    @RayplayzFN 7 місяців тому +4

    this is an error i got RuntimeError: Library cublas64_12.dll is not found or cannot be loaded

    • @MrAcapella
      @MrAcapella 7 місяців тому +1

      SAME! :(

    • @Timiny118
      @Timiny118 6 місяців тому

      i had this same error but ended up having the file from a previous installation of alltalk_tts. I'm sure you could find it elsewhere though. I ended up placing it in "audiosplitter_whisper\venv\Lib\site-packages\torch\lib" and everything worked as it did in his video.

  • @sukhpalsukh3511
    @sukhpalsukh3511 Рік тому

    Great , Thank you for this video,

  • @pilpinpin322
    @pilpinpin322 Рік тому +1

    Thank you so much ! It's a clear video and we see that you know what you are doing! I have a small question regarding the .wav files of the dataset, is it better to encode them in stereo or in mono? Or does the program make no difference?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      I don't think it makes a difference, but I read somewhere that it should be done in stereo. It flattens them I believe though so it doesn't really matter after it's been processed though

    • @pilpinpin322
      @pilpinpin322 Рік тому

      @@Jarods_Journey Thank you very much ! One last question : Is it better to segment the sounds into files of 10 seconds each, or to cut in the form of complete sentences (and therefore to have files of very variable duration)? Thx for your work !

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      @@pilpinpin322 :), complete sentences works best so you don't get weird clippings, but if you run out of VRAM, you'll need to split into smaller segments.

    • @pilpinpin322
      @pilpinpin322 Рік тому

      @@Jarods_Journey Thx for the fast reponse, even if there are very small sentence of 1 sec like " Yess i agree ! " ?

  • @fuuka69420
    @fuuka69420 Рік тому

    Hey another banger video mate!
    Do you reckon its wise to keep the sound of breaths such as when they inhale or exhale?? or do I need to ONLY need the part where the source voice talks or sing?? let me know your thought and keep up the cool vids!

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Whatever is included in the split audio, should be fine. It may cut out some of the breathing perhaps at the end of a sentence or beginning, it everything else in between is fine to keep :)!

  • @CaptainCarlossi
    @CaptainCarlossi 19 днів тому

    Hello. Thanks for your great Videos. One Question: I am from Germany and have WAV files spoken or sung in english AND in german language. For your tool / whisperx I can handle them separately by changing the languange. But my question is about RVC: For training a new model, can I mix those different languages together? I always did that and now I realize, that this maybe wasn't a good idea? Or does that not matter for RVC? Thanks in advance ;)

    • @Jarods_Journey
      @Jarods_Journey  19 днів тому +1

      This is fine, RVC doesn't look at text to train - it's strictly extracting features from the audio provided. The only thing is it may sound accented, for example, if I train a model on Japanese audio, if I use it to convert English speech, it may not sound 100% English native

    • @CaptainCarlossi
      @CaptainCarlossi 19 днів тому

      @@Jarods_Journey Cool. Thank You for your quick reply ;)

  • @Dante02d12
    @Dante02d12 Рік тому +3

    Hey there! Thank you for all those videos! I hadn't realized UVR5 had advanced options, lol.
    Hey, I have a question that can look silly but it is serious : is it really required to train for _hundreds_ of epochs? I have had absolutely great results with 50 epochs only. What does more epochs bring exactly?
    Meanwhile, the issues I have also happen with models trained for hundreds or thousands of epochs, because most of my problems come from the way I clean the audio I want to clone.
    I also noticed my feminine voices tend to break at growls. Is it required to have growling audio in the database used for training? Or is there a secret sauce to make any voice have growls?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +2

      Appreciate it! A finished epoch indicates that the model has seen every sample once. Increasing epochs just repeats this process for X number of epochs. It's all data dependant, as you don't always need more epochs for a good model.
      As well for growls, just in general, they seem to be harder for the models to infer on and my anecdotal experience seems to be all models kinda struggle with it. I have yet to try training with growls, but I want to try a similar experience with laughing because often times laughing just sound weird 😂

  • @whimblaster
    @whimblaster Рік тому +1

    Do I need to sing in the audio for the dataset or talk is enough (like reading something from the web)? Thx, apart from that great tutorial. ^^

  • @dookiepost
    @dookiepost 5 місяців тому

    If you get an error when running xwhsiper, make sure you have version 12 of NVIDIA CUDA toolkit installed

  • @wugglie
    @wugglie Рік тому +2

    for some reason i keep getting an error where it cannot open the vocals.srt file. did i miss a step? there is no vocals.srt file generated in the output folder for audiosplitter.

    • @battletopia
      @battletopia Рік тому

      I'm having the same problem. Did you manage to sort this out?

  • @kaant21
    @kaant21 Рік тому +1

    Dont forget to change execution policy to default when you are done with this

  • @bruhby6276
    @bruhby6276 Рік тому

    Thx for your content! Why would I use WhisperX tho? Is it just for data management or is it actually helps RVC train?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      For curating better data, by using sub timing, there's may be less chances for audio samples being empty noise

  • @matrixxman187
    @matrixxman187 Рік тому +1

    I have 3 minutes of studio quality lossless vocals I would like to use to train. Is that sufficient?
    Additionally, there are some interviews on UA-cam of the same artist speaking at length but I was concerned whether the lower quality mp3 stuff should be avoided for these purposes. Thanks for your video! Very informative

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +2

      Muffled audio should be excluded but if the voice sounds good enough you can include it. 3 minutes may be okay, but idk, you just gotta try it out mate 🤟.
      10 minutes or more is recommended but you can use less sometimes and it'll be fine.

  • @PowerRedBullTypology
    @PowerRedBullTypology Рік тому +1

    Jarod, do you know if there is software or websites or whatever that let you make a new voice out of other voices? Like blend them into a new voice? especially RVC type of voices (since I know that best) ..but would be curious otherwise of others too

  • @EclipsixGaming
    @EclipsixGaming Місяць тому

    i recommend using software like audacity for post processing the audio it help with clearity and if random noice

  • @nexgen91
    @nexgen91 8 місяців тому

    I have audiosplitter_whisper installed and vscode opened, trying to run debugging as per 12:00 in the video and am getting the following error "configuration 'python:file 'is missing in 'launch.json"" any idea what might be going on? BTW: It appears to work if I run "python split_audio.py" in power shell.

  • @Metalovania
    @Metalovania 4 місяці тому

    Hi! I followed your tutorial and managed to set everything up and run the script without getting any errors, but the problem is that I didn't get the expected amount of segments.... I tried the script with three different audios. The first one, of about 4 minutes, got me an output of 35 seconds worth of segments; the second one, also about 4 minutes got an output of 1min 36sec total; and the the third, a bit over 2 minutes, got 55 seconds. Do you know what could be the issue? Also, I tested speaker diarization with another audio but it didn't go very well. It had 4 different speakers, which it separated in only 2 and all 4 speakers where in both folders.

  • @shampun2281
    @shampun2281 Рік тому +1

    They have been updated and now it is not possible to sort files by speakers. Can you look at the new version and tell me what can be done? Is it possible to use the old version somehow?

  • @gamecreator7214
    @gamecreator7214 Місяць тому

    The whisper no longer has setup cpu and setup cuda. Do I just download later versions or are is there a newer tutorial?

  • @victoroam
    @victoroam 7 місяців тому +1

    11:01 i don't know why, but keeps getting me the same error (No module named 'pysrt') but 'pysrt' is already installed

  • @CalmEchos-r6l
    @CalmEchos-r6l 13 днів тому

    ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cu118 (from versions: none)
    ERROR: No matching distribution found for torch==2.0.0+cu118

  • @alphaxeu
    @alphaxeu Рік тому

    Ultimate Vocal Remover is struggling with some track like i hear the instrumental in the back with Kim Vocal 1 is there a model where the vocal are perfect like ?? great vid!

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      The vocal removers are really good, but they're not 100% unfortunately. That's very hard to achieve and I'm sure there are brilliant minds working towards this eventually. But doesn't exist ATM rn, you may be able to get better results with ensemble mode, but you'll have to research a bit on the best combos: github.com/Anjok07/ultimatevocalremovergui/issues/344

  • @chaunguyenthanh6664
    @chaunguyenthanh6664 6 місяців тому

    Hi Jarods, can I use large-v3 model instead of large-v2?

  • @Arc-Trinity
    @Arc-Trinity 10 місяців тому +1

    ERROR: Error [WinError 2] The system cannot find the file specified while executing command git version
    ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH? lol

  • @enoticlive9103
    @enoticlive9103 Рік тому +2

    Hi! I'm from another country and I don't really understand English, but this topic is very interesting! How can I teach a model to speak my language better?

  • @Grom76300
    @Grom76300 Рік тому

    I thought this included both the separation and training, but all those GB of programs are only for isolating voice, daym !

  • @oliviosih347
    @oliviosih347 Рік тому +1

    mine says Failed to create virtual environment. Error: [Errno 13] Permission denied

    • @itschepi
      @itschepi Рік тому

      did you solve that?

    • @kaililkendrick783
      @kaililkendrick783 8 місяців тому

      Same error for me. Unable to get pass this.

    • @IPartyWithUrMom
      @IPartyWithUrMom 7 місяців тому +1

      Fixed it. Uninstall python and reinstall no later than 3.9

  • @zafkieldarknesAnimation
    @zafkieldarknesAnimation Рік тому +1

    Hello please help me erorr
    (Requested float16 compute type, but the target device or backend do not support efficient float16 computation.)

    • @battletopia
      @battletopia Рік тому

      I am having similar issues, did you ever figure it out?

  • @denblindedjaligator5300
    @denblindedjaligator5300 8 місяців тому

    just have a question. How high is your batch size, when you train? Is it something that if you set it too high, you get an imprecise module? If I have a dataset of one hour, what should my batch size be?

  • @alexjet5890
    @alexjet5890 2 місяці тому

    How do you use the dataset created following this tutorial with AI voice cloning 3.0?
    You don't explain how to use them.
    Can you make a video?

  • @MFSCraft
    @MFSCraft Рік тому +1

    Is there some kind of vocaloid-like interface so that i have some control on how certain words would sound like? would be cool to have a TTS that could run the trained RVC voices.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      ATM, I don't know of any that use RVC voice, though I'm bound to see it happening someday

  • @michaelcasado
    @michaelcasado 6 місяців тому

    All points to using WIndows on any of these. Or am I missing something? I am on MacOS and all stuff is .bat and .exe , google collab sandbox running things. Is there no UI to thisdate, that also runs on MacOS? Have I missed to locate it perhaps?

  • @davidmaldonado9254
    @davidmaldonado9254 Рік тому +1

    Thank you for your amazing videos, it really helps me understand how everything works, just one question, I'm having some problems when running the "split_audio" script, it seems it isn't creating the .srt file of the audio and when it tries to call the file it runs into an error, do you know what it could be?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      Whisperx may not be being downloaded correctly. I would try rerunning the setup file again and trying to get this going. One other thing you can do is type and enter whisperx into the console after activating the venv to see if it got installed

    • @davidmaldonado9254
      @davidmaldonado9254 Рік тому +1

      @@Jarods_Journey Thanks! I'll try uninstalling everything and installing again because now the set-up is showing error when previously it didn't

    • @nadaup6023
      @nadaup6023 Рік тому +1

      ​@@davidmaldonado9254Managed to solve? I have the same problem

    • @Zielloss
      @Zielloss Рік тому +1

      Run VS code as admin.

    • @el-bicente
      @el-bicente Рік тому

      I think I had the same problem using the cuda installation. If your debugger tells you that it can't find the .srt file when running split_audio script then check your terminal logs. If you have an error like this:
      "ValueError: Requested float16 compute type, but the target device or backend do not support efficient float16 computation."
      Then it means that your GPU does not support FP16 execution.
      To fix it go line 26 in the split_audio script which must be: return 'cuda', "float16" and replace "float16" by "float32" or "int8".

  • @rytraccount4553
    @rytraccount4553 Рік тому +6

    The code does not generate an srt file for me from a single WAV, and I get a filenotfound errror: No such file or directory: 'D:vocalsplittest/data\\output\\song.srt

    • @rytraccount4553
      @rytraccount4553 Рік тому

      Apparently this is an issue with whisperx, as somne devices like mine donot support this float type, making this code unuseable :(

    • @lerian7669
      @lerian7669 Рік тому +1

      same problem

    • @smokey4049
      @smokey4049 Рік тому +1

      Yeah, have the same problem. Hopefully it will be fixed soon

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Can you try setting it up with setup-cpu? My laptop has a i7-8650u and works with this setup, this switches it over to int 8 instead of float 16

    • @colinosoft
      @colinosoft 9 місяців тому

      Maybe it's too late, but I solved it with "pip install -r requirements-cuda.txt" in my case I have an Nvidia graphics card, if you use cpu then replace it with "requirements-cpu.txt" for some reason there is a missing package that it is not installed when running "setup-cuda.py". Always run the command within the virtual environment created previously with "venv"

  • @chranman1855
    @chranman1855 Рік тому +2

    I'm getting FileNotFounderror in Visual Code Studios, where it cannot find srt_file. I followed your tutorial step by step, but I'm sure I did something wrong since I dont get the same results when I run the program. Since I have no python experience, I'm not sure what I did wrong here.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      Some people have reported that it'll work if you try running vscode in admin mode

    • @chranman1855
      @chranman1855 Рік тому

      @@Jarods_Journey Thank you for responding! I will try that.

    • @colinosoft
      @colinosoft 9 місяців тому

      Maybe it's too late, but I solved it with "pip install -r requirements-cuda.txt" in my case I have an Nvidia graphics card, if you use cpu then replace it with "requirements-cpu.txt" for some reason there is a missing package that it is not installed when running "setup-cuda.py". Always run the command within the virtual environment created previously with "venv"

  • @Skurios18
    @Skurios18 8 місяців тому

    Just a maybe random question I was having issues installnig the audio splitter and I thought it was because I haven't installed cuda toolkit of NVIDIA, so ended up installing it, but it was other thing that was giving me the error so my question is Should I uninstall this cuda toolkit I don't know what it does exactly or it won't harm my configuration or gpu in the future ?

  • @KrazyGen
    @KrazyGen Рік тому +1

    I'm trying to do my own voice and got some decent results, but it can't handle higher pitches. Should I add more samples with my voice in a higher pitch, or give it more samples with my normal voice and train it for longer? I have it trained using the Harvard Sentences from a previous video and I did 300 epochs.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      You can try adding samples of higher pitch, it's mainly going to be good at speaking in the pitch and timbre of the voice you train it with, so if your voice is naturally deeper it's not going to know how to handle that if you try to speak high all of a sudden

  • @GaypataponALT
    @GaypataponALT 10 місяців тому

    I have 954 audio file in my training folder, is it a bit too much for rvc to train?

  • @sonofforehead
    @sonofforehead Рік тому +1

    Hey, at 12:22 I get a similar error, but .\venv\Scripts\activate doesn't seem to fix it, are there any other solutions? It's giving me an error saying "FileNotFoundError", highlighting "subs = pysrt.open(srt_file)"
    Here's most of the error (there's more, just basically the same thing)
    "Exception has occurred: FileNotFoundError
    [Errno 2] No such file or directory: 'C:Users/myuser/OneDrive/Desktop/deleteme/audiosplitter_whisper/data\\output\\MyDataSet.srt'
    File "C:Users\myuser\OneDrive\Desktop\deleteme\audiosplitter_whisper\split_audio.py", line 101, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^"
    Also great video so far!

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Something happened when trying to make the srt file, make sure that whisperx downloaded and the setup wan without issue.
      You may also have to run vscode in admin mode

    • @colinosoft
      @colinosoft 9 місяців тому

      Maybe it's too late, but I solved it with "pip install -r requirements-cuda.txt" in my case I have an Nvidia graphics card, if you use cpu then replace it with "requirements-cpu.txt" for some reason there is a missing package that it is not installed when running "setup-cuda.py". Always run the command within the virtual environment created previously with "venv"

  • @21f.a.c.e.s
    @21f.a.c.e.s 8 місяців тому

    Unfortunately, I don't see any file for Cuda setup file in the cloned directory. Any help?

  • @LosantoBeats
    @LosantoBeats Рік тому

    Does it matter if my source audio is chopped up? For example incomplete words/sentences etc..

  • @kratoos0.0
    @kratoos0.0 11 місяців тому +2

    when i run script this is my error no module name 'ymal"

    • @maxikittikat
      @maxikittikat 10 місяців тому

      i had to manually go through the pain of finding out and basically you make sure your not the the virtual environment to make sure type "deactivate" then all you do for everything isn't installed or is saying the module name isn't found find out online the command to install it then add "--use-pep517" after each command so try "pip install PyYAML --use-pep517" for yaml

  • @youngtrapgod6375
    @youngtrapgod6375 10 місяців тому

    Can this be done for so vits? Becaus RVC loses the human element in my voice when I try making cover songs

  • @NorasHobbyverse
    @NorasHobbyverse Рік тому +1

    thanks for trying but this thing has failed for me multiple times and im tired of trying to troubleshoot this. is it that hard to just make an executable for people to use? i dont know jack shit about code and cant fix it when it doesnt do the same shit your computer does, even when following all the steps.

  • @handsomebanana4060
    @handsomebanana4060 Рік тому +1

    What if my voice doesn't speak any of the default languages? I have found a phoneme-based ASR model that suits me but how do I use it in your code? Anyway, great tutorital!

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      Ah... I haven't dabbled in that area yet and don't know how it works in other non supported languages. I would test it as a command line script first to see if you can get it working that way. I believe the --align_model argument would need to be used

  • @caleb8857
    @caleb8857 Рік тому

    When running it like on 13:00 it says that it `failed to align segment ("!!!!!!!!!!"): no character in this segment found in model disctionary, resorting to original...` multiple times and once it was finished the folder had no segmented audio and was just empty. How do I fix this

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      I think this is a language issue, if your audio files have multiple languages being used, this causes issue with whisperx, as well as if it's an unsupported language. Further than that, please reference the whisperx GitHub issues page for more details as I'm not sure what else causes this.

  • @paulovictor5123
    @paulovictor5123 6 місяців тому +1

    Hey Jarods, much appreciation for your tutorials. I'm facing some issue when running the split_audio.py. I'm using a Spanish database, followed all your steps and changed conf.yaml to language: "es". But, when I run the split_audio.py script, I face this issue: Exception has occurred: FileNotFoundError
    [Errno 2] No such file or directory: 'D:\\Documentos\\VoiceCloning - AudioSplitter\\audiosplitter_whisper\\data\\Vocals\\output\\100_Salmo 53_(Vocals).srt'
    File "D:\Documentos\VoiceCloning - AudioSplitter\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
    File "D:\Documentos\VoiceCloning - AudioSplitter\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
    extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
    File "D:\Documentos\VoiceCloning - AudioSplitter\audiosplitter_whisper\split_audio.py", line 180, in main
    process_audio_files(input_folder, settings)
    File "D:\Documentos\VoiceCloning - AudioSplitter\audiosplitter_whisper\split_audio.py", line 183, in
    main()
    Can you help me out?

    • @ann96662
      @ann96662 6 місяців тому

      did you fix the issue ??

  • @MadFakto
    @MadFakto 4 місяці тому

    Which Video Player do you use?

  • @edwincloudusa
    @edwincloudusa Рік тому

    Can you make a video on how to keep the emotions from the original souce voice? I have everything beautifully working for a clean and perfect voice clone but my source audio has some strong emotion acting, anger/fear/happiness etc, that are not represented in the cloned audio. Thanks.

  • @LosantoBeats
    @LosantoBeats Рік тому

    Can I use talking + singing audio to create my model or should it be split into two separate models. One for singing voice and one for talking voice. I am having trouble finding clean singing audio for my model and considering using talking audio from like interviews etc.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      You can use both. As long as it's the same voice, it should be fine

  • @fountainbird
    @fountainbird Рік тому

    Thanks for the vid although I'm confused. I understand the UVR step to isolate vocals. I would generally then use that as the dataset. What is the benefit of the next step of splitting the file up? is that all it does? What else is happening that I don't know about? I've generally just used longer clean audio files for training. Thanks for enlightening me :)

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      By splitting it, we solve the biggest issue of CUDA out of memory as I don't believe RVC splits larger audio files into more digestible chunks. Splitting it allows us to control this issue, and then additionally, get rid of any silence in the audio samples. Then theres also the fact you can easily remove any bad data from the audio file that you may not want in the training set.
      If your running it just fine with UVR without the out of memory issue though, you should be good to go there, but splitting it just gives you a bit more freedom with the data.

  • @moddest7123
    @moddest7123 Рік тому

    Hey Jarod. Slight issue when cloning the audiosplitter_whisper. I don't get the .git file at the top. Just the rest of the files. How do I fix that?

  • @MinerCold-w1s
    @MinerCold-w1s Рік тому

    hell @jarod, i got this error whiles its creating output and vocal audio sets
    CUDA is available. Running on GPU.
    The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
    The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
    Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.0.6. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file C:\Users\kit\.cache\torch\whisperx-vad-segmentation.bin`
    Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
    Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118. Bad things might happen unless you revert torch to 1.x.
    >>Performing transcription...
    Traceback (most recent call last):
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\Scripts\whisperx-script.py", line 33, in
    sys.exit(load_entry_point('whisperx==3.1.1', 'console_scripts', 'whisperx')())
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\transcribe.py", line 159, in cli
    result = model.transcribe(audio, batch_size=batch_size)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 288, in transcribe
    for idx, out in enumerate(self.__call__(data(audio, vad_segments), batch_size=batch_size, num_workers=num_workers)):
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 124, in __next__
    item = next(self.iterator)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\transformers\pipelines\base.py", line 1028, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 228, in _forward
    outputs = self.model.generate_segment_batched(model_inputs['inputs'], self.tokenizer, self.options)
    File "C:\Users\kit\Desktop
    vc\audiosplitter_whisper\venv\lib\site-packages\whisperx\asr.py", line 138, in generate_segment_batched
    result = self.model.generate(
    RuntimeError: CUDA failed with error out of memory

  • @Imatastychickennugget
    @Imatastychickennugget Рік тому

    Hey, is it bad if there are low sounds of people slamming doors or making pop-like noise in the back?(They get loud on purpose everytime I sing)
    I can't get rid of those as well as plosives from breathing. But you can still here my voice :/

    • @PeteJohnson1471
      @PeteJohnson1471 Рік тому +2

      make space cakes and give it to them, start recording an hour later. You should be good for a few hours whilst they are all monging on the Sofa ;-)
      I feel for your situation that people around you can't be reasonable with you for ten or so minutes.
      Maybe show them some video's of what you are looking to do, and offer to make them a voice , on the proviso that they just shut up or 10 minutes whilst you do your?
      Good luck

  • @Random_person_07
    @Random_person_07 Рік тому

    Just a question does remove the background voice of another speaker if there is another speaker speaking behind the the target speaker

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Unfortunately it does not, overlapping speech and disentanglement is still a research in progress field

    • @Random_person_07
      @Random_person_07 Рік тому

      @@Jarods_Journey One last question what does Speaker diarize do? like cut out each speaker? Nvm you explained it in the video

  • @denblindedjaligator5300
    @denblindedjaligator5300 9 місяців тому

    Hi, I have a question about rvc. I am trying to train a module where I have chosen no pitch. it sounds autotuner like how can i fix it` how does learning rate work` what is batch size`

    • @Jarods_Journey
      @Jarods_Journey  9 місяців тому

      Not too sure about this unfortunately

  • @ShelfxYT
    @ShelfxYT Рік тому

    Do you have any voice modifications like the ones in the video played in real time? to use the same discord for example voicemod/clownfish ?

  • @supersonicunitedsupersonic8531
    @supersonicunitedsupersonic8531 11 місяців тому

    I have source track with background noises and of course I can solve that using UVR5 or another voice isolation VSTs, but there are also segments with much voice reverb and when I decrease that reverb it cuts low-mod frequencies from voice, what i shoul do at such situation? maybe i need to find reference with good eq and try to improve target data using eq match?

    • @Jarods_Journey
      @Jarods_Journey  11 місяців тому

      In this case, you're in a tough spot because if you can't clean the data, it may have some murkiness in the final output. As much as you can, you would wanna get you're audio as clean as possible before training.

  • @grasshoffers
    @grasshoffers 9 місяців тому

    I do not think I have cuda...just cpu but got the error ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
    ERROR: No matching distribution found for torch

  • @aboodghanem1679
    @aboodghanem1679 11 місяців тому

    Hello dear, I would like your help regarding sound reproduction via Google Colab. Is the data uploaded in Wave Mono or Stereo format and is it 16 bit or 24 bit?

  • @TheChipMcDonald
    @TheChipMcDonald Рік тому +1

    1) what/how/can I change this to have multiple data directories (if I want to tweak/add on a later retry, and as a way of keeping things organized). I presume I can make a subdirectory like the "vocal" ones for each unique dataset?
    2) can I bypass the audio split step if I've exported my dataset in

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      1. Each file you put in the data folder will be exported to its own segmented folder in the output folder. Once finished here, I recommend moving the finished files to some somewhere else on your PC.
      2. Yes, no need
      3. The exported files (segmented pieces) are coded in by me and organized to export to the folder you chose at the start. Means unlimited freedom if you wanted to modify the code
      4. It sorta is a batch process, what additional feature are you looking for? From the question, I'm assuming you just want to choose an input and an output folder right? Since it makes a folder per file name, I can see this being a bit cumbersome to have to manually move them into one directory, but this is for sorting reasons.
      A 3060 is good as it can utilize CUDA. Imo, 3060 will gives more flexibility due to its 12gb VRAM so this would be the cheaper option to go with compared to like a 3070 or 3060ti

    • @TheChipMcDonald
      @TheChipMcDonald Рік тому

      @@Jarods_Journey 1) ok 2) ok 3) ah; following along without actually doing it makes it easy to discount where you started at, ahrgh, sorry 4) by batch, effectively automating starting Visual Studio, getting to the point where training ui begins... or in essence, an actual app ala UVC that does the environment setup, python behind the scenes. I want to copy my dataset over, then jump to a ui to start training.... and ideally the same ui to manage models, inference. Installing python, visual studio etc. are one time things I don't mind - I'm thankful you've done these tutorials, but the steps, steps, steps, steps, steps just to get to starting training seems automatable?
      My interest is in music, singing replacement; and what happens by tweaking the dataset, getting to what I hear in my head. Which I want bad enough to jump through hoops (and buy a new pc I previously didn't need, lol) but.... gahhh... it's like being a kid again, configuring AUTOEXEC.BAT and CONFIG.SYS for hours, only to be burned out by the time you get Wolfenstein to run in SVGA with a hand-me-down SoundBlaster 16 card....

    • @TheChipMcDonald
      @TheChipMcDonald Рік тому

      ​@@Jarods_JourneyThanks

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      @@TheChipMcDonald Gotcha! The RVC web-UI is actually pretty close, it's literally just missing the data curation side of things as it comes in a downloadable release too.
      A few more quality of life things later like file browsers instead of paths, etc. and I think we're looking at a very robust and easy to follow workflow. I'll definitely keep the channel updated WHEN someone comes out with something that has all of the puzzle pieces put together. 🙏

  • @gokulkrish3839
    @gokulkrish3839 Рік тому

    Do we need highlevel gpu spec to do above things that you showed in the video

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      Anything that is a Nvidia 3060 12 GB or above should be fine, even 20 series cards work still too. Anything that is not Nvidia often has issues so I don't recommend those.

  • @jeremybauchet6845
    @jeremybauchet6845 Рік тому

    Hello ! I've followed closely the tutorial three times, but I keep getting that one error at line 101 : "Exception has occurred: File Not Found Error" It seems to be looking for an srt file ? Also the terminal says "Requested float16 compute type, but the target device or backend do not support efficient float16 computation."

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      That means no srt file was generated by whisperx. Try redownloading with setup-cpu.py as you're gpu probably doesn't support float16. That, or in the code, you can change it to int8 where there is float16. I'll need to work on a fix for this.

    • @jeremybauchet6845
      @jeremybauchet6845 Рік тому

      @@Jarods_Journey Thank you ! I'll try so.

  • @user-no3dt5iu3u
    @user-no3dt5iu3u 4 місяці тому

    Hi Jarod, I still get an error even after I do .\venv\Scripts\activate
    Exception has occurred: FileNotFoundError
    [Errno 2] No such file or directory: 'C:\\Users\\Mah\\Desktop\\AudioSplitter_Whisper\\audiosplitter_whisper\\data\\output\\Vocals.srt'
    File "C:\Users\Mah\Desktop\AudioSplitter_Whisper\audiosplitter_whisper\split_audio.py", line 96, in extract_audio_with_srt
    subs = pysrt.open(srt_file)
    File "C:\Users\Mah\Desktop\AudioSplitter_Whisper\audiosplitter_whisper\split_audio.py", line 150, in process_audio_files
    extract_audio_with_srt(audio_file_path, srt_file, speaker_segments_dir)
    File "C:\Users\Mah\Desktop\AudioSplitter_Whisper\audiosplitter_whisper\split_audio.py", line 180, in main
    process_audio_files(input_folder, settings)
    File "C:\Users\Mah\Desktop\AudioSplitter_Whisper\audiosplitter_whisper\split_audio.py", line 183, in
    main()
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Mah\\Desktop\\AudioSplitter_Whisper\\audiosplitter_whisper\\data\\output\\Vocals.srt'
    Does it matter that I'm running OS encryption with VeraCrypt?

  • @Helleshellblade
    @Helleshellblade 8 місяців тому

    I get an error while installing
    ERROR: Could not find a version that satisfies the requirement torch==2.0.0+cu118 (from versions: 2.2.0, 2.2.0+cpu, 2.2.0+cu118, 2.2.0+cu121, 2.2.1, 2.2.1+cpu, 2.2.1+cu118, 2.2.1+cu121)
    ERROR: No matching distribution found for torch==2.0.0+cu118

    • @mrkatlet
      @mrkatlet 7 місяців тому

      use python 3.10.11, thats what Jarods did
      it definitely worked for me after i used that specific version

  • @3ool0ne
    @3ool0ne 2 місяці тому

    hey can you do an update on this video?
    any new tools and methodologies that replaces what is outlined in this video.

  • @olaitanluvsojewale
    @olaitanluvsojewale Рік тому

    Hello, I got a few questions..
    So I have access to 6ch audio with the voice I want to clone, and I'm extracting it all manually using Adobe Audition.
    1. Using UVR helps remove any lingering bg noise but sometimes a little noise will remain. It is not that noticeable, so is it okay to have a little noise or will that affect the model?
    2. I know to remove long silences, but what about the small gaps between when the character is actually speaking, should I remove that too so it is just a continuous stream of talking with not even 0.5 second breaks? And what about the sounds when a character isn't actually speaking, e.g. growls or hums, or breathy sounds like laughing, that naturally have some silence in there.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      My observation is the little bit of noise is ok, it shouldn't be that noticeable. One case though I have of a model is that it does show in the output that I can hear the bg that was not removed. Hard to get it perfect though.
      2. The little gaps are fine, as for growls and what not I'd say to cut those out, but I haven't actually tried so I can't say for certain.

    • @olaitanluvsojewale
      @olaitanluvsojewale Рік тому

      @@Jarods_Journey Thank You!

  • @olaitanluvsojewale
    @olaitanluvsojewale Рік тому

    One more question.. for now.. if that’s okay?
    Say I wanted be excessive to get the cleanest, most accurate, almost perfect result possible on the first train. And I had 1 and 1/2 hours or even 2 hours max audio data. And My PC could probably handle it (For context i a have NVIDIA GeForce rtx 3060 graphics card and 32GB ram) What is the Max amount of epochs do you recommend I could train for?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Dunno, the big answer is "it depends". Just try training for 10 epochs and hear how it sounds. Tain around other epochs and try those as well. You're looking for the lowest epoch #

    • @olaitanluvsojewale
      @olaitanluvsojewale Рік тому

      @@Jarods_Journey Oh okay then 🤔 Thank you a lot! I really appreciate you taking the time to answer

  • @basspig
    @basspig Рік тому

    How important is it to remove silence between the speaker's words or does it matter at all?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      It may help reduce some artifacting, but oftentimes you can leave some silence in there and it'll be fine.

    • @basspig
      @basspig Рік тому

      @@Jarods_Journey is the artifacting sounding like octave shift/cracking/falsetto effects. That's been a problem with some of the voice models I've made, and some that I've downloaded and tried using.

  • @EduardMicu
    @EduardMicu 9 днів тому

    Is this only for Nvidia?

  • @foxey461
    @foxey461 Рік тому

    🔥🔥

  • @nazersonic6938
    @nazersonic6938 Рік тому

    Thanks for the helpful video, I have a gtx 1660 ti 6gb vram cuda say i am out of memeory is there a low vram option like in stable diffusion or i am stuck with using cpu?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      There are some low VRAM options built into whisperx that have to be passed, you would have to modify the script to do that. I'll get around to adding it when I get the chance

  • @oxanaivanova8007
    @oxanaivanova8007 7 місяців тому

    ModuleNotFoundError: No modulenamed 'yaml' how do I fix it??

  • @RolanRoyce
    @RolanRoyce 11 місяців тому +1

    That was only so complicated it was ridiculous. Why don't you actually write a program that just does all that by clicking "split"? What about slicer-gui-windows-v1.2.1? Will that do the same thing?

  • @RoxWinted
    @RoxWinted Рік тому

    hello, i'm askin anyone right now because i got a bit lost. i'm trying to make the ai voice not glitch out whenever i'm doing long vowels so it doesn't look for all of them at once making it sound like a mess, and i so far thought you have to train them to sound better, but i think that's not the case. can someone explain what i have to do to achive this?

  • @xerotivi
    @xerotivi Рік тому

    Just found out your channel and wanted to ask you if you know any ways to follow these steps on Mac. As a student the only computer I have is my MacBook Air M1. I watched your video where you show how to use RVC on Colab and I want to learn how I can create my own dataset and remove vocals from songs.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому +1

      You can run this on CPU using setup-cpu, though I haven't tried myself since I don't have MAC. You could technically do all this in Collab as well, but you'll have to set that up yourself

    • @xerotivi
      @xerotivi Рік тому

      @@Jarods_Journey I will spend some time on it and if I found a way, I will post here for others.

  • @Malkovitz_
    @Malkovitz_ Рік тому

    Thanks for tutorial, could you please explain how to replace the whisper model for the one that was trained on my native language?

    • @Malkovitz_
      @Malkovitz_ Рік тому

      BTW, I already found the model, but it's still a mystery on how to use it with your script

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Sorry mate, I haven't looked into this area and don't know quite exactly how to do it either. You have to tell whisperx the location of the alignment model your using, but that's as far as I know.

  • @ItsMeCharkey
    @ItsMeCharkey Рік тому

    hi when I tried this I got this message
    Failed to align segment ("!!!!
    Ш!!!!!!!!!!!!!!!!"): no characters in this segment found in model dictionary, resorting to original !!!!!!!!!!!!!!!!!!"): no characters in this segment found in model dictionary, resorting to original...

    • @ItsMeCharkey
      @ItsMeCharkey Рік тому

      I also got this message too
      Model was trained with pyannote.audio 0.0.1, yours is 2.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
      Model was trained with torch 1.10.0+u102, yours is 2.0.0+cul18. Bad things might happen unless you revert torch to 1.%.

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      Both error messages are fine, and you should still be getting an output file at the end. Though, if it's not, believe that's a whisperx limitation where it can't align some words

    • @ItsMeCharkey
      @ItsMeCharkey Рік тому

      @@Jarods_Journey I have the wav files but I don’t have the vocals with the small audio files
      Is there a way to fix that

  • @Overneed-Belkan-Witch
    @Overneed-Belkan-Witch Рік тому

    Hi Jarods, Im currently on my project of doing Audiobook using cloned voice where I will be the voice
    How good the training will be If I have an i5 and GTX1060 6gb. Is this enough?

    • @Jarods_Journey
      @Jarods_Journey  Рік тому

      That GPU might be rough... You might wanna train on Google colab. The training quality should be the same, just training time will be different

    • @Overneed-Belkan-Witch
      @Overneed-Belkan-Witch Рік тому

      @@Jarods_Journey Thanks for the tips

  • @КП-31мнЛавінськийГліб

    In my record, script start using phrases from record instead of SPEAKER_00 and SPEAKER_01, what can cause that problem?