Training NVIDIA StyleGAN2 ADA under Colab Free and Colab Pro Tricks

Поділитися
Вставка
  • Опубліковано 4 лис 2024

КОМЕНТАРІ • 153

  • @QOVESStudio
    @QOVESStudio 3 роки тому +30

    I'd love to see a 4k GAN

  • @r00t_sh3ll
    @r00t_sh3ll 3 роки тому +3

    Thank you for this Jeff, amazing as always

  • @lukashauser3896
    @lukashauser3896 3 роки тому +14

    Great video, thank you! In case someone sees "UserWarning: semaphore_tracker": This issue appeared since the PyTorch 1.9 release on 15th June and can be fixed by downgrading to the previous version:
    !pip install torch==1.8.1 torchvision==0.9.1

    • @eldendoge2302
      @eldendoge2302 3 роки тому +2

      Downgrading the PyTorch version worked for me as well! I would also like to add that I used a completely new colab notebook. Following from a brand new notebook, I first !pip installed torch==1.8.1 torchvision==0.9.1, then git cloned the repo, and finally !pip installed ninja. From there I was able to successfully run the train.py file.

    • @loveyoutenderly9768
      @loveyoutenderly9768 3 роки тому

      Sir, I love you so much.

    • @HeatonResearch
      @HeatonResearch  2 роки тому +1

      Thank you so much for tracking this down! I've incorporated the above line into the repo.

  • @FailTrainS
    @FailTrainS 3 роки тому +2

    Thanks so much! This is going to be exceptionally helpful in implementing save states for the first time in my cyclegan!

  • @kumarprateek1279
    @kumarprateek1279 3 роки тому +2

    Thanks for these Gans videos

  • @datalabwork
    @datalabwork 3 роки тому +2

    Thank you for updating us...

  • @kiachi470
    @kiachi470 3 роки тому +1

    This is amazing to hear and see to be honest

  • @masterpig5s
    @masterpig5s 3 роки тому

    This feels like a really useful resource.

  • @PicaPauDiablo1
    @PicaPauDiablo1 3 роки тому +1

    Much appreciate it Dr Heaton

  • @mahdidarvish6023
    @mahdidarvish6023 3 роки тому

    Very nice video; thanks for the detailed explanation.

  • @fernandocanepari3795
    @fernandocanepari3795 3 роки тому +1

    Great vídeo! Thanks a lot!

  • @andrewh5640
    @andrewh5640 3 роки тому +1

    another great video. thanks.

  • @veazix
    @veazix 3 роки тому +4

    I want to thank you for getting me into actually attempting this. It's been quite the learning experience. But it seems every time I have a hurrah moment it's immediately followed by a slow disappointing sigh. I've tried using anaconda environments on Windows 10 to get this going but that ended with: "RuntimeError: MALFORMED INPUT: lanes dont match".
    Colab gives me: "UserWarning: semaphore_tracker: There appear to be 34 leaked semaphores to clean up at shutdown". As far as my very limited understanding goes I'm assuming this means that the workers aren't working right :P Might be something to do with only having one CPU core per runtime?

    • @zyxwvutsrqponmlkh
      @zyxwvutsrqponmlkh 3 роки тому

      I have experienced similar issues. I have a decent GPU, code runs in colab cant get it to run on my pc. Even tried installing linux.

  • @saicharanmarrivada5077
    @saicharanmarrivada5077 3 роки тому

    Great video

  • @YCJ1207
    @YCJ1207 3 роки тому +3

    I got these errors on the training step on both colab and PC, anybody know how to fix it?
    UserWarning: conv2d_gradfix not supported on PyTorch 1.10.0+cu111. Falling back to torch.nn.functional.conv2d().
    warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')
    RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented

    • @crankghod9575
      @crankghod9575 2 роки тому

      im getting this same error. Does anyone know how to fix?

  • @MrDavidrees87
    @MrDavidrees87 2 роки тому

    Hi Jeff, this video was great thank you for creating it! Have you tried Projected GANS? It's much, much faster. I'd be interested to hear what you think. Would love it if you made a video showing how you can use that but still save the data to your Gdrive like you have here. Thanks for taking the time to share your knowledge

  • @KlimovArtem1
    @KlimovArtem1 3 роки тому +2

    Now, this is very useful, thank you! Do you share your own GANs that you’ve trained on fish, Minecraft and Xmas trees?

    • @HeatonResearch
      @HeatonResearch  3 роки тому +1

      Yes I do! They are all here, the ones starting with "pretrained" github.com/jeffheaton?tab=repositories

    • @KlimovArtem1
      @KlimovArtem1 3 роки тому

      @@HeatonResearch Now, I wonder if it would be possible to use your trained models with this project: colab.research.google.com/github/orpatashnik/StyleCLIP/blob/main/notebooks/StyleCLIP_global.ipynb (not sure what to replace there besides 'ffhq' though, maybe you can help?). Would be cool to see if it can generate, for example, a decent looking Xmas tree pictures from your model where you usually get only a messy looking pictures. Or if it can generate a fish based on a description of its color and shape.

  • @artificial-ryan
    @artificial-ryan 2 роки тому +1

    This was a very informative video and I thank you for sharing your knowledge of this. I had this running on the colab notebook very well last month, but once I renewed my subscription and tried again today I got an error about how many workers colab is using, as follows :
    Loading training set...
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:477: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    Would you happen to know a fix for this, because the training does seem to freeze and not go anywhere on initial startup. I've waited ~30 minutes.
    Edit: nvm it just took very long lol, thanks again!

  • @davis2018
    @davis2018 3 роки тому +2

    I've been training for a few days. What should I do if the FID value increases little by little without decreasing?

  • @meminesis
    @meminesis 3 роки тому +1

    Thanks so much for this tutorial. There are really a few out there. But I still have some doubts hope you can reply:
    - How many images did you feed it with? Is there a optimal number? Like let's say, 200, 500, 1000?
    - Where can I VISUALLY see how training is progressing? Do the fakes_init or fakes000000 update at some point? Do I have a visual collage only if I make a Resume point?
    - Most importantly, HOW can I make a resume point? How do you come up with the Network file, is it from last training? Where can I find it? What about the Resume folder? Is the SNAP number meaning that after tick 10 it will make a resume point? This will create the fakes000100.png file?
    - WHEN is training to be considered terminated? Is it based on how many pics you feed the dataset, more pics= longer training or simple/similar pics=quick training?
    - After training is finished, this means I will get a .pkl trained file and I can generate GANs from here locally on my computer?
    - In the "Perform initial training" there is a second script to run "...-- snap 25 -- resume..." what is that? You don't mention it in the video.
    It would be great if you could mange to make a tutorial from start to end including resuming and pauses and issues on the network (cutting out waiting times obviously) in order to make the process more clear for newbies like me. Thanks so much. You're being incredibly helpful.

    • @HeatonResearch
      @HeatonResearch  3 роки тому +2

      I like at least a couple thousand images. Many of the NVIDIA labs projects use 50K+. The individual files do not update, but there will be further fakesxxxxxx files generated, that is how I track it. You can resume, critical, I do that in the video, just restart from your last checkpoint. Just make sure you are checkpointing often enough that one happens BEFORE colab shuts you down. Training is "done" when you reach the specified number of kimg, however, you can stop as soon as the images look good, that is what I generally do. I will think about a start to end video! Good idea.

    • @meminesis
      @meminesis 3 роки тому

      @@HeatonResearch thanks Jeff, so very helpful! Eternally grateful for this.

  • @klarinooo
    @klarinooo 3 роки тому

    thanks Jeff, great video. Quick question, training stops after the first tick (duration of training 4 mins on google colab T4). Any ideas on what might be the problem ? no metrics are displayed and no errors either...

  • @dimbachallenge
    @dimbachallenge 3 роки тому

    Hi Jeff, love your videos and love the way you teach. A question, the IMAGE_PATH variables is set to "fish" directory - is that correct? Shouldn't it point to "circuit" directory, since that is the one we are working on Or am I missing anything?

    • @HeatonResearch
      @HeatonResearch  3 роки тому

      It would point to whatever directory has your source images. Fish and circuits are both image sets that I've used in the past.

  • @idoshevet7644
    @idoshevet7644 3 роки тому

    Amazing! thanks so much! First time exploring training GANs and you made it so clear and easy to understand :) one question though, I trained it on a data set of black glyphs on white background (images are 256X256 RGB), I was expecting the generated images to look somewhat like the glyphs, but they were RGB shapeless blobs, is there anything I can do to generate images more similar to the training dataset? thanks!

  • @jonjon5342
    @jonjon5342 3 роки тому +1

    Thanks for this video Jeff! Can i ask you a question? When you run this training - I'm not really clear on what pkl file is being used initially as a base? Or are you training this model from absolute scratch? If you're training from absolute scratch is there a way to start with an existing pkl i'd like to use and where would I input this? (say ffhq model or wikiart if I were to do something more organic). Thanks so much!

    • @HeatonResearch
      @HeatonResearch  3 роки тому +1

      If you do not specify a pkl file with --resume, a neural network with a random weight initialization is used. So yes, in this case I was going from absolute scratch.

    • @jonjon5342
      @jonjon5342 3 роки тому +1

      @@HeatonResearch Thanks so much - makes sense! Can i ask you one more question? Once I started the training it says it's Training for 25000 kimg . As I understand it it's 25000 images. What does that mean exactly? As you mentioned your dataset was around 3000 images or so and mine I'm doing is around 2000. What does 25000 mean - where's that coming from or will the model turn my dataset number of images into 25000 internally?

    • @HeatonResearch
      @HeatonResearch  3 роки тому +3

      So, it randomly samples from your images. Plus, due to ADA, it is also creating new images from your training set. Your training data could be only a couple thousand images, yet with ADA there may be millions of images in total. The kimg is how many real images the discriminator has seen (including augmented images). Also its K, so 100Kimg is 100 * 1000 images. Normally you think of Epochs (number of complete cycles over the training set). But with augmentation epoch is either meaningless or arbitrary.

    • @jonjon5342
      @jonjon5342 3 роки тому

      @@HeatonResearch Jeff - Thank you! When generating images after the training is done - What is the max value I can input for the seed variable? I assume the minimum is 0? or can it be a negative number as well?

  • @Janosch2702
    @Janosch2702 3 роки тому +1

    I'm new to the whole styleGAN topic and was super happy to find such a well prepared notebook and tutorial. Unfortunately after I perform the Initial Training and the first Tick starts to run it crashes with the following warnings. Any idea how to fix this?
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    cpuset_checked))
    /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
    return forward_call(*input, **kwargs)
    /usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 34 leaked semaphores to clean up at shutdown
    len(cache))

    • @Gl1tsyg3m
      @Gl1tsyg3m 3 роки тому

      im also getting this messge, no idea how to fix :/

  • @alenespoli97ify
    @alenespoli97ify 3 роки тому

    Incredibly useful video, thanks for real. I just have a question, when using the 'resume' snippet part we just call the 'train.py' again. This got me thinking if the resume works exactly from where the last network snapshot pickle file stopped, or it kind ofmakes a new training instance on it? Sort of like building another layer, instead of continuing the construction?

  • @johanhagner2341
    @johanhagner2341 3 роки тому +1

    @
    Jeff Heaton Would you mind showing how to run a jupyter notebook like this on AWS/Google Compute Engine if you want to move away from Colab?

    • @HeatonResearch
      @HeatonResearch  3 роки тому +2

      Working on doing a spot instance tutorial. I've been using spot for awhile now, it is surprisingly stable and 1/3 the cost.

  • @WonderRob
    @WonderRob 3 роки тому +1

    Can you do the same tutorial for StyleGAN3?

  • @sepeslurdes1918
    @sepeslurdes1918 3 роки тому

    Here you have my like good man!.. How many images did you used for your chips&boards dataset?

    • @HeatonResearch
      @HeatonResearch  3 роки тому

      Thanks for the like! It was small, only around 2.5k.

  • @samstewart6087
    @samstewart6087 3 роки тому +1

    Hi Jeff, love this video but I'm having an issue when I try and pair notebook with my dataset, I'm getting a message that reads 'No such file or directory' when I try and run the 1s command, and the link itself doesnt work. Any idea why this is ?

  • @galaxyburn40k
    @galaxyburn40k 3 роки тому

    Hi there Jeff. So I've got this running, but it only seems to run for about five minutes and do a single tick. The log looks like this:
    tick 0 kimg 0.0 time 50s sec/tick 13.6 sec/kimg 3410.17 maintenance 36.6 cpumem 5.38 gpumem 11.32 augment 0.000
    Evaluating metrics...
    /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
    return forward_call(*input, **kwargs)
    Admittedly I'm only trying this on 25 pictures - could that be the problem?

  • @nina_________________
    @nina_________________ 3 роки тому +1

    Hi! Thanks a lot for your videos, so helpful. I've only one problem that's pulling me crazy. I've imported my personal folders from drive, when i modify the path for resuming the training another folder is created in "experiment" with a linear-growing number but the .pkl file remain with the same name everytime, so i believe it leads to a sort of loop in the training process... What would be the most probable error? Hope to have explained clearly the problem, thank you a lot again

  • @deewakarchakraborty4027
    @deewakarchakraborty4027 3 роки тому

    We want 4k Now😍

  • @anthonycursan5558
    @anthonycursan5558 3 роки тому

    Hey Jeff ! Thank you so much for this video. I'm trying to train NVIDIA StyleGAN3 on my own dataset but i would like to add pictures as input. Can it be done with StyleGAN ? I'm trying to generate a picture of a flowers using two other flowers (what will be the result of this fusion). Of course I have a dataset with the two flowers parents & the child. Do you think is there any way to do this with this tool ?

  • @charlesbaoumar
    @charlesbaoumar 2 роки тому

    Hey is it possible to Add a description of your specific image requirements? First I got an error that they're all should be squared, so I cropped all of them, now I get Error: Image width/height after scale and crop are required to be power-of-two
    0% 0/250 [00:00

  • @kitschita
    @kitschita Рік тому +1

    Hey I'm getting "derivative for aten::grid_sampler_2d_backward is not implemented" error, do you know how to fix it?

    • @psychologienerd7546
      @psychologienerd7546 Рік тому

      same problem, bc colab does not allow to use torch version 0.8 anymore

  • @rp.1408
    @rp.1408 2 роки тому

    Hi, why do I get this error: AttributeError: 'Logger' object has no attribute 'isatty', when I perform initial training. And I don't see any pkl files on my drive

  • @maxernststockburger9420
    @maxernststockburger9420 3 роки тому

    Hey Jeff, would love to try this but I can't find the link to your notebook. Am I missing something?

  • @preciousakpan674
    @preciousakpan674 3 роки тому

    Thank you so much for this.
    Please how do you know when to stop the training process

    • @petern1575
      @petern1575 3 роки тому

      Were you able to figure this out?

    • @preciousakpan674
      @preciousakpan674 3 роки тому

      @@petern1575 no, i just stopped it after some days

  • @catherinele8417
    @catherinele8417 3 роки тому +1

    Hi ! This is an interesting video ! I try to run your notebook but I have this Error while converting images : "Image width/height after scale and crop are required to be power-of-two". So I look to the .py file and the error comes up because : width != 2 ** int(np.floor(np.log2(width))) . This equation is so weird .... Can you give us the image shape of your dataset so I can resize my images. Thanks a lot !

  • @syeedafatima8557
    @syeedafatima8557 3 роки тому

    I have kept it training for over a month now, sometimes 1 get p-100 but mostly I get v-100 but how much more will I have to keep training it. Can @Jeff Heaton please say me minimum of how many days and maximum of how many days?

  • @ayeshaimran
    @ayeshaimran Рік тому

    can the same be applied to stylegan3? because the version of pytorch in stylegan2 is no longer supported by Colab…

  • @cuentadefernandoleyra6088
    @cuentadefernandoleyra6088 2 роки тому

    Hello, when I launch the initial training it gives me a problem when it tries to construct a network: Constructing networks...
    Setting up PyTorch plugin "bias_act_plugin"... Done.
    /content/stylegan2-ada-pytorch/torch_utils/ops/conv2d_gradfix.py:55: UserWarning: conv2d_gradfix not supported on PyTorch 1.10.0+cu111. Falling back to torch.nn.functional.conv2d().
    warnings.warn(f'conv2d_gradfix not supported on PyTorch {torch.__version__}. Falling back to torch.nn.functional.conv2d().')
    Why is this uncompatible? I am using the same stuff as you

  • @meminesis
    @meminesis 3 роки тому

    Please Jeff, can you tell me how to lower the learning rate from default 0.002 to 0.001 in order to get a more stable lossless training?

  • @chun-julee6942
    @chun-julee6942 3 роки тому

    Hi Mr. Jeff, I want to ask that I follow your video step by step. However, the command line stops at the initial fake images, and it cannot update so far (i didn't get fakes_00100 or something like that), so what should I do? Thank you!

  • @deadprivacy
    @deadprivacy 3 роки тому

    im stuck, i hit copy for the auth code....it copies from the google sign in page.
    and then it wont paste into the text field....just will not paste...
    how?
    ive looked online but no solution, is it a keyboard shortcut. its a keyboard shortcut, panic over....
    it will only paste in properly if you put it in notepad first if copied from a docx....

  • @syeedakudhsiafatima1430
    @syeedakudhsiafatima1430 3 роки тому

    How many days training is needed? If tesla v100 gpu is used..?

  • @Champignon1000
    @Champignon1000 3 роки тому

    I found that using a third neural network with the objective to keep the training balanced between the generator and the discriminator by augmenting both generated and real images, works very well. setups that would otherwise get stable very fast is very stable now. the "augmenter" the neural network that keeps the game stable only has 1 layer with a kernel size of 3 or 5. is it's objective is to keep the loss of the generator at around 0.65.

  • @danav09
    @danav09 2 роки тому

    Can you post a DL link to the circuit images for us that can't pull from the FlickrAPI?

  • @catherinele8417
    @catherinele8417 3 роки тому

    Hi ! Finally it s works wuth 256x256 ! Thanks to this video I can do my work ! Does anyone know how to save and load the model 's weights after training ? so I can re use the model after. Thanks !

  • @Laszer271
    @Laszer271 3 роки тому

    Is there some code for running StyleGAN-ADA with other resolutions than powers of twos? Doesn't matter if it's on Tensorflow or PyTorch. I saw there are such options for StyleGAN2 but haven't found anything working for StyleGAN-ADA. I have a dataset with 32x48 and it feels like a huge waste to pad those images to 64x64. Also, have you used transfer learning for StyleGAN-ADA, and if you did do you have some tips on it? Like which model is best to transfer learn from? In the paper they said that similarity between datasets doesn't matter that much with transfer learning but the diversity of the datasets does.

    • @HeatonResearch
      @HeatonResearch  3 роки тому

      That is a much deeper drive, but something I am investigating. I would love to do GAN at a normal video aspect ratio, such as HD or 4K. It would require rearchitecting the core StyleGAN neural network, and likely at least one of their CUDA functions.

  • @joanacastro751
    @joanacastro751 2 роки тому

    I've done the initial training but I can't find the pkl files to put in next (they aren't at the experiments folder). Am I doing something wrong?

  • @New-j8i
    @New-j8i Рік тому

    Sir after training i got blurred images even my image dataset is in 512 resolution please suggest why generated images are blurred

  • @keenhon7373
    @keenhon7373 2 роки тому

    Hi, for the image size, Can I have width and height not the same. Example 256 * 512.

  • @ntechnology7715
    @ntechnology7715 2 роки тому

    So I followed this (I'm using the free version of Colab) and my session crashed because it ran out of GPU space. Any tips on how to prevent this?

  • @syeedakudhsiafatima1430
    @syeedakudhsiafatima1430 3 роки тому

    I have run the notebook for days but no generation of results and moreover now it's showing some warning and i am not able to continue training... Please answer anyone

  • @gabrielamunguia4428
    @gabrielamunguia4428 3 роки тому

    Thanks a lot for the video, it has clarify me a lot of doubts more beeing very knew in this gan world. I have been trying to perform my initial train with your notebook, the first steps are running perfect, also I have another notebook to clean, verify, crop and slice my images, but I´m having problems with the initial train. It starts the first tick (tick 0 kimg 0.0 time 1m 32s sec/tick 17.2 sec/kimg 4308.67 maintenance 74.5 cpumem 4.74 gpumem 11.32 augment 0.000), but then when it starts to evaluate the metrics I have this message that I not really sure how to fix it or where I can change the definition of the workers:
    Evaluating metrics...
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    cpuset_checked))
    /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
    return forward_call(*input, **kwargs)
    /usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 34 leaked semaphores to clean up at shutdown
    len(cache))
    I know is very hard to review each doubt that users have, but maybe if you can give a line where I can start or what I can do in this cases, I´ll thank you alot. :)

    • @davescott6065
      @davescott6065 3 роки тому

      Were you able to fix this? I found online people suggesting changing batch size will help, but that didn't solve it for me.

  • @preciousakpan674
    @preciousakpan674 3 роки тому

    Please how are the predictions made after the training

  • @hideoakb
    @hideoakb 3 роки тому

    Dude I need to create transformer neural network based chatbot for project submission in 15 days. What should I do? Learn or use to make my project?🙏🙏🙏

  • @KlimovArtem1
    @KlimovArtem1 3 роки тому

    Is it possible to train a GAN based on another already trained one? Like, if I take the one for faces (there are a few publically available, as I know), and then add more face photos, let's say, some fantasy creatures, would that work?

    • @HeatonResearch
      @HeatonResearch  3 роки тому +1

      Yes, transfer learning often helps. This is also how you resume training. Use --resume

    • @KlimovArtem1
      @KlimovArtem1 3 роки тому

      @@HeatonResearch cool! But what if the new image set is very very different from the first one? Like, you trained it on fish, and then re-trained it on people faces. Will it make any sense, make it better or worse in any way?

  • @nguyenanhnguyen7658
    @nguyenanhnguyen7658 3 роки тому

    It is 2,500 US/month for full 1 x VT100 on Google or AWS :) I have run for 4,000 ticks now and FiDFUll50K still somewhere 122... :( custom dataset 1m images furniture

  • @muyyhhit838
    @muyyhhit838 3 роки тому

    Jeff I've occasionally seen P100 in the free colab instances.

    • @HeatonResearch
      @HeatonResearch  3 роки тому

      Interesting, thanks! I am sure they are all slowly marching up. Can't wait to spot an a100 in the wild, on pro, wonder how long that will take?

  • @ebinthomas9085
    @ebinthomas9085 2 роки тому +1

    Generating gan link not working...

  • @rahimnealyakoob5968
    @rahimnealyakoob5968 3 роки тому

    how to generate the gan videos? do you have a notebook?

  • @ajayhiremath629
    @ajayhiremath629 2 місяці тому

    What about the pytorch version?
    Is it already installed in collab?

    • @HeatonResearch
      @HeatonResearch  2 місяці тому

      No, Colab does not have this installed by default. It does take some setup, and PyTorch does sometimes introduce breaking changes for Stylegan.

  • @dbboyes8286
    @dbboyes8286 2 роки тому

    Thanks you for making a Great tutorial series of StyleGAN2 ADA.
    But I have some question about resume training, I want to know how to set the resume train from latest kimg. Can I set it at --kimg?
    For example
    1. I have done my 10 snapshot and get network-xxxxx-0040.pkl
    2. I resume on network-xxxxx-0040.pkl
    3. But It start to train at 0 kimgs
    So I want to know how to set it start at 40 kimgs
    Thanks for anyone who answer this. (Sorry for my bad English :( )

  • @scottrocha539
    @scottrocha539 2 роки тому

    I keep resuming but it seems like it starts over but at the same time it looks slightly better. Is this normal?

  • @joshuawilson2294
    @joshuawilson2294 3 роки тому

    Please help, i am using colab but the model wont load
    Training options:
    {
    "num_gpus": 1,
    "image_snapshot_ticks": 10,
    "network_snapshot_ticks": 10,
    "metrics": [
    "fid50k_full"
    ],
    "random_seed": 0,
    "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "/content/drive/MyDrive/data/gan/images/",
    "use_labels": false,
    "max_size": 1920,
    "xflip": true,
    "resolution": 256
    },
    "data_loader_kwargs": {
    "pin_memory": true,
    "num_workers": 1,
    "prefetch_factor": 2
    },
    "G_kwargs": {
    "class_name": "training.networks.Generator",
    "z_dim": 512,
    "w_dim": 512,
    "mapping_kwargs": {
    "num_layers": 2
    },
    "synthesis_kwargs": {
    "channel_base": 16384,
    "channel_max": 512,
    "num_fp16_res": 4,
    "conv_clamp": 256
    }
    },
    "D_kwargs": {
    "class_name": "training.networks.Discriminator",
    "block_kwargs": {},
    "mapping_kwargs": {},
    "epilogue_kwargs": {
    "mbstd_group_size": 4
    },
    "channel_base": 16384,
    "channel_max": 512,
    "num_fp16_res": 4,
    "conv_clamp": 256
    },
    "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "lr": 0.0025,
    "betas": [
    0,
    0.99
    ],
    "eps": 1e-08
    },
    "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "lr": 0.0025,
    "betas": [
    0,
    0.99
    ],
    "eps": 1e-08
    },
    "loss_kwargs": {
    "class_name": "training.loss.StyleGAN2Loss",
    "r1_gamma": 0.8192
    },
    "total_kimg": 25000,
    "batch_size": 16,
    "batch_gpu": 16,
    "ema_kimg": 5.0,
    "ema_rampup": 0.05,
    "ada_target": 0.6,
    "augment_kwargs": {
    "class_name": "training.augment.AugmentPipe",
    "xflip": 1,
    "rotate90": 1,
    "xint": 1,
    "scale": 1,
    "rotate": 1,
    "aniso": 1,
    "xfrac": 1,
    "brightness": 1,
    "contrast": 1,
    "lumaflip": 1,
    "hue": 1,
    "saturation": 1
    },
    "run_dir": "/content/drive/MyDrive/data/gan/experiments/00004--mirror-auto1-ada"
    }
    Output directory: /content/drive/MyDrive/data/gan/experiments/00004--mirror-auto1-ada
    Training data: /content/drive/MyDrive/data/gan/images/
    Training duration: 25000 kimg
    Number of GPUs: 1
    Number of images: 1920
    Image resolution: 256
    Conditional model: False
    Dataset x-flips: True
    Creating output directory...
    Launching processes...
    Loading training set...
    Num images: 3840
    Image shape: [3, 256, 256]
    Label shape: [0]
    Constructing networks...
    Setting up PyTorch plugin "bias_act_plugin"... Done.
    Setting up PyTorch plugin "upfirdn2d_plugin"... Done.
    Generator Parameters Buffers Output shape Datatype
    --- --- --- --- ---
    mapping.fc0 262656 - [16, 512] float32
    mapping.fc1 262656 - [16, 512] float32
    mapping - 512 [16, 14, 512] float32
    synthesis.b4.conv1 2622465 32 [16, 512, 4, 4] float32
    synthesis.b4.torgb 264195 - [16, 3, 4, 4] float32
    synthesis.b4:0 8192 16 [16, 512, 4, 4] float32
    synthesis.b4:1 - - [16, 512, 4, 4] float32
    synthesis.b8.conv0 2622465 80 [16, 512, 8, 8] float32
    synthesis.b8.conv1 2622465 80 [16, 512, 8, 8] float32
    synthesis.b8.torgb 264195 - [16, 3, 8, 8] float32
    synthesis.b8:0 - 16 [16, 512, 8, 8] float32
    synthesis.b8:1 - - [16, 512, 8, 8] float32
    synthesis.b16.conv0 2622465 272 [16, 512, 16, 16] float32
    synthesis.b16.conv1 2622465 272 [16, 512, 16, 16] float32
    synthesis.b16.torgb 264195 - [16, 3, 16, 16] float32
    synthesis.b16:0 - 16 [16, 512, 16, 16] float32
    synthesis.b16:1 - - [16, 512, 16, 16] float32
    synthesis.b32.conv0 2622465 1040 [16, 512, 32, 32] float16
    synthesis.b32.conv1 2622465 1040 [16, 512, 32, 32] float16
    synthesis.b32.torgb 264195 - [16, 3, 32, 32] float16
    synthesis.b32:0 - 16 [16, 512, 32, 32] float16
    synthesis.b32:1 - - [16, 512, 32, 32] float32
    synthesis.b64.conv0 1442561 4112 [16, 256, 64, 64] float16
    synthesis.b64.conv1 721409 4112 [16, 256, 64, 64] float16
    synthesis.b64.torgb 132099 - [16, 3, 64, 64] float16
    synthesis.b64:0 - 16 [16, 256, 64, 64] float16
    synthesis.b64:1 - - [16, 256, 64, 64] float32
    synthesis.b128.conv0 426369 16400 [16, 128, 128, 128] float16
    synthesis.b128.conv1 213249 16400 [16, 128, 128, 128] float16
    synthesis.b128.torgb 66051 - [16, 3, 128, 128] float16
    synthesis.b128:0 - 16 [16, 128, 128, 128] float16
    synthesis.b128:1 - - [16, 128, 128, 128] float32
    synthesis.b256.conv0 139457 65552 [16, 64, 256, 256] float16
    synthesis.b256.conv1 69761 65552 [16, 64, 256, 256] float16
    synthesis.b256.torgb 33027 - [16, 3, 256, 256] float16
    synthesis.b256:0 - 16 [16, 64, 256, 256] float16
    synthesis.b256:1 - - [16, 64, 256, 256] float32
    --- --- --- --- ---
    Total 23191522 175568 - -
    Discriminator Parameters Buffers Output shape Datatype
    --- --- --- --- ---
    b256.fromrgb 256 16 [16, 64, 256, 256] float16
    b256.skip 8192 16 [16, 128, 128, 128] float16
    b256.conv0 36928 16 [16, 64, 256, 256] float16
    b256.conv1 73856 16 [16, 128, 128, 128] float16
    b256 - 16 [16, 128, 128, 128] float16
    b128.skip 32768 16 [16, 256, 64, 64] float16
    b128.conv0 147584 16 [16, 128, 128, 128] float16
    b128.conv1 295168 16 [16, 256, 64, 64] float16
    b128 - 16 [16, 256, 64, 64] float16
    b64.skip 131072 16 [16, 512, 32, 32] float16
    b64.conv0 590080 16 [16, 256, 64, 64] float16
    b64.conv1 1180160 16 [16, 512, 32, 32] float16
    b64 - 16 [16, 512, 32, 32] float16
    b32.skip 262144 16 [16, 512, 16, 16] float16
    b32.conv0 2359808 16 [16, 512, 32, 32] float16
    b32.conv1 2359808 16 [16, 512, 16, 16] float16
    b32 - 16 [16, 512, 16, 16] float16
    b16.skip 262144 16 [16, 512, 8, 8] float32
    b16.conv0 2359808 16 [16, 512, 16, 16] float32
    b16.conv1 2359808 16 [16, 512, 8, 8] float32
    b16 - 16 [16, 512, 8, 8] float32
    b8.skip 262144 16 [16, 512, 4, 4] float32
    b8.conv0 2359808 16 [16, 512, 8, 8] float32
    b8.conv1 2359808 16 [16, 512, 4, 4] float32
    b8 - 16 [16, 512, 4, 4] float32
    b4.mbstd - - [16, 513, 4, 4] float32
    b4.conv 2364416 16 [16, 512, 4, 4] float32
    b4.fc 4194816 - [16, 512] float32
    b4.out 513 - [16, 1] float32
    --- --- --- --- ---
    Total 24001089 416 - -
    Setting up augmentation...
    Distributing across 1 GPUs...
    Setting up training phases...
    Exporting sample images...
    Initializing logs...
    2021-07-07 07:40:41.581569: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
    Training for 25000 kimg...
    tick 0 kimg 0.0 time 38s sec/tick 7.0 sec/kimg 435.39 maintenance 30.7 cpumem 4.21 gpumem 12.63 augment 0.000
    Evaluating metrics...
    /usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
    cpuset_checked))
    /usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:1051: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /pytorch/c10/core/TensorImpl.h:1156.)
    return forward_call(*input, **kwargs)
    /usr/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 28 leaked semaphores to clean up at shutdown
    len(cache))

    • @vyang6002
      @vyang6002 3 роки тому +1

      I had the same problem before, then I realized that is the pytorch difference. You have to use tytorch 1.7.1

  • @JavierPortillo1
    @JavierPortillo1 2 роки тому

    Hi, I think it would be possible to get the server time and save a checkpoint one hour before shutdown.

    • @HeatonResearch
      @HeatonResearch  2 роки тому

      That would require a change to StyleGAN, since it checkpoints based on number of steps, rather than anything related to time. But agree that would be optimal.

  • @DEFINITE444
    @DEFINITE444 3 роки тому

    Sir great tutorial but having issue with a image with error
    Inconsistant color format: /content/drive/MyDrive/data/gan/images/test/1.png

  • @deewakarchakraborty4027
    @deewakarchakraborty4027 3 роки тому +2

    sir I am trying to train a style GAN2 but it stops after 1 tick only

    • @abcdefg-zl7ew
      @abcdefg-zl7ew 3 роки тому

      i have the same problem! Let me know if you could find a solution please:)

    • @deewakarchakraborty4027
      @deewakarchakraborty4027 3 роки тому +1

      @@abcdefg-zl7ew yes I solved the problem

    • @deewakarchakraborty4027
      @deewakarchakraborty4027 3 роки тому +1

      @@abcdefg-zl7ew what is the size of your dataset

    • @abcdefg-zl7ew
      @abcdefg-zl7ew 3 роки тому

      @@deewakarchakraborty4027 around 4000 pictures

    • @deewakarchakraborty4027
      @deewakarchakraborty4027 3 роки тому

      @@abcdefg-zl7ew try modifying batch size, whatever batch you have its unable to load it into memory and thus system crashes

  • @pedrom9414
    @pedrom9414 Рік тому +1

    How can I add fine tunning to fake images?

  • @khaledkaddal
    @khaledkaddal 3 роки тому

    4k 4k 4k 4k, pleeeaaaase !

  • @jacksmith1098
    @jacksmith1098 3 роки тому

    must the images be JPEGS or could they be PNGs?

    • @HeatonResearch
      @HeatonResearch  3 роки тому

      Either works fine for input, but the stylegan image converter will convert them to PNG.

  • @petern1575
    @petern1575 3 роки тому

    When do I stop training?

  • @loveyoutenderly9768
    @loveyoutenderly9768 3 роки тому

    I have a question. does it have to be a square? Thank you.

    • @HeatonResearch
      @HeatonResearch  3 роки тому +1

      StyleGAN2 ADA is built to process square images. GANs in general do not have such a requirement. It would take considerable modification to StyleGAN2 to process non-square images; additionally, considerable inefficiencies would be introduced if the images were non-square. Additionally, the way StyleGAN2 is built, the dimensions MUST be a multiple of 2. So you jump right from 1024 to 2048, to 4096 to 8192. etc.

    • @loveyoutenderly9768
      @loveyoutenderly9768 3 роки тому

      @@HeatonResearch great thanks!!!

  • @marcoaccardi
    @marcoaccardi 3 роки тому

    Looking for a solution to get a 4K GAN

  • @charlesbaoumar
    @charlesbaoumar 2 роки тому

    Not sure what to do

  • @psychologienerd7546
    @psychologienerd7546 Рік тому

    does not work anymore on colab..

  • @TeaMHackeRPiratE
    @TeaMHackeRPiratE 3 роки тому

    colab pro diskspace size please ?

  • @cuentadefernandoleyra6088
    @cuentadefernandoleyra6088 2 роки тому

    there is a torch uncompatibility.

  • @KlimovArtem1
    @KlimovArtem1 3 роки тому

    I wonder if they allow to train it on nsfw images 😈

    • @HeatonResearch
      @HeatonResearch  3 роки тому +1

      lol, yes that has been done. Just google GAN p*rn.

  • @veazix
    @veazix 2 роки тому +1

    Seems there's a new issue. I was able to recreate it on a fresh google account, notebook, images and the issue persists.
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/__init__.py", line 42, in tf
    from tensorboard.compat import notf # noqa: F401
    ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.7/dist-packages/tensorboard/compat/__init__.py)
    During handling of the above exception, another exception occurred:
    File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 54, in _can_use_color
    return sys.stdout.isatty()
    AttributeError: 'Logger' object has no attribute 'isatty'
    I'm too noob to figure out what's wrong but uninstalling tensorboard seems to allow training to continue. (Skipping tfevents export: No module named 'tensorboard')
    Funny story is I paid for colab pro+ while I was training with the free GPU and only got the error when I tried training with my pro+ plan.

    • @shashankpalle5409
      @shashankpalle5409 2 роки тому

      I am also facing the same issue. Did u resolve it?

    • @veazix
      @veazix 2 роки тому

      ​@@shashankpalle5409 In the install StyleGAN2 tab add a # before it installs tensorboard
      #!pip install tensorboard==1.14.0
      I'm not an expert so I don't know how to fix it properly.

    • @veazix
      @veazix 2 роки тому +1

      @@shashankpalle5409 I did some more testing and it appears my previous suggestion does not work and you have to manually uninstall tensorboard.
      !pip uninstall tensorboard.
      Or, you can change the version to something that stylegan will ignore like tensorboard==1.14.1

    • @oz1178
      @oz1178 2 роки тому +1

      ​@@veazix Thank you so much!!! I have spent way too long trying all kinds of things to get my copy of Jeff's notebook to work - and your solution to uninstall tensorboard solved the problem :) [It's weird though, because up until about 5 days ago, I had run the notebook at least 20 times with no problems. Then something changed, which your solution fixed] Thank, again!

    • @shashankpalle5409
      @shashankpalle5409 2 роки тому +2

      Thanks a lot @Devdabomber it solved the issue.

  • @thegoldenboy4681
    @thegoldenboy4681 2 роки тому

    I got this huge error list when trying to train my model for the first time.
    ImportError: cannot import name 'notf' from 'tensorboard.compat' (/usr/local/lib/python3.7/dist-packages/tensorboard/compat/__init__.py)
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "/content/stylegan2-ada-pytorch/train.py", line 538, in
    main() # pylint: disable=no-value-for-parameter
    File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
    File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
    File "/usr/local/lib/python3.7/dist-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
    File "/content/stylegan2-ada-pytorch/train.py", line 531, in main
    subprocess_fn(rank=0, args=args, temp_dir=temp_dir)
    File "/content/stylegan2-ada-pytorch/train.py", line 383, in subprocess_fn
    training_loop.training_loop(rank=rank, **args)
    File "/content/stylegan2-ada-pytorch/training/training_loop.py", line 240, in training_loop
    stats_tfevents = tensorboard.SummaryWriter(run_dir)
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 220, in __init__
    self._get_file_writer()
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 251, in _get_file_writer
    self.flush_secs, self.filename_suffix)
    File "/usr/local/lib/python3.7/dist-packages/torch/utils/tensorboard/writer.py", line 61, in __init__
    log_dir, max_queue, flush_secs, filename_suffix)
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 72, in __init__
    tf.io.gfile.makedirs(logdir)
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 65, in __getattr__
    return getattr(load_once(self), attr_name)
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 97, in wrapper
    cache[arg] = f(arg)
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/lazy.py", line 50, in load_once
    module = load_fn()
    File "/usr/local/lib/python3.7/dist-packages/tensorboard/compat/__init__.py", line 45, in tf
    import tensorflow
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/__init__.py", line 51, in
    from ._api.v2 import compat
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/__init__.py", line 37, in
    from . import v1
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/__init__.py", line 30, in
    from . import compat
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/__init__.py", line 37, in
    from . import v1
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/compat/v1/__init__.py", line 47, in
    from tensorflow._api.v2.compat.v1 import lite
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/__init__.py", line 9, in
    from . import experimental
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/__init__.py", line 8, in
    from . import authoring
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/_api/v2/compat/v1/lite/experimental/authoring/__init__.py", line 8, in
    from tensorflow.lite.python.authoring.authoring import compatible
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/authoring/authoring.py", line 43, in
    from tensorflow.lite.python import convert
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/convert.py", line 29, in
    from tensorflow.lite.python import util
    File "/usr/local/lib/python3.7/dist-packages/tensorflow/lite/python/util.py", line 51, in
    from jax import xla_computation as _xla_computation
    File "/usr/local/lib/python3.7/dist-packages/jax/__init__.py", line 59, in
    from .core import eval_context as ensure_compile_time_eval
    File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 47, in
    import jax._src.pretty_printer as pp
    File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 56, in
    CAN_USE_COLOR = _can_use_color()
    File "/usr/local/lib/python3.7/dist-packages/jax/_src/pretty_printer.py", line 54, in _can_use_color
    return sys.stdout.isatty()
    AttributeError: 'Logger' object has no attribute 'isatty'

    • @thegoldenboy4681
      @thegoldenboy4681 2 роки тому

      Also I can't seem to find any pkl files, I don't think the program created it for some reason.

    • @cheaddaca3532
      @cheaddaca3532 2 роки тому +1

      same error here
      i fixed it by running the following:
      !pip uninstall jax jaxlib
      !pip install jax[cpu]==0.3.10

    • @thegoldenboy4681
      @thegoldenboy4681 2 роки тому

      @@cheaddaca3532 I see, when and where should I add the cell to run this?

    • @cheaddaca3532
      @cheaddaca3532 2 роки тому +1

      @@thegoldenboy4681 before you run the initial training

    • @thegoldenboy4681
      @thegoldenboy4681 2 роки тому

      @@cheaddaca3532 Thank you