I strongly encourage everyone to generate their own data, but I've also posted my dataset to HuggingFace here - huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main
Great question, and yes, but there's some nuance. Every timestep is considered a memory so continuous joystick actions tend to generate a lot of them very quickly. My average was right around 200 timesteps per task once I got good at it, so filling a 30K memory buffer takes about 150 successful completions of that task. I experimented with human memory buffers from 20K to 60K for various tasks and found 30K to be a good minimum buffer size to succeed at all tasks. To that end, the can_sample method we've coded here looks for batch_size(64) * 500, or 32,000. You could tweak that multiplier and experiment with less.
I was going to do this at the end of the series, but I went ahead and pinned a comment with my data set. I still recommend doing some of your own data generation to get the full process, but it's there to save you some time.
I strongly encourage everyone to generate their own data, but I've also posted my dataset to HuggingFace here - huggingface.co/datasets/robertcowher/farama-kitchen-sac-hrl-youtube/tree/main
Top videos! Did I understand correctly? You took 30.000 iteractions using the joystick?
Great question, and yes, but there's some nuance. Every timestep is considered a memory so continuous joystick actions tend to generate a lot of them very quickly. My average was right around 200 timesteps per task once I got good at it, so filling a 30K memory buffer takes about 150 successful completions of that task. I experimented with human memory buffers from 20K to 60K for various tasks and found 30K to be a good minimum buffer size to succeed at all tasks. To that end, the can_sample method we've coded here looks for batch_size(64) * 500, or 32,000. You could tweak that multiplier and experiment with less.
I was going to do this at the end of the series, but I went ahead and pinned a comment with my data set. I still recommend doing some of your own data generation to get the full process, but it's there to save you some time.