Using GNU parallel painlessly -- from basics to bioinformatics job orchestration
Вставка
- Опубліковано 27 вер 2022
- Basics of GNU parallel, how to use it to run scripts and functions, all the way to using parallel to orchestrate launching jobs with different parameters like we often do in bioinformatics.
You can find all the code from this video here:
omgenomics.com/parallel - Наука та технологія
Looks pretty promising, it seems GNU parallel is able to easily call a single command over a list of arguments, with the added bonus of parallelization. I need to try that command out tomorrow. Thank you for the video!
Very good instruction. As a beginner in the field of bioinformatics, this helps me a great deal! Thank you so much!
Thanks for this great video! 👌
If I may add several other Parallel features I find very useful :
- instead of '--verbose', I use '--dry-run' to see what I run
- option '--keep-order' can be useful sometimes, to tell parallel to keep output order the same as input (but arguments are not actually processed in order, it affects only output order)
- all it's built-in substitution variables like '{.}' (path+filename without extension) ; '{/}' (equivalent to 'basename /path/to/file.ext') ; '{//}' (equivalent to dirname) ; '{/.}' (equivalent to 'basename /path/to/file.ext .ext')
Thank you, my colleagues have been telling me to use parallel for months now, and I'm finally able to start scripting with it.
To test for the correct version of GNU Parallel in a script, use `parallel --minversion 20220622 && echo GNU Parallel version 20220622 or newer is installed`. This is supported use, which is guaranteed to work in the future. The `grep` trick @01:10 is not guaranteed to work in the future.
Really good video, thanks!
Love You Dr. Nattestad🍀
Thanks! That was really helpful.
Great video!
Very useful video!
thanks!!
Do an advanced video where you take the most cryptic GNU Parallel command you have ever used in real life, and decipher what it does.
Do the same for the most 'clever' use. Maybe you replaced a 3 pages script with a single line? Maybe GNU Parallel made it possible to do something that was infeasible to do otherwise?
I used it to create 'dark mode' PDFs with a random java program I found online by splitting books or long papers into individual pages and then parallelizing the dark page creation. I am not a java programmer, and I was able to parallelize a serial program without rewriting it. (Now I just use dark reader in the browser). Other things I would use GNU parallel for is cropping/resizing/recompressing images and normalizing/converting audio files with sox or FFMPEG.
How good at genetics do I need to be if i want to study Bioinformatics at masters level? i am currently studying biomedical sciences at undergraduate level and am looking at what to do next, i have struggled with genetics my whole degree. bioinformatics interests me a lot, even tho im terrible at genetics, mainly cus of the computing and statistical/analytical aspect of it. any advice would be greatly appreciated!
I am undertaking a graduate program now in bioinformatics, from a Biotechnology undergraduate background. We have more students from computer science than biology background. I have noticed that even when you are not good at genetics, you can still lean towards other non-genetic work like machine learning, big data analysis (data science), and disease modelling among others. However, I have noticed that you are also having worries about statistical aspect of it, which makes it tricky. But you will get the basics very fast and you may find it easy at the end
if you prefer reading man prallel_tutorial