UPDATE APRIL 2024 Thanks for the engagement, comments and feedback Due to updates to STAR and PICARD tools, two additional steps (git checkout) are required to get the versions used in the video. I have updated the "0_installSoftware" script to make sure the correct versions are used. Please let me know if you get stuck on any additional steps.
From the bottom of my heart, after 30 years of informatics, sratoolkit cache setup was the most excruciating thing I have ever done, ik heb nu ne dringenden drank nodig.
AWESOME video... Thank you so much for taking the time to demonstrate all of this from scratch -- that was **a lot** of work, and it was invaluable to see this process performed in real time. I will be watching Part 2 tomorrow.
Thanks for leaving a comment. It is indeed a lot of effort, all in all it turned out to be ~8 hours of streaming across 3 sessions. However, if you're doing something you love it fortunately doesn't feel like work.
Updated the description with the link to the code on GitHub, and presentation in PDF on OneDrive: Code on Github: gist.github.com/DannyArends/04d87f5590090dfe0dc6b42e5e1bbe15 Presentation on OneDrive: 1drv.ms/b/s!AtYWSYRMmSHZh4gFsR1904Y-Cce04Q?e=Q8dtRl
Thank you very much Dr. Arends for the tutorials. Just wanted to add something for Ubuntu users, because the folders that created in Ubuntu are kind of different from centos linux. In "Ubuntu", the PATH for "./vdb-config --interactive" or "./fasterq-dump" will be similar to this "/software/sratoolkit/sratoolkit.3.1.1-ubuntu64/bin".
Excellent, excellent, excellent. Thank you, a zillion time. As always, very instructive and educating style for beginners (I am biologist who loves programming and coding). Looking forward for tomorrow session, IN SHAA ALLAH. I have so many questions (naïve ones), the first one is this session (and future coming ones) for everyone. (Can I replicate it for my students. Also, I wish one day I will be able to publish a paper for RNA-seq). I ask this because I am under the impression that you direct these videos for your own students. Pardon me for my ignorance and good luck. Mohamed
I put the lectures online so everyone can learn from them. I think education should be broadly available to everyone. For this lecture I just start from the very basics, setting up the tools needed for RNAseq. Tomorrow we'll have part 2 where we'll start building a pipeline for RNAseq read alignment.
@@DannyArends Thank you very much for your kind response and reply. What makes you stand out from others is that you explain the command lines. I watched a lot (not saying every youtube video but many) and you are among the very very few who explain what is the meaning of the command. I do not ask for so many details because it will be impossible to do so for a public video but a balance between the two is favored. In addition, people from biology background are mostley lost in the linux environment with so many errors happen (apart from typo mistakes).
Thanks, I try to be as thorough and complete as possible. It's why I avoid blindly using packages like dplyr and such, and tend to focus on learning people to use for and while loops in R. When someone understands the basics on a fundamental level, more advanced manipulation statements come easier. The same holds for the command line.
I would like to appreciate this initiative of yours and obviously, it's great, btw can you please specify the configuration of the desktop or laptop in terms of RAM or processor as minimum requirement in order to perform the rna seq analysis all standalone. Again thnx in advance.
This depends on what you are sequencing (mostly the size of the genome). For bacteria an i5 with 4gb RAM would be enough. If you're doing humans, an i7 with 32gb RAM will be needed to do a handful of samples in a reasonable amount of time. For 100s of samples an HPC cluster is needed so you can distribute jobs to many machines.
@@DannyArends Heartiest thnx for your informative reply. I would like to work with the rna seq analysis of various plant species genome like soyabean, common bean, jute specially, so, Is the configuration of i5 with 16GB RAM considered good for performing rna seq analysis in these crops genome standalone?
Thanks! it is a very insightful video you did but how I can be able to follow your virtual online at the time you will do a video coz I am MSc in Bioinformatics and interested to follow your virtual online.
If you are subscribed to the channel, you'll be informed about upcoming live streams. Generally I post the stream announcement ~ 1 week before the actual stream takes place, so people can plan to attend.
Would you please, explain to me the following: when you start installing gatk (at 1:29 hr), you said you prefer compile it yourself but due to its size and time, then you will download and extdact. So, what is the difference between the two methods. Thanks in advance. Mohamed
When compiling it from source, you can more easily update it, just a simple git update followed by a gradle command. The added bonus is 1) you don't have to check the website to see if there is an update and 2) you have access to the source code when an error occurs which helps because the documentation online can lag behind.
sorry for the disturbance, the link that you have provided for debian is 12.6.0, but what you have used in the video is 11.5.0, can you please provide the link for 11.5.0?
No bother, yeah It seems a newer version was released, you can always get the older versions from the archives, a direct link to the 11.5.0 netinst image: cdimage.debian.org/mirror/cdimage/archive/11.5.0/amd64/iso-cd/debian-11.5.0-amd64-netinst.iso
What is your full command? It seems you're calling java on the folder, not the .jar file. If you are, and the error persists, redownload the gatk and extract it, a corruption can occur during download sometimes.
For smaller data sets and genomes, 16 Gb will be enough (e.g. Yeast, Bacteria, Bees, some Plants). For Mouse or Human, 16 Gb is probably not going to be enough, and 32 / 64 Gb is going to be the minimum.
Thank you so much for the wonderful video! I am trying do this in WSL2, but as I am using a network drive, it is bit hard to follow the steps... I found out that it is not allowed to create soft link in SMB connected drive and WSL2 is very slow while writing on the mounted drives. Would this be critical in the further steps? Thank you in advance!
I haven't tried this in wsl2, mostly because I dual-boot to Linux to do bioinformatics related analysis. In theory you could run the whole analysis pipeline in windows itself since all tools are available for windows as well. You could go the wsl2 route *probably*, but it might needs some tweaks or workarounds. Even then, like virtual box the performance will not be anywhere near what's needed for real analysis. So, all in all, it's easiest to following along on linux/virtual box. I chose a virtual box for this since my streaming setup is windows based and installing wsl2 needs a reboot which breaks the stream, so I decided a virtual box was the easiest to do a stream like this.
Ik krijg een error na de "make" bij STAR: STAR.cpp:52:45: error 'parametersDefault' was not declared in this scope en ook STAR.cpp:53:20 error: 'parameterDefault_len' was not declared in this scope. Hoe kan ik hier omheen?
You're going to have to use an older version of the STAR aligner. I've had several reports now that mentioned STAR not compiling, I think it's due to them changing their build based on a newer version of linux. So two options: 1) try installing a newer linux version 2) grab an older binary version of star and use that. (Some other comment on here.mentions the version that still works) I'll see if I can figure out what the issue is and make another video with the solution when I do.
This is the comment I was referring to: "Seems like the master branch is currently "broken", the quickest solution is to just download the binary distribution of the release page. The latest compiled version for linux is: github.com/alexdobin/STAR/releases/download/2.7.10a_alpha_220818/STAR_2.7.10a_alpha_220818_Linux_x86_64_static.zip Just unzip it and put the STAR binary file in your ~/bin folder"
Hello! This video is super helpful for a beginner, but I failed start the virual box. The computer has window 7 system btw and the debian is 32-bit instead of 64. Is there anyway that i can avoid this problem
Virtual box runs fine on windows 7, you do need to install Debian with a 64bit version otherwise you're not going to be able to run the tools. 32bit OS versions are not suitable for large files.
Hi Prof. Danny, Thank you for this excellent video. I have an issue regarding the update in bash file though I updated the code at the end of the bash file, I'm not able to execute the command, for example, when I execute "STAR" I'm getting the " Command 'STAR' not found, but can be installed with: sudo apt install rna-star ". I tried this after conda deactivate. where as I'm getting the command working in conda environment but not in other case.
Thanks for the compliment, thing with $path settings get quite complicated when conda is involved since it takes over the whole environment. Feel free to send me an email with a copy of your .bashrc file so I can take a look into it.
Hellow Professor, knocking your door for another curiosity. And that is, I have upgraded my i5 laptop's RAM from 8 gb to 16 gb and here is I am wondering what should I do, dual boot or Virtual box or wsl in windows or use linux standalone for performing rna seq analysis in some plant genome? I am using windows 10 now. So which option should be preferable to use? Thnx in advance.
I'd probably go for WSL on windows 10 for convenience and semi performant. Dualboot is nice when you have the HDD space for it (sequencing data is big), and virtualbox just has too little performance for real genome sizes.
@@DannyArends Heartiest thnx for your prompt response, Professor. And sorry to bother you again. I would like to know if I go for WSL in Windows 10 then will I have my full 16 gb ram support for rna seq data analysis? My laptop has 1 TB HDD. so Is it enough for my laptop to efficiently handle the pressure of dual booting?
In that case just download trimmomatic v0.39 from here: www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip and extract it. Make sure to update the script to reflect that you're using 0.39 not 0.40-rc1
Thank you for your video. When I was installing Trimmomatic or PICARD tools etc, and then tested if it was installed, it always showed like "Unable to access jarfile picard-2.27.5-SNAPSHOT-all.jar". I use M1 Mac and installed Debian bullseye. How to fix this problem? Thank you in advance!
Hi there, if you use the file browser can you see the picard jar file in the folder? The unable to access error generally means it cannot find the file that you're telling it to execute. So make sure you're in the right location and you can see the file using ls. Alternatively you can give the full path to the file: java -jar /home/username/Software/picard/picard.jar
Also, again sorry for such continuous bothering but if it is possible to make videos on issues like 23andme (or similar ones), exome analysis and microarray analysis (or if there will be a plan of any of these ones). If not, it is Ok, just making another “bothering naïve” suggestion.
Hi Danny, sorry again. I just add this comment, may be it help someone or may be I have strange situation. In the step of making links, the ln step does not work for tabix and fasterq dump. (Again, one of the pain for biologists to learn linux.) Anyway, i googled and i guess i found the solution. Add f to s, so the command is "ln -sf path". Thanks. Mohamed (forget to mention that i am on ubuntu, dual boot. Also, i think the code for tabix3 is not on github).
The f (force) should only be needed when you're linking on top of an already existing file, link, or folder. It's not recommended to just overwrite what was already there, especially since it's relatively common to switch the from and to sides of the command. Perhaps you had tabix/fasterqdump already linked, and the f was needed to overwite the existing link?
@@DannyArends Thanks a lot. I am not pro in linux but i do understand what you wrote. I tried the normal steps shown in your youtube videos (session #1 & #2), and when I do ls (from within bin folder), it gives me everything in green color except fasterq-dump and tabix, they appear in red. When I browse to the folder containing tabix and fasterq-dump, they only work when i type "./tabix". This seems weird. It is there but ln command within bin is not recognizing them (i am on ubunty 22.10). So, i searched for solution and that is what i found. I am very very sorry if my answer is irrelevant or has nothing to do with your kind answer. But my conclusion is to prefer to use debian and follow exactly your master and that ubuntu may be become not good for some bioinformatics tools. Thanks again. Mohamed
Generally them being shown in red means the target of the link doesn't exist. You can check this by doing an ls command with -lathr or something, it shows the target location for each link. Make sure the link points to the executable. Delete the red links when the link points to a non-existing path, then link again if the ln command gives an error or doesn't create a link 99% of the time it's a typo in the from path
@@DannyArends Hi Danny, I followed your steps and it worked. I have no explanation. I first, removed the links that I made using -sf, then added them again like what you did in the video here, and it works (really, very strange. I repeated before this on two computers and both links to tabix and fasterq-dump, did not work before). Anyway, thank you very much. Mohamed
The real error should be mentioned before, the "build failed" is not a real error it just lets you know it couldn't create the jar file. I can help you with this, but I would need to see the full build command you used, as well as all output. Please drop it by email (my email is listed in the about section of my channel)
Hello, the video was extremely helpful and easy to follow. I installed everything and at the end, once I open a new terminal to check samtools or STAR, it tells bash:samtools:command not found. Whats the problem? Also, I took the debian iso initially and not the dvd file that you used.
Hi, figured it out. /home/Rahul/software/ is the right one. I copy pasted directly which has danny in it. All are working now except STAR which has a red symbol. Any leads are helpful
Did you update the .bashrc file to add the ~/bin folder to your $PATH. see: gist.github.com/DannyArends/04d87f5590090dfe0dc6b42e5e1bbe15 (0_installSoftware.sh) line 83 to 97 where we make symbolic links in ~/bin and then use nano to update the bashrc file
A red symbol? That probably means the link isn't pointing to the correct location. Remove the link and add it again, using the tab key to auto complete paths will prevent some failures like typos and capitalization issues.
@@DannyArends Thank you so much for the fast response. I did update .bashrc file initially, but after I updated my name and added all 5 files again, I didn't do it
in ubuntu you need to run the vdb-config --interactive in the /bin that is at the root of your extracted file that should be in sratoolkit folder if you have mkdir one otherwise it's going to be in root of your /software folder. (Maybe because it'd my machine, but it is the most annoying program ever)
@@DannyArends No worries it's very similar. Had to interrupt myself because it was a very long install and my day started, tomorrow I'll resume and try part 2. Thanks for the great work!
Hi Danny, thanks for sharing this video! I'm a beginner in this field and am following your tutorial step-by-step. However, I'm stuck at the STAR software at the moment. I can't seem to compile the software. Error is as below: 'rm' -f STAR.o Parameters.o g++ -c -O3 -std=c++11 -fopenmp -D'COMPILATION_TIME_PLACE="2024-03-14T10:26:24+08:00 :/home/farr/software/STAR/source"' -D'GIT_BRANCH_COMMIT_DIFF="On branch master ; commit b1edc1208d91a53bf40ebae8669f71d50b994851 ; diff files: "' -pipe -Wall -Wextra STAR.cpp STAR.cpp: In function ‘void usage(int)’: STAR.cpp:52:45: error: ‘parametersDefault’ was not declared in this scope 52 | cout.write(reinterpret_cast(parametersDefault), | ^~~~~~~~~~~~~~~~~ STAR.cpp:53:20: error: ‘parametersDefault_len’ was not declared in this scope 53 | parametersDefault_len); | ^~~~~~~~~~~~~~~~~~~~~ make: *** [Makefile:100: STAR.o] Error 1 How do I solve this error?
Seems like the master branch is currently "broken", the quickest solution is to just download the binary distribution of the release page. The latest compiled version for linux is: github.com/alexdobin/STAR/releases/download/2.7.10a_alpha_220818/STAR_2.7.10a_alpha_220818_Linux_x86_64_static.zip Just unzip it and put the STAR binary file in your ~/bin folder
Hi@DannyArends, thanks, thanks for sharing the detailed video. I had set up my own Linux for RNA seq by following your instructions. However, I was wondering if there are any reasons why we create primary_assembly using R?
The answer is that the Ensembl ftp server doesn't provide a primary assembly for saccharomyces cerevisiae to download, while it does for e.g. mouse/human and other commonly used model organisms. For saccharomyces only the top-level genome build is provided, but top level builds include all chromsomes (aka the primary assembly), but also regions not assembled into chromosomes (contigs) and N padded haplotype/patch regions. According to Ensembl documentation when no primary assembly is provided it's because the toplevel one is complete, so in this case we could have used the toplevel one (since it'll be identical to the primary assembly) but for most genomes (e.g. mouse) there will be a difference and for alignment 99% of the cases you're going to use the primary assembly. If you'd use the top level for alignment, then you're going to have to deal with these additional regions later on in the analysis which creates additional complexity in the pipeline and 99% of people ignore these regions anyway. I just added the step of building it, since its not difficult and I think it shows how you can use any genome/reference in fasta to align against. (More info see: ftp.ensembl.org/pub/release-108/fasta/saccharomyces_cerevisiae/dna/README)
If you're using the binary, you can't have this compilation error, since you can skip the compilation (no need to build the binary, since you downloaded it). Just download the binary, put it in ~/bin and then run STAR from the command line. You can skip the make commands to build STAR.
UPDATE APRIL 2024
Thanks for the engagement, comments and feedback Due to updates to STAR and PICARD tools, two additional steps (git checkout) are required to get the versions used in the video. I have updated the "0_installSoftware" script to make sure the correct versions are used. Please let me know if you get stuck on any additional steps.
From the bottom of my heart, after 30 years of informatics, sratoolkit cache setup was the most excruciating thing I have ever done, ik heb nu ne dringenden drank nodig.
Thanks Danny! Learning at 40 was never easier thanks to your videos!
Wow, thanks so much. Good to hear you found the series useful and informative.
AWESOME video... Thank you so much for taking the time to demonstrate all of this from scratch -- that was **a lot** of work, and it was invaluable to see this process performed in real time. I will be watching Part 2 tomorrow.
Thanks for leaving a comment. It is indeed a lot of effort, all in all it turned out to be ~8 hours of streaming across 3 sessions. However, if you're doing something you love it fortunately doesn't feel like work.
Thank you Danny for going in-depth with the RNA-sequence tutorial - so detailed and easy to replicate for a beginner. This is super helpful.
Glad it was helpful! Thanks for leaving a comment.
Updated the description with the link to the code on GitHub, and presentation in PDF on OneDrive:
Code on Github: gist.github.com/DannyArends/04d87f5590090dfe0dc6b42e5e1bbe15
Presentation on OneDrive: 1drv.ms/b/s!AtYWSYRMmSHZh4gFsR1904Y-Cce04Q?e=Q8dtRl
Thank you for this great video. Newcastle is my hometown, hope you are enjoying it there!
I'm still discovering new things every day, but from what I've seen I think I'm going to enjoy living here.
Great Video. Thank you. I really appreciate this walk through.
Glad it was helpful!
Thank you from the bottom of my heart 🙂
You're welcome, glad that you're enjoying the content!
Thank you Danny for your very excellent guide. Looking forward to your new lectures =))
Thanks for leaving a comment :)
Thank you very much Dr. Arends for the tutorials. Just wanted to add something for Ubuntu users, because the folders that created in Ubuntu are kind of different from centos linux.
In "Ubuntu", the PATH for "./vdb-config --interactive" or "./fasterq-dump" will be similar to this "/software/sratoolkit/sratoolkit.3.1.1-ubuntu64/bin".
Thanks for the info, every Linux flavor is slightly different indeed.
@@DannyArends True, Dr. Arends. Thank you for providing the chance for learning and sharing our experience.
Thank you for uploading this! Your tutorials are really helpful :)
Thanks, happy you liked it. In the next one we'll start aligning some sequences, I thought it would be good to show the whole process.
Great course sir, thank you.
Glad you like it, thanks for leaving a comment !
I appreciate your work. I am new to RNA seq and I am finding it very interesting
thankyuo so much. . . +Sub
Awesome, thank you! Good to hear you're enjoying the lectures.
Excellent, excellent, excellent. Thank you, a zillion time. As always, very instructive and educating style for beginners (I am biologist who loves programming and coding). Looking forward for tomorrow session, IN SHAA ALLAH. I have so many questions (naïve ones), the first one is this session (and future coming ones) for everyone. (Can I replicate it for my students. Also, I wish one day I will be able to publish a paper for RNA-seq). I ask this because I am under the impression that you direct these videos for your own students. Pardon me for my ignorance and good luck. Mohamed
Edit:
…….. Can I replicate it for my students….. Of course, with all the credits to you and your channel and links.
I put the lectures online so everyone can learn from them. I think education should be broadly available to everyone. For this lecture I just start from the very basics, setting up the tools needed for RNAseq. Tomorrow we'll have part 2 where we'll start building a pipeline for RNAseq read alignment.
Ofcourse feel free to use an resample, credits would be highly appreciated
@@DannyArends Thank you very much for your kind response and reply. What makes you stand out from others is that you explain the command lines. I watched a lot (not saying every youtube video but many) and you are among the very very few who explain what is the meaning of the command. I do not ask for so many details because it will be impossible to do so for a public video but a balance between the two is favored. In addition, people from biology background are mostley lost in the linux environment with so many errors happen (apart from typo mistakes).
Thanks, I try to be as thorough and complete as possible. It's why I avoid blindly using packages like dplyr and such, and tend to focus on learning people to use for and while loops in R. When someone understands the basics on a fundamental level, more advanced manipulation statements come easier. The same holds for the command line.
Thank You So Much, Sir
I would like to appreciate this initiative of yours and obviously, it's great, btw can you please specify the configuration of the desktop or laptop in terms of RAM or processor as minimum requirement in order to perform the rna seq analysis all standalone. Again thnx in advance.
This depends on what you are sequencing (mostly the size of the genome). For bacteria an i5 with 4gb RAM would be enough. If you're doing humans, an i7 with 32gb RAM will be needed to do a handful of samples in a reasonable amount of time. For 100s of samples an HPC cluster is needed so you can distribute jobs to many machines.
@@DannyArends Heartiest thnx for your informative reply. I would like to work with the rna seq analysis of various plant species genome like soyabean, common bean, jute specially, so, Is the configuration of i5 with 16GB RAM considered good for performing rna seq analysis in these crops genome
standalone?
Should work, but it'll take some time to run the analysis since you'll probably only be able to do one sample at a time.
@@DannyArends Thnx from the core of my heart for your enligtening reply.
Thanks! it is a very insightful video you did but how I can be able to follow your virtual online at the time you will do a video coz I am MSc in Bioinformatics and interested to follow your virtual online.
If you are subscribed to the channel, you'll be informed about upcoming live streams. Generally I post the stream announcement ~ 1 week before the actual stream takes place, so people can plan to attend.
Would you please, explain to me the following: when you start installing gatk (at 1:29 hr), you said you prefer compile it yourself but due to its size and time, then you will download and extdact. So, what is the difference between the two methods. Thanks in advance. Mohamed
When compiling it from source, you can more easily update it, just a simple git update followed by a gradle command. The added bonus is 1) you don't have to check the website to see if there is an update and 2) you have access to the source code when an error occurs which helps because the documentation online can lag behind.
@@DannyArends Thank you very much. Mohamed
sorry for the disturbance, the link that you have provided for debian is 12.6.0, but what you have used in the video is 11.5.0, can you please provide the link for 11.5.0?
No bother, yeah It seems a newer version was released, you can always get the older versions from the archives, a direct link to the 11.5.0 netinst image: cdimage.debian.org/mirror/cdimage/archive/11.5.0/amd64/iso-cd/debian-11.5.0-amd64-netinst.iso
@@DannyArends thanks a ton
while testing the file I am having a trouble "Error: Invalid or corrupt jarfile gatk-4.4.0.0/gatk".
how to resolve this?
What is your full command? It seems you're calling java on the folder, not the .jar file. If you are, and the error persists, redownload the gatk and extract it, a corruption can occur during download sometimes.
Hi Danny, i have 16gb RAM memory in my laptop, will i be able to do RNA seq?
For smaller data sets and genomes, 16 Gb will be enough (e.g. Yeast, Bacteria, Bees, some Plants). For Mouse or Human, 16 Gb is probably not going to be enough, and 32 / 64 Gb is going to be the minimum.
Thank you so much for the wonderful video! I am trying do this in WSL2, but as I am using a network drive, it is bit hard to follow the steps... I found out that it is not allowed to create soft link in SMB connected drive and WSL2 is very slow while writing on the mounted drives. Would this be critical in the further steps? Thank you in advance!
I haven't tried this in wsl2, mostly because I dual-boot to Linux to do bioinformatics related analysis. In theory you could run the whole analysis pipeline in windows itself since all tools are available for windows as well. You could go the wsl2 route *probably*, but it might needs some tweaks or workarounds. Even then, like virtual box the performance will not be anywhere near what's needed for real analysis. So, all in all, it's easiest to following along on linux/virtual box.
I chose a virtual box for this since my streaming setup is windows based and installing wsl2 needs a reboot which breaks the stream, so I decided a virtual box was the easiest to do a stream like this.
@@DannyArends I need to analyse actual dataset in the future, so I'll try again with the dual-boot & hard drive. I really appreciate your response!
Ik krijg een error na de "make" bij STAR: STAR.cpp:52:45: error 'parametersDefault' was not declared in this scope en ook STAR.cpp:53:20 error: 'parameterDefault_len' was not declared in this scope. Hoe kan ik hier omheen?
You're going to have to use an older version of the STAR aligner. I've had several reports now that mentioned STAR not compiling, I think it's due to them changing their build based on a newer version of linux.
So two options:
1) try installing a newer linux version
2) grab an older binary version of star and use that. (Some other comment on here.mentions the version that still works)
I'll see if I can figure out what the issue is and make another video with the solution when I do.
This is the comment I was referring to:
"Seems like the master branch is currently "broken", the quickest solution is to just download the binary distribution of the release page. The latest compiled version for linux is: github.com/alexdobin/STAR/releases/download/2.7.10a_alpha_220818/STAR_2.7.10a_alpha_220818_Linux_x86_64_static.zip
Just unzip it and put the STAR binary file in your ~/bin folder"
Hello! This video is super helpful for a beginner, but I failed start the virual box. The computer has window 7 system btw and the debian is 32-bit instead of 64. Is there anyway that i can avoid this problem
Virtual box runs fine on windows 7, you do need to install Debian with a 64bit version otherwise you're not going to be able to run the tools. 32bit OS versions are not suitable for large files.
@@DannyArends Thank you so much for the prompt reply! I will try on my mac to see how it goes then!
Good luck !
Hi Prof. Danny, Thank you for this excellent video. I have an issue regarding the update in bash file though I updated the code at the end of the bash file, I'm not able to execute the command, for example, when I execute "STAR" I'm getting the " Command 'STAR' not found, but can be installed with:
sudo apt install rna-star ".
I tried this after conda deactivate. where as I'm getting the command working in conda environment but not in other case.
Thanks for the compliment, thing with $path settings get quite complicated when conda is involved since it takes over the whole environment. Feel free to send me an email with a copy of your .bashrc file so I can take a look into it.
Hellow Professor, knocking your door for another curiosity. And that is, I have upgraded my i5 laptop's RAM from 8 gb to 16 gb and here is I am wondering what should I do, dual boot or Virtual box or wsl in windows or use linux standalone for performing rna seq analysis in some plant genome? I am using windows 10 now. So which option should be preferable to use? Thnx in advance.
I'd probably go for WSL on windows 10 for convenience and semi performant. Dualboot is nice when you have the HDD space for it (sequencing data is big), and virtualbox just has too little performance for real genome sizes.
@@DannyArends Heartiest thnx for your prompt response, Professor. And sorry to bother you again. I would like to know if I go for WSL in Windows 10 then will I have my full 16 gb ram support for rna seq data analysis? My laptop has 1 TB HDD. so Is it enough for my laptop to efficiently handle the pressure of dual booting?
WSL allows for full memory usage
@@DannyArends A tons of thnx to you Professor.
What to do if the compilation for trimmomatic has mot been done?
In that case just download trimmomatic v0.39 from here: www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.39.zip and extract it. Make sure to update the script to reflect that you're using 0.39 not 0.40-rc1
@@DannyArends thank you so much and also the virtual box version what you have used in the tutorial and the one in the pdf is different, is it fine?
The version of virtual box should not matter, the important part is to use the same Debian version
Thank you for your video. When I was installing Trimmomatic or PICARD tools etc, and then tested if it was installed, it always showed like "Unable to access jarfile picard-2.27.5-SNAPSHOT-all.jar". I use M1 Mac and installed Debian bullseye. How to fix this problem? Thank you in advance!
Hi there, if you use the file browser can you see the picard jar file in the folder? The unable to access error generally means it cannot find the file that you're telling it to execute. So make sure you're in the right location and you can see the file using ls. Alternatively you can give the full path to the file: java -jar /home/username/Software/picard/picard.jar
Thank you very much for your quick reply!
Also, again sorry for such continuous bothering but if it is possible to make videos on issues like 23andme (or similar ones), exome analysis and microarray analysis (or if there will be a plan of any of these ones). If not, it is Ok, just making another “bothering naïve” suggestion.
Some of these topics are covered in the bioinformatics lecture series here on my channel. But I'm always open to suggestions.
Hi Danny, sorry again. I just add this comment, may be it help someone or may be I have strange situation. In the step of making links, the ln step does not work for tabix and fasterq dump. (Again, one of the pain for biologists to learn linux.) Anyway, i googled and i guess i found the solution. Add f to s, so the command is "ln -sf path". Thanks. Mohamed (forget to mention that i am on ubuntu, dual boot. Also, i think the code for tabix3 is not on github).
The f (force) should only be needed when you're linking on top of an already existing file, link, or folder. It's not recommended to just overwrite what was already there, especially since it's relatively common to switch the from and to sides of the command. Perhaps you had tabix/fasterqdump already linked, and the f was needed to overwite the existing link?
@@DannyArends Thanks a lot. I am not pro in linux but i do understand what you wrote. I tried the normal steps shown in your youtube videos (session #1 & #2), and when I do ls (from within bin folder), it gives me everything in green color except fasterq-dump and tabix, they appear in red. When I browse to the folder containing tabix and fasterq-dump, they only work when i type "./tabix". This seems weird. It is there but ln command within bin is not recognizing them (i am on ubunty 22.10). So, i searched for solution and that is what i found. I am very very sorry if my answer is irrelevant or has nothing to do with your kind answer. But my conclusion is to prefer to use debian and follow exactly your master and that ubuntu may be become not good for some bioinformatics tools. Thanks again. Mohamed
Generally them being shown in red means the target of the link doesn't exist. You can check this by doing an ls command with -lathr or something, it shows the target location for each link. Make sure the link points to the executable. Delete the red links when the link points to a non-existing path, then link again if the ln command gives an error or doesn't create a link 99% of the time it's a typo in the from path
@@DannyArends thank you. Will test this and come back. Mohamed
@@DannyArends Hi Danny, I followed your steps and it worked. I have no explanation. I first, removed the links that I made using -sf, then added them again like what you did in the video here, and it works (really, very strange. I repeated before this on two computers and both links to tabix and fasterq-dump, did not work before). Anyway, thank you very much. Mohamed
working for today
Enjoy work !
+1
Hi Professor
I account an ERROR entitled: BUILD FAILED, while running the ./gradlew shadowJar command for installing PICARD. Kindly help to solve this
The real error should be mentioned before, the "build failed" is not a real error it just lets you know it couldn't create the jar file.
I can help you with this, but I would need to see the full build command you used, as well as all output. Please drop it by email (my email is listed in the about section of my channel)
@@DannyArends Thanks Professor for your answer. The problem gets solved due to Java 11 version. PICARD requires JAVA 17.
@@hnisarbiotech How did you solve this problem? How do you get JAVA17
@@hnisarbiotech how did you solve this issue
Hello, the video was extremely helpful and easy to follow. I installed everything and at the end, once I open a new terminal to check samtools or STAR, it tells bash:samtools:command not found. Whats the problem?
Also, I took the debian iso initially and not the dvd file that you used.
Hi, figured it out. /home/Rahul/software/ is the right one. I copy pasted directly which has danny in it. All are working now except STAR which has a red symbol. Any leads are helpful
Did you update the .bashrc file to add the ~/bin folder to your $PATH. see: gist.github.com/DannyArends/04d87f5590090dfe0dc6b42e5e1bbe15 (0_installSoftware.sh) line 83 to 97 where we make symbolic links in ~/bin and then use nano to update the bashrc file
A red symbol? That probably means the link isn't pointing to the correct location. Remove the link and add it again, using the tab key to auto complete paths will prevent some failures like typos and capitalization issues.
@@DannyArends Thank you so much for the fast response. I did update .bashrc file initially, but after I updated my name and added all 5 files again, I didn't do it
Looks like I have two STAR folders- One in software and one in home. Should I remove one?
in ubuntu you need to run the vdb-config --interactive in the /bin that is at the root of your extracted file that should be in sratoolkit folder if you have mkdir one otherwise it's going to be in root of your /software folder. (Maybe because it'd my machine, but it is the most annoying program ever)
Thanks for the info, I tend to run a debian based OS.
@@DannyArends No worries it's very similar. Had to interrupt myself because it was a very long install and my day started, tomorrow I'll resume and try part 2. Thanks for the great work!
Hi Danny, thanks for sharing this video! I'm a beginner in this field and am following your tutorial step-by-step.
However, I'm stuck at the STAR software at the moment. I can't seem to compile the software. Error is as below:
'rm' -f STAR.o Parameters.o
g++ -c -O3 -std=c++11 -fopenmp -D'COMPILATION_TIME_PLACE="2024-03-14T10:26:24+08:00 :/home/farr/software/STAR/source"' -D'GIT_BRANCH_COMMIT_DIFF="On branch master ; commit b1edc1208d91a53bf40ebae8669f71d50b994851 ; diff files: "' -pipe -Wall -Wextra STAR.cpp
STAR.cpp: In function ‘void usage(int)’:
STAR.cpp:52:45: error: ‘parametersDefault’ was not declared in this scope
52 | cout.write(reinterpret_cast(parametersDefault),
| ^~~~~~~~~~~~~~~~~
STAR.cpp:53:20: error: ‘parametersDefault_len’ was not declared in this scope
53 | parametersDefault_len);
| ^~~~~~~~~~~~~~~~~~~~~
make: *** [Makefile:100: STAR.o] Error 1
How do I solve this error?
Seems like the master branch is currently "broken", the quickest solution is to just download the binary distribution of the release page. The latest compiled version for linux is: github.com/alexdobin/STAR/releases/download/2.7.10a_alpha_220818/STAR_2.7.10a_alpha_220818_Linux_x86_64_static.zip
Just unzip it and put the STAR binary file in your ~/bin folder
Hi@DannyArends, thanks, thanks for sharing the detailed video. I had set up my own Linux for RNA seq by following your instructions. However, I was wondering if there are any reasons why we create primary_assembly using R?
The answer is that the Ensembl ftp server doesn't provide a primary assembly for saccharomyces cerevisiae to download, while it does for e.g. mouse/human and other commonly used model organisms.
For saccharomyces only the top-level genome build is provided, but top level builds include all chromsomes (aka the primary assembly), but also regions not assembled into chromosomes (contigs) and N padded haplotype/patch regions. According to Ensembl documentation when no primary assembly is provided it's because the toplevel one is complete, so in this case we could have used the toplevel one (since it'll be identical to the primary assembly) but for most genomes (e.g. mouse) there will be a difference and for alignment 99% of the cases you're going to use the primary assembly.
If you'd use the top level for alignment, then you're going to have to deal with these additional regions later on in the analysis which creates additional complexity in the pipeline and 99% of people ignore these regions anyway.
I just added the step of building it, since its not difficult and I think it shows how you can use any genome/reference in fasta to align against.
(More info see: ftp.ensembl.org/pub/release-108/fasta/saccharomyces_cerevisiae/dna/README)
@@DannyArends Tried the latest compiled version, but the same error appeared. 😞
If you're using the binary, you can't have this compilation error, since you can skip the compilation (no need to build the binary, since you downloaded it).
Just download the binary, put it in ~/bin and then run STAR from the command line. You can skip the make commands to build STAR.