I just started my master degree in Computational Biology and these videos are kind of inspiring! Coming from an undergrad in Biotechnology, I have a lot of work to do and I hope I could reach good Bioinformatics skills in the next two years! Thank you again for the content
Where are you taken the Master degree? I'm a Medical biotech student at Federico II, Naples who has request trainship in bioinformatic (translational genomic). As autodidact, I'm learning PCA and multivariate analysis.
Hi Maria! Great video, I have my own data in VCF format, is there a way I could plot it together with the rest of the data you've shown here. Look forward to any guidance or tips on how to do that
You could do the same just swapping out the VCF for your own, then in the colab you could load them both and then pd.concat them. Check pandas documentation for more details.
Sorry, I'm an amateur researcher and I study and compare ancient samples and populations. I mainly use GEDMATCH and Mytrueancestry. Maybe you can tell me what data format the MTA uses in its database? Full BAM files downloaded from archives or their minified version? Because very strange results usually appear when comparing archaic and recent samples. Sorry if I asked a stupid question. I just want to get an answer to whether simple TXT file-based gene samples are suitable for scientific testing. The point is that I found the downloadable WGS database of Hungarian medieval rulers and I also want to perform higher-level tests and analyzes with BAM files.
I’m not actually familiar with MTA or its data format, but I just googled it, and it looks like it takes data from various services. Does that include 23andMe and/or Ancestry? In that case those would be SNP data so you wouldn’t have full bam files because there are no sequencing reads but rather just the SNP genotypes. You can get back and forth between these and a VCF by converting SNP rs IDs to their genomic locations, though I don’t know what tool to use for this off the top of my head….
@@OMGenomics Thank you very much for your reply, I really appreciate it. This matches what I guessed so far. In short, it is about the fact that, depending on the subscription, the MTA makes a certain number of archaic samples available to its subscribers. The maximum is 700 samples. Then I upload my 23andme or FTDNA or Myheritage raw data. And then I can compare myself to this specified 700 ancient people. But the problem is that with some people I can match up to 7 segments and 240 centimorgans, which I think is impossible with a person who lived 800 years ago. It's like being a first cousin of a person who lived 25 generations ago. Since I am not an IT specialist, I only assume that this contradiction is caused by the different data formats. So I think the matches seen in the MTA are not true
@@OMGenomics Or, for example, what you say is confirmed when a few days ago King Béla III's mitochondrial DNA was given T2b2b1. It stayed that way for a couple of days until it was upgraded to H1b, which it actually was. So this company is really working with data that lacks essential genetic information
Dear Maria. Thanks for this video, I think it was very insightful for biologists like me on how we can control RNA-seq data based on subject genotype (i.e: When that info is not available through the metadata). After seeing the video i was thinking why there is no much research on the application of dimensionality reduction techniques on Whole Exome Sequencing (WES) data ??. It won't be also interesting to attempt to stratify gene expression profile based on potential variants-causing diseases?. I would love to hear your opinion on this subject. Cheers
You can download the vcf files directly from your Bash Terminal. You'll just need to type it in manually as shown here at 3:30 Also if you visit her repo you'll see she shared the commands there as well.
@@OMGenomics OMG thank you so much for your reply. I would like to tell you that I am a big fan of your OMGenomics show. I watched all of your R videos and the one called Plotting in R for Biologists is really helpful for beginners. If you have time I would appreciate it if you could teach us plotting in Python for biologists. I personally ask if you could release a video clip on how to deal with batch-effect correction in genomics data analysis. Thanks!!
What if your vcf contains variants where some samples have ./. genotypes (no calls) ? The code you posted does not appear to work for this type of data. Any suggestions? Thanks
Ah yes, handling missing data. You can assume they are 0/0 or exclude those loci or the samples entirely, depending on the consequences. If it’s only a minority of loci, excluding them might be best. Assuming 0/0 can be a good solution when they’re scattered across most loci and most samples.
Hello, I will create and publish an AWS blog that executes the same concept, but through AWS tooling : S3, AWS HealthOmics, SageMaker Notebooks. Do you want to participate ? Really like your channel and the way you present btw ! Great Work !
She is a great teacher of bioinformatics!!! - This from a retired professor of computational medicine and bioinformatics at Michigan...
I literally have an assignment on this that I have to work on today, you're a godsend!
I just started my master degree in Computational Biology and these videos are kind of inspiring! Coming from an undergrad in Biotechnology, I have a lot of work to do and I hope I could reach good Bioinformatics skills in the next two years! Thank you again for the content
Where are you taken the Master degree? I'm a Medical biotech student at Federico II, Naples who has request trainship in bioinformatic (translational genomic). As autodidact, I'm learning PCA and multivariate analysis.
@@francescosilvestro2092 I'm taking the master degree in Trento. It has very respectable research groups in the field
I'm not a biologist, just here for the really cool bioinformatics videos you do! Thanks
I'm a computer scientist, very fun to watch these. Will try it out
Wow, this was a relly awesome video!!. Specially for me doing my phd in pop-gene. Looking forward for more like this.
very informative video thanks a lot, may you explain how you got the number of SNPs ??
Thanks a lot for the great video! I look forward to seeing more such content!
Thanks Maria, your content is really great!
thanks! looking forward to seeing more
thank you so much! super comprehensive
Hi Maria! Great video, I have my own data in VCF format, is there a way I could plot it together with the rest of the data you've shown here. Look forward to any guidance or tips on how to do that
You could do the same just swapping out the VCF for your own, then in the colab you could load them both and then pd.concat them. Check pandas documentation for more details.
@@OMGenomics thanks Maria! I'll try it over the weekend. Will get back to you if I face any issues :)
Sorry, I'm an amateur researcher and I study and compare ancient samples and populations. I mainly use GEDMATCH and Mytrueancestry. Maybe you can tell me what data format the MTA uses in its database? Full BAM files downloaded from archives or their minified version?
Because very strange results usually appear when comparing archaic and recent samples.
Sorry if I asked a stupid question. I just want to get an answer to whether simple TXT file-based gene samples are suitable for scientific testing.
The point is that I found the downloadable WGS database of Hungarian medieval rulers and I also want to perform higher-level tests and analyzes with BAM files.
I’m not actually familiar with MTA or its data format, but I just googled it, and it looks like it takes data from various services. Does that include 23andMe and/or Ancestry? In that case those would be SNP data so you wouldn’t have full bam files because there are no sequencing reads but rather just the SNP genotypes. You can get back and forth between these and a VCF by converting SNP rs IDs to their genomic locations, though I don’t know what tool to use for this off the top of my head….
@@OMGenomics Thank you very much for your reply, I really appreciate it. This matches what I guessed so far.
In short, it is about the fact that, depending on the subscription, the MTA makes a certain number of archaic samples available to its subscribers. The maximum is 700 samples. Then I upload my 23andme or FTDNA or Myheritage raw data. And then I can compare myself to this specified 700 ancient people.
But the problem is that with some people I can match up to 7 segments and 240 centimorgans, which I think is impossible with a person who lived 800 years ago. It's like being a first cousin of a person who lived 25 generations ago.
Since I am not an IT specialist, I only assume that this contradiction is caused by the different data formats. So I think the matches seen in the MTA are not true
@@OMGenomics Or, for example, what you say is confirmed when a few days ago King Béla III's mitochondrial DNA was given T2b2b1. It stayed that way for a couple of days until it was upgraded to H1b, which it actually was. So this company is really working with data that lacks essential genetic information
Interesting! I asked the hive mind on Twitter, so I hope my extended network includes enough ancient DNA experts to help check your concerns.
@@OMGenomics Thank you very much, it's very cooI, I will be very interested in expert opinions
Dear Maria. Thanks for this video, I think it was very insightful for biologists like me on how we can control RNA-seq data based on subject genotype (i.e: When that info is not available through the metadata). After seeing the video i was thinking why there is no much research on the application of dimensionality reduction techniques on Whole Exome Sequencing (WES) data ??. It won't be also interesting to attempt to stratify gene expression profile based on potential variants-causing diseases?. I would love to hear your opinion on this subject. Cheers
How do I get a bioinformatics title for my final thesis
This was so satisfying to watch!
God bless you, anyway you alreadt is a goddes! Thank youuu
Nice!! Thank you so much!
Thank you so much!
Thanks a lot for this nice video.
The file was to big for my virtual box linux, any advice?
Very cool!
How long should it take to download? It's been a reaaaally long time and it's still loading
hey i cant open the link provided by 1000 vcf genomes! it says can't connect??
Hey! I just checked and it was working for me. Can you include the exact command you ran?
Great video.
What software you are using for taking notes and writing python script?
VSCode, longer name is visual studio code
nice.
Hii Mam this is very important topic
I got lost at the 2 min mark, because the link doesn't work for me :( do you know how I can fix that? it just gives me a blank page
Which link? Btw everything you need is on the github repo I linked in the description.
You can download the vcf files directly from your Bash Terminal. You'll just need to type it in manually as shown here at 3:30
Also if you visit her repo you'll see she shared the commands there as well.
👍👍👍👍
Would be helpful if the video was broken up into parts so we can click on the bit of the video we're actually interested in
Yea I didn't have time to do that before, but I just finished adding those time points now. Enjoy!
@@OMGenomics thanks so much!
Hello thanks for this interesting video, I wanna learn bioinformatics, can I found any help here my friends
Yes, watch Maria videos in order .. 1- What is bioinformatics 2- getting started in bioinformatics 3- Five steps ...
It would be awesome if you could exactly copy what you did on R into Python.
What do you mean? Which thing I did in R?
@@OMGenomics OMG thank you so much for your reply. I would like to tell you that I am a big fan of your OMGenomics show. I watched all of your R videos and the one called Plotting in R for Biologists is really helpful for beginners. If you have time I would appreciate it if you could teach us plotting in Python for biologists. I personally ask if you could release a video clip on how to deal with batch-effect correction in genomics data analysis. Thanks!!
@@aewe4239 w3schools has good intro python stuff
@@aewe4239 she is working on python all the time on this video.
What if your vcf contains variants where some samples have ./. genotypes (no calls) ? The code you posted does not appear to work for this type of data. Any suggestions? Thanks
Ah yes, handling missing data. You can assume they are 0/0 or exclude those loci or the samples entirely, depending on the consequences. If it’s only a minority of loci, excluding them might be best. Assuming 0/0 can be a good solution when they’re scattered across most loci and most samples.
@@OMGenomics thanks, great video
Hello, I will create and publish an AWS blog that executes the same concept, but through AWS tooling : S3, AWS HealthOmics, SageMaker Notebooks. Do you want to participate ? Really like your channel and the way you present btw ! Great Work !
Please be my mentor.