Genes and geography -- a bioinformatics project

Поділитися
Вставка
  • Опубліковано 16 січ 2025

КОМЕНТАРІ • 63

  • @Telemed911
    @Telemed911 3 роки тому +32

    She is a great teacher of bioinformatics!!! - This from a retired professor of computational medicine and bioinformatics at Michigan...

  • @lilacspring2556
    @lilacspring2556 3 роки тому +9

    I literally have an assignment on this that I have to work on today, you're a godsend!

  • @alessandrogagliardi7470
    @alessandrogagliardi7470 2 роки тому +4

    I just started my master degree in Computational Biology and these videos are kind of inspiring! Coming from an undergrad in Biotechnology, I have a lot of work to do and I hope I could reach good Bioinformatics skills in the next two years! Thank you again for the content

    • @francescosilvestro2092
      @francescosilvestro2092 Рік тому

      Where are you taken the Master degree? I'm a Medical biotech student at Federico II, Naples who has request trainship in bioinformatic (translational genomic). As autodidact, I'm learning PCA and multivariate analysis.

    • @alessandrogagliardi7470
      @alessandrogagliardi7470 Рік тому +1

      @@francescosilvestro2092 I'm taking the master degree in Trento. It has very respectable research groups in the field

  • @Kitsune152
    @Kitsune152 Рік тому +1

    I'm not a biologist, just here for the really cool bioinformatics videos you do! Thanks

  • @AbhinavSrivastava-xe7xi
    @AbhinavSrivastava-xe7xi Рік тому +2

    I'm a computer scientist, very fun to watch these. Will try it out

  • @santiagomedina8585
    @santiagomedina8585 3 роки тому +5

    Wow, this was a relly awesome video!!. Specially for me doing my phd in pop-gene. Looking forward for more like this.

  • @balqeesmansour6692
    @balqeesmansour6692 2 роки тому +2

    very informative video thanks a lot, may you explain how you got the number of SNPs ??

  • @furkanmtorun
    @furkanmtorun 3 роки тому

    Thanks a lot for the great video! I look forward to seeing more such content!

  • @patricioperez1985
    @patricioperez1985 3 роки тому +1

    Thanks Maria, your content is really great!

  • @yasintopcu4042
    @yasintopcu4042 3 роки тому

    thanks! looking forward to seeing more

  • @latinadna
    @latinadna 3 роки тому

    thank you so much! super comprehensive

  • @indologyandindianhistory673
    @indologyandindianhistory673 2 роки тому +1

    Hi Maria! Great video, I have my own data in VCF format, is there a way I could plot it together with the rest of the data you've shown here. Look forward to any guidance or tips on how to do that

    • @OMGenomics
      @OMGenomics  2 роки тому +1

      You could do the same just swapping out the VCF for your own, then in the colab you could load them both and then pd.concat them. Check pandas documentation for more details.

    • @indologyandindianhistory673
      @indologyandindianhistory673 2 роки тому

      @@OMGenomics thanks Maria! I'll try it over the weekend. Will get back to you if I face any issues :)

  • @louisvalois3863
    @louisvalois3863 2 роки тому +1

    Sorry, I'm an amateur researcher and I study and compare ancient samples and populations. I mainly use GEDMATCH and Mytrueancestry. Maybe you can tell me what data format the MTA uses in its database? Full BAM files downloaded from archives or their minified version?
    Because very strange results usually appear when comparing archaic and recent samples.
    Sorry if I asked a stupid question. I just want to get an answer to whether simple TXT file-based gene samples are suitable for scientific testing.
    The point is that I found the downloadable WGS database of Hungarian medieval rulers and I also want to perform higher-level tests and analyzes with BAM files.

    • @OMGenomics
      @OMGenomics  2 роки тому +1

      I’m not actually familiar with MTA or its data format, but I just googled it, and it looks like it takes data from various services. Does that include 23andMe and/or Ancestry? In that case those would be SNP data so you wouldn’t have full bam files because there are no sequencing reads but rather just the SNP genotypes. You can get back and forth between these and a VCF by converting SNP rs IDs to their genomic locations, though I don’t know what tool to use for this off the top of my head….

    • @louisvalois3863
      @louisvalois3863 2 роки тому

      @@OMGenomics Thank you very much for your reply, I really appreciate it. This matches what I guessed so far.
      In short, it is about the fact that, depending on the subscription, the MTA makes a certain number of archaic samples available to its subscribers. The maximum is 700 samples. Then I upload my 23andme or FTDNA or Myheritage raw data. And then I can compare myself to this specified 700 ancient people.
      But the problem is that with some people I can match up to 7 segments and 240 centimorgans, which I think is impossible with a person who lived 800 years ago. It's like being a first cousin of a person who lived 25 generations ago.
      Since I am not an IT specialist, I only assume that this contradiction is caused by the different data formats. So I think the matches seen in the MTA are not true

    • @louisvalois3863
      @louisvalois3863 2 роки тому

      @@OMGenomics Or, for example, what you say is confirmed when a few days ago King Béla III's mitochondrial DNA was given T2b2b1. It stayed that way for a couple of days until it was upgraded to H1b, which it actually was. So this company is really working with data that lacks essential genetic information

    • @OMGenomics
      @OMGenomics  2 роки тому +1

      Interesting! I asked the hive mind on Twitter, so I hope my extended network includes enough ancient DNA experts to help check your concerns.

    • @louisvalois3863
      @louisvalois3863 2 роки тому

      @@OMGenomics Thank you very much, it's very cooI, I will be very interested in expert opinions

  • @felipenunezvillena2141
    @felipenunezvillena2141 Рік тому

    Dear Maria. Thanks for this video, I think it was very insightful for biologists like me on how we can control RNA-seq data based on subject genotype (i.e: When that info is not available through the metadata). After seeing the video i was thinking why there is no much research on the application of dimensionality reduction techniques on Whole Exome Sequencing (WES) data ??. It won't be also interesting to attempt to stratify gene expression profile based on potential variants-causing diseases?. I would love to hear your opinion on this subject. Cheers

  • @MhreteabWeldebrhan
    @MhreteabWeldebrhan Рік тому

    How do I get a bioinformatics title for my final thesis

  • @alejandrogonzalesdezavala6930
    @alejandrogonzalesdezavala6930 3 роки тому

    This was so satisfying to watch!

  • @BrenaCedraz
    @BrenaCedraz 3 роки тому

    God bless you, anyway you alreadt is a goddes! Thank youuu

  • @leandronascimento5552
    @leandronascimento5552 3 роки тому

    Nice!! Thank you so much!

  • @onatovonatovic526
    @onatovonatovic526 3 роки тому

    Thank you so much!

  • @angezoclanclounon1751
    @angezoclanclounon1751 3 роки тому

    Thanks a lot for this nice video.

  • @frankr2007
    @frankr2007 Рік тому

    The file was to big for my virtual box linux, any advice?

  • @nextgengenomics
    @nextgengenomics 3 роки тому

    Very cool!

  • @franciscoromogaray3076
    @franciscoromogaray3076 Рік тому

    How long should it take to download? It's been a reaaaally long time and it's still loading

  • @zahraazkiar7209
    @zahraazkiar7209 7 місяців тому

    hey i cant open the link provided by 1000 vcf genomes! it says can't connect??

    • @OMGenomics
      @OMGenomics  7 місяців тому

      Hey! I just checked and it was working for me. Can you include the exact command you ran?

  • @beeryya
    @beeryya 3 роки тому

    Great video.

  • @MrKasshiff
    @MrKasshiff 3 роки тому

    What software you are using for taking notes and writing python script?

    • @OMGenomics
      @OMGenomics  3 роки тому +1

      VSCode, longer name is visual studio code

  • @samifawcett4246
    @samifawcett4246 3 роки тому +1

    nice.

  • @praveenrathore315
    @praveenrathore315 3 роки тому

    Hii Mam this is very important topic

  • @alessiailas4929
    @alessiailas4929 3 роки тому

    I got lost at the 2 min mark, because the link doesn't work for me :( do you know how I can fix that? it just gives me a blank page

    • @OMGenomics
      @OMGenomics  3 роки тому

      Which link? Btw everything you need is on the github repo I linked in the description.

    • @kevinalexis9886
      @kevinalexis9886 2 роки тому

      You can download the vcf files directly from your Bash Terminal. You'll just need to type it in manually as shown here at 3:30
      Also if you visit her repo you'll see she shared the commands there as well.

  • @zhengyu2763
    @zhengyu2763 3 роки тому

    👍👍👍👍

  • @lilacspring2556
    @lilacspring2556 3 роки тому +1

    Would be helpful if the video was broken up into parts so we can click on the bit of the video we're actually interested in

    • @OMGenomics
      @OMGenomics  3 роки тому +6

      Yea I didn't have time to do that before, but I just finished adding those time points now. Enjoy!

    • @lilacspring2556
      @lilacspring2556 3 роки тому

      @@OMGenomics thanks so much!

  • @saharmosallam3449
    @saharmosallam3449 3 роки тому

    Hello thanks for this interesting video, I wanna learn bioinformatics, can I found any help here my friends

    • @islamsalah4314
      @islamsalah4314 2 роки тому

      Yes, watch Maria videos in order .. 1- What is bioinformatics 2- getting started in bioinformatics 3- Five steps ...

  • @aewe4239
    @aewe4239 3 роки тому

    It would be awesome if you could exactly copy what you did on R into Python.

    • @OMGenomics
      @OMGenomics  3 роки тому

      What do you mean? Which thing I did in R?

    • @aewe4239
      @aewe4239 3 роки тому

      ​@@OMGenomics OMG thank you so much for your reply. I would like to tell you that I am a big fan of your OMGenomics show. I watched all of your R videos and the one called Plotting in R for Biologists is really helpful for beginners. If you have time I would appreciate it if you could teach us plotting in Python for biologists. I personally ask if you could release a video clip on how to deal with batch-effect correction in genomics data analysis. Thanks!!

    • @austinkunch710
      @austinkunch710 3 роки тому

      @@aewe4239 w3schools has good intro python stuff

    • @frangarcia1699
      @frangarcia1699 2 роки тому

      @@aewe4239 she is working on python all the time on this video.

  • @robertb2664
    @robertb2664 2 роки тому

    What if your vcf contains variants where some samples have ./. genotypes (no calls) ? The code you posted does not appear to work for this type of data. Any suggestions? Thanks

    • @OMGenomics
      @OMGenomics  2 роки тому

      Ah yes, handling missing data. You can assume they are 0/0 or exclude those loci or the samples entirely, depending on the consequences. If it’s only a minority of loci, excluding them might be best. Assuming 0/0 can be a good solution when they’re scattered across most loci and most samples.

    • @robertb2664
      @robertb2664 2 роки тому

      @@OMGenomics thanks, great video

  • @KostasTzouvanas
    @KostasTzouvanas 2 дні тому

    Hello, I will create and publish an AWS blog that executes the same concept, but through AWS tooling : S3, AWS HealthOmics, SageMaker Notebooks. Do you want to participate ? Really like your channel and the way you present btw ! Great Work !

  • @elvisnnaemeka6722
    @elvisnnaemeka6722 3 роки тому

    Please be my mentor.