How to build a machine learning model to predict antimicrobial peptides (End-to-end Bioinformatics)

Поділитися
Вставка
  • Опубліковано 25 сер 2024
  • Antimicrobial resistance is an urgent and global health problem as existing drugs are becoming ineffective against the treatment of antimicrobial infections. In this video, i will be showing you how to build an end-to-end bioinformatics project where we will be building a machine learning model to predict antimicrobial peptides. Particularly, we will be retrieving 2 datasets consisting of antimicrobial peptides (positive set) and non-antimicrobial peptides (negative set). Then, we will be computing some peptide features to quantitatively describe peptides followed by model building and finally model interpretation where we shed light on the key important features important for predicting antimicrobial peptides.
    👉 Code: github.com/dat...
    --------------------
    🌟 Subscribe to this UA-cam channel www.youtube.co...
    🌟 Join the Newsletter of Data Professor newsletter.data...
    🌟 Buy me a coffee www.buymeacoff...
    🌟 Download Kite for FREE www.kite.com/g...
    ⭕ Playlist:
    Check out our other videos in the following playlists.
    ✅ Data Science 101: bit.ly/datapro...
    ✅ Data Science UA-camr Podcast: bit.ly/datasci...
    ✅ Data Science Virtual Internship: bit.ly/datapro...
    ✅ Bioinformatics: bit.ly/dataprof...
    ✅ Data Science Toolbox: bit.ly/datapro...
    ✅ Streamlit (Web App in Python): bit.ly/datapro...
    ✅ Shiny (Web App in R): bit.ly/datapro...
    ✅ Google Colab Tips and Tricks: bit.ly/datapro...
    ✅ Pandas Tips and Tricks: bit.ly/datapro...
    ✅ Python Data Science Project: bit.ly/datapro...
    ✅ R Data Science Project: bit.ly/datapro...
    ✅ Weka (No Code Machine Learning): bit.ly/dp-weka
    ⭕ Recommended Books:
    🌟kit.co/datapro...
    ✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
    ✅ Data Science from Scratch : amzn.to/3fO0JiZ
    ✅ Python Data Science Handbook : amzn.to/37Tvf8n
    ✅ R for Data Science : amzn.to/2YCPcgW
    ✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
    ✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
    ⭕ Stock photos, graphics and videos used on this channel:
    ✅ 1.envato.marke...
    ⭕ Follow us:
    ✅ Medium: bit.ly/chanin-m...
    ✅ FaceBook: / dataprofessor
    ✅ Website: dataprofessor.org/ (Under construction)
    ✅ Twitter: / thedataprof
    ✅ Instagram: / data.professor
    ✅ LinkedIn: / chanin-nantasenamat
    ✅ GitHub 1: github.com/dat...
    ✅ GitHub 2: github.com/cha...
    ⭕ Disclaimer:
    Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
    #artoflearningdatascience #learndatascience #66daysofdata #datascience #machinelearning #python #bigdata #datamining #bigdata #datascienceworkshop #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #dataprofessor #ai #datascientist

КОМЕНТАРІ • 80

  • @DataProfessor
    @DataProfessor  3 роки тому +8

    👉Watch this next (How to learn data science in 2021) ua-cam.com/video/oR670Txwh88/v-deo.html
    Hi friends! I hope this video was helpful, please make sure to LIKE, SUBSCRIBE, COMMENT and SHARE!
    ** Support this Channel
    🌟 Buy me a coffee www.buymeacoffee.com/dataprofessor
    🌟 Download Kite for FREE www.kite.com/get-kite/?
    👉 Subscribe to this UA-cam channel ua-cam.com/users/dataprofessor
    👉 Join the Newsletter of Data Professor newsletter.dataprofessor.org

  • @emmanuelonah4596
    @emmanuelonah4596 2 роки тому +5

    Discovering you is the best thing that have ever happened to me. Needless to say that my bio- and Cheminformatics skills have increased significantly. Thank you so much DataProfessor.

    • @DataProfessor
      @DataProfessor  2 роки тому

      That’s awesome Emmanuel, glad that the content was helpful 😊

  • @jasdeep003
    @jasdeep003 2 роки тому +1

    We biologists sincerely appreciate your efforts and love you Data Professor !!

    • @DataProfessor
      @DataProfessor  2 роки тому +1

      My pleasure, glad to hear that it’s helpful :)

  • @baotao922
    @baotao922 9 місяців тому

    Starting my self learning Bioinformatics journey and your channel is just awesome! Thank you for the guidance ❤

  • @angsumandas1
    @angsumandas1 3 роки тому +11

    Thank you for keeping our request. Biologists love data professor

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      A pleasure, thanks for the support 😆

  • @Ali-pf9or
    @Ali-pf9or 3 роки тому +1

    I watched it before but now I'm here to comment and share it!

    • @DataProfessor
      @DataProfessor  3 роки тому

      Awesome, thanks again for watching, commenting helps with the algorithm! :)

  • @mariopaul6505
    @mariopaul6505 3 роки тому +1

    It's awesome how well you show how to use data science in the bioinformatics field Chanin! Great video!

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      Hi Mario, glad to hear from you! Means a lot, thank you for your kind words 😊

    • @mariopaul6505
      @mariopaul6505 3 роки тому +2

      @@DataProfessor You're welcome!

  • @salma-amlas
    @salma-amlas 3 місяці тому +1

    Can you explain what is meant by negative and positive? The features label here is a bit confusing

  • @suecheng3755
    @suecheng3755 3 роки тому

    Thank you. I am going to buildup some ML tools for drug discovery with bioinformatic data analysis from your great work!

  • @TinaHuang1
    @TinaHuang1 3 роки тому +2

    So awesome data professor!!

  • @Ashis_Udgata
    @Ashis_Udgata Місяць тому +1

    I am getting error message
    TypeError Traceback (most recent call last)
    in ()
    19 return df
    20
    ---> 21 feature = feature_calc(pos, neg, aac) # AAC
    22 #feature = feature_calc(pos, neg, dpc) # DPC
    23 feature
    in feature_calc(po, ne, feature_name)
    6 def feature_calc(po, ne, feature_name):
    7 # Calculate feature
    ----> 8 po_feature = feature_name(po)
    9 ne_feature = feature_name(ne)
    10 # Create class labels
    TypeError: 'str' object is not callable

  • @VyshnavieRSarma-rb7ur
    @VyshnavieRSarma-rb7ur 3 роки тому +1

    Thanks a lot Data Professor... This is very useful to us. Thank you

  • @qaziacademy3048
    @qaziacademy3048 2 роки тому

    So beautiful project it is. I learnt a lot. Thank you Data Professor.

  • @abdulmalikaliyu8876
    @abdulmalikaliyu8876 3 роки тому +2

    Thank you Data Professor 🙏for this video on AMP activity prediction. I am working on AMP and have synthesised 2. My question is if we want to predict like activity value like MIC or IC50 of AMP, can you suggest how to get the data so as to try it out.

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      There are curated databases that compiles bioactivity data for AMPs, some example are YADAMP, DRAMP, DBAASP, APD3, etc.

  • @Ajeet-Yadav-IIITD
    @Ajeet-Yadav-IIITD 2 роки тому +2

    2:00 GPS raghava is my Prof. at IIITD , India

    • @DataProfessor
      @DataProfessor  2 роки тому

      Awesome, he’s a great professor 😊

  • @zadilkhwaja
    @zadilkhwaja 3 роки тому +3

    Any suggestions what can I modify in this project on my own if any

    • @DataProfessor
      @DataProfessor  3 роки тому +3

      That's a good question! In the video I showed how to use the amino acid composition as the peptide feature, you can use other peptide features which there are more than 20 different types of peptide features to select from and use. You can even do a comparison to see which of the 20+ peptide features provides the best performance. Then you can also use lazypredict to evaluate more than 30+ different ML algorithms on this dataset (see this video on how I used lazypredict ua-cam.com/video/ZdDUwlwJNi0/v-deo.html).

  • @asadmustafa2133
    @asadmustafa2133 3 роки тому

    Thank you for this great video! Can you please make a video to make input data to predict Antibiotic resistance by machine learning. How bioinformatics tools will be used to make the input data? Please

  • @user-hm5bj3be5o
    @user-hm5bj3be5o Рік тому

    Hi Dataprofessor, Thanks again for your inspiring video. I have one additional question.
    What technique should we use to scale feature values when combining several different types of individual features? Can we simply concatenate the matrices without performing any further processing

  • @luctiber
    @luctiber 3 роки тому

    Hi data professor, thanks for creating and sharing these amazing videos. I need your insight on a question bothering me. How to tackle the following kind of use case. Let’s say I am analysing the sells of a shop and the shop keeper question is which item should I present in the same area with others so I can increase my average revenue. To help the analysis I have all the items in the shop with their location over the last 10 years and on the other hand I have all the purchase my customers did over the last 10 years. How would you conduct the analysis ? Geolocation of the items associated with basket analysis? Thanks for your recommendation

  • @anchitadassarma7812
    @anchitadassarma7812 6 місяців тому

    Please provide the codes for other features like ABC, AAI, TPC etc. I tried but codes are not running properly and it's taking so much time

  • @nirajkc6514
    @nirajkc6514 Рік тому

    I have master in medical microbiology and also master degree in data science, please make few more project

  • @notknown9307
    @notknown9307 3 роки тому +3

    This is gonna be viral

    • @TinaHuang1
      @TinaHuang1 3 роки тому +1

      hahaha this made me laugh

    • @DataProfessor
      @DataProfessor  3 роки тому

      Haha, thanks! It’s literally antimicrobial 😆

  • @clivedarwell5732
    @clivedarwell5732 2 роки тому

    Hello - do you know why I'm getting "'wget' is not recognized as an internal or external command" when running Jupyter through Anaconda?

  • @sorias8
    @sorias8 3 роки тому +1

    Great video! I work with AMPs and I was really thinking of ways to play with the data. I have one curiosity, If you had one more entry, e.g. another peptide you synthesize, you could predict the outcome no? You just need to add another row in the dataframework right?
    Thank you for such comprehensive video. :)

    • @DataProfessor
      @DataProfessor  3 роки тому +2

      Thanks for watching. The new unknown peptides is actually quite similar to the test set as shown in the video where the new peptide would have computed peptide features where a trained model will be applied to make a prediction of this peptide, think model.predict(X_new, Y_new)

  • @kashafnaz_
    @kashafnaz_ 2 роки тому

    sir the out of AAC is showing in quantitatively but our input data is text-based and from lower to high, how could it be possible numerically what is meaningful and what is showing?? please reply

  • @MrTasukae
    @MrTasukae 3 роки тому

    I would like to know if it's a peptide sequence prediction then is it possible to adapt this one? In case, I have a possitivr epitope and negative as the same as the example. In this example, I have tried to concat the data but more than 3 is a bit weird. Would you mind guiding me for that?

  • @fs6716
    @fs6716 9 місяців тому

    Danke!

    • @DataProfessor
      @DataProfessor  9 місяців тому

      Thanks @fs6716 for the kind support!

  • @harshkumarsingh5815
    @harshkumarsingh5815 2 роки тому

    Thanks professor had one question, can we use cdhit to compare two separate databases (say AMP and NON-AMP) then can we check which peptide sequence is similar to each other in different databases?

  • @biochemistrybee2494
    @biochemistrybee2494 2 роки тому +1

    Very nice video sir. Can we do this for anti-inflammatory peptides and anticancer peptides also

    • @DataProfessor
      @DataProfessor  2 роки тому

      Yes definitely, just replace the dataset and calculate the descriptors.

    • @biochemistrybee2494
      @biochemistrybee2494 2 роки тому +1

      @@DataProfessor thank you so much sir. Your video made my research work very worthy. Thank you very much

  • @abdelrahmanmedhat1392
    @abdelrahmanmedhat1392 2 роки тому

    Hello how did you install pfeature manually

  • @romashagupta5287
    @romashagupta5287 3 роки тому

    Thank you for the video sir. But there is an issue while running the data split step, there comes an error saying "Input contains NaN, infinity or a value too large for dtype('float64')". Please help.

  • @username42
    @username42 3 роки тому +2

    can we relate this info in this video to the molecular docking somehow? or have you ever perform molecular docking ?

    • @DataProfessor
      @DataProfessor  3 роки тому +2

      Yes definitely we can do that! You know your bioinformatics! If interested I can make a video showing that

    • @username42
      @username42 3 роки тому +1

      @@DataProfessor thanks that would be great!

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      @@username42 👌 ok

  • @wildrifter2583
    @wildrifter2583 3 роки тому +1

    Great video again as always

  • @Anmkhan-oh3iw55
    @Anmkhan-oh3iw55 Рік тому

    Hy i am having an error in importing lazyclassifier
    Can any one help me with tht???

  • @PythonCodeMan
    @PythonCodeMan 3 роки тому +2

    Great...

  • @kashafnaz_
    @kashafnaz_ 2 роки тому +1

    sir, I chose MS in data science after BS in bioinformatics inspired by you...Is it the right decision?

  • @adarigirishkumar6567
    @adarigirishkumar6567 3 роки тому +1

    Hi Data Professor, what if i want apply more than one function, like aac_wp + ddr_wp. is it possible ?

    • @DataProfessor
      @DataProfessor  3 роки тому

      Yes, definitely, you can re-iterate the code block and replace the function name accordingly to get the other descriptors.

  • @yangzhang2999
    @yangzhang2999 3 роки тому +1

    Thank you for the awesome tutorial! I noticed in the Pypl repository, there is a 'Pepfeature' lib, just wondering if you have any knowledge about how it compared to the Pfeature lib. Thanks a lot.

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      I looked at the documentation noted in the Dissertation.pdf file and Pepfeature is more targeted towards generating peptide features for epitope predictions while Pfeature is more general purpose.

    • @yangzhang2999
      @yangzhang2999 3 роки тому +1

      @@DataProfessor Thanks a lot for the fast reply! Got it.

  • @user-hm5bj3be5o
    @user-hm5bj3be5o Рік тому

    thank you for your useful video. However, I couldn't install Pfeature package on windows to be imported and used in Jupyter notebook. Any suggestions ? thank you

    • @DataProfessor
      @DataProfessor  Рік тому

      I’d recommend using a Colab to run the calculation as it runs in Linux

  • @muditarora9860
    @muditarora9860 3 роки тому +1

    what if we want a particular topic say lukemia does it help?
    can you comment!

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      Nope, this dataset is based on peptides that have been experimentally tested to have antimicrobial activity. However, if we want to predict peptides that have activity towards leukemia then we need to use another dataset that likewise contains data on peptides that have been experimentally tested for their anti-leukemic activity.

    • @muditarora9860
      @muditarora9860 3 роки тому

      @@DataProfessor would you recommend any github link for same!

  • @kashafnaz_
    @kashafnaz_ 3 роки тому +1

    what is gini score represent here

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      Gini gives the relative feature importance.

  • @kashafnaz_
    @kashafnaz_ 3 роки тому +1

    amazing

  • @kashafnaz_
    @kashafnaz_ 3 роки тому +1

    can you convert this project on streamlit web application and also teach us deployment method

    • @DataProfessor
      @DataProfessor  3 роки тому +1

      Hi, I've created 2 videos showing how to deploy Streamlit web apps here:
      - ua-cam.com/video/kXvmqg8hc70/v-deo.html
      - ua-cam.com/video/zK4Ch6e1zq8/v-deo.html

  • @theivaprakasham
    @theivaprakasham 3 роки тому

    Thank you for the detailed video on Peptide based Bioactivity classification. I will be experimenting this dataset with other Deep learning techniques too in future. Can you educate us on how to transform similar works/projects into a research grade conference/journal paper? Though I can understand the ML part of the related articles during my literature survey, I find the results are represented in the detailed pharmaceutical format which I don't understand at the first look.

  • @sitonaixillya7886
    @sitonaixillya7886 3 роки тому

    While ur acc and MCC is lower than the report

  • @zhangjiabin-ww7fl
    @zhangjiabin-ww7fl Рік тому +1

    hello professor, I try to install the Pfeature package use your command, but i got this error:
    "/usr/local/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'zip_safe'
    warnings.warn(msg)
    running install
    running build
    running build_py
    copying Pfeature/__init__.py -> build/lib/Pfeature
    error: can't copy 'Pfeature/ONTAINER-LICENSE': doesn't exist or not a regular file"
    could you help me to fix it, I appreciate a lot !

    • @Fiza-ub3zt
      @Fiza-ub3zt 3 місяці тому

      I am facing the same problem

    • @nikitatiwari8520
      @nikitatiwari8520 12 днів тому

      Did you got the solution..i am facing the same issue