Hi Dave, Thank you very much for posting quality videos. Your tutorials helped me a lot to understand the basic fundamentals of text analytics. I did a certificate course via online training companies but I could not pick up the topics very well and left half learned, at some point i felt my money wasted. But then, I came across your videos that are upto the mark and quite easy to learn and apply. I am thanking you for your great service. Hope to learn more from you and wish you for a great life ahead.
Hello Dave, You’ve made a positive difference in my life. After I saw your videos I learn many things. Your way of teaching is awesome. Many codes I learn here I applied to my projects and I got very good results. Your sacrifices don’t go unnoticed. :) Thank you
@Lord Knight - Glad you liked the video. We will be releasing videos each week until the series is complete. Video #3 will be up early next week. Stay tuned! Dave
Hey, Dave Your hard work is really paying off in creating content that is of great value to all the users. I am really looking forward to watch all the video lectures on data mining. But preferably, I would also like to have a video on how sense can be taken out of data on specifically facebook pages and groups. Cheers!
I love how you explain things so clearly! More text analysis videos please! :).. I'm going to watch the other ones but I hope you go through how to use different lexicons and other classifiers like Naive Bayes for theme analysis. Anyways thank you so much!
@Jayl - Glad you like the videos. There will be a new one each week until the series is complete. This series will focus on using tree-based mechanisms, in particular the mighty random forest. However, the feature engineering will work with any algorithm (e.g., Naive Bayes) that can be trained to perform binary classification.
Dave! Great video. I have one question regarding the index function. I don't get the same twenty values as you do after having set the same random seed. This affects the text messages I see when looking at the next video about HTML-escaped ampersand characters. Any suggestions on what I should do?
Hey dave! I've been watching your videos ever since the Titanic data set, which really got me into using R and data science! I love the way you teach, its simple and too the point! I love how you apply and acknowledge the 80/20 rule. I was thinking about learning python to do some data analytics as well, but your videos just seem to overpower my decision to switch from R to python. I have a question tho. In your opinion, how do you value data science bootcamps in regards to job preperation? Is there any benefits to learning python than R? Thanks again for all the work you put into your videos. It's 12am right now and I'm eating oatmeal while watching this haha!
@PSNzzirGrizzHD - First of all, thanx for watching the videos and I am glad that you have found them useful. To answer the common question of Python vs. R I always tell folks the same thing: 1 - It is better to be awesome at one language rather than OK at both. 2 - If you already know Python, stick with that. If you know R, stick with that. See #1. Above. 3 - Certain geographies/industries prefer one over the other. For example, if you are targeting Silicon Valley companies then Python is the way to go. See #1. Regarding bootcamps, I can only comment on the Data Science Dojo bootcamp where I teach presently as my day job. Our students find our week-long curriculum an excellent way to bootstrap into foundational data science skills and start their journey. However, it is only the start of the journey. Becoming a great data scientist (which I do not classify myself as) requires long-term concerted effort. HTH, Dave
Hey, Dave. I'm starting with text mining and your videos are absolutely genius. I decided to tackle with text data of my native language Sadly quantida doesn't support my language (polish). I managed to get my hands on polish stop words and remove them , but i cannot pass the stem. I have array containing core and generally written word but i have problems with applying the stem to my tokens. I tried asking stack overflow but without any success.
@Michał Siarkiewicz - Unfortunately, the quanteda package (as do most R packages) take a dependency on the Snowball stemmer (i.e., package SnowballC). The Snowball stemmer doesn't appear to have support for Polish at this time. I performed a quick Google and found the following document that may assist in finding Polish stemmers.: www.cs.put.poznan.pl/dweiss/site/publications/download/ltc_092_weiss_2.pdf HTH, Dave
Thank you very much with this. Would it be possible to have a full lesson on social media mining especially with Facebook. I think that companies are having a constantly growing interest in knowing about their customers needs.
Thank you for your quick reply David. Ideally, that would include extracting relevant data from Facebook, sentiment analysis through likes, etc. Besides, an analysis of people's reaction on a certain post/video/photo could bring about useful insights. Overall, any significant data and learning from Facebook, is of utmost importance. Thank you for your willingness to do that.
great tutorial, really helpful, I notice that the dataset shown in the video have pre-defined labels(ham and spam), i want to ask that how to deal with the situation that a dataset do not have labels beside manually label the every text document?
Hi Dave, Thank you for the video, May i know if we can use Sample.split instead of createDatapartition, just want to know if there is an advantage of one over the other or is there any specific reason you have used createDataPartition? Also wishing you a very Happy New Year..!! God bless..!!
Hi Dave, you are an excellent teacher. I've been following your video series since R programming for excel users. I need your help on this video. While running the code, i'm unable to find createDataPartition function in my R studio even though i have installed caret package. When I execute the command - library(caret) i get this error: Error: package or namespace load failed for ‘caret’ in loadNamespace(j
@Neha Nandwani - You are too kind, glad you like the videos! The error above indicates that you need to install the "SparseM" package. You can use the following code to do so: install.packages("SparseM") HTH, Dave
Oh..yes, i could have guessed that from the error message itself. Thanks Dave for replying. I'm tuned to your youtube data science video series. Do you have plans to conduct boot camp in India ?
Error when I use this code with my data - Error in terms.formula(formula, data = data) : duplicated name 'document' in data frame using '.' What does this mean and how can it be solutioned. Thanks
@Satish Bhonagiri - If I understand your question correctly, the goal is to create a binary classification model that can accurately predict whether a new (unseen) SMS text message is ham or spam.
I learned from your videos more than I learned from my master degree honestly. I can't stop watching!
This series keeps me sane during this pandemic, thanks a lot!
Keep following us for more content!
I don't understand why i landed on these videos this late in my life.but i thank God i did. going through all your text analytics videos
Hey Dave, you have extraordinary teaching skills, the way you explained everything is just awesome. Thanks alot..!!
Hi Dave, Thank you very much for posting quality videos. Your tutorials helped me a lot to understand the basic fundamentals of text analytics. I did a certificate course via online training companies but I could not pick up the topics very well and left half learned, at some point i felt my money wasted. But then, I came across your videos that are upto the mark and quite easy to learn and apply. I am thanking you for your great service.
Hope to learn more from you and wish you for a great life ahead.
Hello Dave,
You’ve made a positive difference in my life.
After I saw your videos I learn many things. Your way of teaching is awesome. Many codes I learn here I applied to my projects and I got very good results. Your sacrifices don’t go unnoticed.
:) Thank you
Impressive rhetoric abilities. Precise language. Beautiful
Please post more teaching materials about R programming, you have great demonstration skills, Dave!
Really good lessons!
Thank you for all this free stuff!
Stay tuned with us for more tutorials, Rafael.
Very intuitive and detailed explanation.
Can't wait for the next part. Loving the elementary approach to teaching this stuff!
Good to hear Dave. Thanks again!
@Lord Knight - Glad you liked the video. We will be releasing videos each week until the series is complete. Video #3 will be up early next week. Stay tuned!
Dave
What do you have in mind for next series?
@jonimatix - This is currently still in planning, stay tuned!
Great work thank you and congratulations
People like you make a dent in this world!
Dave, you are doing a great job
Hey, Dave
Your hard work is really paying off in creating content that is of great value to all the users. I am really looking forward to watch all the video lectures on data mining. But preferably, I would also like to have a video on how sense can be taken out of data on specifically facebook pages and groups.
Cheers!
@Sumit Dargan - Thank you for the kind words, glad you liked the videos. Your request is duly noted!
I love how you explain things so clearly! More text analysis videos please! :).. I'm going to watch the other ones but I hope you go through how to use different lexicons and other classifiers like Naive Bayes for theme analysis. Anyways thank you so much!
@Jayl - Glad you like the videos. There will be a new one each week until the series is complete. This series will focus on using tree-based mechanisms, in particular the mighty random forest. However, the feature engineering will work with any algorithm (e.g., Naive Bayes) that can be trained to perform binary classification.
Just love the way of your teaching!
Simply superb,excellent explanation
Excellent videos! Thank you so much.
Amaizing
Dave! Great video. I have one question regarding the index function. I don't get the same twenty values as you do after having set the same random seed. This affects the text messages I see when looking at the next video about HTML-escaped ampersand characters. Any suggestions on what I should do?
Hey dave! I've been watching your videos ever since the Titanic data set, which really got me into using R and data science! I love the way you teach, its simple and too the point! I love how you apply and acknowledge the 80/20 rule. I was thinking about learning python to do some data analytics as well, but your videos just seem to overpower my decision to switch from R to python.
I have a question tho. In your opinion, how do you value data science bootcamps in regards to job preperation? Is there any benefits to learning python than R?
Thanks again for all the work you put into your videos. It's 12am right now and I'm eating oatmeal while watching this haha!
@PSNzzirGrizzHD - First of all, thanx for watching the videos and I am glad that you have found them useful. To answer the common question of Python vs. R I always tell folks the same thing:
1 - It is better to be awesome at one language rather than OK at both.
2 - If you already know Python, stick with that. If you know R, stick with that. See #1. Above.
3 - Certain geographies/industries prefer one over the other. For example, if you are targeting Silicon Valley companies then Python is the way to go. See #1.
Regarding bootcamps, I can only comment on the Data Science Dojo bootcamp where I teach presently as my day job. Our students find our week-long curriculum an excellent way to bootstrap into foundational data science skills and start their journey. However, it is only the start of the journey. Becoming a great data scientist (which I do not classify myself as) requires long-term concerted effort.
HTH,
Dave
Hey, Dave.
I'm starting with text mining and your videos are absolutely genius.
I decided to tackle with text data of my native language Sadly quantida doesn't support my language (polish). I managed to get my hands on polish stop words and remove them , but i cannot pass the stem. I have array containing core and generally written word but i have problems with applying the stem to my tokens. I tried asking stack overflow but without any success.
@Michał Siarkiewicz - Unfortunately, the quanteda package (as do most R packages) take a dependency on the Snowball stemmer (i.e., package SnowballC). The Snowball stemmer doesn't appear to have support for Polish at this time. I performed a quick Google and found the following document that may assist in finding Polish stemmers.:
www.cs.put.poznan.pl/dweiss/site/publications/download/ltc_092_weiss_2.pdf
HTH,
Dave
This is really really good. Just wish there was a version of this with apache-spark and scala/java.
great Lesson
Thank you very much with this. Would it be possible to have a full lesson on social media mining especially with Facebook. I think that companies are having a constantly growing interest in knowing about their customers needs.
Thank you for your quick reply David. Ideally, that would include extracting relevant data from Facebook, sentiment analysis through likes, etc. Besides, an analysis of people's reaction on a certain post/video/photo could bring about useful insights.
Overall, any significant data and learning from Facebook, is of utmost importance.
Thank you for your willingness to do that.
@Hamza MIGHRI - Thank you for the suggestion. Could you elaborate on what specific topics would be most useful to you/others in such a tutorial?
Hey! I have been following up your videos for text analytics in R and I am unable to run the index function. Can you please help?
Really good video. Thanks a lot!
Another good video, thanks!
I enjoyed your video :) Keep it up!
@MisterBassBoost - Thanx, glad you liked it!
great tutorial, really helpful,
I notice that the dataset shown in the video have pre-defined labels(ham and spam), i want to ask that how to deal with the situation that a dataset do not have labels beside manually label the every text document?
That would be for Unsupervised ML techniques.
i have question, is this set.seed(32984) number choose arbitrary?
I have the same question.
Hey dave!! Excellent. I have learnt complete text analytics and can we have some lessons on Google Analytics as well!!!!!
Hi Dave,
Thank you for the video, May i know if we can use Sample.split instead of createDatapartition, just want to know if there is an advantage of one over the other or is there any specific reason you have used createDataPartition?
Also wishing you a very Happy New Year..!! God bless..!!
Hi Dave, you are an excellent teacher. I've been following your video series since R programming for excel users.
I need your help on this video.
While running the code, i'm unable to find createDataPartition function in my R studio even though i have installed caret package.
When I execute the command - library(caret) i get this error:
Error: package or namespace load failed for ‘caret’ in loadNamespace(j
@Neha Nandwani - You are too kind, glad you like the videos! The error above indicates that you need to install the "SparseM" package. You can use the following code to do so:
install.packages("SparseM")
HTH,
Dave
Oh..yes, i could have guessed that from the error message itself. Thanks Dave for replying. I'm tuned to your youtube data science video series. Do you have plans to conduct boot camp in India ?
the GitHub link is not working;can u please hare the latest one
The R code link is broken
Error when I use this code with my data - Error in terms.formula(formula, data = data) :
duplicated name 'document' in data frame using '.'
What does this mean and how can it be solutioned. Thanks
Hi David..good video!!What is the problem statement or objective reg. HAM and SPAM?
@Satish Bhonagiri - If I understand your question correctly, the goal is to create a binary classification model that can accurately predict whether a new (unseen) SMS text message is ham or spam.
Perfect
@Masoud Paydar - Thank you for the compliment and glad you liked the video!