Thank you. You saved me. I have a subject at my university, which is connected with data science. I was given a book where I had to find some information about how to do labs, but there were no information about how to use pandas in python. So I started to look for documentation in the internet, but it was difficult to understand it, because my level of English is not perfect. So, losing the hope, I found your channel. I was very impressed . I understand every your lesson. Thank you a lot.
Really great. Videos are short and using pritical data instead of simulating. Spent about half a day and now I am at lesson 15! Thanks fro this great tutorial!
Hi Your videos on pandas with Q & A approach is unique and excellent. Just started watching them. Helping me in getting into the nuts and bolts of pandas.
Awesome....I have seen number of tutorial on different sites and on UA-cam...But this video series tutorial on pandas is awesome...now after completing of this series of video I feel very comfortable to work on pandas... Thanks alot....
I know I am years behind, but I find these tutorials very helpful. I have a question on Transposing a dataframe. When is the best time to do transpose? Is it before cleaning it up or before? The headers seem confused after the transpose and the index is gone, sorting and cleaning becomes a challenge
Thanks for the suggestion! I don't have any plans for a video series about matplotlib, though I do cover pandas plotting (which uses matplotlib) in this video series, specifically in videos 14, 15, and 25: ua-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
Data School that is a pity,you r an excellent teacher,matplotlib would be so useful for me since i am a consultant working in big four,and i need data visualization tools for presentation
Hi, I really liked your method of teaching. It is high on value addition and practicality. I am looking for a course on Python which can enable me to use the data analysis skill in HR field. Do you offer something on similar lines?
One problem I always run into is aligning the x-ticks to the bins when plotting a histogram or a bar chart. Maybe you can do a video going through different histogram/bar charts and using different methods of aligning the x-ticks? There's a lot of information on this online but most of it fairly confusing and is always unique to the situation. Thanks!
i have a very interesting question and have full faith only you can answer , if my comment got read. so question is when we apply astype to change str to category then size reduces but how astype is working internally, in other words without using any in built function how can i change the datatype of given series.
Hi thank you so much for this awesome video. I have a question. How did you get the out result showing all borders of grid in the pane. It's so cool, but I searched a lot, did't get idea how to configure this feature. I already subscribed you channel and liked your video of course.
Glad the video was helpful to you! The output style you are seeing in the video is the default style from the Jupyter Notebook, which is what I use as my Python environment.
Nice video series! But I look through all of the series so far. There is no one relating to writing data into some common types of files like excel sheet, json, csv. Can you make a video about data storage after data processing?
Thanks for the suggestion! I'll consider that for the future. Note that I do cover writing to a CSV file in this video: ua-cam.com/video/ylRlGCtAtiE/v-deo.html
Suppose I display a dataframe suppose drinks, then I cannot see the whole table in my notebook and neither I can scroll through, but in your video I can see the scroll bar, moreover in my table there are no lines in between columns and hence it looks like a tab separated table, but in your video there are. Why are there differences in these?Pls help
Hey buddy one more support I want :) do you have any series in which you solve kaggle tasks or analyticalvidya.com tasks ? If yes can you give me link to that.If not I request you to make a whole series which is dedicated to Kaggle or analytical vidya tasks :) That will be very much useful for all :)
My latest video might be of interest to you: "How do I use pandas with scikit-learn to create Kaggle submissions?" ua-cam.com/video/ylRlGCtAtiE/v-deo.html
when we use describe function in pandas and it gives saverals insight about the data. standard deviation is one of them. does it give one std deviation value or what? please explain.
Hi, I think that you may be looking for this..just in case this is still useful for you : range = movies.genre.value_counts() range[(range>=100) & (range
How do I explore the IMDB dataset by genre and actor? It's a bit tricky, because the actors_list column holds the names of the three main actors as one string in a list. I know how to get there in Python without using Pandas, and it would be pretty cumbersome. How would I do that with Pandas?
Thanks for the videos. I have a dataset where for example, i have to first filter out all the movies with genre 'crime', and multiply the duration of all the crime genre into 2 (like for pulp fiction, I want to make the duration as 154 * 2 = 308). This is just an example. Please help me out
Man! I am processing a data file where the column headers contain ":", "::", underscores & spaces. I have a row called units and 2 as limits (float or integers). Some values are being read as strings and I want to convert them into numbers. I am having a hard time as the method you described in one of your tutorials uses df.headername.dtype(float) etc... Except I cannot use this notation as headername is a messy combination of characters. Any idea on what to do? Thanks,
hi sir I have one question if you wish: lest's consider we have a data of movies ,the question is: How do I read data from the row number 12 to the row number for example 84. thanks
Hello Sir! , genre.value_counts() is working fine but when I set normalize on using genre.value_counts(normalize=True), it is raising an error. It shows --> OptionError: "No such keys(s): 'compute.use_numexpr'" Could you please help me in understanding the reason for such an error
Is there a way to get .describe() to include mode and median values? Also, I really liked the .value_counts(normalize = True) described at 3.50 Is there a way to get this to actually display %? IE: Drama: 28.40 % Comedy: 15.93% (Visualizations are great thanks!)
For numeric columns, the describe method will always display the median (it's marked as "50%"), but it will never display the mode. For categorical columns, the describe method will always display the mode (it's marked as "top"). Regarding your question about displaying percentages, there may be a way to achieve this by changing a display option, but I'm not sure: pandas.pydata.org/pandas-docs/stable/options.html Glad you like the visualizations! I hope to have more in later videos.
I just released a video discussing display options, which may be of interest to you: ua-cam.com/video/yiO43TQ4xvc/v-deo.html However, it doesn't look like there's a simple way to display the value_counts with the percentages. Sorry!
HOW TO GET THE MOVIES HAVING FREQUENCY 1 FROM THE SERIES WHICH WE GET AFTER APPLYING THE value_counts() to the Genre column???
4 роки тому
THERE IS A BIG, HUGE PROBLEM WITH DESCRIBE: I am using python 3.8.5, so far the latest, I tried to describe() a list of 1000 names, and the result was terrible, it sent me different modal value for the name.... I sincerely think this message attached to describe should be refined in order to explain the situation with multimodal series
Hi there! I want to look into the value_counts for movies.duration, my codes: >> movies.duration.describe() output: count 979.000000 mean 120.979571 std 26.218010 min 64.000000 25% 102.000000 50% 117.000000 75% 134.000000 max 242.000000 Name: duration, dtype: float64 >> movies.duration.value_counts(bins=3, sort=False) output: (63.821000000000005, 123.333] 589 (123.333, 182.667] 362 (182.667, 242.0] 28 Name: duration, dtype: int64 Why does the interval start with 63.821000000000005 instead of the min 64? Can I suppress the very long decimals and how? Thanks!
This is a great series. you have put a lot of efforts. I surely would like to buy you a coffee. I have a question. how can i find a movie which has same duration as movies xyz but has a higher rating?
Thanks! If you want to pay me back, please just subscribe to my email newsletter, and maybe tell a friend too! www.dataschool.io/subscribe/ Regarding your question, I'm not sure if this is what you are looking for? movies[movies.duration == 120].sort_values('star_rating')
movies[(movies.duration == 109)].loc[:,['title' , 'content_rating']].sort_values(['content_rating']) #try that line of code ... I assume u know the duration ... 109 for instance.
using histogram for categorical objects is not true, in my opinion for this example you should use a bar chart because in this example the minutes are labels (the string which is represented as a number). am I correct?
Please tell me why i can not plot in my IDLE. There is no graph in my result. And thanks for this video! Just like this: >>> movies.duration.plot(kind='hist') >>> movies.duration.value_counts().plot(kind='bar') and there is no any picture.Please help.,thanks
Thanks a lot for your videos. My Question is in General : In a real production environment , how can I use this note book - I mean to ask , If I am want to schedule a Job and If I need to see the data as dataframes with some graph , Is there any way to schedule and see in notebook
Great question! I would tend not to use the notebook for production code. However, I'm not sure what IDE would be ideal for the situation you are describing - I'm sorry!
Whether or not the plots will appear automatically depends on your environment. In the case of the Jupyter notebook, they will show up as long as you have run the following command during your session: %matplotlib inline
Great question! movies.genre.value_counts()[movies.genre.value_counts() > 100].index
4 роки тому
NON NOSTROMO HERE: sorry, we are on Earth yet¿¿¿¿¿?????? C.¿.? yes, I am running the tutorial with my super wide screen, and it is wonderful while I kind of chat with my students at........ 12:09 after midnight, they are just getting inside the Google educational platform, while I am translating plenty of useful tutorials like this one........ NON NOSTROMO HERE, over......... print("lot of laughs"*12)
I’m new to Python, and think the teaching here is excellent!
Hat Tip ... making my way thru this whole series during a quarantine time of cv
Nice!
cough cough, aaaaaaaaaaa chhhhhhhh ooooooooooooo, so? what did you say?
Very good lecture series....
Thank you!
Thank you. You saved me. I have a subject at my university, which is connected with data science. I was given a book where I had to find some information about how to do labs, but there were no information about how to use pandas in python. So I started to look for documentation in the internet, but it was difficult to understand it, because my level of English is not perfect. So, losing the hope, I found your channel. I was very impressed . I understand every your lesson. Thank you a lot.
Thank you so much for sharing! You are very welcome, and I appreciate your kind words!
Really great. Videos are short and using pritical data instead of simulating. Spent about half a day and now I am at lesson 15! Thanks fro this great tutorial!
Awesome! Glad you are enjoying the series!
Hi
Your videos on pandas with Q & A approach is unique and excellent. Just started watching them. Helping me in getting into the nuts and bolts of pandas.
The way you explain is very awesome thank you
Thanks!
Very nice and clearly explained.
Thank you so much 🙂
Your videos are amazing
Thank you so much 😀
You are a good teacher.
Thank you!
Thank you very much for your videoes.
You are very welcome
Awesome....I have seen number of tutorial on different sites and on UA-cam...But this video series tutorial on pandas is awesome...now after completing of this series of video I feel very comfortable to work on pandas... Thanks alot....
That's great to hear! You're very welcome!
wow. never seen such a powerful one line command. great examples.
Thanks!
Excellent teacher!
Thank you!
I know I am years behind, but I find these tutorials very helpful. I have a question on Transposing a dataframe. When is the best time to do transpose? Is it before cleaning it up or before? The headers seem confused after the transpose and the index is gone, sorting and cleaning becomes a challenge
Thanks very much! I learnt a lot from this series of video. Is there one like this for matplotlib?? Look forward to it !
Yes, that would be great!
Thanks for the suggestion! I don't have any plans for a video series about matplotlib, though I do cover pandas plotting (which uses matplotlib) in this video series, specifically in videos 14, 15, and 25: ua-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
Data School that is a pity,you r an excellent teacher,matplotlib would be so useful for me since i am a consultant working in big four,and i need data visualization tools for presentation
are still here, in this planet?
really good video.Learn a lot
Glad to hear that!
thank you for the videos they were so helpful
You're welcome!
Many thanks for your help!
You're welcome!
I very much congratulate you for sharing code used in video with us. Many thanks for that. It is very much useful to me. My warm regards to you.
You're very welcome!
Awesome Video. You are a boon to humanity.
Thank you so much! :)
I like it ... it s very well explained
Thank you!
I did not know about that normalize=True funtion. COOL!
Great video!
Thanks!
the best of the best!!!!
Thank you!
Awesome tutorials.Can you please do a video on map function and pivot function
I don't have a video that covers pivot, but I do have this video that covers map: ua-cam.com/video/P_q0tkYqvSk/v-deo.html
Hi, I really liked your method of teaching. It is high on value addition and practicality. I am looking for a course on Python which can enable me to use the data analysis skill in HR field. Do you offer something on similar lines?
One problem I always run into is aligning the x-ticks to the bins when plotting a histogram or a bar chart. Maybe you can do a video going through different histogram/bar charts and using different methods of aligning the x-ticks? There's a lot of information on this online but most of it fairly confusing and is always unique to the situation. Thanks!
Thanks for your suggestion, I'll consider it!
Hi Kevin, wanted to know if any results produced using method will be a series? For instance the one using value_counts produces series.
Question:
After we have modified our dataframe can we save it as a csv file again?
You can use to_csv, I've got an example here: ua-cam.com/video/ylRlGCtAtiE/v-deo.html
Please suggest some sites from where I can practice some questions
www.datacamp.com/courses/analyzing-police-activity-with-pandas?tap_a=5644-dce66f&tap_s=280411-a25fc8
Crosstab seems to generate a dataframe which resembles a pivot table in Excel. Correct?
i have a very interesting question and have full faith only you can answer , if my comment got read. so question is when we apply astype to change str to category then size reduces but how astype is working internally, in other words without using any in built function how can i change the datatype of given series.
Hi thank you so much for this awesome video. I have a question. How did you get the out result showing all borders of grid in the pane. It's so cool, but I searched a lot, did't get idea how to configure this feature. I already subscribed you channel and liked your video of course.
Glad the video was helpful to you! The output style you are seeing in the video is the default style from the Jupyter Notebook, which is what I use as my Python environment.
Nice video series! But I look through all of the series so far. There is no one relating to writing data into some common types of files like excel sheet, json, csv. Can you make a video about data storage after data processing?
Thanks for the suggestion! I'll consider that for the future.
Note that I do cover writing to a CSV file in this video: ua-cam.com/video/ylRlGCtAtiE/v-deo.html
Question 1:
percent = movies.genre.value_counts(normalize = True) * 100
how to append '%' to each calculated value ?
You're awesome btw :D :D
Glad you like the videos! I think you could accomplish this with display options: ua-cam.com/video/yiO43TQ4xvc/v-deo.html
Suppose I display a dataframe suppose drinks, then I cannot see the whole table in my notebook and neither I can scroll through, but in your video I can see the scroll bar, moreover in my table there are no lines in between columns and hence it looks like a tab separated table, but in your video there are.
Why are there differences in these?Pls help
I seriously love you man.Your videos are awesome
Ha! Thanks so much for your support!! :)
Hey buddy one more support I want :) do you have any series in which you solve kaggle tasks or analyticalvidya.com tasks ? If yes can you give me link to that.If not I request you to make a whole series which is dedicated to Kaggle or analytical vidya tasks :) That will be very much useful for all :)
I don't have a series like that. Thanks for the suggestion, I'll consider it for the future! :)
My latest video might be of interest to you: "How do I use pandas with scikit-learn to create Kaggle submissions?" ua-cam.com/video/ylRlGCtAtiE/v-deo.html
This really made me cry *Tears of Joy* :)
when we use crosstab?
Those 2 people that has given thumbs down are jealous of u..... because they can't explain like u do.
Ha! :)
when we use describe function in pandas and it gives saverals insight about the data. standard deviation is one of them. does it give one std deviation value or what? please explain.
Thanks very much, Can you also make video on plotnine library it is ggplot of python
Thanks for your suggestion!
thank you for this amazing video ! but how can i get the number of movies in the range of 100-150 min and so on ?
Hi, I think that you may be looking for this..just in case this is still useful for you :
range = movies.genre.value_counts()
range[(range>=100) & (range
Does this help? ua-cam.com/video/2AFGPdNn4FM/v-deo.html
How do I explore the IMDB dataset by genre and actor? It's a bit tricky, because the actors_list column holds the names of the three main actors as one string in a list. I know how to get there in Python without using Pandas, and it would be pretty cumbersome. How would I do that with Pandas?
I think it would kind of be just like python df = df.actors[0] would be the first name, I think im new to this stuff also...
You would probably have to use pandas string methods with regular expressions to parse that column into 3 new columns.
want to know why series? when there is data frames. what's the specific purpose of series that can not be achieved through data frames?
A Series is like a DataFrame column. It exists because it is the building block for a DataFrame. Does that help?
How do you iterate through all the values in a series?
I actually answer that question in a video: ua-cam.com/video/B-r9VuK80dk/v-deo.html
Hope that helps!
I have been enjoying your pandas series.
Ha! :) Thanks for the suggestion - I'll consider it for the future!
Thanks for the videos. I have a dataset where for example, i have to first filter out all the movies with genre 'crime', and multiply the duration of all the crime genre into 2 (like for pulp fiction, I want to make the duration as 154 * 2 = 308). This is just an example. Please help me out
Sorry, I won't be able to help you with this - good luck!
Man! I am processing a data file where the column headers contain ":", "::", underscores & spaces. I have a row called units and 2 as limits (float or integers). Some values are being read as strings and I want to convert them into numbers. I am having a hard time as the method you described in one of your tutorials uses df.headername.dtype(float) etc... Except I cannot use this notation as headername is a messy combination of characters. Any idea on what to do?
Thanks,
You can use bracket notation to select columns instead: df['headername'] - does that help?
@@dataschool Thank you for your reply. So, df['parameterName'].dtype(float) will work? I will give it a try.
I think perhaps you mean to use astype rather than dtype. Hope that helps!
hi sir
I have one question if you wish:
lest's consider we have a data of movies ,the question is:
How do I read data from the row number 12 to the row number for example 84.
thanks
This may help: ua-cam.com/video/xvpNA7bC8cs/v-deo.html
Hello Sir! , genre.value_counts() is working fine but when I set normalize on using genre.value_counts(normalize=True), it is raising an error. It shows --> OptionError: "No such keys(s): 'compute.use_numexpr'"
Could you please help me in understanding the reason for such an error
Not sure, sorry!
Is there a way to get .describe() to include mode and median values?
Also, I really liked the .value_counts(normalize = True) described at 3.50
Is there a way to get this to actually display %? IE:
Drama: 28.40 %
Comedy: 15.93%
(Visualizations are great thanks!)
For numeric columns, the describe method will always display the median (it's marked as "50%"), but it will never display the mode. For categorical columns, the describe method will always display the mode (it's marked as "top").
Regarding your question about displaying percentages, there may be a way to achieve this by changing a display option, but I'm not sure: pandas.pydata.org/pandas-docs/stable/options.html
Glad you like the visualizations! I hope to have more in later videos.
I just released a video discussing display options, which may be of interest to you: ua-cam.com/video/yiO43TQ4xvc/v-deo.html
However, it doesn't look like there's a simple way to display the value_counts with the percentages. Sorry!
HOW TO GET THE MOVIES HAVING FREQUENCY 1 FROM THE SERIES WHICH WE GET AFTER APPLYING THE value_counts() to the Genre column???
THERE IS A BIG, HUGE PROBLEM WITH DESCRIBE: I am using python 3.8.5, so far the latest, I tried to describe() a list of 1000 names, and the result was terrible, it sent me different modal value for the name.... I sincerely think this message attached to describe should be refined in order to explain the situation with multimodal series
Hi there! I want to look into the value_counts for movies.duration, my codes:
>> movies.duration.describe()
output:
count 979.000000
mean 120.979571
std 26.218010
min 64.000000
25% 102.000000
50% 117.000000
75% 134.000000
max 242.000000
Name: duration, dtype: float64
>> movies.duration.value_counts(bins=3, sort=False)
output:
(63.821000000000005, 123.333] 589
(123.333, 182.667] 362
(182.667, 242.0] 28
Name: duration, dtype: int64
Why does the interval start with 63.821000000000005 instead of the min 64? Can I suppress the very long decimals and how? Thanks!
Great questions, but I'm not sure... good luck!
0 Ithaca, NY
1 Willingboro, NJ
2 Holyoke, CO
3 Abilene, KS
4 New York Worlds Fair, NY
what if we just want to see reports in NY what code do we use ?
Start with a string method to split the state into another column, and then filter. Hope that helps!
This is a great series. you have put a lot of efforts. I surely would like to buy you a coffee.
I have a question. how can i find a movie which has same duration as movies xyz but has a higher rating?
Thanks! If you want to pay me back, please just subscribe to my email newsletter, and maybe tell a friend too! www.dataschool.io/subscribe/
Regarding your question, I'm not sure if this is what you are looking for?
movies[movies.duration == 120].sort_values('star_rating')
news letter subscription done. Thanks for answering. not exactly what i was looking for. let get back to you with data and question again to explain.
movies[(movies.duration == 109)].loc[:,['title' , 'content_rating']].sort_values(['content_rating'])
#try that line of code ... I assume u know the duration ... 109 for instance.
using histogram for categorical objects is not true, in my opinion for this example you should use a bar chart because in this example the minutes are labels (the string which is represented as a number). am I correct?
duration is a numerical variable, and thus a histogram makes sense.
I got this error message "module 'pandas' has no attribute 'read_cvs' "
It's 'read_csv', instead of 'read_cvs'.
Please tell me why i can not plot in my IDLE. There is no graph in my result. And thanks for this video!
Just like this:
>>> movies.duration.plot(kind='hist')
>>> movies.duration.value_counts().plot(kind='bar')
and there is no any picture.Please help.,thanks
I think all you need to do is this:
import matplotlib.pyplot as plt
plt.show()
Let me know if that helps!
Thanks a lot for your videos. My Question is in General : In a real production environment , how can I use this note book - I mean to ask , If I am want to schedule a Job and If I need to see the data as dataframes with some graph , Is there any way to schedule and see in notebook
Great question! I would tend not to use the notebook for production code. However, I'm not sure what IDE would be ideal for the situation you are describing - I'm sorry!
.plot() method does not prints the graph on its own ..........I need to run matplotlib.pyplot.show() to print the graph each time .
Whether or not the plots will appear automatically depends on your environment. In the case of the Jupyter notebook, they will show up as long as you have run the following command during your session:
%matplotlib inline
Thanks.......actully I missed this point from your video.
Also your lectures are the best........thankss for sharing your knowledge .
You're very welcome! :)
movies.genre.value_counts()> 100
This returns a Boolean, how do we get those genre whose value_counts is greater than 100
Great question!
movies.genre.value_counts()[movies.genre.value_counts() > 100].index
NON NOSTROMO HERE: sorry, we are on Earth yet¿¿¿¿¿?????? C.¿.? yes, I am running the tutorial with my super wide screen, and it is wonderful while I kind of chat with my students at........ 12:09 after midnight, they are just getting inside the Google educational platform, while I am translating plenty of useful tutorials like this one........ NON NOSTROMO HERE, over.........
print("lot of laughs"*12)
I just watched 4 ADS to be able to open this video - it's ""amazing"!
Thank you!
you didn't explain enough the unique() what does it do and the crosstab()
Thanks for your suggestion! Maybe I'll cover that more in a future video.