How do I explore a pandas Series?

Data School

Додати в
- Мій плейлист
- Переглянути пізніше
Поділитися

Поділитися

Вставка

Розмір відео:

Показувати елементи керування програвачем

Автоматичне відтворення

Автоповтор

Опубліковано 13 січ 2025

КОМЕНТАРІ • 136

@donaldcunningham5280 4 роки тому ⁺³
I’m new to Python, and think the teaching here is excellent!
@TR3NDSETR 4 роки тому ⁺⁹
Hat Tip ... making my way thru this whole series during a quarantine time of cv
@dataschool 4 роки тому ⁺²
Nice!
4 роки тому
cough cough, aaaaaaaaaaa chhhhhhhh ooooooooooooo, so? what did you say?
@20tanishq10 Рік тому ⁺¹
Very good lecture series....
@dataschool Рік тому
Thank you!
@maksymshylo8136 6 років тому ⁺¹
Thank you. You saved me. I have a subject at my university, which is connected with data science. I was given a book where I had to find some information about how to do labs, but there were no information about how to use pandas in python. So I started to look for documentation in the internet, but it was difficult to understand it, because my level of English is not perfect. So, losing the hope, I found your channel. I was very impressed . I understand every your lesson. Thank you a lot.
@dataschool 6 років тому
Thank you so much for sharing! You are very welcome, and I appreciate your kind words!
@junjiang6895 6 років тому
Really great. Videos are short and using pritical data instead of simulating. Spent about half a day and now I am at lesson 15! Thanks fro this great tutorial!
@dataschool 6 років тому
Awesome! Glad you are enjoying the series!
@kmnm9463 4 роки тому
Hi
Your videos on pandas with Q & A approach is unique and excellent. Just started watching them. Helping me in getting into the nuts and bolts of pandas.
@khalidsultani6006 4 роки тому ⁺¹
The way you explain is very awesome thank you
@dataschool 4 роки тому ⁺¹
Thanks!
@allwinaark8001 3 роки тому ⁺¹
Very nice and clearly explained.
@dataschool 3 роки тому
Thank you so much 🙂
@FabioRBelotto 2 роки тому ⁺¹
Your videos are amazing
@dataschool 2 роки тому ⁺¹
Thank you so much 😀
@bilaaaacoo Рік тому
You are a good teacher.
@dataschool Рік тому
Thank you!
@Anokhetoons 4 роки тому
Thank you very much for your videoes.
@dataschool 4 роки тому
You are very welcome
@rockingtwinsvedansh-vedans7414 6 років тому
Awesome....I have seen number of tutorial on different sites and on UA-cam...But this video series tutorial on pandas is awesome...now after completing of this series of video I feel very comfortable to work on pandas... Thanks alot....
@dataschool 6 років тому
That's great to hear! You're very welcome!
@spicytuna08 6 років тому
wow. never seen such a powerful one line command. great examples.
@dataschool 6 років тому
Thanks!
@wazka1234 5 років тому ⁺¹
Excellent teacher!
@dataschool 5 років тому
Thank you!
@tawandamukarati4051 2 роки тому ⁺¹
I know I am years behind, but I find these tutorials very helpful. I have a question on Transposing a dataframe. When is the best time to do transpose? Is it before cleaning it up or before? The headers seem confused after the transpose and the index is gone, sorting and cleaning becomes a challenge
@Alicengood 8 років тому ⁺¹²
Thanks very much! I learnt a lot from this series of video. Is there one like this for matplotlib?? Look forward to it !
@kleczekr 8 років тому
Yes, that would be great!
@dataschool 8 років тому ⁺³
Thanks for the suggestion! I don't have any plans for a video series about matplotlib, though I do cover pandas plotting (which uses matplotlib) in this video series, specifically in videos 14, 15, and 25: ua-cam.com/play/PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y.html
@carriemu1437 4 роки тому
Data School that is a pity，you r an excellent teacher，matplotlib would be so useful for me since i am a consultant working in big four，and i need data visualization tools for presentation
4 роки тому
are still here, in this planet?
@eularmaths9043 4 роки тому
really good video.Learn a lot
@dataschool 4 роки тому
Glad to hear that!
@meryelbahhat7886 7 років тому
thank you for the videos they were so helpful
@dataschool 7 років тому
You're welcome!
@ItsWithinYou 3 роки тому
Many thanks for your help!
@dataschool 3 роки тому
You're welcome!
@s.baskaravishnu22 7 років тому
I very much congratulate you for sharing code used in video with us. Many thanks for that. It is very much useful to me. My warm regards to you.
@dataschool 7 років тому
You're very welcome!
@tanmaykorgaonkar963 5 років тому
Awesome Video. You are a boon to humanity.
@dataschool 5 років тому
Thank you so much! :)
@redhaducos2441 3 роки тому
I like it ... it s very well explained
@dataschool 3 роки тому
Thank you!
@danielmonroy6874 4 роки тому
I did not know about that normalize=True funtion. COOL!
@kennethstephani692 Рік тому
Great video!
@dataschool Рік тому
Thanks!
@jhonalexander3438 Рік тому
the best of the best!!!!
@dataschool Рік тому
Thank you!
@TheAnubhav27 7 років тому
Awesome tutorials.Can you please do a video on map function and pivot function
@dataschool 7 років тому
I don't have a video that covers pivot, but I do have this video that covers map: ua-cam.com/video/P_q0tkYqvSk/v-deo.html
@1979ligesh 4 роки тому ⁺¹
Hi, I really liked your method of teaching. It is high on value addition and practicality. I am looking for a course on Python which can enable me to use the data analysis skill in HR field. Do you offer something on similar lines?
@millionaire12345 6 років тому
One problem I always run into is aligning the x-ticks to the bins when plotting a histogram or a bar chart. Maybe you can do a video going through different histogram/bar charts and using different methods of aligning the x-ticks? There's a lot of information on this online but most of it fairly confusing and is always unique to the situation. Thanks!
@dataschool 6 років тому
Thanks for your suggestion, I'll consider it!
@jacksonthounaojam5420 4 роки тому
Hi Kevin, wanted to know if any results produced using method will be a series? For instance the one using value_counts produces series.
@kostasnikoloutsos5172 7 років тому ⁺²
Question:
After we have modified our dataframe can we save it as a csv file again?
@dataschool 7 років тому ⁺¹
You can use to_csv, I've got an example here: ua-cam.com/video/ylRlGCtAtiE/v-deo.html
@mahima891 6 років тому ⁺¹
Please suggest some sites from where I can practice some questions
@dataschool 6 років тому ⁺¹
www.datacamp.com/courses/analyzing-police-activity-with-pandas?tap_a=5644-dce66f&tap_s=280411-a25fc8
@vishesharora2352 4 роки тому
Crosstab seems to generate a dataframe which resembles a pivot table in Excel. Correct?
@dolla_shrey 2 роки тому
i have a very interesting question and have full faith only you can answer , if my comment got read. so question is when we apply astype to change str to category then size reduces but how astype is working internally, in other words without using any in built function how can i change the datatype of given series.
@RJ-yf3qs 7 років тому
Hi thank you so much for this awesome video. I have a question. How did you get the out result showing all borders of grid in the pane. It's so cool, but I searched a lot, did't get idea how to configure this feature. I already subscribed you channel and liked your video of course.
@dataschool 7 років тому
Glad the video was helpful to you! The output style you are seeing in the video is the default style from the Jupyter Notebook, which is what I use as my Python environment.
@tongyang5427 8 років тому
Nice video series! But I look through all of the series so far. There is no one relating to writing data into some common types of files like excel sheet, json, csv. Can you make a video about data storage after data processing?
@dataschool 8 років тому
Thanks for the suggestion! I'll consider that for the future.
Note that I do cover writing to a CSV file in this video: ua-cam.com/video/ylRlGCtAtiE/v-deo.html
@marwanelghitany8875 7 років тому ⁺¹
Question 1:
percent = movies.genre.value_counts(normalize = True) * 100
how to append '%' to each calculated value ?
You're awesome btw :D :D
@dataschool 7 років тому ⁺¹
Glad you like the videos! I think you could accomplish this with display options: ua-cam.com/video/yiO43TQ4xvc/v-deo.html
@pradneshkalkar5525 4 роки тому
Suppose I display a dataframe suppose drinks, then I cannot see the whole table in my notebook and neither I can scroll through, but in your video I can see the scroll bar, moreover in my table there are no lines in between columns and hence it looks like a tab separated table, but in your video there are.
Why are there differences in these?Pls help
@PankajMishra-ey3yh 8 років тому ⁺⁵
I seriously love you man.Your videos are awesome
@dataschool 8 років тому ⁺²
Ha! Thanks so much for your support!! :)
@PankajMishra-ey3yh 8 років тому
Hey buddy one more support I want :) do you have any series in which you solve kaggle tasks or analyticalvidya.com tasks ? If yes can you give me link to that.If not I request you to make a whole series which is dedicated to Kaggle or analytical vidya tasks :) That will be very much useful for all :)
@dataschool 8 років тому ⁺¹
I don't have a series like that. Thanks for the suggestion, I'll consider it for the future! :)
@dataschool 8 років тому ⁺¹
My latest video might be of interest to you: "How do I use pandas with scikit-learn to create Kaggle submissions?" ua-cam.com/video/ylRlGCtAtiE/v-deo.html
@PankajMishra-ey3yh 8 років тому ⁺²
This really made me cry *Tears of Joy* :)
@sucharitha9365 4 роки тому
when we use crosstab?
@abhishek-hb1vg 6 років тому
Those 2 people that has given thumbs down are jealous of u..... because they can't explain like u do.
@dataschool 5 років тому
Ha! :)
@rajbir_singh0517 4 роки тому
when we use describe function in pandas and it gives saverals insight about the data. standard deviation is one of them. does it give one std deviation value or what? please explain.
@kashishjain78 5 років тому
Thanks very much, Can you also make video on plotnine library it is ggplot of python
@dataschool 4 роки тому
Thanks for your suggestion!
@girisankar9390 5 років тому
thank you for this amazing video ! but how can i get the number of movies in the range of 100-150 min and so on ?
@Patriciaunik 5 років тому
Hi, I think that you may be looking for this..just in case this is still useful for you :
range = movies.genre.value_counts()
range[(range>=100) & (range
@dataschool 4 роки тому
Does this help? ua-cam.com/video/2AFGPdNn4FM/v-deo.html
@SpokenStuff 6 років тому
How do I explore the IMDB dataset by genre and actor? It's a bit tricky, because the actors_list column holds the names of the three main actors as one string in a list. I know how to get there in Python without using Pandas, and it would be pretty cumbersome. How would I do that with Pandas?
@sodapopinski9922 6 років тому
I think it would kind of be just like python df = df.actors[0] would be the first name, I think im new to this stuff also...
@dataschool 6 років тому
You would probably have to use pandas string methods with regular expressions to parse that column into 3 new columns.
@suhasinil3291 5 років тому
want to know why series? when there is data frames. what's the specific purpose of series that can not be achieved through data frames?
@dataschool 4 роки тому
A Series is like a DataFrame column. It exists because it is the building block for a DataFrame. Does that help?
@SharanyaVRaju 8 років тому
How do you iterate through all the values in a series?
@dataschool 8 років тому
I actually answer that question in a video: ua-cam.com/video/B-r9VuK80dk/v-deo.html
Hope that helps!
@New2Golf 8 років тому
I have been enjoying your pandas series.
@dataschool 7 років тому
Ha! :) Thanks for the suggestion - I'll consider it for the future!
@logeshkumarapandian 6 років тому
Thanks for the videos. I have a dataset where for example, i have to first filter out all the movies with genre 'crime', and multiply the duration of all the crime genre into 2 (like for pulp fiction, I want to make the duration as 154 * 2 = 308). This is just an example. Please help me out
@dataschool 6 років тому
Sorry, I won't be able to help you with this - good luck!
@youcefyahiaoui1465 6 років тому
Man! I am processing a data file where the column headers contain ":", "::", underscores & spaces. I have a row called units and 2 as limits (float or integers). Some values are being read as strings and I want to convert them into numbers. I am having a hard time as the method you described in one of your tutorials uses df.headername.dtype(float) etc... Except I cannot use this notation as headername is a messy combination of characters. Any idea on what to do?
Thanks,
@dataschool 6 років тому
You can use bracket notation to select columns instead: df['headername'] - does that help?
@youcefyahiaoui1465 6 років тому
@@dataschool Thank you for your reply. So, df['parameterName'].dtype(float) will work? I will give it a try.
@dataschool 5 років тому
I think perhaps you mean to use astype rather than dtype. Hope that helps!
@khaledbenotmane728 6 років тому
hi sir
I have one question if you wish:
lest's consider we have a data of movies ,the question is:
How do I read data from the row number 12 to the row number for example 84.
thanks
@dataschool 6 років тому
This may help: ua-cam.com/video/xvpNA7bC8cs/v-deo.html
@sandeep9244 5 років тому
Hello Sir! , genre.value_counts() is working fine but when I set normalize on using genre.value_counts(normalize=True), it is raising an error. It shows --> OptionError: "No such keys(s): 'compute.use_numexpr'"
Could you please help me in understanding the reason for such an error
@dataschool 5 років тому
Not sure, sorry!
@joescanlon7502 8 років тому
Is there a way to get .describe() to include mode and median values?
Also, I really liked the .value_counts(normalize = True) described at 3.50
Is there a way to get this to actually display %? IE:
Drama: 28.40 %
Comedy: 15.93%
(Visualizations are great thanks!)
@dataschool 8 років тому
For numeric columns, the describe method will always display the median (it's marked as "50%"), but it will never display the mode. For categorical columns, the describe method will always display the mode (it's marked as "top").
Regarding your question about displaying percentages, there may be a way to achieve this by changing a display option, but I'm not sure: pandas.pydata.org/pandas-docs/stable/options.html
Glad you like the visualizations! I hope to have more in later videos.
@dataschool 8 років тому
I just released a video discussing display options, which may be of interest to you: ua-cam.com/video/yiO43TQ4xvc/v-deo.html
However, it doesn't look like there's a simple way to display the value_counts with the percentages. Sorry!
@suvakantasamal3713 4 роки тому
HOW TO GET THE MOVIES HAVING FREQUENCY 1 FROM THE SERIES WHICH WE GET AFTER APPLYING THE value_counts() to the Genre column???
4 роки тому
THERE IS A BIG, HUGE PROBLEM WITH DESCRIBE: I am using python 3.8.5, so far the latest, I tried to describe() a list of 1000 names, and the result was terrible, it sent me different modal value for the name.... I sincerely think this message attached to describe should be refined in order to explain the situation with multimodal series
@wlancer8826 6 років тому
Hi there! I want to look into the value_counts for movies.duration, my codes:
>> movies.duration.describe()
output:
count 979.000000
mean 120.979571
std 26.218010
min 64.000000
25% 102.000000
50% 117.000000
75% 134.000000
max 242.000000
Name: duration, dtype: float64
>> movies.duration.value_counts(bins=3, sort=False)
output:
(63.821000000000005, 123.333] 589
(123.333, 182.667] 362
(182.667, 242.0] 28
Name: duration, dtype: int64
Why does the interval start with 63.821000000000005 instead of the min 64? Can I suppress the very long decimals and how? Thanks!
@dataschool 6 років тому
Great questions, but I'm not sure... good luck!
@trying2code 3 роки тому
0 Ithaca, NY
1 Willingboro, NJ
2 Holyoke, CO
3 Abilene, KS
4 New York Worlds Fair, NY
what if we just want to see reports in NY what code do we use ?
@dataschool 3 роки тому
Start with a string method to split the state into another column, and then filter. Hope that helps!
@sudhu26 7 років тому
This is a great series. you have put a lot of efforts. I surely would like to buy you a coffee.
I have a question. how can i find a movie which has same duration as movies xyz but has a higher rating?
@dataschool 7 років тому
Thanks! If you want to pay me back, please just subscribe to my email newsletter, and maybe tell a friend too! www.dataschool.io/subscribe/
Regarding your question, I'm not sure if this is what you are looking for?
movies[movies.duration == 120].sort_values('star_rating')
@sudhu26 7 років тому
news letter subscription done. Thanks for answering. not exactly what i was looking for. let get back to you with data and question again to explain.
@marwanelghitany8875 7 років тому
movies[(movies.duration == 109)].loc[:,['title' , 'content_rating']].sort_values(['content_rating'])
#try that line of code ... I assume u know the duration ... 109 for instance.
@rezaenergy 6 років тому
using histogram for categorical objects is not true, in my opinion for this example you should use a bar chart because in this example the minutes are labels (the string which is represented as a number). am I correct?
@dataschool 6 років тому
duration is a numerical variable, and thus a histogram makes sense.
@dodjicohovi4097 7 років тому
I got this error message "module 'pandas' has no attribute 'read_cvs' "
@dataschool 7 років тому ⁺¹
It's 'read_csv', instead of 'read_cvs'.
@zht6956 8 років тому
Please tell me why i can not plot in my IDLE. There is no graph in my result. And thanks for this video!
Just like this:
>>> movies.duration.plot(kind='hist')
>>> movies.duration.value_counts().plot(kind='bar')
and there is no any picture.Please help.,thanks
@dataschool 8 років тому
I think all you need to do is this:
import matplotlib.pyplot as plt
plt.show()
Let me know if that helps!
@sachinkerala 7 років тому
Thanks a lot for your videos. My Question is in General : In a real production environment , how can I use this note book - I mean to ask , If I am want to schedule a Job and If I need to see the data as dataframes with some graph , Is there any way to schedule and see in notebook
@dataschool 7 років тому
Great question! I would tend not to use the notebook for production code. However, I'm not sure what IDE would be ideal for the situation you are describing - I'm sorry!
@HimanshuGupta-yv4lf 7 років тому
.plot() method does not prints the graph on its own ..........I need to run matplotlib.pyplot.show() to print the graph each time .
@dataschool 7 років тому ⁺¹
Whether or not the plots will appear automatically depends on your environment. In the case of the Jupyter notebook, they will show up as long as you have run the following command during your session:
%matplotlib inline
@HimanshuGupta-yv4lf 7 років тому ⁺¹
Thanks.......actully I missed this point from your video.
Also your lectures are the best........thankss for sharing your knowledge .
@dataschool 7 років тому ⁺¹
You're very welcome! :)
@AK-ud4ur 6 років тому
movies.genre.value_counts()> 100
This returns a Boolean, how do we get those genre whose value_counts is greater than 100
@dataschool 6 років тому ⁺¹
Great question!
movies.genre.value_counts()[movies.genre.value_counts() > 100].index
4 роки тому
NON NOSTROMO HERE: sorry, we are on Earth yet¿¿¿¿¿?????? C.¿.? yes, I am running the tutorial with my super wide screen, and it is wonderful while I kind of chat with my students at........ 12:09 after midnight, they are just getting inside the Google educational platform, while I am translating plenty of useful tutorials like this one........ NON NOSTROMO HERE, over.........
print("lot of laughs"*12)
@annawilson3824 5 років тому ⁺¹
I just watched 4 ADS to be able to open this video - it's ""amazing"!
@dataschool 4 роки тому
Thank you!
@firasbayazed7479 6 років тому
you didn't explain enough the unique() what does it do and the crosstab()
@dataschool 6 років тому
Thanks for your suggestion! Maybe I'll cover that more in a future video.

Наступне

Автоматичне відтворення

How do I handle missing values in pandas?