It will not make you and expert Consider the examples trivial. Average age of each animal (a SQL "group by" for instance...) try doing it this way with a million animals and computing the variance at the same time
Im so happy you had to Google something easy so quick and I knew the answer 😂 it just made me feel that one im really learning and its ok to seek help if needed lol. Amazing video. Thank you
Hi Keith, Genuinely appreciate you solving all these pandas problems. I am not sure if you already have but I was wondering if you could also do one on the 100 Numpy problems? Again, thanks for you work.
19:30 I do this a lot, by passing a dict to the agg function after grouping (it allows you to asign multiple operators to several cols at once). Eg df.groupby(“animal”).agg({“age”:”mean”})
50:08 In problem 23 they used df.sub to be able to specify axis, to make subtraction row-wise. I don’t know how - operator does, does it do always row wise or always column wise or maybe it chooses every time based on input?
For question 27, this is something i find easy to code and understand df = df.sort_values(by=['grps', 'vars'], ascending=[True, False]) df = df.groupby('grps').head(3).reset_index(drop=True) df.groupby('grps')['vars'].sum()
Great video! However I believe your solution for q23 was wrong. You subtracted the mean of the entire DataFrame instead of the mean of each row. It worked for your example of np.ones because the entire DataFrame had the same mean as the mean of each row (a mean of 1). You want a solution that subtracts a different value for each row, namely the row's mean.
Very normal! I'm always looking things up in real-world projects. I would say that you should be able to get to the point where you can answer the problems without looking up the specific answer, but by looking up the functions that will help you implement the logic to get to the answer.
What do you think is a good method to concatanate a string value from datafram column to anothee dataframe column by index key. Example, df_1 rows 10, 20, 26, 30, 40 column 5 (string) concatonate to df_2 rows 9, 19, 25, 29, 39 column 1?..
could somebody explain q23 because the way he is doing, i think its wrong because "df.mean()" going to give us the mean values with respect to individual columns not rows and then also in subtraction each mean value going to subtracted from individual columns respectively. we have to use the "df.mean(axis=1)" and then in subtraction also we have to take care of it . I have done it like that "df.subtract(df.mean(axis=1),axis="index").multiply(-1)". Please correct me if i am wrong.
His solution is wrong, I agree, but not for the reason you say. His solution calculated the mean of the entire DataFrame but he should've subtracted the mean for each row (axis=0). Edit: Never mind, I agree with you.
A couple of issues with the set solution. One is that a set is not guaranteed to preserve any order that numbers are inserted into it, so even though it worked in this example, if we changed the numbers to [3,3,2,2,1,1,8,8,9], the output show {1,2,3,8,9} instead of [3,2,1,8,9]. Another issue with the set solution is that if a number reappears later on in the list, it will disregard it from the solution. So if we had [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 2, 2] as our input, your solution would output {1,2,3,4,5,6,7} while the correct solution would be [1,2,3,4,5,6,7,2]. Hope this is helpful!
Weird to see you open VSCode with the ui when the cli command is the easiest one to add to your other flow once you’re in that directory after cloning. code . Opens the current directory in VSCode.
I agree, he subtracted the mean of the entire DataFrame instead of each row. It worked for his example of np.ones because the entire DataFrame had the same mean as the mean of each row (1).
The job market is challenging right now, but data science positions aren't going anywhere. You definitely can still get a data science job in 2024. That being said, I wouldn't only look for data science positions. There are a lot of software engineering & data engineering roles that use a similar skillset that can be less competitive to land. I'd recommend keeping track of the most popular skills on job openings for all these types of roles, and tailor what you learn moving forward based on that. I also recommend trying to network with people that are working at companies you find interesting. You'll give yourself a much better chance at landing a data science job if you are referred by someone already at a company you are applying to. Job postings on a site like LinkedIn can really difficult to progress in the process because so many people apply.
great video! however, regarding the usage of the terminal to create directories etc at 0:59 , can anyone recommend some youtube videos or sources to get more familiar with it? thanks a bunch! good luck getting good at pandas everybody :)
Woo!!! 5 hours of Pandas practice, what could be better. Hope you all enjoy!
Will watch this on repeat until I am an expert. Thank you.
@@jonpounds1922 haha my man 💪
It will not make you and expert
Consider the examples trivial. Average age of each animal (a SQL "group by" for instance...) try doing it this way with a million animals and computing the variance at the same time
You are a sight for sore programming eyes Keith! We cannot thank you enough for this!!
Awesome work and skills Keith. Thank you, great effort
Are you crazy man , 5 hour + course only for pandas , man your dedication for teaching is amazing
I appreciate the support!
Dude I freaking LOVE your content. I am so stoked to see this video and have it bookmarked for the rest of my data science career lol
haha love that! Glad you like the content :)
5 hours of pandas puzzles??? Just what I need!
never thought I'd ever say that in my life tbh
@@ngoclinhvu5381 Haha very fair. I found these exercises very educational for me personally, so hope that you do as well!
This is what I was looking for.
Thanks Keith
Please guys give this video a like if you haven't, it takes a lot of work to create such a masterpiece. Welcome back Keith🎉.
That small dance at 3:37:27 was a pleasant surprise out of nowhere. You are amazing :D
Hahahah glad you enjoyed that 😂
Subscribed.... the way you are doing is a genuine way of making mistakes and then learning
I'm too old for all the Minecraft or Fortnight streams, so here I'm and loving it :-)
Hahaha love this
you are so genuine and humble!
Great to have you back Keith! Going to watch it over the next couple of days and it’s gonna be my sort of bible I guess for future reference haha
Love that! I found the exercises very educational myself.
It is great to have you back teaching 🎉
This the kind of content that makes UA-cam the great source of learning it is!
This video would be really helpful.
Keep up the great work!😊
Im so happy you had to Google something easy so quick and I knew the answer 😂 it just made me feel that one im really learning and its ok to seek help if needed lol. Amazing video. Thank you
Hi Keith, Genuinely appreciate you solving all these pandas problems. I am not sure if you already have but I was wondering if you could also do one on the 100 Numpy problems? Again, thanks for you work.
19:30 I do this a lot, by passing a dict to the agg function after grouping (it allows you to asign multiple operators to several cols at once). Eg df.groupby(“animal”).agg({“age”:”mean”})
This is super useful, thank you for the tip!
this works too, df.groupby('animal').mean('age')
@@thiagosiqueira4690 I think that doesn't work for grouping by multiple columns and adding a specific function for every column
5 hours pandas video is crazyyyy. Must give a thumb up!
OMG keith you are a lifesaver! thank you!
24:27 We can use gropyby to count the animals in the following way...
df.groupby('animal')['animal'].count()
Thank you such much!! whatever you are doing actually is life changing for people like me who is self learning these! Thank you!!!
Just came here to say a big THANK YOU 🙌🏻🙌🏻
Excellent! Thank you very much for this video!! Please more with this format 👏
This is really an excellent channel on Python like "techie talkee"
1:23:56 Here's the updated code of pandas for the question 27. df.groupby(['grps'])['vals'].nlargest(3).groupby(level=0).sum()
Thanks!
Thanks Keith! This is so goodd
At 33:57 - "22) Filter duplicate integers" - Might as well try: pd.DataFrame(data=df['A'].unique(),columns=['A'])
Man, you're crazy 🤣🤣🤣🤣🤣🤣🤣🤣🤣. This is awesome! Thanks for a colossal and great video!!!🎉🎉
Thanks man, I was just looking for getting into Pandas.
50:08 In problem 23 they used df.sub to be able to specify axis, to make subtraction row-wise.
I don’t know how - operator does, does it do always row wise or always column wise or maybe it chooses every time based on input?
more of this buddy enjoyed each second
For question 27, this is something i find easy to code and understand
df = df.sort_values(by=['grps', 'vars'], ascending=[True, False])
df = df.groupby('grps').head(3).reset_index(drop=True)
df.groupby('grps')['vars'].sum()
another solution to puzzle 26:
for i in [0,1,2,3,4]:
a=df.iloc[i].sort_values(ascending=True).index[7]
print (a)
Thank you for making such wonder videos on python.🙏.please make some videos on pyspark also.
Great video! However I believe your solution for q23 was wrong. You subtracted the mean of the entire DataFrame instead of the mean of each row. It worked for your example of np.ones because the entire DataFrame had the same mean as the mean of each row (a mean of 1). You want a solution that subtracts a different value for each row, namely the row's mean.
Thank you for this Keith .
Love the long form !
Thank you so much sir for this pandas session
This is awesome! can you do for other libraries too please!!!
Looking forward to numpy puzzles now!!
Blindly subscribed
Bro, you are a genius!!!
Man you are crazy....but amazing... !! love from india.. !!
Is it ok if am struggling with the questions that are labeled hard or am I supposed to be able to answer all without searching functions up?
Very normal! I'm always looking things up in real-world projects. I would say that you should be able to get to the point where you can answer the problems without looking up the specific answer, but by looking up the functions that will help you implement the logic to get to the answer.
What do you think is a good method to concatanate a string value from datafram column to anothee dataframe column by index key. Example, df_1 rows 10, 20, 26, 30, 40 column 5 (string) concatonate to df_2 rows 9, 19, 25, 29, 39 column 1?..
Thanks dude🎉it is helpful
in your solution no 24, the operation df - df.mean(axis=1) won't work directly because the dimensions won't align
Can you do the numpy one also
pure gold 🤩
answer to question 23 is incorrect: it say the ROW mean, while df.mean() gives you the column mean
Very Nice ⚡thanks a lot ⚘
Thanks Lot Keith...😘
This is awesome. Thanks. ❤
You are very welcome!
thanks for this...needed
Awesome!! Can you please make one for Pyspark? :)
Thanks for this. Do you know some app or website, to practice or improve my scripting skills?
i think i have a better solution for filtering rows which contain the same integer as the row above
df = pd.DataFrame({'A': [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7]})
def precedingDuplicateCheck(row):
if row.name == 0:
prev = None
return False
else:
prev = df.loc[row.name -1,'A']
if prev == row['A']:
return True
else:
return False
df_new = df[~df.apply(precedingDuplicateCheck,axis=1)]
df_new
Such a good videos!!!
You look great 🎉 and thanks for posting this video.
Thank you! 😊
could somebody explain q23 because the way he is doing, i think its wrong because "df.mean()" going to give us the mean values with respect to individual columns not rows and then also in subtraction each mean value going to subtracted from individual columns respectively. we have to use the "df.mean(axis=1)" and then in subtraction also we have to take care of it .
I have done it like that "df.subtract(df.mean(axis=1),axis="index").multiply(-1)". Please correct me if i am wrong.
His solution is wrong, I agree, but not for the reason you say. His solution calculated the mean of the entire DataFrame but he should've subtracted the mean for each row (axis=0).
Edit: Never mind, I agree with you.
for the 22th quiz it could be done as simple as
for i,k in df.items():
df=set(k)
A couple of issues with the set solution.
One is that a set is not guaranteed to preserve any order that numbers are inserted into it, so even though it worked in this example, if we changed the numbers to [3,3,2,2,1,1,8,8,9], the output show {1,2,3,8,9} instead of [3,2,1,8,9].
Another issue with the set solution is that if a number reappears later on in the list, it will disregard it from the solution. So if we had [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 2, 2] as our input, your solution would output {1,2,3,4,5,6,7} while the correct solution would be [1,2,3,4,5,6,7,2].
Hope this is helpful!
Weird to see you open VSCode with the ui when the cli command is the easiest one to add to your other flow once you’re in that directory after cloning.
code .
Opens the current directory in VSCode.
Great video!!
Hi guys, is there something similar or equivalent for SQL and scikit-learn? Thank you in advance!
great work
I believe the answer for the 23rd problem is not the correct one because the average mean of the vertical column is also one
Do the same for all the popular libraries.
Great content, solution of 23 I believe is wrong.
I agree, he subtracted the mean of the entire DataFrame instead of each row. It worked for his example of np.ones because the entire DataFrame had the same mean as the mean of each row (1).
Thank you for your video, Keith!
Question for you, do you think its too late to get data science job in 2024?
The job market is challenging right now, but data science positions aren't going anywhere. You definitely can still get a data science job in 2024.
That being said, I wouldn't only look for data science positions. There are a lot of software engineering & data engineering roles that use a similar skillset that can be less competitive to land. I'd recommend keeping track of the most popular skills on job openings for all these types of roles, and tailor what you learn moving forward based on that.
I also recommend trying to network with people that are working at companies you find interesting. You'll give yourself a much better chance at landing a data science job if you are referred by someone already at a company you are applying to. Job postings on a site like LinkedIn can really difficult to progress in the process because so many people apply.
thank you very much
I love your content
great video! however, regarding the usage of the terminal to create directories etc at 0:59 , can anyone recommend some youtube videos or sources to get more familiar with it? thanks a bunch! good luck getting good at pandas everybody :)
Great tutorial!
Bro, can you do it for other libraries like numpy, seaborn, and matplotlib. Please !!!!!
Thank you 😭🩵🩵
sir could u make same like numpy video
1:10:03
You really so serious learn and post this class
Plz make practice video on "NUMPY" and "MATPLOTLIB" .
PEOPLE WHO WANT THIS VIDEO "LIKE THIS COMMENT"
Great job boss 🎉🎉🎉🎉🎉
Nice video and content. Can you also come up with similar video of pyspark.
Thank you! As of now I don't have immediate plans to Pyspark video, but I'll look more into it.
Legend
🎉🎉Cool!!!!
Finally❤😂
Is okay to collab
🎉🎉🎉🎉
Wow OMG ...
49 seconds theres a disgusting sound!
pants
pants