I hope everyone finds this video helpful. The next video of the series will be posted tomorrow at the same time. The next video will cover how to create pie charts. I'd like to thank Brilliant for sponsoring this series. If you'd like to check them out then you can sign up with this link and get 20% off your premium subscription: brilliant.org/cms
Hello Corey! Please can you advise: 1. how did you the clean the data within the column " LanguageWorkedWith" so that you can generate this clear data? 2. After I have split it and save it to another csv file a part from the main, the is the output: [(" 'JavaScript'", 53020), (" 'HTML/CSS'", 39761), (" 'Java'", 29863), ("['Bash/Shell/PowerShell'", 28340), (" 'SQL']", 28178), (" 'Python'", 26185), (" 'PHP'", 20394), (" 'SQL'", 19094), (" 'TypeScript']", 16091), ("['HTML/CSS'", 15322)] [Finished in 33.6s] 3. According the below output , how will I do so that it can bring the sum exact of the occurrence of the languages as it look like not doing it? Thank you,
In case you don't know, the shortcut for 8:13 in jupyter notebook is *Ctrl + left mouse click* on the different lines one by one. You can write at different lines at the same time.
23:40 here's that one liner if anybody's interested. Personally, I like this more. languages, popularity = map(list, zip(*language_counter.most_common(15)))
@@jg9193 but if you don't use map(list, iterable) then languages and popularity will be tuples so you cannot use reverve() for the rest of the tutorial. Or languages, popularity = [list(e) for e in zip(*language_counter.most_common(15))] without map
@@corben3348 Fair point, I didn't think of that. That said, he could just do languages[::-1] instead of languages.reverse() to reverse a tuple Then again, using list() would even be unnecessary if he did that
Man, you are awesome, everything I have learned about python started from your channel, I wish you the very best all success, as you make everyone happy, keep up the excellent work, we all heavily rely on you.
As you mentioned Zip can also be used language = cnt.most_common(10) language.reverse() language_X, language_Y = list(zip(*language)) plt.barh(language_X, language_Y)
Excellent tutorial Corey! Real life stuff and practical, including the use of Counter. It's important to show these data preparation steps. Very helpful indeed, thank you.
for those wondering how to obtain the CSV file, once you've clicked on it and you see all of the data in your web browser, just right click and say save as
Thanks a lot Corey. Really your videos are endless treasure. Just a way for plotting bar charts for more than one dataset on the same plot without need to numpy. Just use built-in map function. width = 0.25 #Width of bar plt.bar(list(map(lambda x: x-width/2, age_x)), salaries1, color = 'k', width = width) plt.bar(list(map(lambda x: x+width/2, age_x)), salaries2, color = 'r', width = width)
Amazing content Corey. The way you simplify the material and explain is awesome, many thanks. Can you please also do a video showing your setup and how you make video's. Thanks !!!
Thanks for this. Great lesson. As you say, creating multiple bars seems extraordinarily hacky. I would have thought this would be easily dealt with by a plotting library
@Corey Schafer .. I came up with below function which will handle the bar widths for multiple bar plots by itself. Just in case anybody wants to use it : ages_x = np.asarray([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35]) count = 5 width = 0.8/count def width_cal(position): shift = np.array([]) if count < 2: return ages_x if count % 2 == 0: for i in range(1, count, 2): shift = np.append(shift, (width/2 * i)) shift = np.sort(np.append(shift, np.negative(shift))) else: for i in range(0, count, 2): shift = np.append(shift, (width/2 * i)) shift = np.unique(np.sort(np.append(shift, np.negative(shift)))) shift = np.around(shift, decimals=3) return ages_x + shift[position] plt.bar(width_cal(0), dev_y, width=width, color='#444444', label="All Devs")
Hi Corey, thank you for the wonderful session , I have stuck at this point with the last example :-import csv import numpy as np import pandas as pd from collections import Counter from matplotlib import pyplot as plt plt.style.use("fivethirtyeight") data = pd.read_csv('data.csv') ids = data['Responder_id'] lang_responses = data['LanguagesWorkedWith'] language_counter = Counter() for response in lang_responses: language_counter.update(response.split(';')) languages = [] popularity = [] for item in language_counter.most_common(15): languages.append(item[0]) popularity.append(item[1]) languages.reverse() popularity.reverse() plt.barh(languages, popularity) plt.title("Most Popular Languages") # plt.ylabel("Programming Languages") plt.xlabel("Number of People Who Use") plt.tight_layout() plt.show() ### I am getting an error like AttributeError: 'float' object has no attribute 'split' ...Please explain..
At 9:30 you correct the numbers of the x-axis with plt.xticks() Couldn't we just have circumvented that problem by saying x_indexes = np.array(ages_x) instead of x_indexes = np.arange(len(ages_x)) Since that would have given us an array with the original numbers that we could add/subtract the width to/from? Is there any benefit to the plt.xticks() solution (other than seeing how xticks work)?
I thought the exact same thing, why is he making his life more complicated than necessary? There is no problem with adding/subtracting offsets directly from the ages np.array, it just works. It makes it less hacky, too.
You can also do this for geting the languages and popularity lists. languages = list(map(lambda x: x[0], language_counter.most_common(15))) print(languages) popularity = list(map(lambda x: x[1], language_counter.most_common(15))) print(popularity)
At 6:11 , even without replacing "ages_x" with "x_indexes" , and applying plt.xticks = (ticks= x_indexes,label=ages_x)at 9:38, is giving the same result, provided we convert ages_x from list to np.ndarray. With that approach we don't even need xticks(). Can we do that instead of what is shown or am i missing a point?
for unpacking counter.most_common(x) you can use: for a,b in counter.most_common(x) or for a,b in counter.items(): cause they are the same, they are a list of tuples, which is "zipped" already = meaning you can iterate of it simultaneously (a is tuple[0]. b is tuple[1]) I hope it helps you, yea you out there.
Another great tutorial. Thank you. However, using a Jupyter Notebook, I am having a problem with plt.bar, plt.barh. The error I receive is "unsupported operand type(s) for -: 'str' and 'float'.
We can not thank you enough..still thanks a ton Corey. I have an interesting observation @9.48. In the plt.xticks(...) method when I use the ticks and labels keywords it gives me AttributeError. It works when I pass the arguments without using keywords. Perhaps it has something to do with my Matplotlib version...
great video, in my jupyter notebook, the bar plots are plotted on different plots, not on same plot, although am following same steps, am i missing on something?
that feel when I paused tutorial to figure out how to extract languages and popularity from language_counter and later it turns out that you've done that exactly in the same way, lol
How to have the percentage values also listed along the Y-axis with language names as shown in the plot in the stackoverflow website (towards the end of the video)
Who needs python docs when you have such an amazing teacher
True:
Exactly Brother
Where is the CSV for this? I don't see it in the description. Thank you!
True
the teacher
I hope everyone finds this video helpful. The next video of the series will be posted tomorrow at the same time. The next video will cover how to create pie charts.
I'd like to thank Brilliant for sponsoring this series. If you'd like to check them out then you can sign up with this link and get 20% off your premium subscription:
brilliant.org/cms
As usual lovely!!!!!!!
It's a great tutorial; the only thing I was missing is to add total values on the top of each bar charts (can be trickier for stacked bar chart)
Thank you, sir, for providing top-class tutorials for free.
Hello Corey!
Please can you advise:
1. how did you the clean the data within the column " LanguageWorkedWith" so that you can generate this clear data?
2. After I have split it and save it to another csv file a part from the main, the is the output: [(" 'JavaScript'", 53020), (" 'HTML/CSS'", 39761), (" 'Java'", 29863), ("['Bash/Shell/PowerShell'", 28340), (" 'SQL']", 28178), (" 'Python'", 26185), (" 'PHP'", 20394), (" 'SQL'", 19094), (" 'TypeScript']", 16091), ("['HTML/CSS'", 15322)]
[Finished in 33.6s]
3. According the below output , how will I do so that it can bring the sum exact of the occurrence of the languages as it look like not doing it?
Thank you,
Where is the CSV for this? I don't see it in the description. Thank you!
In case you don't know, the shortcut for 8:13 in jupyter notebook is *Ctrl + left mouse click* on the different lines one by one. You can write at different lines at the same time.
Nice! Thanks form 2years later!
alt + left mouse in vs code
thanks u
@@jeffery_tang
These series is much better than the curses in Udemy I paid for. Thank you very much.
what "curses"
@@DendrocnideMoroides wannabe savage
23:40 here's that one liner if anybody's interested. Personally, I like this more.
languages, popularity = map(list, zip(*language_counter.most_common(15)))
Really nice! Could you please explain what the "*" symbol does?
nice
Or just: list(zip(*language_counter.most_common(15))). Map is unnecessary as list() automatically maps over an Iterable
@@jg9193 but if you don't use map(list, iterable) then languages and popularity will be tuples so you cannot use reverve() for the rest of the tutorial. Or languages, popularity = [list(e) for e in zip(*language_counter.most_common(15))] without map
@@corben3348 Fair point, I didn't think of that. That said, he could just do languages[::-1] instead of languages.reverse() to reverse a tuple
Then again, using list() would even be unnecessary if he did that
At 8:12, when you selected multiple locations and simultaneously type the same code to multiple lines, my world just expanded!
This series with pandas one has taken my skills to a new level.
No body teaches like you. You are the best. Amazing delivery of information, truly useful tutorials. Thank you so much.
Corey, you are great teacher. You have rare ability to explain calmly. Much appreciating your efforts.
Man, you are awesome, everything I have learned about python started from your channel, I wish you the very best all success, as you make everyone happy, keep up the excellent work, we all heavily rely on you.
Thanks! That's very kind of you.
Thank you very much bro, Greetings from Azerbaijan.
Right from reading data from a csv file to plotting it, you helped a lot of people.
I think your videos are more understandable than rest of the youtube channels
The great thing about your tutorials is that despite main topic, you learn a lot useful tricks, modules etc.
such a great Python instructor with an angelic voice. Thank you so much 😊
Corey Schafer saves my life once again...
Deep gratitude for your work, man!
This is the best content on UA-cam, thank you for so much
As you mentioned Zip can also be used
language = cnt.most_common(10)
language.reverse()
language_X, language_Y = list(zip(*language))
plt.barh(language_X, language_Y)
Another great video, thank-you. A Pandas series of videos would be awesome!
Excellent tutorial Corey! Real life stuff and practical, including the use of Counter. It's important to show these data preparation steps. Very helpful indeed, thank you.
I can't express how amazing this video is. What a great teacher you are. 🔥🔥
for those wondering how to obtain the CSV file, once you've clicked on it and you see all of the data in your web browser, just right click and say save as
jaja that was very useful, Thanks!
Thanks so much!
Thanks a lot Corey. Really your videos are endless treasure.
Just a way for plotting bar charts for more than one dataset on the same plot without need to numpy. Just use built-in map function.
width = 0.25 #Width of bar
plt.bar(list(map(lambda x: x-width/2, age_x)), salaries1, color = 'k', width = width)
plt.bar(list(map(lambda x: x+width/2, age_x)), salaries2, color = 'r', width = width)
Your videos are just sprinkled with little golden nuggets! I love it ❤
Very informative video, good job Mr Corey
2 weeks later and still not a single dislike on this video
best matplotlib tutorial ever!
Really nice work over here, the most important man on youtube for me.
This is pure Gold .
What I really like is your videos, Corey. I can learn Python and English ;D
Thanks!!
I can't believe we need this hack to make a bar chart.
Great video.
What a perfect lesson, fast and insightful pieces of knowledge...
Amazing content Corey. The way you simplify the material and explain is awesome, many thanks. Can you please also do a video showing your setup and how you make video's. Thanks !!!
This is the best fantastic lecture for the relation of Python and Pandas I've ever seen!!!!!!!!!!!!!!
Xie Xie!!!
thank you Brilliant for supporting Corey
Your explanation is awesome...thank you so much ...A great teacher for a lifetime...
thank you for always showing the clear code before abbreviating
Very helpful video. The pandas method is much simpler and easier to understand. Thanks Corey!
thank you so much sir,really glad i found ur playlist and didn't waste time on other platforms
Programming is so fun.
This is gold! Thank you very much for doing this, you have incredible talent to explain complicated stuff in an easy manner, keep up good work :)))
sad fact, if you want to open csv file in PYcharm , you have to pay for PYcharm Professional(~$230) :(
btw you are the best teacher I've ever seen
29:31 does it create a list of the variables? What if I want to remove or edit (like pop() command) an element, can I use list commands?
Great explanation...thanks a lot Corey sir
Thanks for this. Great lesson. As you say, creating multiple bars seems extraordinarily hacky. I would have thought this would be easily dealt with by a plotting library
Thank you for your work. I enjoy every lesson.
That's true......you are an amazing teacher. This was very helpful
hi Corey....god bless you
you are a life saviour for people like me
Counter() is the best thing I learned today
Great video! Thank you man
I just came across this series of videos. They are extremely good :-)
@Corey Schafer .. I came up with below function which will handle the bar widths for multiple bar plots by itself. Just in case anybody wants to use it :
ages_x = np.asarray([25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])
count = 5
width = 0.8/count
def width_cal(position):
shift = np.array([])
if count < 2:
return ages_x
if count % 2 == 0:
for i in range(1, count, 2):
shift = np.append(shift, (width/2 * i))
shift = np.sort(np.append(shift, np.negative(shift)))
else:
for i in range(0, count, 2):
shift = np.append(shift, (width/2 * i))
shift = np.unique(np.sort(np.append(shift, np.negative(shift))))
shift = np.around(shift, decimals=3)
return ages_x + shift[position]
plt.bar(width_cal(0), dev_y, width=width, color='#444444', label="All Devs")
Hi Corey, thank you for the wonderful session , I have stuck at this point with the last example :-import csv
import numpy as np
import pandas as pd
from collections import Counter
from matplotlib import pyplot as plt
plt.style.use("fivethirtyeight")
data = pd.read_csv('data.csv')
ids = data['Responder_id']
lang_responses = data['LanguagesWorkedWith']
language_counter = Counter()
for response in lang_responses:
language_counter.update(response.split(';'))
languages = []
popularity = []
for item in language_counter.most_common(15):
languages.append(item[0])
popularity.append(item[1])
languages.reverse()
popularity.reverse()
plt.barh(languages, popularity)
plt.title("Most Popular Languages")
# plt.ylabel("Programming Languages")
plt.xlabel("Number of People Who Use")
plt.tight_layout()
plt.show()
### I am getting an error like AttributeError: 'float' object has no attribute 'split' ...Please explain..
you are amazing, waiting for your data science ( ML, AI ) course...... THANKS A LOT!
At 9:30 you correct the numbers of the x-axis with plt.xticks()
Couldn't we just have circumvented that problem by saying
x_indexes = np.array(ages_x)
instead of
x_indexes = np.arange(len(ages_x))
Since that would have given us an array with the original numbers that we could add/subtract the width to/from?
Is there any benefit to the plt.xticks() solution (other than seeing how xticks work)?
I thought the exact same thing, why is he making his life more complicated than necessary? There is no problem with adding/subtracting offsets directly from the ages np.array, it just works. It makes it less hacky, too.
You're making machine learning interesting, thank you
Very nice your explanations. Congratulations.
Great tutorial sir
thank you professor. love from india. u know what i dont like to read those documentation. when i saw your videos.
You explain things really well, kudos!
You can also do this for geting the languages and popularity lists.
languages = list(map(lambda x: x[0], language_counter.most_common(15)))
print(languages)
popularity = list(map(lambda x: x[1], language_counter.most_common(15)))
print(popularity)
At 6:11 , even without replacing "ages_x" with "x_indexes" , and applying plt.xticks = (ticks= x_indexes,label=ages_x)at 9:38, is giving the same result, provided we convert ages_x from list to np.ndarray. With that approach we don't even need xticks().
Can we do that instead of what is shown or am i missing a point?
Thank you man, appreciate the effort and time you've put in creating such amazing content as these.
Thank you very much.its a great tutorial as always
Great video as always! Really helpful for detailed explanation.
Thank you for sharing your knowledge!
The best in you tube .👏
These videos are great! Coming from R (and ggplot) I was a tad skeptical that Python could emulate R when it came to data viz, but I stand corrected.
You're right
Great, amazing video
thank you very much, very clear and straight to the point!
great tutorial! the best!! thanks for teaching us!
great instructor
This is the best Corey; Thank you very much from my 🧠 and ❣
at 9:11, since we adjusted the offsets for different bars, why did the X-axis still show 0, 1, 2... only? shouldn't it have shown (0-0.5), 0, 0+0.5?
Another great video form you, Corey. Thank you, you made my day everyday!!
Amazing video !
for unpacking counter.most_common(x) you can use:
for a,b in counter.most_common(x) or for a,b in counter.items():
cause they are the same, they are a list of tuples, which is "zipped" already =
meaning you can iterate of it simultaneously (a is tuple[0]. b is tuple[1])
I hope it helps you, yea you out there.
Great videos. I'm so grateful...
Such a great help, thankyou so much!
great video.
Another great video. Thanks!!
I don't understand why we needed the numpy, what it's purpouse? 4:40
Another great tutorial. Thank you. However, using a Jupyter Notebook, I am having a problem with plt.bar, plt.barh. The error I receive is "unsupported operand type(s) for -: 'str' and 'float'.
I jus found the python legend . Thank god
In 8:20, how did you select cursors for 3 lines at the same time and wrote on them simultaneously? That's so handy
hold ctrl-shift (mac) or ctrl-alt (windows)
thank you for python tutorial
Corey. Million thanks bro
Thank you for the series of video! :)
thank you!!!! you ar an excellent teacher
Please do a tutorial on numpy as well, it would be super helpful, by the way awesome content😁
Thank you so much for your hard work! You are a great teacher and your video tutorial represent a valuable resource :)
Thank you lot sir 😃
We can not thank you enough..still thanks a ton Corey.
I have an interesting observation @9.48. In the plt.xticks(...) method when I use the ticks and labels keywords it gives me AttributeError. It works when I pass the arguments without using keywords. Perhaps it has something to do with my Matplotlib version...
Same happened with me
yes, some old version matplotlib will have this problem.
Thank you guy, I had the same problem
Thank you very much. Please, please come back!
great tutorial, thanks
great video, in my jupyter notebook, the bar plots are plotted on different plots, not on same plot, although am following same steps, am i missing on something?
that feel when I paused tutorial to figure out how to extract languages and popularity from language_counter and later it turns out that you've done that exactly in the same way, lol
How to have the percentage values also listed along the Y-axis with language names as shown in the plot in the stackoverflow website (towards the end of the video)
we can also use the dictionary's keys() and values() for getting x and y axis. x_axis = list(dict.keys())