Matplotlib Boxplots | Creating Single and Multiple Boxplots in Python
Вставка
- Опубліковано 24 сер 2021
- Matplotlib boxplots can be used for a variety of tasks which include: outlier detection, understanding the data range and distribution, and understanding whether the data is skewed. In this video, we take a look at creating basic boxplots with matplotlib, without the need for seaborn or any other high-level libraries.
We also look at creating boxplots of multiple columns with different ranges using a simple Python for loop.
If you haven't already, make sure you subscribe to the channel: / @andymcdonald42
If you have enjoyed this video and want to say thanks, feel free to buy me a coffee at the following link: buymeacoffee.com/andymcdonaldgeo
----
The notebook for this video can be found on my GitHub repository at: github.com/andymcdgeo/Andys_Y...
Libraries used in this video:
pandas: pandas.pydata.org
matplotlib: matplotlib.org
Books I Recommend:
As an Amazon Associate I earn from qualifying purchases. By buying through any of the links below I will earn commission at no extra cost to you.
PYTHON FOR DATA ANALYSIS: Data Wrangling with Pandas, NumPy, and IPython
UK: amzn.to/3HNycJ9
US: amzn.to/3DL7qPv
FUNDAMENTALS OF PETROPHYSICS
UK: amzn.to/3l1PgSf
PETROPHYSICS: Theory and Practice of Measuring Reservoir Rock and Fluid Transport Properties
UK: amzn.to/30UNWZS
US: amzn.to/3DNqBbd
WELL LOGGING FOR EARTH SCIENTISTS
UK: amzn.to/3FHsbfn
US: amzn.to/3CILAuE
GEOLOGICAL INTERPRETATION OF WELL LOGS
UK: amzn.to/3l2v2HV
US: amzn.to/30UOTkU
-----
Thanks for watching, if you want to connect you can find me at the links below:
/ andymcdonaldgeo
/ geoandymcd
/ andymcdonaldgeo
www.andymcdonald.scot/
Sign up to my newsletter at:
fabulous-founder-2965.ck.page...
#matplotlib #petrophysics #python #boxplots #welllogs #jupyternotebooks #geoscience - Наука та технологія
I always know my answer is certain with Andy! Thank you for your great videos, I've learnt a lot from you. You're a genius
We are very lucky to be able to see these useful videos. Thank you Andy🙏🙏
Thanks. I am glad you like them!
I liked it so much! Please, keep doing such videos, you're saving my nerves..
Thanks. I have plenty more to come 😁
Thanks for the informative video.
super helpful thank you Andy
Happy to help
Thank you really helpful
Thank you so much for your video!
No worries!
Thank you Andy 👍🏻
Any time 👍
Thanks.
Hi Andy! I have been trying to make a box plot with a simple break in y-axis and have not been able to. any tips?
Lovely
Hi Andy, looking at the last exercise using subplots, would this still work if the columns had a different number of data points from each other? I've tried similar to this video except with reading a simple csv file containing a few columns of data, with some columns having more data points than others, and the box plots with less data points (NaN) just don't show up at the end. Is there a way around this? If I plot the data separately or on the same graph (same axis) it has no problem, but only some of the subplots with fewer data points just wouldn't plot at all. Thanks
Hi S Lee. I am not 100%certain on this and would have to try. But some plots don’t handle nan values and you unfortunately have to remove them by dropping them.
This seems to be the case with this stackoverflow question which sounds similar to what you are experiencing
stackoverflow.com/questions/44305873/how-to-deal-with-nan-value-when-plot-boxplot-using-python
Really nice vedio! please share the original DataFrame you used
Thank you! I am learning and your videos help a lot! I tried to use your code for my dataset but I faced with an error and do not understand where is the problem. TypeError Traceback (most recent call last)
in
4
5 for i, ax in enumerate(axes.flat):
----> 6 ax.boxplot(data1.iloc[:,i])
7 ax.set_title(data1.columns[i], fontsize=20, fontweight='bold')
8 ax.tick_params(axis='y', labelsize=14)
TypeError: unsupported operand type(s) for +: 'method' and 'float'
man u explained sooooooooooo good
Thanks. I am glad it helped :)
Hi, thanks a lot for the content! I need help with a boxplot... Could you tell me how you can show the points inside the boxplot and annotate a number for each point? I have a dataset only of 49 points
No problem. One way to do that is add a jitter plot on top of the box plot. I’m not so sure annotating each point would be a good idea as it may become too cluttered.
You can see an example here
www.python-graph-gallery.com/36-add-jitter-over-boxplot-seaborn
Thank you!!
You're welcome!
help me out alot than you!😀😀😀
after knowing the outlier in the boxplot , what is the python command we could use to remove them from our analysis?
Hi Chi, you can use a small piece of code, like the one below, to remove the outliers identified by the boxplot.
#Calculate the Quartiles
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
#Calculate the IQR
IQR = Q3 - Q1
#Remove the outliers
df_clean = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
Source: stackoverflow.com/questions/50461349/how-to-remove-outlier-from-dataframe-using-iqr
Really nice video! Thanks.
It would be great if you could also explain how to interpret the graphics. for instance, what is the meaning of having a lot of outliers in GR Log.
Again, thank you very much.
I mean... probably some of them are just wrong data, but maybe some outliers represent a particular lithology.
Thanks Jose. I am planning to cover that in a small series on outlier detection in the near future. These initial videos are focusing on how to create the plots with Python.
I also covered this topic very briefly at this years SPWLA conference and in more detail in my Data Quality paper, which you can find at the link below.
www.researchgate.net/publication/351607547_Data_Quality_Considerations_for_Petrophysical_Machine_Learning_Models
You are correct that some of the outliers could be incorrectly measured data, which could be a result of tool/sensor issues, borehole washout, system issues...etc. But they could potentially reflect a particular lithology, for example a spike in the GR data may be caused by a hot sand/hot shale. That is why we need to treat some of these outlier detection methods with caution and also use our domain expertise to make the final decision.
Hi Andy, Is there any way to remove the outliers?
Yes there is. You can apply the boxplot equations to a dataframe and remove points that way : datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot
Hi, Could it be possible that using a box plot and interquartile range may not always be a good idea? for example, the formation can have n number of combinations, and fluid properties may vary too. It may result is a very wide data spread. Could it be possible that a point outside the range might be true and represent a unique rock type? Shouldn't we confirm that from the mud log?
Yes. That is very possible. Any outliers detected by these methods should always be checked to confirm that they are real outliers. When applying boxplots to petrophysical data I often do it by filtering for specific formations/ rock types.
The key is not to use one method in isolation. Same principle as not trying to do an analysis based on a single curve.
@@AndyMcDonald42 In the case of this field. Would you suggest doing the outlier analysis based on the geological age of the rock? This data is present in the dataset.
Also, is it possible to figure out whether the log data is processed or not? I mean whether all the necessary correction has been applied by the logging company or not? Just looking at the PFE data, I can see mud has barite, and PEF readings are off the chart. It makes me think resistivity and other data might not have been corrected for borhole environment too. That would definitely mess up the model training.
data["income"].plot(kind="box"); but it doesnt show me the y and x axis. Does anybody know why that is?
5:01 even with the plt. command the boxplot gets plotted but without y and x axis...
I’m not sure. Have you checked over your data to make sure it’s ok and you are calling the correct column? I believe anything like nans should be handled by the plotting.
If you are still having trouble Stackoverflow is a great place to get help and it allows you to share your code and data, which you can’t really do here
@@AndyMcDonald42 it was my dark theme. I had to do plt.figure(facecolor="white"). :D
@@cypherecon5989 Glad you got it sorted. Its always the small things that catches us out. 😁
I can send you the data and the type of boxplot Im talkig about
commnet