Matplotlib Boxplots | Creating Single and Multiple Boxplots in Python

Поділитися
Вставка
  • Опубліковано 24 сер 2021
  • Matplotlib boxplots can be used for a variety of tasks which include: outlier detection, understanding the data range and distribution, and understanding whether the data is skewed. In this video, we take a look at creating basic boxplots with matplotlib, without the need for seaborn or any other high-level libraries.
    We also look at creating boxplots of multiple columns with different ranges using a simple Python for loop.
    If you haven't already, make sure you subscribe to the channel: / @andymcdonald42
    If you have enjoyed this video and want to say thanks, feel free to buy me a coffee at the following link: buymeacoffee.com/andymcdonaldgeo
    ----
    The notebook for this video can be found on my GitHub repository at: github.com/andymcdgeo/Andys_Y...
    Libraries used in this video:
    pandas: pandas.pydata.org
    matplotlib: matplotlib.org
    Books I Recommend:
    As an Amazon Associate I earn from qualifying purchases. By buying through any of the links below I will earn commission at no extra cost to you.
    PYTHON FOR DATA ANALYSIS: Data Wrangling with Pandas, NumPy, and IPython
    UK: amzn.to/3HNycJ9
    US: amzn.to/3DL7qPv
    FUNDAMENTALS OF PETROPHYSICS
    UK: amzn.to/3l1PgSf
    PETROPHYSICS: Theory and Practice of Measuring Reservoir Rock and Fluid Transport Properties
    UK: amzn.to/30UNWZS
    US: amzn.to/3DNqBbd
    WELL LOGGING FOR EARTH SCIENTISTS
    UK: amzn.to/3FHsbfn
    US: amzn.to/3CILAuE
    GEOLOGICAL INTERPRETATION OF WELL LOGS
    UK: amzn.to/3l2v2HV
    US: amzn.to/30UOTkU
    -----
    Thanks for watching, if you want to connect you can find me at the links below:
    / andymcdonaldgeo
    / geoandymcd
    / andymcdonaldgeo
    www.andymcdonald.scot/
    Sign up to my newsletter at:
    fabulous-founder-2965.ck.page...
    #matplotlib #petrophysics #python #boxplots #welllogs #jupyternotebooks #geoscience
  • Наука та технологія

КОМЕНТАРІ • 44

  • @johnowusukonduah2305
    @johnowusukonduah2305 Рік тому +1

    I always know my answer is certain with Andy! Thank you for your great videos, I've learnt a lot from you. You're a genius

  • @mohammadkeshtkar9655
    @mohammadkeshtkar9655 2 роки тому +4

    We are very lucky to be able to see these useful videos. Thank you Andy🙏🙏

  • @sabrinakadirova7084
    @sabrinakadirova7084 2 роки тому +1

    I liked it so much! Please, keep doing such videos, you're saving my nerves..

  • @yippiyee1
    @yippiyee1 2 роки тому

    Thanks for the informative video.

  • @mjones410
    @mjones410 2 роки тому

    super helpful thank you Andy

  • @coldtea9755
    @coldtea9755 2 роки тому

    Thank you really helpful

  • @timut1830
    @timut1830 2 роки тому

    Thank you so much for your video!

  • @anamalbulushi5332
    @anamalbulushi5332 2 роки тому

    Thank you Andy 👍🏻

  • @vito135c
    @vito135c Рік тому +1

    Thanks.

  • @alirezarahnama2096
    @alirezarahnama2096 4 місяці тому

    Hi Andy! I have been trying to make a box plot with a simple break in y-axis and have not been able to. any tips?

  • @gamuchiraindawana2827
    @gamuchiraindawana2827 3 місяці тому

    Lovely

  • @slee3083
    @slee3083 2 роки тому

    Hi Andy, looking at the last exercise using subplots, would this still work if the columns had a different number of data points from each other? I've tried similar to this video except with reading a simple csv file containing a few columns of data, with some columns having more data points than others, and the box plots with less data points (NaN) just don't show up at the end. Is there a way around this? If I plot the data separately or on the same graph (same axis) it has no problem, but only some of the subplots with fewer data points just wouldn't plot at all. Thanks

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому

      Hi S Lee. I am not 100%certain on this and would have to try. But some plots don’t handle nan values and you unfortunately have to remove them by dropping them.
      This seems to be the case with this stackoverflow question which sounds similar to what you are experiencing
      stackoverflow.com/questions/44305873/how-to-deal-with-nan-value-when-plot-boxplot-using-python

  • @balajig8522
    @balajig8522 2 роки тому

    Really nice vedio! please share the original DataFrame you used

  • @annadomas2484
    @annadomas2484 Рік тому +1

    Thank you! I am learning and your videos help a lot! I tried to use your code for my dataset but I faced with an error and do not understand where is the problem. TypeError Traceback (most recent call last)
    in
    4
    5 for i, ax in enumerate(axes.flat):
    ----> 6 ax.boxplot(data1.iloc[:,i])
    7 ax.set_title(data1.columns[i], fontsize=20, fontweight='bold')
    8 ax.tick_params(axis='y', labelsize=14)
    TypeError: unsupported operand type(s) for +: 'method' and 'float'

  • @kararshah6056
    @kararshah6056 Рік тому

    man u explained sooooooooooo good

  • @espanolaturitmoint
    @espanolaturitmoint Рік тому

    Hi, thanks a lot for the content! I need help with a boxplot... Could you tell me how you can show the points inside the boxplot and annotate a number for each point? I have a dataset only of 49 points

    • @AndyMcDonald42
      @AndyMcDonald42  Рік тому

      No problem. One way to do that is add a jitter plot on top of the box plot. I’m not so sure annotating each point would be a good idea as it may become too cluttered.
      You can see an example here
      www.python-graph-gallery.com/36-add-jitter-over-boxplot-seaborn

  • @nzambabignoumba445
    @nzambabignoumba445 2 роки тому

    Thank you!!

  • @victorjohnlaobena7099
    @victorjohnlaobena7099 2 місяці тому

    help me out alot than you!😀😀😀

  • @chisoo6903
    @chisoo6903 2 роки тому +1

    after knowing the outlier in the boxplot , what is the python command we could use to remove them from our analysis?

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому +1

      Hi Chi, you can use a small piece of code, like the one below, to remove the outliers identified by the boxplot.
      #Calculate the Quartiles
      Q1 = df.quantile(0.25)
      Q3 = df.quantile(0.75)
      #Calculate the IQR
      IQR = Q3 - Q1
      #Remove the outliers
      df_clean = df[~((df < (Q1 - 1.5 * IQR)) |(df > (Q3 + 1.5 * IQR))).any(axis=1)]
      Source: stackoverflow.com/questions/50461349/how-to-remove-outlier-from-dataframe-using-iqr

  • @josedavidbastoaguirre2099
    @josedavidbastoaguirre2099 2 роки тому

    Really nice video! Thanks.
    It would be great if you could also explain how to interpret the graphics. for instance, what is the meaning of having a lot of outliers in GR Log.
    Again, thank you very much.

    • @josedavidbastoaguirre2099
      @josedavidbastoaguirre2099 2 роки тому +1

      I mean... probably some of them are just wrong data, but maybe some outliers represent a particular lithology.

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому +1

      Thanks Jose. I am planning to cover that in a small series on outlier detection in the near future. These initial videos are focusing on how to create the plots with Python.
      I also covered this topic very briefly at this years SPWLA conference and in more detail in my Data Quality paper, which you can find at the link below.
      www.researchgate.net/publication/351607547_Data_Quality_Considerations_for_Petrophysical_Machine_Learning_Models
      You are correct that some of the outliers could be incorrectly measured data, which could be a result of tool/sensor issues, borehole washout, system issues...etc. But they could potentially reflect a particular lithology, for example a spike in the GR data may be caused by a hot sand/hot shale. That is why we need to treat some of these outlier detection methods with caution and also use our domain expertise to make the final decision.

  • @iliusmondal2098
    @iliusmondal2098 2 роки тому

    Hi Andy, Is there any way to remove the outliers?

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому

      Yes there is. You can apply the boxplot equations to a dataframe and remove points that way : datascience.stackexchange.com/questions/54808/how-to-remove-outliers-using-box-plot

  • @19neetish
    @19neetish Рік тому

    Hi, Could it be possible that using a box plot and interquartile range may not always be a good idea? for example, the formation can have n number of combinations, and fluid properties may vary too. It may result is a very wide data spread. Could it be possible that a point outside the range might be true and represent a unique rock type? Shouldn't we confirm that from the mud log?

    • @AndyMcDonald42
      @AndyMcDonald42  Рік тому +1

      Yes. That is very possible. Any outliers detected by these methods should always be checked to confirm that they are real outliers. When applying boxplots to petrophysical data I often do it by filtering for specific formations/ rock types.
      The key is not to use one method in isolation. Same principle as not trying to do an analysis based on a single curve.

    • @19neetish
      @19neetish Рік тому

      @@AndyMcDonald42 In the case of this field. Would you suggest doing the outlier analysis based on the geological age of the rock? This data is present in the dataset.
      Also, is it possible to figure out whether the log data is processed or not? I mean whether all the necessary correction has been applied by the logging company or not? Just looking at the PFE data, I can see mud has barite, and PEF readings are off the chart. It makes me think resistivity and other data might not have been corrected for borhole environment too. That would definitely mess up the model training.

  • @cypherecon5989
    @cypherecon5989 2 роки тому

    data["income"].plot(kind="box"); but it doesnt show me the y and x axis. Does anybody know why that is?

    • @cypherecon5989
      @cypherecon5989 2 роки тому

      5:01 even with the plt. command the boxplot gets plotted but without y and x axis...

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому

      I’m not sure. Have you checked over your data to make sure it’s ok and you are calling the correct column? I believe anything like nans should be handled by the plotting.
      If you are still having trouble Stackoverflow is a great place to get help and it allows you to share your code and data, which you can’t really do here

    • @cypherecon5989
      @cypherecon5989 2 роки тому

      @@AndyMcDonald42 it was my dark theme. I had to do plt.figure(facecolor="white"). :D

    • @AndyMcDonald42
      @AndyMcDonald42  2 роки тому +1

      @@cypherecon5989 Glad you got it sorted. Its always the small things that catches us out. 😁

  • @sanisalisu4929
    @sanisalisu4929 2 місяці тому

    I can send you the data and the type of boxplot Im talkig about

  • @GreyHatGenX
    @GreyHatGenX 11 місяців тому

    commnet