How to make scatter plot with trendline and stats in python
Вставка
- Опубліковано 14 жов 2024
- Get a chart with a linear regression line of best fit and the equation of the line, the r-squared value and the p-value.
---------------------------------------------------------------------------------------------------------------------
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x = np.array([2,4,5,7,10])
y = np.array([1,5,4,8,11])
plt.scatter(x,y)
slope, intercept, rvalue, pvalue, stderr = stats.linregress(x,y)
plt.plot(x,slope*x+intercept)
plt.annotate("y=%.3fx+%.3f
R$^2$=%.3f
p=%.3f"%(slope,intercept,rvalue**2,pvalue),xy=(0.15,0.7),xycoords='figure fraction')
plt.show()
If you are doing this with dates, scipy can't handle dates so you have to use:
Import matplotlib.dates as mdates
new_x = mdates.date2num(x)
To covert your dates into numbers.
Then the slope will be in units of days.
Thank you so much for the thorough tutorial!!! Love you voice!!!
Thanks.. gave a really nice process of construction.. got me plaing for a few hours after watching and here is my update:
def plot_regression(x, y, x_label='X', y_label='Y', title='Linear Regression'):
plt.scatter(x, y, c='black', marker='x', label='Data Points', alpha=0.7, s=20)
# Linear regression with dark blue line and narrower width
slope, intercept, rvalue, pvalue, stderr = stats.linregress(x, y)
plt.plot(x, slope * x + intercept, c='darkblue', label='Regression Line', linewidth=1)
quantile_values = np.arange(1, 101)
x_quantiles = np.percentile(x, quantile_values)
y_quantiles = np.percentile(y, quantile_values)
plt.scatter(x_quantiles, y_quantiles, c='red', marker='x', s=20, label='Quantiles')
# Annotation
annotation_text = (f"y = {slope:.3f}x + {intercept:.3f}
"
f"R$^2$ = {rvalue**2:.3f}
"
f"p = {pvalue:.3f}")
plt.annotate(annotation_text, xy=(0.15, 0.7), xycoords='axes fraction', fontsize=10,
bbox=dict(facecolor='white', edgecolor='black', boxstyle='square,pad=0.5'))
# Labels, title, and legend
plt.xlabel(x_label)
plt.ylabel(y_label)
plt.title(title)
# Position legend outside the plot area to the right
plt.legend(loc='upper left', bbox_to_anchor=(1, 1))
# Add light grid lines
plt.grid(True, which='both', linestyle='--', linewidth=0.5, color='grey', alpha=0.5)
# Adjust layout to accommodate the legend
plt.tight_layout(rect=[0, 0, 0.85, 1])
# Show plot
plt.show()
How can I do this for a log - log plot?
numpy.log10(x) will take the base-10 log of everything in your x array, you can do the same with y. numpy.log(x) will take the natural log (ln). I'm guessing there are ways to use other bases if you need to for whatever reason. I think the rest should be the same, you just need to make it clear that you are plotting and doing statistics on the log of your data.