Pdf Plotting Concern
I tried the following manual approach: dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None,
Solution 1:
The first approach gives you a probability mass function. The second gives you a probability density - hence the name probability density function (pdf). Hence both are correct, they just show something different.
If you evaluate the pdf over a larger range (e.g. 10 times the standard deviation), it will look much like an expected gaussian curve.
import pandas as pd
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt
dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None, 45, None],'stage_2_to_3_time' : [30, None, None, None],'engaged_time' : [70, None, None, None]}
df = pd.DataFrame(dict, columns=['id', 'testers_time', 'stage_1_to_2_time', 'activated_time', 'stage_2_to_3_time', 'engaged_time'])
df= df.dropna(subset=['testers_time']).sort_values('testers_time')
mean = np.mean(df['testers_time'])
std = np.std(df['testers_time'])
x = np.linspace(mean - 5*std, mean + 5*std)
fit = stats.norm.pdf(x, mean, std)
print(fit)
plt.plot(x, fit, marker='.', linestyle='-')
plt.hist(df['testers_time'], normed='true')
plt.show()
Post a Comment for "Pdf Plotting Concern"