Pdf Plotting Concern

March 31, 2024 Post a Comment

I tried the following manual approach: dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None,

Solution 1:

The first approach gives you a probability mass function. The second gives you a probability density - hence the name probability density function (pdf). Hence both are correct, they just show something different.

If you evaluate the pdf over a larger range (e.g. 10 times the standard deviation), it will look much like an expected gaussian curve.

import pandas as pd
import scipy.stats as stats
import numpy as np
import matplotlib.pyplot as plt

dict = {'id': ['a','b','c','d'], 'testers_time': [10, 30, 15, None], 'stage_1_to_2_time': [30, None, 30, None], 'activated_time' : [40, None, 45, None],'stage_2_to_3_time' : [30, None, None, None],'engaged_time' : [70, None, None, None]} 
df = pd.DataFrame(dict, columns=['id', 'testers_time', 'stage_1_to_2_time', 'activated_time', 'stage_2_to_3_time', 'engaged_time'])

df= df.dropna(subset=['testers_time']).sort_values('testers_time')

mean = np.mean(df['testers_time'])
std = np.std(df['testers_time'])
x = np.linspace(mean - 5*std, mean + 5*std)

fit = stats.norm.pdf(x, mean, std)  
print(fit)

plt.plot(x, fit, marker='.', linestyle='-')
plt.hist(df['testers_time'], normed='true')      

plt.show()

Python stackoverflow Examples

Pdf Plotting Concern

Solution 1:

Post a Comment for "Pdf Plotting Concern"