Skip to content Skip to sidebar Skip to footer

Python, Pandas & Chi-Squared Test Of Independence

I am quite new to Python as well as Statistics. I'm trying to apply the Chi Squared Test to determine whether previous success affects the level of change of a person (percentage w

Solution 1:

A few corrections:

  • Your expected array is not correct. You must divide by observed.sum().sum(), which is 1284, not 1000.
  • For a 2x2 contingency table such as this, the degrees of freedom is 1, not 8.
  • Your calculation of chi_squared_stat does not include a continuity correction. (But it isn't necessarily wrong to not use it--that's a judgment call for the statistician.)

All the calculations that you perform (expected matrix, statistics, degrees of freedom, p-value) are computed by chi2_contingency:

In [65]: observed
Out[65]: 
                        Previously Successful  Previously Unsuccessful
Yes - changed strategy                  129.3                   260.17
No                                      182.7                   711.83

In [66]: from scipy.stats import chi2_contingency

In [67]: chi2, p, dof, expected = chi2_contingency(observed)

In [68]: chi2
Out[68]: 23.383138325890453

In [69]: p
Out[69]: 1.3273696199438626e-06

In [70]: dof
Out[70]: 1

In [71]: expected
Out[71]: 
array([[  94.63757009,  294.83242991],
       [ 217.36242991,  677.16757009]])

By default, chi2_contingency uses a continuity correction when the contingency table is 2x2. If you prefer to not use the correction, you can disable it with the argument correction=False:

In [73]: chi2, p, dof, expected = chi2_contingency(observed, correction=False)

In [74]: chi2
Out[74]: 24.072616672232893

In [75]: p
Out[75]: 9.2770200776879643e-07

Solution 2:

degrees of freedom = (row-1)x(column-1). For a 2x2 table it is (2-1)x(2-1) = 1


Post a Comment for "Python, Pandas & Chi-Squared Test Of Independence"