Ok so we have to do some statistics next week. Chi Square, how difficult can it be? I have seldom used Chi-square, as my work has been mainly with repeated data, before and after, t-test or the non-parametric, regressions and correlations, and the occasional ANOVA (for which one would usually ask a professional’s help). But Chi square is one of those simpler ones…contingency tables…like Mendelian tables. Puh.

So I open the Zar chapter, but the formulas and the numbers just make my head spin.

C’mon guys. This is 21st Century. We have software to do this! Let me show the students the Way!

Um, software.

There are tons of statistics software out there. I have personally used SPSS, Origin, and GraphPad Prism. I have also used Excel, although for more complex stuff I used macros developed by others. I liked GraphPad a lot, because it was so easy. And I know there are trial versions out there. We can do this!

Download the trial. Go to page 470 of the book. How easy can it be? Yellow and green flowers. Observed versus expected.

Data table in Prism

After that, I click Analyze and choose Chi-Square (or I can also click directly on the Chi sign).

Chi square analysis of example 22.1 of the Zar book in Prism.

Easy, right? It is not significant. The flowers have the same distribution. Nice.

Look back into the book. Darn. The Chi-square value is different. It says it is different from the expected distribution. Darn. I must have put the numbers wrong.

I will save you the next half an hour. I invert the columns. I read the book and it warns about computers doing it with only 1 degree of freedom, so I add Yates’ correction. Still, the value is different. A faint hope arises in me. I have seen errors in textbooks. Maybe…maybe. Whip up the calculator and follow the instructions.

This is not so difficult after all. I am just subtracting expected from observed, squaring the result, and then dividing by expected. Then I sum both. Yes, indeed it is 4.32. Look up the corresponding Chi value in the table. I don’t remember when was the last time I actually used a Chi-square table. Indeed, it is 3.84. Yep, it is lower. Yep, then the observed distribution is different (alternate hypothesis accepted).

Oh well. Let’s try Excel. At the end of the day, most people have access to Excel. Put in the numbers. It is not as easy as Prism, as I have to insert a function and choose among several that start with CHI but settle for CHISQ.TEST. However the data input is clearer, I have to choose between the actual and the expected range. Highlight, click Enter. A number comes up, 0.03766692. What the heck?

Another half an hour. I look up the Help section. They have an example, which I run dutifully. In the comments, it gives me the Chi square value and the probability. However, I only get the probability number:

CHISQ.TEST returns the probability that a value of the χ2 statistic at least as high as the value calculated by the above formula could have happened by chance under the assumption of independence. 

Long story short, I decide to run (completely humbled by now) Zar’s formula on the Excel example. As I crunch the numbers using the formula bar, I realize I can set up a nice template for the class to use, although I would still prefer they go through the number crunching themselves. I get the correct Chi value. Then I realize that whoever wrote the Help section was not very helpful: the probability given (which is less than 0.05) means that there is no “chance” in the calculated Chi square being higher = it is indeed independent. Why not writing it in a simple straightforward way?

Emboldened I run the green-yellow numbers in Excel using my little setup. Then, even cockier, I run example 22.3 without looking at the result. I am not doing it step by step anymore, I insert the complete calculation for one set and then just copy it for the rest. I get the correct result and I am content. Not only because it turned it ok but I have actually refreshed the knowledge, and it came back to me.

Excel analysis of examples 22.2/Zar, the Chi-square example in Excel, and the 22.3/Zar.

But what about Prism? I am sad. I decide to run the Excel example numbers, and the results come out ok. However, Zar’s 22.3 comes out wrong. I poke around in the internet and see reports of some bugs. That said, I have lost faith in it, at least for these calculations. If you know what is going on let me know!

What are the learning experiences from this episode?

  1. Thou shall be humble. I have a story about it but I save it for the next time.
  2. If you want to learn something well, learn from the bottom, ideally from scratch. This is a big deal, by the way. So much in science these days is done using kits and ready-made stuff that we forget or never learn the principle underneath. If everything goes well, it’s fine, but if it does not, how do you troubleshoot?
  3. Use more than one method when experimenting. Address different angles.
  4. Follow the scientific method- if there is a question, formulate a hypothesis and test it.
  5. Do blind tests to double check your results.
  6. Double-check, double-check, double-check.

I wrote this down to share with you a problem-solving experience, something we will be start doing next week. Being very critical of methods and thorough with data is absolute necessity of science. If you think this is something that only happens to beginners, check this Nature article out. Sloppiness, unfortunately, is becoming common, especially in the current very competitive Publish or perish culture.

Anyway, get ready to play with Excel next week! It should be fun 🙂