Lies, darn lies, and statistics

Leave a comment

Ok so we have to do some statistics next week. Chi Square, how difficult can it be? I have seldom used Chi-square, as my work has been mainly with repeated data, before and after, t-test or the non-parametric, regressions and correlations, and the occasional ANOVA (for which one would usually ask a professional’s help). But Chi square is one of those simpler ones…contingency tables…like Mendelian tables. Puh.

So I open the Zar chapter, but the formulas and the numbers just make my head spin.

C’mon guys. This is 21st Century. We have software to do this! Let me show the students the Way!

Um, software.

There are tons of statistics software out there. I have personally used SPSS, Origin, and GraphPad Prism. I have also used Excel, although for more complex stuff I used macros developed by others. I liked GraphPad a lot, because it was so easy. And I know there are trial versions out there. We can do this!

Download the trial. Go to page 470 of the book. How easy can it be? Yellow and green flowers. Observed versus expected.

Data table in Prism

After that, I click Analyze and choose Chi-Square (or I can also click directly on the Chi sign).

Chi square analysis of example 22.1 of the Zar book in Prism.

Easy, right? It is not significant. The flowers have the same distribution. Nice.

Look back into the book. Darn. The Chi-square value is different. It says it is different from the expected distribution. Darn. I must have put the numbers wrong.

I will save you the next half an hour. I invert the columns. I read the book and it warns about computers doing it with only 1 degree of freedom, so I add Yates’ correction. Still, the value is different. A faint hope arises in me. I have seen errors in textbooks. Maybe…maybe. Whip up the calculator and follow the instructions.

This is not so difficult after all. I am just subtracting expected from observed, squaring the result, and then dividing by expected. Then I sum both. Yes, indeed it is 4.32. Look up the corresponding Chi value in the table. I don’t remember when was the last time I actually used a Chi-square table. Indeed, it is 3.84. Yep, it is lower. Yep, then the observed distribution is different (alternate hypothesis accepted).

Oh well. Let’s try Excel. At the end of the day, most people have access to Excel. Put in the numbers. It is not as easy as Prism, as I have to insert a function and choose among several that start with CHI but settle for CHISQ.TEST. However the data input is clearer, I have to choose between the actual and the expected range. Highlight, click Enter. A number comes up, 0.03766692. What the heck?

Another half an hour. I look up the Help section. They have an example, which I run dutifully. In the comments, it gives me the Chi square value and the probability. However, I only get the probability number:

CHISQ.TEST returns the probability that a value of the χ2 statistic at least as high as the value calculated by the above formula could have happened by chance under the assumption of independence. 

Long story short, I decide to run (completely humbled by now) Zar’s formula on the Excel example. As I crunch the numbers using the formula bar, I realize I can set up a nice template for the class to use, although I would still prefer they go through the number crunching themselves. I get the correct Chi value. Then I realize that whoever wrote the Help section was not very helpful: the probability given (which is less than 0.05) means that there is no “chance” in the calculated Chi square being higher = it is indeed independent. Why not writing it in a simple straightforward way?

Emboldened I run the green-yellow numbers in Excel using my little setup. Then, even cockier, I run example 22.3 without looking at the result. I am not doing it step by step anymore, I insert the complete calculation for one set and then just copy it for the rest. I get the correct result and I am content. Not only because it turned it ok but I have actually refreshed the knowledge, and it came back to me.

Excel analysis of examples 22.2/Zar, the Chi-square example in Excel, and the 22.3/Zar.

But what about Prism? I am sad. I decide to run the Excel example numbers, and the results come out ok. However, Zar’s 22.3 comes out wrong. I poke around in the internet and see reports of some bugs. That said, I have lost faith in it, at least for these calculations. If you know what is going on let me know!

What are the learning experiences from this episode?

  1. Thou shall be humble. I have a story about it but I save it for the next time.
  2. If you want to learn something well, learn from the bottom, ideally from scratch. This is a big deal, by the way. So much in science these days is done using kits and ready-made stuff that we forget or never learn the principle underneath. If everything goes well, it’s fine, but if it does not, how do you troubleshoot?
  3. Use more than one method when experimenting. Address different angles.
  4. Follow the scientific method- if there is a question, formulate a hypothesis and test it.
  5. Do blind tests to double check your results.
  6. Double-check, double-check, double-check.

I wrote this down to share with you a problem-solving experience, something we will be start doing next week. Being very critical of methods and thorough with data is absolute necessity of science. If you think this is something that only happens to beginners, check this Nature article out. Sloppiness, unfortunately, is becoming common, especially in the current very competitive Publish or perish culture.

Anyway, get ready to play with Excel next week! It should be fun 🙂

My favorite protein assignment: tumor necrosis factor (TNF)

Leave a comment

This is an example of how to organize the protein assignment page.

Human Tumor necrosis factor 

Tumor necrosis factor (also called cachectin, TNF-alpha, or TNFSF2) is a cytokine originally described as a mediator of septic shock but it is currently considered a master regulator of cell death, survival, and organogenesis.

Tumor necrosis factor (TNF; also named TNFa) is a type II transmembrane protein with an intracellular amino terminus. It has signalling potential both as a 26 kd membrane-integrated protein and as a soluble cytokine released after cleavage by the protease TACE; its soluble form is a trimer of 17 kDa components. There are two TNF receptors: TNFR1, which is found on most cells in the body, and TNFR2, which is primarily expressed on cells of hematopoietic origin. TNFR1 is activated by both TNF forms, while TNFR2 primarily binds transmembrane TNF. TNF receptors are also shed and act as soluble TNF-binding proteins, competing with cell surface receptors for free ligand and thus inhibiting TNF action (Locksley et al, 2001, Hehlgans et al, 2005).

The signaling pathways mediated by the two receptors are slightly divergent. TNFR1 is considered to  mediate more systemic effects. The result of its activation can lead to cell proliferation or death depending on context.  In contrast to TNFR1, TNFR2 lacks a death domain.  Its biological role  is still not fully understood, although recent evidence suggests that it can modulate the actions of TNFR1 on immune and endothelial cells. Transmembrane TNF can function as both ligand and receptor: soluble TNF receptors can bind to the cytokine on the cell surface and generate reverse signaling (Balkwill, 2009).

figure of TNF amino acid sequence

TNF Sequence (from PDB)

Gene

The human TNF gene (TNFA) was cloned in 1985 (Lloyd et al, 1985). It maps to chromosome 6p21.3 (short arm),  close or within the MHC (Major Histocompatibility Complex) region. It spans about 3 kb and contains 4 exons. The 3′ UTR of TNF alpha contains an AU-rich element (ARE), providing a means of post-transcriptional control.

Protein

The protein is translated as a 233 amino acid (26kD) type II transmembrane protein, which is further cleaved by the protease TACE (ADAM 17). Both forms exist as trimers. The 17 kd TNF protomers (185-amino acid-long) are composed of two antiparallel β-pleated sheets with antiparallel β-strands, forming a ‘jelly roll’ β-structure, typical for the TNF family.

Here are two renderings of TNF from the PDB site. However, for the 3D effects you may want to visit the Jmol view.

PDB rendering of TNF

PDB rendering of TNF

molecular view of TNF

Another rendering of TNF

Protein Databank Reference (PDB): 1TNF 

Uniprot entry: P01375 

NCBI RefSeq : NP000585.2

How to critically read an article

2 Comments

Ok so we had the first class meeting yesterday. It went well, although that genome review article was a bit heavy on information and it took too long to discuss. My plan for today is to organize the next discussion so we have it more structured. Along the lines of a short intro/review, and then assign students roles for discussion.

Which makes me think about a completely different thing but along similar lines. In the conflict resolution workshops I regularly facilitate we do many communication exercises. One of the most efficient ones is called 4-four part listening, in which participants are divided in small groups and they listen to a story narrated by one person in the group. These stories are usually personal experiences of conflict. The other members listen to different aspects of the story: one focuses on the facts, the other on the emotions, and the third to the values implied in the story. At the end, the story is analyzed putting together all three aspects.

So maybe that could be a good way to analyze a scientific article also: different aspects, including what is done, why is it done, how it is done, which major aspects it touches?

And most importantly, a CRITICAL reading. Of course it is easiest for those who are in the field, but even an outsider has tools to analyze a scientific article.

<Disclaimer: I am not an expert in this particular field, so these tools are limited to my general bio expertise. I have no inside knowledge, so I may miss lots of details and nuances only obvious to the specialist. However, these are my recommendations to a Bio grad student confronted with a peer reviewed specialized article.>

Above is a Youtube video about ENCODE. It is a bit longish but it can give you a quick idea of the project. And it makes this posting more colorful 🙂

Ok back to tools of critical reading. That includes but it is not limited to:

  • Background of the article/project. For example, yesterday’s gene definition article talks about the ENCODE project. One has to look up what the project exactly is, and a quick internet search immediately brought the ENCODE website up.
  • Methods- this article was, as many of you correctly noted, a review/opinion article. There is some quick reference to the methodology involved  (tiling arrays). The article says something about the methodological complexities of it and quotes several references such as Emanuelsson et al, 2007 Rozowsky et al 2007, and the original ENCODE report. A quick look at the  wikipedia article gives an explanation of the method and a whole series of references. Do you need to look them all up? Well, not right now- but if are going DEEP into a topic you have to look up the original references. Thou shall not trust a review article’s summary of a methodology. And methodology sometimes is critical. Lots of published strange data abound due to contaminated cultures, incorrectly bred mouse lines, and lack of good controls. Anybody heard of the XRMV and chronic fatigue disease connection fiasco? The arsenic life controversy? There is still people out there who doubt that prions exist.
  • The basics. Quiz yourself while reading. Consider it mental workout. When you read about concepts that you know you have seen, take a pause before going on. Splicing. Transposons. UTRs. Try to remember what you remember. Look them up if needed. Read what the article says. Things may have changed since you last studied it. Take a mental note, and try rewiring those old synapses. It is hard, I know.
  • The new stuff. Well, you have to read, double-check, and enhance your mind.
  • Afterlife of the article. If you go to the article’s official page and scroll down, you will find its official afterlife: related articles and articles that have cited it. Those can mean good things (the article is relevant and many consider it a good reference) or bad things (the article is crappy and there is a lot of comments and rebuttals). In this article’s case, there are some that cite is positively and others consider it reductionistic. To see more personal takes of articles, google the article and look for blog postings about it (once you know the field you know whose blogs’ to look for), such as this one, skeptical of it (similarly to the discussion we had in class, about possible artefacts), although the comments are quite interesting to read to. By the way I subscribed to this particular blog (Sandwalk). OMG those Monday molecule riddles! Very proud I figured B12 out. Seeing the Co atom in the middle helped. But I am digressing.Structure of vitamin B12

Well this turned out to be quite a long posting! Hope this helps for the future. Your comments are welcome!

Course starts today!

Leave a comment

Yep, course starts today. I am happy I did as much as I could before the Spring Break, as primitive camping and offroad driving does not lend itself very well to science. I did read one of my chosen articles for the first week about the definition of a gene from a historical perspective, and was very content with the choice.The article by  Gerstein et al, titled Gene post-ENCODE, is indeed an ideal article to start the discussion of a molecular biology course. How the perception of the concept of gene has changed from the abstract “units” of hereditary information envisioned by Mendel to the code of a biological (and slightly sloppy) operative system! The full text of the article is available free online here.

The other choice was not as satisfying. My idea is to combine classic articles with some newer and even controversial articles, so the classic was one dedicated to the details of meiosis. Meiosis in particular seemed a good topic as in general bio classes we tend to go over the results of meiosis, but not so much of the hows. Roeder’s excellent and detailed article goes really deep…very deep. I gave up reading it after the third page, as I had to read each sentence three times due to the high information density. I am skipping it the first week- maybe later, depending on students’ topic choices. Here is the article, available for free.

Which takes me to a reflection that I hope to discuss with students today: scientific communication. I hope this newer generation of scientists realize how important is to be able to communicate science effectively, across disciplines and even to the lay public. While there is a place for condensed scientific writing only understandable for the initiated ones, the times have a’changed, and scientists need to become better communicators and educators. And here is a long but worth to listen reflection on the matter: Professor Vincent Racaniello’s acceptance speech of the Peter Wildy Prize for Microbiology Education.

He talks among other things about social media. I visited my Twitter feed yesterday and one of the scientist I follow was tweeting about genetic Cell reviews, great each and every of them, and one caught my eye: a review of epigenetics. The article is available here, and I decided to bring it to class today for discussion, as it seems the perfect complement for the gene article.So far so good! Looking forward today 🙂

Newer Entries

CUREing Ocean Plastics

STEM education exploring ocean plastic pollution

about flexible, distance and online learning (FDOL)

FDOL, an open course using COOL FISh

Main Admin Site for the WPVIP multisite

This multisite hosts public sites for Parse.ly and WordPress VIP

#Microjc

An Online Summer Book Club of Science

barralopolis

Teaching and learning reflections around science education

Disrupted Physician

The Physician Wellness Movement and Illegitimate Authority: The Need for Revolt and Reconstruction

The Blog of Author Tim Ferriss

Tim Ferriss is the author of five #1 New York Times bestsellers and host of The Tim Ferriss Show podcast.

Here is Havana

A blog written by the gringa next door

Storyshucker

A blog full of humorous and poignant observations.

Jung's Biology Blog

Teaching biology; bioinformatics; PSMs; academia, openteaching, openlearning

blogruedadelavida

Reflexiones sobre asuntos variados, desde criminologia hasta artes ocultas.

Humanitarian Cafe

Think Outside the Box

Small Pond Science

Research, teaching, and mentorship in the sciences

Small Things Considered

Teaching and learning reflections around science education

1 Year and a 100 Books

No two people read the same book