By Dr. Robert Gerwien
formerly
: Department of Biology, Bates College

RESOURCE MATERIALS INDEX

| WHY? | inferential statisitics | types of data | central tendency | measures of variation |

| parametric statistics | assumptions of.. | t-test | ANOVA | correlation and regression |

| nonparam. statistics | assumptions of... | chi-square one sample test |chi-square 2-sample test | other nonparam. tests |

WHICH TEST DO I USE?: Flow Chart


Introduction

As maturing biologists, much of your life will be spent collecting data and deciding what to do with it. Unfortunately, this task has caused many in our profession to oscillate between anxiety and apoplexy and that need not be the case. This guide is meant to alleviate your pain and make statistics approachable for the non-mathematically inclined.

Top of page

INFERENTIAL STATISTICS

Your goal as a scientist is to find answers to questions that interest you. This is often accomplished by Hypothesis Testing. For example, if you are interested in the effect of a certain drug on human physiology, one question you could ask is what effect does this drug have on heart rate? Based on personal familiarity, library research, or intuition you may think that this drug will cause an increase in heart rate. This best guess answer to our experimental question is known as an Experimental or Research Hypothesis and it plays a central role in the scientific method. Experimental hypotheses refer to proximate (mechanistic) or ultimate (evolutionary) causation of biological phenomena. On the other hand, Statistical Hypotheses are statements about how general our observed phenomena may be. To use the language of statisticians, they are statements about population parameters (a population in this case is a complete set of individuals, objects, or measurements having some common observable characteristic). The important thing to remember about statistical hypotheses is that they can be evaluated by statistical tests.

There are two types of statistical hypotheses that you need to be concerned with: a Null Hypothesis and an Alternative Hypothesis. A null hypothesis is a hypothesis of no difference (hence the word null). In the above example, our null hypothesis could be stated as: "there is no difference in heart rate between individuals given the drug and our control group." The alternative hypothesis is simply the opposite: "there is a difference in heart rate between individuals given the drug and controls." These are two mutually exclusive hypotheses and both must be stated prior to analyzing your data. If there is one take home message of this entire manuscript this is it: All statistics do is assign a probability level to your null hypothesis. In other words, it tells you how likely it is that your null hypothesis is true.

Top of page

Researchers take a gamble with hypothesis testing. There is always a chance that they might make a mistake because they are dealing with probabilities. There are two types of mistakes that we could make. We could either reject our null hypothesis when it is really true (a Type-I error or an a-error) or fail to reject our null hypothesis when it is really false (a Type-II error). Intuitively, I hope, you should recognize that it is worse to make a Type-I error (saying there is a difference when there is not) than a Type-II error (failing to detect a difference). In order to minimize the risk of making a Type-I error, we generally set our cut off probability level for rejecting the null hypothesis at a low value. This cut off value is known as the a-level or level of significance. This level is usually set at 0.05 for no other reason than it is generally accepted to be a reasonable level of risk. An a-level of 0.05 means that there is a 5% chance our null hypothesis is correct. Conversely, there is a 95% chance that it is wrong. If our statistical analysis yields a probability level less than 0.05, we would reject our null hypothesis and accept our alternative hypothesis. If it is greater than 0.05, we have failed to reject our null hypothesis (note it is never appropriate to "accept" your null hypothesis). The former case (p<0.05) is generally referred to as a significant difference.

Top of page

TYPES OF DATA

Scientists collect data (plural of datum) in order to answer questions. The type of data collected will be an important determinant of what statistical test you decide to use. Biology deals with things we count or measure. As such, there are two types of data we need to be concerned with. Things that we count, discrete numbers, include all types of categorical data. The number of individuals seen at a given time, the number of individuals with a certain color, the number of males and females in your class are all examples of discrete data. With this type of data an individual can belong to one and only one category. For example, no one individual can be both a male and a female. These different categories are discrete and the category each individual belongs to is "known without error."

Data that we measure, continuous numbers, are not known without error. For example, the size distribution of pines in a forest depends on how precisely the trees were measured. Continuous variables include commonly measured parameters such as lengths, weight, volumes, time, rates, etc.. Because continuous variables have a distribution, they have the advantage of being analyzed by more powerful statistical methods. However, they are often more difficult to intuitively understand.

Top of page

Let's return to the example given in the first section: the effect of a drug on heart rate. Our experiment consist of two treatment groups, individuals given the drug and controls given a placebo. There are 10 replicates in each treatment group and the results (in beats per minute) are presented below.

 

Heart Rate
 Treatment

 1

2

3

4

5

6

7

8

9

10
 Drug

 76

88

72

83

85

81

94

90

78

85
 Placebo (Control)

76

78

76

74

83

71

79

81

68

88

What can we conclude from this experiment? Does the drug cause the heart to beat faster? It looks like it might, but we should be hesitant to conclude anything yet. The difficulty arises because our data are variable. In order to understand continuous data, you should approach their interpretation in three steps: 1) plot the data, 2) summarize the data, and 3) analyze the data.

Pictorial tools offer a convenient way to visualize complex numerical systems. Thus, our first step in interpreting this data will be a graph called a histogram or frequency distribution. Lets consider the control group. Start by ordering the data and then grouping them into convenient size classes:

 Control:

 68

71

74

76

76

78

79

81

83

88

 

The range goes from 68-88 and we can group these data into intervals of five beats per minute, and then graph the results as a bar graph:

 

 Interval

 Frequency

 66-70

1

71-75

2

76-80

4

81-85

2

86-90

1
 

This graph has a nice symmetrical shape. If we were to take an infinite number of samples and the graph still exhibited this symmetry, the data are said to be "normally distributed." Normal, in this case, does not mean natural or expected. It is simply a name given to these types of distributions. You should examine a picture of a normal distribution in any statistical textbook to note its structure. This distribution will become very important later on.

Top of page

The next step in interpreting your data is to summarize it. There are two approaches to numerically summarizing your data. First, we need to find out what single number best represents this data. This will be our estimate of central tendency. Then we need a way to estimate the spread around our central tendency.

There are three common measures of central tendency. The mean, median, and the mode. The mode is the data value that occurs most frequently and the median is the data value that occurs at the precise middle of all data points. While these two metrics are very important and can be more appropriate than the mean in many cases, the mean is the most commonly used measure in biology. As most of you are aware the mean is the numerical average of all the data points. To calculate it, you add up all the values and devide by the total number of values you added. In our experiment, the mean of the control group is 77.4 while the mean of individuals given the experimental drug is 83.2.

Top of page

The mean is perhaps the most important single measure you can use to represent variable data. However, using only the mean you have no idea of how much variation there is in the data. Therefore, the next step in summarizing the data is to develop a measure of how much spread there is around the mean. Lets start by simplifying our data set by taking only the first 5 observations of the control:

 

 Heart Rate
 Treatment

 1

2

3

4

5

 Mean
 Control

 76

78

76

74

83

77.4

 

Since we are interested in the spread of the data around the mean, the most intuitive thing to do is to take the difference between the mean and each observation:

76-77.4 = -1.4
78-
77.4 = 0.6
76-
77.4 = -1.4
74-
77.4 = -3.4
83-
77.4 = 5.6

What we want to do is estimate the mean deviations from the mean (whew!). However, you will note that the sum of the above differences is equal to zero (as is always the case). A simple way around this is to take the average of the squared deviations like so:

This value is called the variance (s2) and is a perfectly acceptable measure of variation. However, our original data was in beats per minute while this is in (beats per minute)2. To get something with the same units as our original data, take the square root:

This last value is called the standard deviation and its formula is given as:

The standard deviation and the variance are very useful estimates of variation. However, they are both very sensitive to sample size. Extremely small samples inflate these measures of variation and large samples truncates them. In order to alleviate this problem, you should always use the unbiased standard deviation and variance. To unbias these equations change the denominator from n to n-1 as follows:

Top of page

STATISTICAL ANALYSES

Having collected your data, and if necessary, graphed and summarized your data, you are now ready to begin analyzing your data. Remember, all these tools do is provide you with a way to assign probability values to your null hypothesis (if p<0.05: Reject null; if p>0.05: failed to reject null). For the majority of you, the bulk of your statistical analyses will be done on computer and you will be tempted to ignore the following equations. I want to caution you that it is very dangerous to use an equation you do not understand.

Top of page

I. Parametric Statistics

Parametric analyses are the oldest and most commonly used type of analysis. They are capable of handling large data sets and very complex experimental designs quite easily. Furthermore, the most common ones (correlation, t-test, analysis of variance) are available on every statistical package for the computer as well as many scientific calculators. All parametric statistics have three common assumptions that must be met before proceeding.

  • First, all observations are independent of other observations. This assumption is the product of a carefully designed experiment and needs no formal testing.
  • Second, data are normally distributed which can be easily tested by examining your frequency distribution.
  • The final assumption is that the variances in the different treatment groups are the same. There are several statistical tests available to test this assumption (e.g. the F-Max Test, Bartlett's Test) and they are often done with parametric analyses on many statistical programs. However, if you do not have access to a computer, it is perfectly acceptable to examine your standard deviations and look for treatment groups with standard deviations that are much larger (e.g. an order of magnitude larger) than the others.

Top of page

A. Student’s T-Test

This analysis is used when you are comparing two different samples, as is the case with the heart rate experiment outlined above. Recall that our null hypothesis is that there is no difference in heart rate between individuals receiving the experimental drug and those receiving a placebo. To determine the probability that this hypothesis is correct, we will use the following equation:

Where x1 and x2 are the means, s21 and s22 are the unbiased variances, and n1 and n2 are the sample sizes of treatment groups one and two respectively. If n1=n2 ( the same number of observations in each treatment group), this equation simplifies to:

The means and variances for our heart rate experiment are summarized in the following table:

 Mean

 Variance (unbiased)

  n
 Drug

 83.2

44.62

10
 Control

77.4

33.87

10


Substituting these values into the above equation we find that t=2.07. Next we look up the critical value of t in the table at the end of this chapter. The critical value is determined by the level of significance (usually 0.05) and the degrees of freedom (df), which is calculated for this test as df=n1+n2-2. If the t-statistic that we calculated from the above equation is equal to or greater than the critical value, we reject our null hypothesis and say that "there are significant differences between these treatment groups." In our case T
crit(a=0.05, df=18)=2.101. This number is larger than our calculated number therefore we have failed to reject our null hypothesis. We cannot safely conclude that the drug had an effect on heart rate.

Top of page

B. Analysis of Variance (ANOVA)

Analysis of variance is used to determine if differences exist between more than two treatment groups. While the computations involved in ANOVA are not difficult, they are beyond the scope of this guide. Interested persons are advised to consult any introductory statistics text to see how it is done.

The assumptions of ANOVA are identical to the t-test and the calculated statistic is called an F-value which has a probability value associated with it. As with the t-test, if our probability value is less than 0.05 we reject our null hypothesis (in this case that there is no difference among the treatment groups). This p-value only tells if there are significant differences among our groups. It does not tell us where these differences are. In other words, in an experiment with five treatment groups and a significant p value, we know that there are some differences among these groups but we do not know specifically which groups are different. As a result, ANOVA is usually performed in conjunction with a post hoc multiple comparisons test (e.g. Bonferoni’s test or Tukey’s test) that will tell you precisely where the differences lie.

Top of page

C. Correlation and Regression

In some cases we are not interested in whether or not there is a difference between two groups, instead we want to know whether two variables are related. For example, suppose you have the following data on sprint speed and muscle mass for a variety of lizards:

 Lizard

 Muscle mass (g)

 Sprint speed (m/s)

 1

 5

 12

 2

 4

 10

 3

 6

 14

 4

 7

 15

 5

 3

 7

A good experimental question for this type of data is: "does sprint speed increase with muscle mass?" In order to answer this question, the appropriate analysis is a correlation method called Pearson’s r. The r statistic has a range of values from -1.00 (a perfect negative correlation) to 1.00 (a perfect positive correlation). A negative correlation means that as one variable increases in size, the other decreases. A positive correlation means that as one variable increases so does the other. When r=0.00 there is no relationship between the two variables. The null hypothesis concerned with these types of experiments is that there is no relationship between the two variables. In other words r=0.00. This test has the same three assumptions as other parametric analyses, but it also has the additional assumption that the relationship between the two variables is linear.

Top of page

The calculations for Pearson’s r are a rather laborious process and most will prefer to use a computer to carry out the computations. An important thing to remember is that correlation does not imply causation. In the above example, there is a strong correlation between muscle mass and sprint speed (convince yourself on a computer). However, it would be wrong to conclude that an increase in muscle mass causes an increase in sprint speed. Perhaps fast sprinters have bigger muscles because they sprint more (sprinting causes greater musculature) or perhaps a third, unmeasured, variable is causing the relationship between sprint speed and muscle mass.

In order to determine causation you must change the way you conduct your experiment. If we wish to examine whether an increase in muscle mass causes an increase in sprint speed we need to manipulate muscle mass and examine the effects on sprint speed. If we are able to design such an experiment, the appropriate analysis would then be a regression which is computationally similar to Pearson’s r (and equally laborious). In a regression analysis, the test statistic is called the coefficient of variation ( R2). The coefficient of variation has a range of values from 0%-100%. An R2 of 75% means that "75% of the variation in the dependent variable (the variable you measure) is due to variation in the independent variable (the variable you manipulate)." It is important to realize that just because you run a regression does not necessarily make it a causative relationship. In order to demonstrate causation (and use a regression) you must perform a controlled experiment or have a very good a priori reason for assuming causation.

Top of page

 

 

II. Nonparametric statistics

Most nonparametric statistics have only recently been developed. They are very simple to use, do not require large data sets, and have few underlying assumptions. Although they are not as powerful as parametric statistics (i.e. they are not very good at detecting small differences between groups), in most cases they are perfectly acceptable. Non-parametric tests all assume independence of observations. In other words your previous observation does not influence subsequent observations. If you counted the number of pine trees in a forest in 1996 and again in 1997, we do not have two independent measures of number of pines because the number observed in 1996 will strongly influence the number observed in 1997. However, counting the number of pines from two different locations will be independent of each other.

Top of page

A. Chi-Squared One Sample Test

The Chisquare one sample test is used with discrete data to determine if observed frequency counts differ from expected frequency counts. To determine c2 values use the following formula:

Where O is the observed frequency, E is the expected frequency, and K is the number of categories. For example, suppose you perform the following monohybrid cross:

vg+vg X vg+vg

where vg+ is the wild type allele and vg is the allele for vestigial wings. Our null hypothesis is that there is no difference between the observed ratio and the expected ratio of a monohybrid cross (i.e. 3:1). The progeny of this cross are scored as follows:

Wild Type= 750 flies
Vestigial= 125 flies

These are our observed frequencies. The expected ratio from a monohybrid cross is 3:1. Therefor, our expected number of wild type flies is (3/4)(875)=656.25. Our expected number of vestigial flies is (1/4)(875)=218.75. Substituting these values into the above equation, c2=53.6. The degrees of freedom (df) for this test is the number of categories minus one (k-1). Turning to the statistics table, the critical value of c2 for 1 df and a significance level of 0.05 is 3.84. Since our calculated value is so much larger than the critical value, we reject our null hypothesis.

Top of page

B. Chi-Squared Two Sample Test

Suppose we have a set of observations that can be classified according to two different types of attributes. For example, suppose we have captured all of the salamanders in a forest and classify them according to species and what type of substrate they were found on. We may be interested in knowing if there is an association between species and substrate. Of course, our null hypothesis is that there will be no difference between substrate choices among the different species. You collect the following data:

  Salamander Species
 Substrate

 S1

 S2

 Total

 Log

 50

 5

 55

 Leaf

 20

 100

 120

 Total

 70

 105

 175

We can use the same formula to calculate c2, the only difference between this analysis and the previous one is the way we calculate our expected frequencies (E). To calculate E for each cell, multiply the cells row total by its column total and divide by the grand total (the probability of two independent events occurring simultaneously is equal to the product of the individual probabilities). For example, the expected number of S1 salamanders associated with logs is (70)(55)/175=22. The remaining expected values are shown in the table below.

  Salamander Species
 Substrate

 S1

 S2

 Total

 Log

 22

 33

 55

 Leaf

 48

 72

 120

 Total

 70

 105

 175

Our degrees of freedom are df=(number of rows-1)(number of columns-1)=1. If you plug these values into the chi-squared equation you find that the calculated value (86.6) is much larger than the critical value from the table (3.84) so we can reject the null hypothesis and conclude that different species prefer different types of substrate.

Top of page

C. Other Nonparametric Tests

Other non-parametric test that are available on most computer programs also deserve some mention. Wilcoxin’s Rank Sums Test and the Mann-Whitney U Test are analyses that test for differences between two treatment groups (i.e. a non parametric t-test). Kruskall-Wallis tests simultaneously for differences between more than two different treatment groups (a nonparametric ANOVA). Spearman’s correlation is a non-parametric correlation analysis . All of these analyses can be used instead of a parametric analysis when the data are not normally distributed. However, these analyses do require that the original data be continuous and many also have a homogeneity of variance assumption that must be met. In general these tests should be chosen over parametric alternatives when sample sizes are small (less than 10-20 replicates).

CHOICE OF ANALYSIS (pdf version of flow chart)

Your choice of statistical analysis should be made prior to conducting an experiment. There is little sense in collecting data that you can’t analyze properly. Use the following flow chart to help you decide which statistic to use.

 

 

Top of page

REFERENCES

Brown, L. and J.F. Downhower. 1988. Analyses in Behavioral Ecology. Sinauer Press. Sunderland MA.

Lehner, P. 1979. Handbook of Ethological Methods. Garland Press, NY.



Modified 1-24-14 gja

Department of Biology, Bates College, Lewiston, ME 04240