formerly: Department of Biology, Bates College |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||
| Treatment |
|
|
|
|
|
|
|
|
|
|
| Drug |
|
|
|
|
|
|
|
|
|
|
| Placebo (Control) |
|
|
|
|
|
|
|
|
|
|
What can we conclude from this experiment? Does the drug cause the heart to beat faster? It looks like it might, but we should be hesitant to conclude anything yet. The difficulty arises because our data are variable. In order to understand continuous data, you should approach their interpretation in three steps: 1) plot the data, 2) summarize the data, and 3) analyze the data.
Pictorial tools offer a convenient way to visualize complex numerical systems. Thus, our first step in interpreting this data will be a graph called a histogram or frequency distribution. Lets consider the control group. Start by ordering the data and then grouping them into convenient size classes:
| Control: |
|
|
|
|
|
|
|
|
|
|
The range goes from 68-88 and we can group these data into intervals of five beats per minute, and then graph the results as a bar graph:
|
![]() |
This graph has a nice symmetrical shape. If we were to take an infinite number of samples and the graph still exhibited this symmetry, the data are said to be "normally distributed." Normal, in this case, does not mean natural or expected. It is simply a name given to these types of distributions. You should examine a picture of a normal distribution in any statistical textbook to note its structure. This distribution will become very important later on.
The next step in interpreting your data is to summarize it. There are two approaches to numerically summarizing your data. First, we need to find out what single number best represents this data. This will be our estimate of central tendency. Then we need a way to estimate the spread around our central tendency.
There are three common measures of central tendency. The mean, median, and the mode. The mode is the data value that occurs most frequently and the median is the data value that occurs at the precise middle of all data points. While these two metrics are very important and can be more appropriate than the mean in many cases, the mean is the most commonly used measure in biology. As most of you are aware the mean is the numerical average of all the data points. To calculate it, you add up all the values and devide by the total number of values you added. In our experiment, the mean of the control group is 77.4 while the mean of individuals given the experimental drug is 83.2.
Top of page
The mean is perhaps the most important single measure you can use to represent variable data. However, using only the mean you have no idea of how much variation there is in the data. Therefore, the next step in summarizing the data is to develop a measure of how much spread there is around the mean. Lets start by simplifying our data set by taking only the first 5 observations of the control:
Heart Rate Treatment 1 2 3 4 5 Mean Control 76 78 76 74 83 77.4 Since we are interested in the spread of the data around the mean, the most intuitive thing to do is to take the difference between the mean and each observation:
76-77.4 = -1.4
78-77.4 = 0.6
76-77.4 = -1.4
74-77.4 = -3.4
83-77.4 = 5.6What we want to do is estimate the mean deviations from the mean (whew!). However, you will note that the sum of the above differences is equal to zero (as is always the case). A simple way around this is to take the average of the squared deviations like so:
This value is called the variance (s2) and is a perfectly acceptable measure of variation. However, our original data was in beats per minute while this is in (beats per minute)2. To get something with the same units as our original data, take the square root:

This last value is called the standard deviation and its formula is given as:

The standard deviation and the variance are very useful estimates of variation. However, they are both very sensitive to sample size. Extremely small samples inflate these measures of variation and large samples truncates them. In order to alleviate this problem, you should always use the unbiased standard deviation and variance. To unbias these equations change the denominator from n to n-1 as follows:

Having collected your data, and if necessary, graphed and summarized your data, you are now ready to begin analyzing your data. Remember, all these tools do is provide you with a way to assign probability values to your null hypothesis (if p<0.05: Reject null; if p>0.05: failed to reject null). For the majority of you, the bulk of your statistical analyses will be done on computer and you will be tempted to ignore the following equations. I want to caution you that it is very dangerous to use an equation you do not understand.
I. Parametric Statistics
Parametric analyses are the oldest and most commonly used type of analysis. They are capable of handling large data sets and very complex experimental designs quite easily. Furthermore, the most common ones (correlation, t-test, analysis of variance) are available on every statistical package for the computer as well as many scientific calculators. All parametric statistics have three common assumptions that must be met before proceeding.
- First, all observations are independent of other observations. This assumption is the product of a carefully designed experiment and needs no formal testing.
- Second, data are normally distributed which can be easily tested by examining your frequency distribution.
- The final assumption is that the variances in the different treatment groups are the same. There are several statistical tests available to test this assumption (e.g. the F-Max Test, Bartlett's Test) and they are often done with parametric analyses on many statistical programs. However, if you do not have access to a computer, it is perfectly acceptable to examine your standard deviations and look for treatment groups with standard deviations that are much larger (e.g. an order of magnitude larger) than the others.
This analysis is used when you are comparing two different samples, as is the case with the heart rate experiment outlined above. Recall that our null hypothesis is that there is no difference in heart rate between individuals receiving the experimental drug and those receiving a placebo. To determine the probability that this hypothesis is correct, we will use the following equation:

Where x1 and x2 are the means, s21 and s22 are the unbiased variances, and n1 and n2 are the sample sizes of treatment groups one and two respectively. If n1=n2 ( the same number of observations in each treatment group), this equation simplifies to:

The means and variances for our heart rate experiment are summarized in the following table:
Mean Variance (unbiased) n Drug 83.2 44.62 10 Control 77.4 33.87 10
Substituting these values into the above equation we find that t=2.07. Next we look up the critical value of t in the table at the end of this chapter. The critical value is determined by the level of significance (usually 0.05) and the degrees of freedom (df), which is calculated for this test as df=n1+n2-2. If the t-statistic that we calculated from the above equation is equal to or greater than the critical value, we reject our null hypothesis and say that "there are significant differences between these treatment groups." In our case Tcrit(a=0.05, df=18)=2.101. This number is larger than our calculated number therefore we have failed to reject our null hypothesis. We cannot safely conclude that the drug had an effect on heart rate.
B. Analysis of Variance (ANOVA)
Analysis of variance is used to determine if differences exist between more than two treatment groups. While the computations involved in ANOVA are not difficult, they are beyond the scope of this guide. Interested persons are advised to consult any introductory statistics text to see how it is done.
The assumptions of ANOVA are identical to the t-test and the calculated statistic is called an F-value which has a probability value associated with it. As with the t-test, if our probability value is less than 0.05 we reject our null hypothesis (in this case that there is no difference among the treatment groups). This p-value only tells if there are significant differences among our groups. It does not tell us where these differences are. In other words, in an experiment with five treatment groups and a significant p value, we know that there are some differences among these groups but we do not know specifically which groups are different. As a result, ANOVA is usually performed in conjunction with a post hoc multiple comparisons test (e.g. Bonferonis test or Tukeys test) that will tell you precisely where the differences lie.
In some cases we are not interested in whether or not there is a difference between two groups, instead we want to know whether two variables are related. For example, suppose you have the following data on sprint speed and muscle mass for a variety of lizards:
Lizard Muscle mass (g) Sprint speed (m/s) 1 5 12 2 4 10 3 6 14 4 7 15 5 3 7 A good experimental question for this type of data is: "does sprint speed increase with muscle mass?" In order to answer this question, the appropriate analysis is a correlation method called Pearsons r. The r statistic has a range of values from -1.00 (a perfect negative correlation) to 1.00 (a perfect positive correlation). A negative correlation means that as one variable increases in size, the other decreases. A positive correlation means that as one variable increases so does the other. When r=0.00 there is no relationship between the two variables. The null hypothesis concerned with these types of experiments is that there is no relationship between the two variables. In other words r=0.00. This test has the same three assumptions as other parametric analyses, but it also has the additional assumption that the relationship between the two variables is linear.
The calculations for Pearsons r are a rather laborious process and most will prefer to use a computer to carry out the computations. An important thing to remember is that correlation does not imply causation. In the above example, there is a strong correlation between muscle mass and sprint speed (convince yourself on a computer). However, it would be wrong to conclude that an increase in muscle mass causes an increase in sprint speed. Perhaps fast sprinters have bigger muscles because they sprint more (sprinting causes greater musculature) or perhaps a third, unmeasured, variable is causing the relationship between sprint speed and muscle mass.
In order to determine causation you must change the way you conduct your experiment. If we wish to examine whether an increase in muscle mass causes an increase in sprint speed we need to manipulate muscle mass and examine the effects on sprint speed. If we are able to design such an experiment, the appropriate analysis would then be a regression which is computationally similar to Pearsons r (and equally laborious). In a regression analysis, the test statistic is called the coefficient of variation ( R2). The coefficient of variation has a range of values from 0%-100%. An R2 of 75% means that "75% of the variation in the dependent variable (the variable you measure) is due to variation in the independent variable (the variable you manipulate)." It is important to realize that just because you run a regression does not necessarily make it a causative relationship. In order to demonstrate causation (and use a regression) you must perform a controlled experiment or have a very good a priori reason for assuming causation.
Most nonparametric statistics have only recently been developed. They are very simple to use, do not require large data sets, and have few underlying assumptions. Although they are not as powerful as parametric statistics (i.e. they are not very good at detecting small differences between groups), in most cases they are perfectly acceptable. Non-parametric tests all assume independence of observations. In other words your previous observation does not influence subsequent observations. If you counted the number of pine trees in a forest in 1996 and again in 1997, we do not have two independent measures of number of pines because the number observed in 1996 will strongly influence the number observed in 1997. However, counting the number of pines from two different locations will be independent of each other.
A. Chi-Squared One Sample Test
The Chisquare one sample test is used with discrete data to determine if observed frequency counts differ from expected frequency counts. To determine c2 values use the following formula:
Where O is the observed frequency, E is the expected frequency, and K is the number of categories. For example, suppose you perform the following monohybrid cross:
vg+vg X vg+vg where vg+ is the wild type allele and vg is the allele for vestigial wings. Our null hypothesis is that there is no difference between the observed ratio and the expected ratio of a monohybrid cross (i.e. 3:1). The progeny of this cross are scored as follows:
Wild Type= 750 flies
Vestigial= 125 fliesThese are our observed frequencies. The expected ratio from a monohybrid cross is 3:1. Therefor, our expected number of wild type flies is (3/4)(875)=656.25. Our expected number of vestigial flies is (1/4)(875)=218.75. Substituting these values into the above equation, c2=53.6. The degrees of freedom (df) for this test is the number of categories minus one (k-1). Turning to the statistics table, the critical value of c2 for 1 df and a significance level of 0.05 is 3.84. Since our calculated value is so much larger than the critical value, we reject our null hypothesis.
B. Chi-Squared Two Sample Test
Suppose we have a set of observations that can be classified according to two different types of attributes. For example, suppose we have captured all of the salamanders in a forest and classify them according to species and what type of substrate they were found on. We may be interested in knowing if there is an association between species and substrate. Of course, our null hypothesis is that there will be no difference between substrate choices among the different species. You collect the following data:
Salamander Species Substrate S1 S2 Total Log 50 5 55 Leaf 20 100 120 Total 70 105 175 We can use the same formula to calculate c2, the only difference between this analysis and the previous one is the way we calculate our expected frequencies (E). To calculate E for each cell, multiply the cells row total by its column total and divide by the grand total (the probability of two independent events occurring simultaneously is equal to the product of the individual probabilities). For example, the expected number of S1 salamanders associated with logs is (70)(55)/175=22. The remaining expected values are shown in the table below.
Salamander Species Substrate S1 S2 Total Log 22 33 55 Leaf 48 72 120 Total 70 105 175 Our degrees of freedom are df=(number of rows-1)(number of columns-1)=1. If you plug these values into the chi-squared equation you find that the calculated value (86.6) is much larger than the critical value from the table (3.84) so we can reject the null hypothesis and conclude that different species prefer different types of substrate.
Other non-parametric test that are available on most computer programs also deserve some mention. Wilcoxins Rank Sums Test and the Mann-Whitney U Test are analyses that test for differences between two treatment groups (i.e. a non parametric t-test). Kruskall-Wallis tests simultaneously for differences between more than two different treatment groups (a nonparametric ANOVA). Spearmans correlation is a non-parametric correlation analysis . All of these analyses can be used instead of a parametric analysis when the data are not normally distributed. However, these analyses do require that the original data be continuous and many also have a homogeneity of variance assumption that must be met. In general these tests should be chosen over parametric alternatives when sample sizes are small (less than 10-20 replicates).
CHOICE OF ANALYSIS
Your choice of statistical analysis should be made prior to conducting an experiment. There is little sense in collecting data that you cant analyze properly. Use the following flow chart to help you decide which statistic to use.
REFERENCES
Brown, L. and J.F. Downhower. 1988. Analyses in Behavioral Ecology. Sinauer Press. Sunderland MA.
Lehner, P. 1979. Handbook of Ethological Methods. Garland Press, NY.
Modified 9-30-02 gja
Department of Biology, Bates College, Lewiston, ME 04240