Homework #7 Solution

Problem 1 - 5 pts

bovids <- read.table("../../static/datasets/bovid_occurrences_table.txt", 
                     header=T, sep="\t")
#add rownames
rownames(bovids) <- bovids$taxon
#drop the last column
bovids <- bovids[,-9]

Chi square is a good choice here as all the expected cell frequencies are well above 5. Thus, it can be used to test the hypothesis that row and column variables are associated.

myTest <- chisq.test(bovids)

Just to make sure, lets check that our expected values are high enough.

Remember, any expected values less than 5 would call for the use of Fisher’s exact test. All of our values are well above 5 so we are good with chi-square.

myTest$expected
##                 site1    site2    site3    site4     site5    site6    site7
## Gazella      168.3372 138.7644 141.4942 221.5681  88.71824 76.88915 161.9677
## Connochaetes 188.1332 155.0828 158.1336 247.6239  99.15127 85.93110 181.0146
## Tragelaphus  167.9099 138.4122 141.1351 221.0058  88.49307 76.69400 161.5566
## Aepyceros    215.6197 177.7406 181.2371 283.8022 113.63741 98.48576 207.4611
##                 site8
## Gazella      184.2610
## Connochaetes 205.9296
## Tragelaphus  183.7933
## Aepyceros    236.0162

Now, we look at the results of the test and interpret

myTest
## 
## 	Pearson's Chi-squared test
## 
## data:  bovids
## X-squared = 508.66, df = 21, p-value < 2.2e-16

There is strong evidence that site and taxon are associated. The null hypothesis is that there is no association between site and taxon. Our p value from the chi square analysis is vanishingly small, which means that data as extreme as ours are extremely unlikely assuming the null hypothesis is true.