Problem 1
In 25 words or less, what does a p-value represent? Be careful!!!!
Problem 2
Write your own Monte Carlo statistical analysis to test the hypothesis that a person’s foot size is correlated with a person’s stature (height). Use this fake data set on foot size for people living in the USA.
A. Use ggplot to create a scatter plot with height on the x-axis and foot length on the y-axis, and add a line showing the relationship between the variables. Make sure your plot has an informative title and make sure it looks good!
B. Do the analysis:
- Your test statistic is the absolute value of the correlation coefficient (we will discuss correlation coeffieicnets in more detail next week). To calculate the correlation coefficient of two vectors
xandy, you would use thecor()function like thiscor(x,y). You can calculate the absolute value of this number withabs(). - For each iteration, shuffle the foot length values randomly, and calculate the test statistic for the shuffled foot lengths and the original height variable.
- Repeat this 10,000 times, storing the test statistic each time in a vector called
cors.
C. Make a histogram of cors to get a visual sense of the values. Make sure you have meaningful axis labels, a title, and that the plot looks good!
D. Calculate the value of the test statistic for the original (unshuffled) data, and save to a variable called original.cor
E. Calculate the proportion of the elements in cors that are greater than or equal to original.cor
F. Write a couple of sentences interpreting these results in terms of the relationship between the two variables and in terms of the null hypothesis. Be specific!
Problem 3
Is it possible to get a p-value of zero when using parametric (i.e. frequentist) statistical methods? What about when using a Monte Carlo framework? Justify your answers.