Module 1 Modules

1.1 Temperature and Cricket Chirps

This module examines the relationship of ambient temperature and the rate of cricket ‘chirps’.

1.1.1 Topics

  • Simple data input
  • Scatter plot
  • Linear regression modeling

1.1.3 Data Entry

1.1.5 Linear Regression

## 
## Call:
## lm(formula = chirps ~ temp)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6181 -0.6154  0.0916  0.7669  1.5549 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.45931    2.98920   0.154 0.880239    
## temp         0.20300    0.03754   5.408 0.000119 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.986 on 13 degrees of freedom
## Multiple R-squared:  0.6923, Adjusted R-squared:  0.6686 
## F-statistic: 29.25 on 1 and 13 DF,  p-value: 0.0001195

1.1.6 Conclusions

  • The p-value for temperature is less than the 95% threshold (0.05) which indicates the relationship is statistically significant.
  • The adjusted R-squared is 0.67. About 70% of the variation in chirps can be explained by temperature.
  • The equation for the line of best fit is \(Chirps = 0.2(Temperature) + 0.46\)

1.2 Hardy-Weinberg Genotype and Phenotype Solver

1.2.1 Topics

  • Concatenated print statements
  • Rounding

1.2.2 Calculations and Results

## Population Size =  30
## Frequency for genotype RR =  0.5
## Frequency for genotype Rr =  0.17
## Frequency for genotype rr =  0.33
## Count for allele R =  35
## Count for allele r =  25
## Total Allele Count =  60
## Frequency for allele R =  0.58
## Frequency for allele r =  0.42
## Frequency for Phenotype Red =  0.67
## Frequency for Phenotype Yellow =  0.33

1.3 Sea Anemone Body Mass vs Egg Count

1.3.1 Topics

  • Simple data entry
  • Correlation
  • Linear Regression

1.3.2 Source

Data provided courtesy of Will Ryan, PhD, FSU Biology. Thesis: “The role of seasonal and geographic temperature variation on the life cycle of the clonal sea anemone Diadumene lineata (Verrill 1869).”

1.3.3 Data Entry

1.3.4 Correlation Analysis

## [1] 0.9775228

1.3.5 Linear Regression

## 
## Call:
## lm(formula = eggs ~ mass)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -336.37 -215.27   44.44  129.45  442.68 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -683.1      194.8  -3.508  0.00989 ** 
## mass           206.0       16.8  12.267 5.48e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 263.8 on 7 degrees of freedom
## Multiple R-squared:  0.9556, Adjusted R-squared:  0.9492 
## F-statistic: 150.5 on 1 and 7 DF,  p-value: 5.484e-06
## y ~ -683.14 + 206.04 * mass

1.4 Simple Clustering by Characteristics

Cluster analysis works by grouping organisms that are closest to each other by using the distance formula. This can work for 2D (x,y) coordinates, 3D (x,y,z) coordinates, or coordinates in even more dimensions. Here, we can examine the closeness of several species using measurements. Our coordinates will have multiple dimensions, but this doesn’t matter. NOTE: Cluster analysis is somewhat subjective and the number of clusters is ultimately at the discretion of the researcher! Results are therefore not absolutely conclusive and should be considered along with other data.

1.4.1 Data Entry

1.5 Simple T-test and Descriptive Statistics

1.5.1 Data Entry

1.5.2 T-test

## 
##  Welch Two Sample t-test
## 
## data:  dugout and ormond
## t = -2.8129, df = 54.408, p-value = 0.006817
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2159643 -0.2040357
## sample estimates:
## mean of x mean of y 
##  4.476667  5.186667

1.5.3 Simple Descriptive Statistics

## Dugout Lake Sample Size =  30
## Dugout Lake Sample Mean =  4.476667
## Dugout Lake Sample Standard Deviation =  0.84269
## Dugout Lake Sample Standard Error =  0.1538534
## Ormond Lake Sample Size =  30
## Ormond Lake Sample Mean =  5.186667
## Ormond Lake Sample Standard Deviation =  1.095991
## Ormond Lake Sample Standard Error =  0.2000996

1.6 Genetic Drift

1.6.1 Data Entry

library(stringr)

generation0<-c("Aa","Aa","aa","Aa","AA","Aa","aa","Aa","Aa","aa","AA","aa","Aa","aa","AA","Aa","aa","AA","Aa","aa")
generation1<-c("AA","aa","Aa","aa","AA","Aa","aa","AA","Aa","aa","Aa","Aa","aa","Aa","AA","Aa","Aa","Aa","Aa","aa")
generation2<-c("Aa","Aa","aa","Aa","AA","Aa","Aa","Aa","Aa","aa","aa","AA","aa","AA","AA","Aa","aa","Aa","Aa","AA")
generation3<-c("aa","AA","aa","AA","AA","Aa","aa","Aa","Aa","AA","aa","AA","aa","AA","AA","Aa","aa","Aa","Aa","AA")
generation4<-c("Aa","Aa","Aa","Aa","Aa","Aa","Aa","Aa","Aa","aa","Aa","Aa","Aa","Aa","aa","Aa","Aa","Aa","aa","Aa")
generation5<-c("aa","AA","aa","Aa","Aa","AA","AA","Aa","AA","Aa","aa","aa","AA","Aa","Aa","aa","aa","Aa","Aa","aa")
generation6<-c("Aa","Aa","aa","Aa","Aa","Aa","AA","aa","Aa","aa","AA","Aa","Aa","AA","AA","Aa","aa","AA","Aa","aa")
generation7<-c("Aa","Aa","Aa","Aa","aa","AA","Aa","Aa","aa","AA","AA","aa","Aa","aa","aa","AA","aa","AA","Aa","aa")
generation8<-c("aa","aa","aa","Aa","Aa","Aa","aa","AA","Aa","aa","Aa","AA","aa","Aa","Aa","Aa","Aa","aa","Aa","aa")
generation9<-c("Aa","aa","Aa","Aa","aa","Aa","AA","Aa","Aa","AA","Aa","Aa","AA","AA","aa","Aa","Aa","Aa","AA","Aa")

#Gather the data into a data frame
data0<-data.frame(string=c(generation0), stringsAsFactors=F)
data1<-data.frame(string=c(generation1), stringsAsFactors=F)
data2<-data.frame(string=c(generation2), stringsAsFactors=F)
data3<-data.frame(string=c(generation3), stringsAsFactors=F)
data4<-data.frame(string=c(generation4), stringsAsFactors=F)
data5<-data.frame(string=c(generation5), stringsAsFactors=F)
data6<-data.frame(string=c(generation6), stringsAsFactors=F)
data7<-data.frame(string=c(generation7), stringsAsFactors=F)
data8<-data.frame(string=c(generation8), stringsAsFactors=F)
data9<-data.frame(string=c(generation9), stringsAsFactors=F)

1.6.2 Calculations

#Count the A alleles in each data set
data0$countA <- str_count(data0$string, "A")
data1$countA <- str_count(data1$string, "A")
data2$countA <- str_count(data2$string, "A")
data3$countA <- str_count(data3$string, "A")
data4$countA <- str_count(data4$string, "A")
data5$countA <- str_count(data5$string, "A")
data6$countA <- str_count(data6$string, "A")
data7$countA <- str_count(data7$string, "A")
data8$countA <- str_count(data8$string, "A")
data9$countA <- str_count(data9$string, "A")

#Count the a alleles in each data set
data0$counta <- str_count(data0$string, "a")
data1$counta <- str_count(data1$string, "a")
data2$counta <- str_count(data2$string, "a")
data3$counta <- str_count(data3$string, "a")
data4$counta <- str_count(data4$string, "a")
data5$counta <- str_count(data5$string, "a")
data6$counta <- str_count(data6$string, "a")
data7$counta <- str_count(data7$string, "a")
data8$counta <- str_count(data8$string, "a")
data9$counta <- str_count(data9$string, "a")

#Sum up the A alleles for each generation
data0A<-sum(data0[2])
data1A<-sum(data1[2])
data2A<-sum(data2[2])
data3A<-sum(data3[2])
data4A<-sum(data4[2])
data5A<-sum(data5[2])
data6A<-sum(data6[2])
data7A<-sum(data7[2])
data8A<-sum(data8[2])
data9A<-sum(data9[2])

#Sum up the a alleles for each generation
data0a<-sum(data0[3])
data1a<-sum(data1[3])
data2a<-sum(data2[3])
data3a<-sum(data3[3])
data4a<-sum(data4[3])
data5a<-sum(data5[3])
data6a<-sum(data6[3])
data7a<-sum(data7[3])
data8a<-sum(data8[3])
data9a<-sum(data9[3])

#Gather the sums for each allele type for plotting a line
countTrendA<-c(data0A,data1A,data2A,data3A,data4A,data5A,data6A,data7A,data8A,data9A)
countTrenda<-c(data0a,data1a,data2a,data3a,data4a,data5a,data6a,data7a,data8a,data9a)

#Calculate the allele frequencies for each generation
freqTrendA<-countTrendA/(length(generation0)*2)
freqTrenda<-countTrenda/(length(generation0)*2)

1.7 Hominin Skull Analysis

1.7.1 Data Entry

1.11 Temperature Measurement Error

##      hands          sticker       thermometer   
##  Min.   :55.00   Min.   :55.00   Min.   :92.00  
##  1st Qu.:64.00   1st Qu.:62.00   1st Qu.:94.00  
##  Median :65.00   Median :64.00   Median :97.00  
##  Mean   :66.85   Mean   :63.85   Mean   :96.23  
##  3rd Qu.:67.00   3rd Qu.:65.00   3rd Qu.:98.00  
##  Max.   :85.00   Max.   :71.00   Max.   :99.00

1.14 Population Growth Curve

##     t        N
## 1   0 100.0000
## 2   1 124.0000
## 3   2 151.9744
## 4   3 183.7090
## 5   4 218.5723
## 6   5 255.4797
## 7   6 292.9617
## 8   7 329.3542
## 9   8 363.0760
## 10  9 392.9043
## 11 10 418.1513
## 12 11 438.6864
## 13 12 454.8248
## 14 13 467.1529
## 15 14 476.3597
## 16 15 483.1165
## 17 16 488.0105
## 18 17 491.5211
## 19 18 494.0216
## 20 19 495.7937
## 21 20 497.0450
## 22 21 497.9262
## 23 22 498.5458
## 24 23 498.9808
## 25 24 499.2859
## 26 25 499.4998
## 27 26 499.6497
## 28 27 499.7547
## 29 28 499.8283
## 30 29 499.8798
## 31 30 499.9158
## 32 31 499.9411
## 33 32 499.9588
## 34 33 499.9711
## 35 34 499.9798
## 36 35 499.9859
## 37 36 499.9901
## 38 37 499.9931
## 39 38 499.9951
## 40 39 499.9966
## 41 40 499.9976
## 42 41 499.9983
## 43 42 499.9988
## 44 43 499.9992
## 45 44 499.9994
## 46 45 499.9996
## 47 46 499.9997
## 48 47 499.9998
## 49 48 499.9999
## 50 49 499.9999
## 51 50 499.9999

1.16 Predator-Prey Relationships

##    Deer Resources Cougars
## 1     1         9       1
## 2     2         8       2
## 3     3         7       3
## 4     4         6       2
## 5     5         5       3
## 6     6         6       2
## 7     7         7       3
## 8     6         8       2
## 9     5         4       4
## 10    4         5       5
## 11    3         6       3
## 12    2         7       4
## 13    1         6       2
## 14    2         5       3
## 15    3         4       4
## 16    4         3       5
## 17    5         4       6
## 18    6         5       5
## 19    7         6       4
## 20    8         7       5
## 21    7         8       6
## 22    6         6       7
## 23    5         6       6
## 24    4         5       7
## 25    3         4       8
## 26    2         3       9
## 27    1         2       8
## 28    2         3       7
## 29    3         4       6
## 30    4         5       5
## 31    5         6       6
## 32    6         7       5
## 33    7         8       4
## 34    8         9       3
## 35    7         8       4
## 36    6         6       5

1.17 Everglades Python Impact

#Source: Dorcas, M. E., Willson, J. D., Reed, R. N., Snow, R. W., Rochford, M. R., Miller, M. A., … Hart, K. M. (2012). 
#Severe mammal declines coincide with proliferation of invasive Burmese pythons in Everglades National Park. 
#Proceedings of the National Academy of Sciences of the United States of America, 109(7), 2418–2422. http://doi.org/10.1073/pnas.1115226109

#Source: http://www.nps.gov/ever/learn/nature/burmesepythonremoval.htm
#Annual Tally of Burmese Pythons Removed In and Around Everglades National Park (Including Big Cypress, Everglades City, Marco Island, Key Largo, etc) by
#Authorized Agents*, Park Staff, and Park Partners
#Note: Compilation statistics were taken over by the U.S. Geological Survey in 2013

#Create data for the years on the x axis
year  <-c("1993","1994","1995","1996","1997","1998","1999","2000", "2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012")

#Create data vectors for each animal, filling in for missing data with NA
python <-c(NA,NA,NA,NA,NA,NA,NA,2, 3, 14,23,70,94,170,248,343,367,322,169,152)
deer   <-c(3, 4, 19,14,11,4, 9, NA,NA,NA,0,0,0,3,0,0,11,10,0,NA)
raccoon<-c(22,48,60,30,44,23,39,NA,NA,NA,0,3,2,1,0,0,0,0,3,NA)
opossum<-c(6, 21,23,12,18,14,9, NA,NA,NA,0,0,0,0,0,0,2,3,0,NA)
rabbit <-c(1, 1, 13,2, 13,3, 5, NA,NA,NA,0,0,0,0,0,0,0,0,0,NA)
bobcat <-c(3, 2, 1, 1, 0, 1, 1, NA,NA,NA,0,0,0,0,0,0,2,2,0,NA)

#Create multiple bar plots, one per organism with custom y axis limits
barplot(python, main="Python",xlab="Year", col=c("red"), names.arg=year, ylim=c(0,400))

1.18 Punnett Square Dice Simulation

## Expected AA genotype frequency =  0.25
## Expected Aa genotype frequency =  0.5
## Expected aa genotype frequency =  0.25
## Expected Dominant phenotype frequency =  0.75
## Expected Recessive phenotype frequency =  0.25
## Expected Dominant:Recessive phenotype Ratio =  3 : 1
## Group AA genotype frequency =  0.32
## Group Aa genotype frequency =  0.5
## Group aa genotype frequency =  0.18
## Group Dominant phenotype frequency =  0.82
## Group Recessive phenotype frequency =  0.18
## Group Dominant:Recessive phenotype Ratio =  4.555556 : 1
## Class AA genotype frequency =  0.224
## Class Aa genotype frequency =  0.514
## Class aa genotype frequency =  0.262
## Class Dominant phenotype frequency =  0.738
## Class Recessive phenotype frequency =  0.262
## Class Dominant:Recessive phenotype Ratio =  2.816794 : 1

1.19 Sample Size Effects

## Mean 20 =  7.142857
## Standard Error 20 =  0.5660461
## 95% Confidence Interval 20 =  6.010765 8.274949
## Mean 40 =  6.926829
## Standard Error 40 =  0.41817
## 95% Confidence Interval 40 =  6.090489 7.763169
## Mean 60 =  6.966667
## Standard Error 60 =  0.3460566
## 95% Confidence Interval 60 =  6.274553 7.65878
## Mean 80 =  6.425
## Standard Error 80 =  0.3254865
## 95% Confidence Interval 80 =  5.774027 7.075973
## Mean 100 =  6.24
## Standard Error 100 =  0.3022207
## 95% Confidence Interval 100 =  5.635559 6.844441

## CI Range 20 =  2.264184
## CI Range 40 =  1.67268
## CI Range 60 =  1.384226
## CI Range 80 =  1.301946
## CI Range 100 =  1.208883

1.20 Phytogenetic Analysis of Gamma Fibrinogen

##       [,1]                       [,2]    
##  [1,] "Physeter_catodon"         "U86643"
##  [2,] "Physeter_catodon"         "U86644"
##  [3,] "Ailurus_fulgens"          "U86645"
##  [4,] "Ailurus_fulgens"          "U86646"
##  [5,] "Ovis_dalli"               "U86647"
##  [6,] "Alces_alces"              "U86648"
##  [7,] "Giraffa_camelopardalis"   "U86649"
##  [8,] "Tragulus_napu"            "U86650"
##  [9,] "Delphinapterus_leucas"    "U86651"
## [10,] "Physeter_catodon"         "U86652"
## [11,] "Balaenoptera_physalus"    "U86653"
## [12,] "Hexaprotodon_liberiensis" "U86654"
## [13,] "Sus_scrofa"               "U86655"
## [14,] "Pecari_tajacu"            "U86656"
## [15,] "Camelus_dromedarius"      "U86657"
## [16,] "Tapirus_indicus"          "U86658"
## [17,] "Equus_przewalskii"        "U86659"
## [18,] "Crocuta_crocuta"          "U86660"
## [19,] "Canis_latrans"            "U86661"
#To standardize sequence lengths, choose the first and last nucleotide number
minLength<-1
maxLength<-424

#truncate all sequences so they are the same length
all$U86643 <- all$U86643[c(minLength:maxLength)]
all$U86644 <- all$U86644[c(minLength:maxLength)]
all$U86645 <- all$U86645[c(minLength:maxLength)]
all$U86646 <- all$U86646[c(minLength:maxLength)]
all$U86647 <- all$U86647[c(minLength:maxLength)]
all$U86648 <- all$U86648[c(minLength:maxLength)]
all$U86649 <- all$U86649[c(minLength:maxLength)]
all$U86650 <- all$U86650[c(minLength:maxLength)]
all$U86651 <- all$U86651[c(minLength:maxLength)]
all$U86652 <- all$U86652[c(minLength:maxLength)]
all$U86653 <- all$U86653[c(minLength:maxLength)]
all$U86654 <- all$U86654[c(minLength:maxLength)]
all$U86655 <- all$U86655[c(minLength:maxLength)]
all$U86656 <- all$U86656[c(minLength:maxLength)]
all$U86657 <- all$U86657[c(minLength:maxLength)]
all$U86658 <- all$U86658[c(minLength:maxLength)]
all$U86659 <- all$U86659[c(minLength:maxLength)]
all$U86660 <- all$U86660[c(minLength:maxLength)]
all$U86661 <- all$U86661[c(minLength:maxLength)]

#change the names in the list
names(all)<-names

#Compute a matrix of pairwise distances from DNA sequences
#raw returns proportion of sites that differ between sequences
#N returns number of sites that differ between sequences
matrix<-dist.dna(all, model="N")

#Run a clustering method
cluster <- hclust(matrix, method="ward.D")

par(cex=0.9, mar=c(13, 3, 1, 1))

#Plot the dendrogram
tree<-plot(as.dendrogram(cluster),horiz=FALSE)

1.21 Genbank Access

##      [,1]                      [,2]      
## [1,] "Echinococcus_granulosus" "AB921054"
## [2,] "Echinococcus_granulosus" "AB921090"
## [3,] "Echinococcus_ortleppi"   "AB921055"
## [4,] "Echinococcus_canadensis" "AB921058"
## [5,] "Echinococcus_canadensis" "AB921068"
## [6,] "Echinococcus_canadensis" "AB921075"
## [7,] "Echinococcus_canadensis" "AB921083"

1.22 Gentry Survey Data

## 
## Call:
## lm(formula = texas ~ area)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.74711 -0.86012  0.05576  0.63086  2.06463 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -8.4880     3.5217  -2.410  0.04249 *  
## area          3.7425     0.5722   6.541  0.00018 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.258 on 8 degrees of freedom
## Multiple R-squared:  0.8425, Adjusted R-squared:  0.8228 
## F-statistic: 42.79 on 1 and 8 DF,  p-value: 0.0001802
## 
## Call:
## lm(formula = penn ~ area)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4146 -0.4031  0.3181  0.9616  2.4717 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -31.3134     5.2508  -5.964 0.000337 ***
## area          7.1315     0.8531   8.360 3.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.876 on 8 degrees of freedom
## Multiple R-squared:  0.8973, Adjusted R-squared:  0.8844 
## F-statistic: 69.88 on 1 and 8 DF,  p-value: 3.178e-05
## 
## Call:
## lm(formula = finland ~ area)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46670 -0.13383  0.01548  0.15191  0.37033 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   1.2355     0.7326   1.687  0.13018   
## area          0.5665     0.1190   4.760  0.00143 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2617 on 8 degrees of freedom
## Multiple R-squared:  0.739,  Adjusted R-squared:  0.7064 
## F-statistic: 22.66 on 1 and 8 DF,  p-value: 0.001427
## 
## Call:
## lm(formula = costaRica ~ area)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.651 -3.303 -1.064  4.079  6.732 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -160.630     13.584  -11.82 2.40e-06 ***
## area          39.281      2.207   17.80 1.02e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.853 on 8 degrees of freedom
## Multiple R-squared:  0.9754, Adjusted R-squared:  0.9723 
## F-statistic: 316.8 on 1 and 8 DF,  p-value: 1.017e-07