Advanced regression modeling
Modeling of bear weight based upon bodily measurements using mixed effects and generalized least squares modeling accounting for covariance-variance structures and within-group correlation.
Based upon predicted vs. observed, residuals, residual sum of squares, and AIC, this script determines that bear weight is modeled best by a cubic-splines transformed mixed-effect model in the form of:
lme(weight ~ chestGSc + lengthSc + headWSc + ageNA +
splinepoints_chest + splinepoints_length,
random = ~ 1 | name/age,
weights=varExp(form=~as.vector(chestGSc + splinepoints_chest)),
correlation=corCAR1(form = ~ age | name))
where:
- chestGSc = scaled chest girth
- lengthSc = scaled length
- headWSc = scaled head width
- ageNA = dummy variable, indicating whether or not age was imputed
- splinepoints_chest = cubic splines transformation of chest girth, with knots at 7, 25, and 90 percent
- splinepoints_length = cubic splines transformation of length, with knots at 10, 35, and 75 percent
- autocorrelation is modeled with an order of 1, with a continuous time covarariate of age grouped by the name of each bear
- within-group variance is modeled using the exponential variance of scaled chest girth plus its spline transformation
1 Read in Data
require(pacman)
p_load(ggplot2, nlme, lme4, reshape2, dplyr, car)
bears <- read.fwf("Bears.full.dat", skip=20,
col.names=c("id", "age", "month", "sex", "headL", "headW",
"neckG", "length", "chestG", "weight", "obs", "name"),
widths=c(9, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 10),
strip.white=T, na.string = "")
Missing bear names are set to “Unnamed 1” and “Unnamed 2” and relevant columns are converted to factors
bears$name <- as.character(bears$name)
bears[87,12] <- "Unnamed1"
bears[88,12] <- "Unnamed2"
bears$name <- as.factor(bears$name)
bears$month <- as.factor(bears$month)
bears$sex <- as.factor(bears$sex)
attach(bears)
2 Correlations
Exploratory correlations show strong relationships between possible predictor variables and bear weight.
cor(weight, bears[,5:9])
headL headW neckG length chestG
[1,] 0.8333214 0.7556092 0.9432771 0.8746451 0.9659937
3 Initial plotting
Initial plots show nonlinear relationships for length, chest, and neck.
bearsplotdata <- bears[,0:12]
bearsplotdata <- bearsplotdata[,-c(1,4,11,12)]
bearsplotdata <- melt(bearsplotdata, id.vars=c("weight"))
bearsplot1 <- ggplot(data=bearsplotdata, aes(y=weight, x=value))+
theme_bw()+
geom_point()+
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank(),
strip.background = element_blank(),
panel.grid.minor = element_blank()) +
facet_wrap(~variable, ncol=3, scales="free_x")
print(bearsplot1)
4 Transformations
Transform length, chest, and neck based upon observed relationships in the initial plots.
bears$lengthtransf <- 1/bears$length
bears$chestGtransf <- bears$chestG^2
bears$neckGtransf <- bears$neckG^2
5 Scale response and predictor variables
The measurements and weights that we are using in the model vary by more than an order of magnitude. In order for the model to converge, we need to make sure that the predictors and response variables have similar ranges.
All predictor variables were scaled by subtracting the mean and dividing by the standard deviation.
bears$headLSc <- scale(bears$headL)
bears$neckGSctransf <- scale(bears$neckGtransf)
bears$neckGSc <- scale(bears$neckG)
bears$headWSc <- scale(bears$headW)
bears$lengthSc <- scale(bears$length)
bears$lengthSctransf <- scale(bears$lengthtransf)
bears$chestGSc <- scale(bears$chestG)
bears$chestGSctransf <- scale(bears$chestGtransf)
6 Deal with missing age data
I chose a model-based imputation as outlined in Andrew Gelman’s 2012 textbook “Data Analysis Using Regression and Multilevel/Hierarchical Models”. I create a dummy variable for future models indicating whether or not the age data is imputed, and then infer the value based upon the relationship between age and head width.
I build a linear model relating age to scaled head width, resulting in an estimate of 21.532. This means that, for every 1 unit change in the standard deviation of head width, age increases by 21.532 months. I then apply this relationship to the missing age data.
As a means to check whether the imputation was successful, I compare the mean of the imputed age to the mean of the known age. I then compare the correlation between age and a variable not used to impute age (chest girth) for imputed and known age.
Age is similar between the two (43.4 vs 46.8), as are the correlations (0.734 vs. 0.731). Thus, I continue to use the imputed age data in the modeling process.
bears$ageNA <- 0
bears$ageNA[is.na(bears$age)] <- 1
summary(lm(bears$age[bears$ageNA==0] ~ bears$headWSc[bears$ageNA==0]))
Call:
lm(formula = bears$age[bears$ageNA == 0] ~ bears$headWSc[bears$ageNA ==
0])
Residuals:
Min 1Q Median 3Q Max
-63.441 -14.084 -6.891 9.156 79.171
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.893 2.808 16.341 < 2e-16 ***
bears$headWSc[bears$ageNA == 0] 21.532 2.656 8.106 4.65e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 25.44 on 81 degrees of freedom
Multiple R-squared: 0.4479, Adjusted R-squared: 0.441
F-statistic: 65.7 on 1 and 81 DF, p-value: 4.653e-12
meanage <- mean(bears$age[bears$ageNA==0])
bearschange <- bears$headWSc[bears$ageNA==1]*21.532
bears$age[bears$ageNA==1] <- meanage + bearschange
bears$age <- jitter(bears$age, amount=0.0001)
mean(bears$age[bears$ageNA==1])
[1] 46.83522
mean(bears$age[bears$ageNA==0])
[1] 43.43373
cor(bears$chestG[bears$ageNA==0], bears$age[bears$ageNA==0])
[1] 0.7342366
cor(bears$chestG[bears$ageNA==1], bears$age[bears$ageNA==1])
[1] 0.7314328
bears$agescale <- scale(bears$age)
7 Linear model
I start by building a simple linear model to get a base AIC score and model structure moving forward.
lm1 <- lm(weight ~ chestGSctransf + sex + neckGSctransf + lengthSctransf + headLSc
+ headWSc, data=bears)
summary(lm1)
Call:
lm(formula = weight ~ chestGSctransf + sex + neckGSctransf +
lengthSctransf + headLSc + headWSc, data = bears)
Residuals:
Min 1Q Median 3Q Max
-58.473 -11.112 -1.404 11.803 74.150
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 192.8548 2.1479 89.788 < 2e-16 ***
chestGSctransf 74.9734 5.4441 13.772 < 2e-16 ***
sex2 -2.2553 4.1094 -0.549 0.5840
neckGSctransf 31.8555 6.0351 5.278 5.03e-07 ***
lengthSctransf -6.6563 3.6866 -1.806 0.0732 .
headLSc -2.0311 4.2655 -0.476 0.6347
headWSc -0.9657 2.9858 -0.323 0.7469
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.76 on 136 degrees of freedom
Multiple R-squared: 0.9662, Adjusted R-squared: 0.9647
F-statistic: 648.2 on 6 and 136 DF, p-value: < 2.2e-16
AIC(lm1)
[1] 1282.124
7.1 Variable Inflation Factors
Variable inflation factors indicate that chestG and neckG are highly colinear.
vif(lm1)
chestGSctransf sex neckGSctransf lengthSctransf headLSc
9.762581 1.193243 11.997357 4.476711 5.993064
headWSc
2.936599
Indeed, chestG and neckG have a 0.94 correlation.
cor(bears$chestGSctransf, bears$neckGSctransf)
[,1]
[1,] 0.9406729
I keep chestG because it has a slightly higher correlation with weight. I remove neckG.
cor(bears$chestGSctransf, bears$weight)
[,1]
[1,] 0.9773564
cor(bears$neckGSctransf, bears$weight)
[,1]
[1,] 0.9532861
8 Mixed-effect model with power variance function
I first build a mixed-effects model with all of the potential predictor variables and no controls for heteroskedasticity or longitudinality.
I group the data by name and age, and allow the slope and intercept to vary by groups.
lme_init <- lme(weight ~ chestGSctransf + lengthSctransf + headWSc + headLSc,
random = ~ 1 | name/age, data=bears)
Remove insignificant variable (headLSc) and begin trying various weighting choices.
I choose between the varPower and varExp variance function classes based upon visual inspections of the variance in the initial facetted plot.
lme.power <- lme(weight ~ chestGSctransf + lengthSctransf + headWSc,
random = ~ 1 | name/age,
weights=varPower(form=~as.vector(lengthSctransf)), data=bears)
lme.exp <- lme(weight ~ chestGSctransf + lengthSctransf + headWSc,
random = ~ 1 | name/age,
weights=varExp(form=~as.vector(lengthSctransf)), data=bears)
Choose the varExp function to model variance-covariance structure, based upon its lower AIC score.
AIC(lme.power, lme.exp)
df AIC
lme.power 8 1296.343
lme.exp 8 1279.278
8.1 Residuals
Residuals for the linear mixed-effect model.
resid <- resid(lme.exp)
residplot <- ggplot(data=bears, aes(y=resid, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()
print(residplot)
8.2 Remove outliers
Based upon the residuals from the above linear model, I identified samples 37 and 132 as potential outliers. Sample 37 is the oldest bear by 25%, and sample 132 is the heaviest bear by nearly 10%.
There were no other reasons to remove sample 37 other than its abnormal residuals, and so it was kept in the data.
Sample 132 had been measured 10 months earlier, during which they gained 70 pounds while losing chest girth and length. Based on the unlikelihood of increasing weight by 20% while becoming smaller in size, I chose to remove sample 132.
bearsRM <- bears[-132,]
lme3 <- lme(weight ~ chestGSctransf + lengthSctransf + headWSc,
random = ~ 1 |name/age, weights=varExp(form=~as.vector(lengthSctransf)),
data=bearsRM)
8.3 Residuals
resid3 <- resid(lme3)
residplot3 <- ggplot(data=bearsRM, aes(y=resid3, x=weight))+
geom_point(size=1.7)+
geom_hline(yintercept=0, color="blue", size=1.25)+
theme_bw()
print(residplot3)
8.4 Chest vs. weight - observed vs. predicted
plot1 <- ggplot(data=bearsRM, aes(y=weight, x= chestGSc))+
geom_point()+
theme_bw()+
geom_point(aes(y=lme3$fitted[,1], x=chestGSc), color="blue")
print(plot1)
8.5 Fitted vs. actual
fitted3 <- fitted(lme3)
fittedvsactual <- ggplot(data=bearsRM, aes(y=fitted3, x=weight))+
theme_bw()+
geom_point(size=2.3)+
geom_smooth(method="lm", se=FALSE)
print(fittedvsactual)
9 Generalized least squares
Next, bear weight is modeled by generalized least squares.
Model one has no correlation structure, while models two and three do. Model two includes a dummy variable for whether age was imputed.
gls1 <- gls(weight ~ chestGSctransf + lengthSctransf + headWSc,
weights=varExp(form=~as.vector(lengthSctransf)), data=bearsRM)
gls2 <- gls(weight ~ chestGSctransf + lengthSctransf + headWSc + ageNA,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls3 <- gls(weight ~ chestGSctransf + lengthSctransf + headWSc,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~age | name), data=bearsRM)
9.1 AIC
AIC suggests that model two (with correlation structure and the imputed dummy variable) is the best thus far.
AIC(gls1, gls2, gls3)
Warning in AIC.default(gls1, gls2, gls3): models are not all fitted to the
same number of observations
df AIC
gls1 6 1268.551
gls2 8 1267.498
gls3 7 1270.551
9.2 Residuals
resid.gls <- resid(gls2)
residplot.gls <- ggplot(data=bearsRM, aes(y=resid.gls, x=weight))+
geom_point(size=1.7)+
geom_hline(yintercept=0, color="blue", size=1.25)+
theme_bw()
print(residplot.gls)
10 Cubic Splines
Up until this point, chest girth and body length have been modeled using x^2 and 1/x transformation, respectively.
In this section, we replace these simple transformations with the cubic splines transformation, which allows for different transformations at different regions of the distribution of x.
10.1 Function
The following equation is Eq 2.25 in Harrell p. 20
natural.spline.comp<-function(X,t,j) {
k<-length(t)
pmax(0,X-t[j])^3-pmax(0,X-t[k-1])^3*(t[k]-t[j])/(t[k]-t[k-1])+
pmax(0,X-t[k])^3*(t[k-1]-t[j])/(t[k]-t[k-1])
}
10.2 Chest Girth
10.2.1 Knots
To select proper knot placement for chest girth, I begin with 0.10, 0.50, and 0.90, and then tweak the placements and choose the placement that results in the lowest AIC score.
attach(bearsRM)
knots <- quantile(chestGSc, probs=c(0.10,0.50,0.90), na.rm=TRUE)
knots_2 <- quantile(chestGSc, probs=c(0.05,0.30,0.90), na.rm=TRUE)
knots_3 <- quantile(chestGSc, probs=c(0.07,0.25,0.90), na.rm=TRUE)
knots_4 <- quantile(chestGSc, probs=c(0.10,0.35,0.75), na.rm=TRUE)
splinepoints <- natural.spline.comp(chestGSc, knots, 1)
splinepoints_2 <- natural.spline.comp(chestGSc, knots_2, 1)
splinepoints_3 <- natural.spline.comp(chestGSc, knots_3, 1)
splinepoints_4 <- natural.spline.comp(chestGSc, knots_4, 1)
10.2.2 Spline point selection
- Model 1 = 0.10, 0.50, 0.90
- Model 2 = 0.05, 0.30, 0.90
- Model 3 = 0.07, 0.25, 0.90 << Best
- Model 4 = 0.10, 0.35, 0.75
gls_spline1 <- gls(weight ~ chestGSc + lengthSctransf + headWSc + ageNA + splinepoints,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline2 <- gls(weight ~ chestGSc + lengthSctransf + headWSc + ageNA + splinepoints_2,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline3 <- gls(weight ~ chestGSc + lengthSctransf + headWSc + ageNA + splinepoints_3,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline4 <- gls(weight ~ chestGSc + lengthSctransf + headWSc + ageNA + splinepoints_4,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
10.2.3 Spline point AIC
AIC(gls_spline1 ,gls_spline2 ,gls_spline3 ,gls_spline4)
df AIC
gls_spline1 9 1233.156
gls_spline2 9 1228.796
gls_spline3 9 1228.490
gls_spline4 9 1232.279
10.3 Length - spline points
To select proper knot placement for chest girth, I begin with 0.10, 0.50, and 0.90, and then tweak the placements and choose the placement that results in the lowest AIC score.
knots_l_1 <- quantile(lengthSc, probs=c(0.10,0.50,0.90), na.rm=TRUE)
knots_l_2 <- quantile(lengthSc, probs=c(0.05,0.30,0.90), na.rm=TRUE)
knots_l_3 <- quantile(lengthSc, probs=c(0.07,0.25,0.90), na.rm=TRUE)
knots_l_4 <- quantile(lengthSc, probs=c(0.10,0.35,0.75), na.rm=TRUE)
splinepoints_l_1 <- natural.spline.comp(lengthSc, knots, 1)
splinepoints_l_2 <- natural.spline.comp(lengthSc, knots_2, 1)
splinepoints_l_3 <- natural.spline.comp(lengthSc, knots_3, 1)
splinepoints_l_4 <- natural.spline.comp(lengthSc, knots_4, 1)
- Model 1 = 0.10, 0.50, 0.90
- Model 2 = 0.05, 0.30, 0.90
- Model 3 = 0.07, 0.25, 0.90
- Model 4 = 0.10, 0.35, 0.75 << Best
gls_spline_l_1 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_1,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline_l_2 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_2,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline_l_3 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_3,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
gls_spline_l_4 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_4,
weights=varExp(form=~as.vector(lengthSctransf)),
correlation=corCAR1(form = ~ age |name), data=bearsRM)
AIC(gls_spline_l_1, gls_spline_l_2, gls_spline_l_3, gls_spline_l_4)
df AIC
gls_spline_l_1 10 1210.123
gls_spline_l_2 10 1210.157
gls_spline_l_3 10 1209.300
gls_spline_l_4 10 1209.536
10.4 Restructure variance
Because the cubic splines transformation resulted in a lower AIC score and better model fit than the x^2 and 1/x transformation, we must revisit the manner in which within-group and between-group heteroskedasticity is being modeled.
Namely, we must consider that modeling this variance with a cubic splines transformation may also be better than modeling it with the x^2 or 1/x transformation.
To test for this, we build four potential models.
- Option 1 = model based upon splinepoints for chest girth
- Option 2 = model based upon splinepoints for length
- Option 3 = model based upon chest girth plus its splinepoints
- Option 4 = model based upon length plus its splinepoints
gls_spline_opt1 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_4,
weights=varExp(form=~as.vector(splinepoints_3)),
correlation=corCAR1(form = ~ age | name), data=bearsRM)
gls_spline_opt2 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_4,
weights=varExp(form=~as.vector(splinepoints_l_4)),
correlation=corCAR1(form = ~ age | name), data=bearsRM)
gls_spline_opt3 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_4,
weights=varExp(form=~as.vector(chestGSc + splinepoints_3)),
correlation=corCAR1(form = ~ age | name), data=bearsRM)
gls_spline_opt4 <- gls(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_3,
weights=varExp(form=~as.vector(lengthSc + splinepoints_l_3)),
correlation=corCAR1(form = ~ age | name), data=bearsRM)
AIC(gls_spline_opt1, gls_spline_opt2, gls_spline_opt3, gls_spline_opt4)
df AIC
gls_spline_opt1 10 1203.178
gls_spline_opt2 10 1212.882
gls_spline_opt3 10 1201.739
gls_spline_opt4 10 1211.680
gls_spline <- gls_spline_opt3
11 Restructure LME
Because the cubic splines transformation improved the model fit of the GLS so much, this section revisits the LME and applies the same cubic splines transformation to the best-fit LME model from above.
lme_final <- lme(weight ~ chestGSc + lengthSc + headWSc +
ageNA + splinepoints_3 + splinepoints_l_4,
random = ~ 1 |name/age,
weights=varExp(form=~as.vector(chestGSc + splinepoints_3)),
correlation=corCAR1(form = ~ age | name), data=bearsRM)
12 Comparison of models
12.1 AIC
cat(" AIC LM ", round(AIC(lm1),2),
"\n", "AIC GLS ", round(AIC(gls2),2), "\n",
"AIC GLS spline ", round(AIC(gls_spline), 2), "\n",
"AIC LME spline ", round(AIC(lme_final),2),
"\n")
AIC LM 1282.12
AIC GLS 1267.5
AIC GLS spline 1201.74
AIC LME spline 1202.34
12.2 Residuals
resid_lm <- resid(lm1)
resid_gls_spline <- resid(gls_spline)
resid_gls <- resid(gls2)
resid_lme <- resid(lme_final)
residplot_lm <- ggplot(data=bears, aes(y=resid_lm, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()+
ggtitle("Linear model")
residplot_gls_spline <- ggplot(data=bearsRM, aes(y=resid_gls_spline, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()+
ggtitle("GLS - cubic splines")
residplot_gls <- ggplot(data=bearsRM, aes(y=resid_gls, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()+
ggtitle("GLS")
residplot_lme <- ggplot(data=bearsRM, aes(y=resid_lme, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()+
ggtitle("LME - spline")
multiplot(residplot_lm, residplot_gls_spline, residplot_gls, residplot_lme, cols=2)
12.3 Predicted vs observed
fitted_lm <- fitted(lm1)
fitted_gls_spline <- fitted(gls_spline)
fitted_gls <- fitted(gls2)
fitted_lme <- fitted(lme_final)
fitted_plot_lm <- ggplot(data=bears, aes(y=fitted_lm, x=weight))+
theme_bw()+
geom_point(size=1.7)+
geom_smooth(method="lm", se=FALSE)+
xlab("")+
ylab("Fitted weight")+
ggtitle("Linear model")+
theme(panel.grid.minor=element_blank())
fitted_plot_gls_spline <- ggplot(data=bearsRM, aes(y=fitted_gls_spline, x=weight))+
theme_bw()+
geom_point(size=1.7)+
geom_smooth(method="lm", se=FALSE)+
ggtitle("GLS - spline")+
xlab("Observed weight")+
ylab("Fitted weight")+
theme(panel.grid.minor=element_blank())
fitted_plot_gls <- ggplot(data=bearsRM, aes(y=fitted_gls, x=weight))+
theme_bw()+
geom_point(size=1.7)+
geom_smooth(method="lm", se=FALSE)+
ggtitle("GLS")+
xlab("")+
ylab("")+
theme(panel.grid.minor=element_blank())
fitted_plot_lme <- ggplot(data=bearsRM, aes(y=fitted_lme, x=weight))+
theme_bw()+
geom_point(size=1.7)+
geom_smooth(method="lm", se=FALSE)+
ggtitle("LME - spline")+
ylab("")+
xlab("Observed weight")+
theme(panel.grid.minor=element_blank())
multiplot(fitted_plot_lm, fitted_plot_gls_spline, fitted_plot_gls, fitted_plot_lme, cols=2)
12.4 Residual sum of squares
RSS_lm <- sum(resid(lm1)^2)
RSS_gls <- sum(resid(gls2)^2)
RSS_gls_spline <- sum(resid(gls_spline)^2)
RSS_lme_spline <- sum(resid(lme_final)^2)
cat(" RSS LM ", round(RSS_lm,2),
"\n", "RSS GLS ", round(RSS_gls,2), "\n",
"RSS GLS spline ", round(RSS_gls_spline, 2), "\n",
"RSQ LME spline ", round(RSS_lme_spline,2),
"\n")
RSS LM 58628.63
RSS GLS 69851.49
RSS GLS spline 49865.17
RSQ LME spline 37288.98
13 Final model
The linear mixed effects model is chosen to be the best model for bear weight based upon its low AIC score and low residual sum of squares.
lme(weight ~ chestGSc + lengthSc + headWSc + ageNA +
splinepoints_chest + splinepoints_length,
random = ~ 1 | name/age,
weights=varExp(form=~as.vector(chestGSc + splinepoints_chest)),
correlation=corCAR1(form = ~ age | name))
AIC(lme_final)
[1] 1202.342
(RSS_lme_spline <- sum(resid(lme_final)^2))
[1] 37288.98
fitted_plot_lme <- ggplot(data=bearsRM, aes(y=fitted_lme, x=weight))+
theme_bw()+
geom_point(size=2.3)+
geom_smooth(method="lm", se=FALSE)+
ggtitle("Fitted vs observed")
residplot_lme <- ggplot(data=bearsRM, aes(y=resid_lme, x=weight))+
geom_point()+
geom_hline(yintercept=0, color="blue")+
theme_bw()+
ggtitle("Residuals")
multiplot(residplot_lme, fitted_plot_lme)
summary(lme_final)
Linear mixed-effects model fit by REML
Data: bearsRM
AIC BIC logLik
1202.342 1237.206 -589.1712
Random effects:
Formula: ~1 | name
(Intercept)
StdDev: 7.369669
Formula: ~1 | age %in% name
(Intercept) Residual
StdDev: 0.00180367 11.38655
Correlation Structure: Continuous AR(1)
Formula: ~age | name/age
Parameter estimate(s):
Phi
0.2
Variance function:
Structure: Exponential of variance covariate
Formula: ~as.vector(chestGSc + splinepoints_3)
Parameter estimates:
expon
0.09857261
Fixed effects: weight ~ chestGSc + lengthSc + headWSc + ageNA + splinepoints_3 + splinepoints_l_4
Value Std.Error DF t-value p-value
(Intercept) 137.34347 4.600698 98 29.852749 0.0000
chestGSc 18.39338 7.128357 37 2.580312 0.0140
lengthSc 24.93949 5.021722 37 4.966323 0.0000
headWSc 4.99183 2.174034 37 2.296115 0.0274
ageNA 1.85700 3.131285 98 0.593048 0.5545
splinepoints_3 17.38913 2.233030 37 7.787238 0.0000
splinepoints_l_4 0.49507 1.937311 37 0.255544 0.7997
Correlation:
(Intr) chstGS lngthS hedWSc ageNA spln_3
chestGSc 0.527
lengthSc -0.066 -0.810
headWSc 0.105 -0.095 -0.117
ageNA -0.417 0.066 -0.222 -0.064
splinepoints_3 -0.459 -0.823 0.709 -0.086 0.001
splinepoints_l_4 -0.166 0.511 -0.735 0.024 0.105 -0.741
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.63632686 -0.53081533 0.04505409 0.48008005 2.50417096
Number of Observations: 142
Number of Groups:
name age %in% name
100 142