Thursday 11 May 2017

Random Slopes: Now That Rbrul Has Them, You May Want Them Too


I've made the first major update to Rbrul in a long time, adding support for random slopes. While in most cases, models with random intercepts perform better than those without them, a recent paper (Barr et al. 2013) has convincingly argued that for each fixed effect in a mixed model, one or more corresponding random slopes should also be considered.

So what are random slopes and what benefits do they provide? If we start with the simple regression equation y = ax + b, the intercept is b and the slope is a. A random intercept allows b to vary; if the data is drawn from different speakers, each speaker essentially has their own value for b. A random slope allows each speaker to have their own value for a as well.

The sociolinguistic literature usually concedes that speakers can vary in their intercepts (average values or rates of application). But at least since Guy (1980), it has been suggested or assumed that the speakers in a community do not vary in their slopes (constraints). As we saw last week, though, in some data sets the effect of following consonant vs. vowel on t/d-deletion varies by speaker more than might be expected by chance.

In the Buckeye Corpus, the estimated standard deviation, across speakers, of this consonant-vs.-vowel slope was 0.70 log-odds; in the Philadelphia Neighborhood Corpus, it was 0.67. A simulation reproducing the number of speakers, number of tokens, balance of following segments, overall following-segment effect, and speaker intercepts produced a median standard deviation of only 0.10 for Ohio and 0.16 for Philadelphia. Speaker slopes as dispersed as the ones actually observed would occur very rarely by chance (Ohio, p < .001; Philadelphia, p = .003).1

If rates and constraints can vary by speaker, it is important not to ignore speaker when analyzing the data. In assessing between-speaker effects – gender, class, age, etc. – ignoring speaker is equivalent to assuming that every token comes from a different speaker. This greatly overestimates the significance of these between-speaker effects (Johnson 2009). The same applies to intercepts (different rates between groups) and slopes (different constraints between groups). The figure below illustrates this.

By keeping track of speaker variation, random intercepts and slopes help provide accurate p-values (left). Without them, data gets "lumped" and p-values can be meaninglessly low (right).

Especially if your data is unbalanced, there are other benefits to using random slopes if your constraints might differ by speaker (or by word, or another grouping factor); these will not be discussed here. Mixed-effects models with random slopes not only control for by-speaker constraint variation, they also provide an estimate of its size. Mixed models with only random intercepts, like fixed-effects models, rather blindly assume the slope variation to be zero, and are only accurate if it really is. No doubt, this "Shared Constraints Hypothesis" (Guy 2004) is roughly, qualitatively correct: for example, all 83 speakers from Ohio and Philadelphia showed more deletion before consonants than before vowels (except one person with only two tokens!) But the hypothesis has been taken for granted far more often than it has been supported with quantitative evidence.

Rbrul has always fit models with random intercepts, allowing users to stop assuming that individual speakers have equal rates of application of a binary variable (or the same average values of a continuous variable). Now Rbrul allows random slopes, so the Shared Constraints Hypothesis can be treated like the hypothesis it is, rather than an inflexible axiom built into our software. The new feature may not be working perfectly, so please send feedback to danielezrajohnson@gmail.com (or comment here) if you encounter any problems or have questions. Also feel free to be in touch if you have requests for other features to be added in the future!

1These models did not control for other within-subjects effects that could have increased the apparent diversity in the following-segment effect.


P.S. A major drawback to using random slopes is that models containing them can take a long time to fit, and sometimes they don't fit at all, causing "false convergences" and "singular convergences" that Rbrul reports with an "Error Message". There is not always a solution to this – see here and here for suggestions from Jaeger – but it is always a good idea to center any continuous variables, or at least keep the zero-point close to the center. For example, if you have a date-of-birth predictor, make 0 the year 1900 or 1950, not the year 0. Add random slopes one at a time so processing times don't get out of hand too quickly. Sonderegger has suggested dropping the correlation terms that lmer() estimates (by default) among the random effects. While this speeds up model fitting considerably, it seems to make the questionable assumption that the random effects are uncorrelated, so it has not been implemented.

P.P.S. Like lmer(), Rbrul will not stop you from adding a nonsensical random slope that does not vary within levels of the grouping factor. For example, a by-speaker slope for gender makes no sense because a given speaker is – at least traditionally – always the same gender. If speaker is the grouping factor, use random slopes that can vary within a speaker's data: style, topic, and most internal linguistic variables. If you are using word as a grouping factor, it is possible that different words show different gender effects; using a by-word slope for gender could be revealing.

P.P.P.S. I also added the AIC (Akaike Information Criterion) to the model output. The AIC is the deviance plus two times the number of parameters. Comparing the AIC values of two models is an alternative to performing a likelihood-ratio test. The model with lower AIC is supposed to be better.


References:

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language 68: 255-278. [pdf]

Guy, Gregory R. 1980. Variation in the group and the individual: the case of final stop deletion. In W. Labov (ed.), Locating language in time and space. New York: Academic Press. 1-36. [pdf]

Guy, Gregory R. 2004. Dialect unity, dialect contrast: the role of variable constraints. Talk presented at the Meertens Institute, August 2004.

Johnson, Daniel E. 2009. Getting off the GoldVarb standard: introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass 3(1): 359-383. [pdf]

No comments:

Post a Comment