# Exercise 1 with solution - New York City /r/ with Rbrul - GoldVarb-esque logistic regression with categorical predictor
#
# Start R.
# Load Rbrul.
source("http://www.danielezrajohnson.com/Rbrul.R")
# Start Rbrul.
rbrul()
#
# Open an Internet browser.
# Go to "http://www.danielezrajohnson.com/ds.csv".
# Save page on Desktop or in other directory as "ds.csv".
# From the Rbrul main menu, choose "1" to "load/save data".
1
# If prompted to save current data, press "Enter" for "No".
# Navigate to Desktop or other directory.
[navigate]
# Enter the number to the left of the "ds.csv" file.
[number]
# "What separates the columns?" Enter "c" for commas.
c
# The MAIN MENU should appear again, with "current data structure" as
follows:
# Current data structure:
# r (integer with 2 values): 1 0
# store (factor with 3 values): Saks Macy's Klein's
# emphasis (factor with 2 values): normal emphatic
# word (factor with 2 values): fouRth flooR
#
# This is the Labov "fourth floor" department store data collected in
1962.
#
# Question 1: which of the factor groups has a statistically-significant
effect on the use of /r/, and which of their factors favor and disfavor
/r/, and to what extent?
#
# Enter "5-modeling" and the MODELING MENU will open.
5
# Enter "1" to "choose variables".
1
# Enter "1" to choose "r" as the response, or dependent variable.
1
# Press "Enter" as "r" is a binary response.
# Enter "2" to choose "1" (presence of /r/) as the application value.
1
# Enter "2", "3", "4" to choose "store", "emphasis", and "word" as the
potential predictors.
2
3
4
# "Are any of these predictors continuous?" The answer is no, they are all
categorical/factors, so press "Enter".
# "Any grouping factors (random effects)?" The answer is no, these are all
ordinary fixed effects, so press "Enter".
# "Consider an/another pairwise interaction between predictors?" Why not? Let's
consider the possible interaction of any pair of the predictors. So enter
"2", "3", "2", "4", "3", "4".
2
3
2
4
3
4
#
# The MODELING MENU should now show, under "Current variables are:"
# response.binary: r (1 vs. 0)
# fixed.factor: store emphasis word
# fixed.interaction: store:emphasis store:word emphasis:word
#
# Enter "5" for "step-up/step-down".
5
# A long output results. The most relevant part of the output, near the
end, is:
#
# BEST STEP-UP MODEL WAS WITH store (1.08e-18) + word (8.18e-09) [A]
#
# STEP-UP AND STEP-DOWN MATCH!
#
# STEPPING DOWN:
#
# $store
# factor logodds tokens 1/1+0 centered weight
# Saks 0.900 177 0.475 0.711
# Macy's 0.436 336 0.372 0.607
# Klein's -1.337 216 0.097 0.208
#
# $word
# factor logodds tokens 1/1+0 centered weight
# flooR 0.493 347 0.412 0.621
# fouRth -0.493 382 0.228 0.379
#
# Both in step-up and step-down, the groups selected as significant are
STORE and WORD. The significance levels are shown in parentheses (For
example, 1.08e-18 for STORE means p = 1.08 x 10 raised to the power of
-18).
#
# EMPHASIS is not selected, since its significance is only 0.07 (this
figure can be seen at step-up Run 6 or step-down Run 9).
#
# The only interaction considered in the step-up model is STORE:WORD,
after STORE and WORD are individually added. This interaction is not
significant (p = 0.34).
# In the step-down model, all the interaction effects are dropped first,
as well as the EMPHASIS main effect.
#
# We see the same output as from GoldVarb in Exercise 1, in the "centered
weight" column. For STORE, Saks strongly favors /r/ (.711), Macy's mildly
favors /r/ (.607), and Klein's strongly disfavors /r/ (0.208). Within
WORD, the unstressed "fourth" disfavors /r/ (0.379), while stressed
"floor" favors /r/ (0.621).
#
# The Rbrul output also presents results in log-odds, and shows token
numbers and response proportions in the same table as the factor weights.
#
# Enter "9" to return to the main menu.
9