Statistical Analysis of Speech Data
Daniel Ezra Johnson
Sociolinguistics Summer School 3
University of Glasgow
6 July 2011

This workshop consists of a set of exercises in R. The first half of the workshop deals with graphing. The second half deals with regression. Please download these two files to a local directory on your computer. They are plain text files. You will open them as scripts in R and follow along with the instructions and commands during the workshop.

The following readings may be helpful or interesting before or after the workshop. Anything by William Cleveland or Edward Tufte is recommended for general reading on graphing and information design. Douglas Bates and Frank Harrell are good sources about regression modeling.

This is the introduction to Cleveland's Elements of graphing data.
This is the homepage for the R graphing package ggplot2.
This is a good on-line introduction to R.
This is a handy reference card of R commands.
This and this are a good two-part introduction to regression modeling in R.
This and this are two articles I have written in favor of mixed-effects regression.

Before the workshop, make sure that you have R installed and working on your computer!
If you have never installed R, use one of the following links to download R for Mac or Windows.

R has a command-line interface. You can enter commands by typing them directly at the > prompt. For example, we can type 2 + 2, press Return/Enter, and obtain the result, 4.

> 2 + 2
[1] 4


A better way to run commands in R is to copy lines from a text file, or script, that you will usually keep open next to the main R window. The Mac version of R calls this window a "document" while the Windows version calls it a "script". You can highlight text and use Command-Enter (Mac) or Ctrl-R (Windows) to run simple commands or longer programs from your script window.

Besides the main R functions, we will be using several additional packages. You can install packages under the Package menu (try "Package Installer" for Mac or "Install package(s)" for Windows). Or you can use the function install.packages(). For example, to install the package ggplot2, we would type:

> install.packages("ggplot2")

When we install packages, a window may open asking us to choose a mirror, that is, to choose which site around the world we will download from. Choose any mirror, it doesn't matter which one.

Please install the following packages, which we will be using in the workshop:

ggplot2 (for graphing)
lme4 (for mixed-effects regression modeling)
plyr (for data manipulation)

(Note for Windows users: If you receive an error message saying that you don't have permission to install packages, you may have to close R and re-open it "as administrator" (right-click on the icon to reveal this option). Once you have installed packages, you can start R in the normal way.)

The data to be used in the workshop is from Scotland. We have a file of vowel measurements from Orkney and judgments of post-vocalic /r/ from Gretna. The Orkney data was collected by Meredith Tamminga and the Gretna data was collected as part of the AISEB project at the University of York. Download both .csv data files to a local directory on your computer.

We will be using Rbrul, so download this text file to a local directory on your computer. We will not be discussing Rbrul in detail, but I am happy to set up other times to meet and talk about it, if there is interest.

Finally, Bill Haddican and I are working on a survey, and would appreciate it very much if you felt like participating. You can also distribute it to less linguistically-savvy friends and family members!

To begin the workshop, start R. For the first half of the workshop, open the script "graphing.R" from wherever you have saved it. For the second half of the workshop, open the script "regression.R" from wherever you have saved it.