Title: | Basic Statistics and Data Analysis |
---|---|
Description: | Data sets for book "Basic Statistics and Data Analysis" by Larry J. Kitchens. |
Authors: | Alan T. Arnholt [aut, cre], Ben Evans [aut] |
Maintainer: | Alan T. Arnholt <[email protected]> |
License: | GPL-3 |
Version: | 1.2.2 |
Built: | 2024-11-07 03:11:09 UTC |
Source: | https://github.com/alanarnholt/bsda |
Data used in problem 6.39
Abbey
Abbey
A data frame/tibble with 50 observations on one variable
daily price returns (in pence) of Abbey National shares
Buckle, D. (1995), Bayesian Inference for Stable Distributions, Journal of the American Statistical Association, 90, 605-613.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Abbey$price) qqline(Abbey$price) t.test(Abbey$price, mu = 300) hist(Abbey$price, main = "Exercise 6.39", xlab = "daily price returns (in pence)", col = "blue")
qqnorm(Abbey$price) qqline(Abbey$price) t.test(Abbey$price, mu = 300) hist(Abbey$price, main = "Exercise 6.39", xlab = "daily price returns (in pence)", col = "blue")
Data used in Exercise 10.1
Abc
Abc
A data frame/tibble with 54 observations on two variables
a numeric vector
a character vector A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(response ~ group, col=c("red", "blue", "green"), data = Abc ) anova(lm(response ~ group, data = Abc))
boxplot(response ~ group, col=c("red", "blue", "green"), data = Abc ) anova(lm(response ~ group, data = Abc))
Data used in Exercise 1.23 and 2.79
Abilene
Abilene
A data frame/tibble with 16 observations on three variables
a character variable with values Aggravated
assault
, Arson
, Burglary
, Forcible rape
, Larceny
theft
, Murder
, Robbery
, and Vehicle theft
.
a factor with levels 1992
and 1999
number of reported crimes
Uniform Crime Reports, US Dept. of Justice.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(mfrow = c(2, 1)) barplot(Abilene$number[Abilene$year=="1992"], names.arg = Abilene$crimetype[Abilene$year == "1992"], main = "1992 Crime Stats", col = "red") barplot(Abilene$number[Abilene$year=="1999"], names.arg = Abilene$crimetype[Abilene$year == "1999"], main = "1999 Crime Stats", col = "blue") par(mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Abilene, aes(x = crimetype, y = number, fill = year)) + geom_bar(stat = "identity", position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 30, hjust = 1)) ## End(Not run)
par(mfrow = c(2, 1)) barplot(Abilene$number[Abilene$year=="1992"], names.arg = Abilene$crimetype[Abilene$year == "1992"], main = "1992 Crime Stats", col = "red") barplot(Abilene$number[Abilene$year=="1999"], names.arg = Abilene$crimetype[Abilene$year == "1999"], main = "1999 Crime Stats", col = "blue") par(mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Abilene, aes(x = crimetype, y = number, fill = year)) + geom_bar(stat = "identity", position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 30, hjust = 1)) ## End(Not run)
Data used in Exercise 8.57
Ability
Ability
A data frame/tibble with 400 observations on two variables
a factor with levels girls
and boys
a factor with levels hopeless
, belowavg
, average
, aboveavg
, and superior
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
CT <- xtabs(~gender + ability, data = Ability) CT chisq.test(CT)
CT <- xtabs(~gender + ability, data = Ability) CT chisq.test(CT)
Data used in Exercise 8.51
Abortion
Abortion
A data frame/tibble with 51 observations on the following 10 variables:
a character variable with values alabama
,
alaska
, arizona
, arkansas
, california
,
colorado
, connecticut
, delaware
, dist of columbia
,
florida,
georgia
, hawaii
, idaho
, illinois
,
indiana
, iowa
, kansas
, kentucky
, louisiana
,
maine
, maryland
, massachusetts
, michigan
,
minnesota
, mississippi
, missouri
, montana
,
nebraska
, nevada
, new hampshire
, new jersey
,
new mexico
, new york
, north carolina
, north dakota
,
ohio
, oklahoma
, oregon
, pennsylvania
, rhode
island
, south carolina
, south dakota
, tennessee
,
texas
, utah
, vermont
, virginia
, washington
,
west virginia
, wisconsin
, and wyoming
a character variable with values midwest
northeast
south
west
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a factor with levels Low
and High
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~region + rate, data = Abortion) T1 chisq.test(T1)
T1 <- xtabs(~region + rate, data = Abortion) T1 chisq.test(T1)
Data used in Exercise 1.28
Absent
Absent
A data frame/tibble with 20 observations on one variable
days absent
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
CT <- xtabs(~ days, data = Absent) CT barplot(CT, col = "pink", main = "Exercise 1.28") plot(ecdf(Absent$days), main = "ECDF")
CT <- xtabs(~ days, data = Absent) CT barplot(CT, col = "pink", main = "Exercise 1.28") plot(ecdf(Absent$days), main = "ECDF")
Data used in Example 7.14 and Exercise 10.7
Achieve
Achieve
A data frame/tibble with 25 observations on two variables
mathematics achiement score
a factor with 2 levels boys
and girls
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
anova(lm(score ~ gender, data = Achieve)) t.test(score ~ gender, var.equal = TRUE, data = Achieve)
anova(lm(score ~ gender, data = Achieve)) t.test(score ~ gender, var.equal = TRUE, data = Achieve)
Data used in Exercise 9.15
Adsales
Adsales
A data frame/tibble with six observations on three variables
a character vector listing month
a numeric vector containing number of ads
a numeric vector containing number of sales
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(sales ~ ads, data = Adsales, main = "Exercise 9.15") mod <- lm(sales ~ ads, data = Adsales) abline(mod, col = "red") summary(mod) predict(mod, newdata = data.frame(ads = 6), interval = "conf", level = 0.99)
plot(sales ~ ads, data = Adsales, main = "Exercise 9.15") mod <- lm(sales ~ ads, data = Adsales) abline(mod, col = "red") summary(mod) predict(mod, newdata = data.frame(ads = 6), interval = "conf", level = 0.99)
Data used in Exercises 1.66 and 1.81
Aggress
Aggress
A data frame/tibble with 28 observations on one variable
measure of aggresive tendency, ranging from 10-50
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
with(data = Aggress, EDA(aggres)) # OR IQR(Aggress$aggres) diff(range(Aggress$aggres))
with(data = Aggress, EDA(aggres)) # OR IQR(Aggress$aggres) diff(range(Aggress$aggres))
Data used in Exercises 1.91 and 3.68
Aid
Aid
A data frame/tibble with 51 observations on two variables
a factor with levels Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, District of
Colunbia
, Florida
, Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
, Kansas
, Kentucky
,
Louisiana
, Maine
, Maryland
, Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
, Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
,
Washington
, West Virginia
, Wisconsin
, and Wyoming
average monthly payment per person in a family
US Department of Health and Human Services, 1993.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Aid$payment, xlab = "payment", main = "Average monthly payment per person in a family", col = "lightblue") boxplot(Aid$payment, col = "lightblue") dotplot(state ~ payment, data = Aid)
hist(Aid$payment, xlab = "payment", main = "Average monthly payment per person in a family", col = "lightblue") boxplot(Aid$payment, col = "lightblue") dotplot(state ~ payment, data = Aid)
Data used in Exercise 6.60
Aids
Aids
A data frame/tibble with 295 observations on three variables
time (in months) from HIV infection to the clinical manifestation of full-blown AIDS
age (in years) of patient
a numeric vector
Kalbsleich, J. and Lawless, J., (1989), An analysis of the data on transfusion related AIDS, Journal of the American Statistical Association, 84, 360-372.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
with(data = Aids, EDA(duration) ) with(data = Aids, t.test(duration, mu = 30, alternative = "greater") ) with(data = Aids, SIGN.test(duration, md = 24, alternative = "greater") )
with(data = Aids, EDA(duration) ) with(data = Aids, t.test(duration, mu = 30, alternative = "greater") ) with(data = Aids, SIGN.test(duration, md = 24, alternative = "greater") )
Data used in Exercise 1.12
Airdisasters
Airdisasters
A data frame /tibble with 141 observations on the following seven variables
a numeric vector indicating the year of an aircraft accident
a numeric vector indicating the number of deaths of an aircraft accident
a character vector indicating the decade of an aircraft accident
2000 World Almanac and Book of Facts.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(las = 1) stripchart(deaths ~ decade, data = Airdisasters, subset = decade != "1930s" & decade != "1940s", method = "stack", pch = 19, cex = 0.5, col = "red", main = "Aircraft Disasters 1950 - 1990", xlab = "Number of fatalities") par(las = 0)
par(las = 1) stripchart(deaths ~ decade, data = Airdisasters, subset = decade != "1930s" & decade != "1940s", method = "stack", pch = 19, cex = 0.5, col = "red", main = "Aircraft Disasters 1950 - 1990", xlab = "Number of fatalities") par(las = 0)
Data for Example 2.9
Airline
Airline
A data frame/tibble with 11 observations on three variables
a charater variable with values Alaska
,
Amer West
, American
, Continental
, Delta
,
Northwest
, Pan Am
, Southwest
, TWA
,
United
, and USAir
a numeric vector
complaints per 1000 passengers
Transportation Department.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
with(data = Airline, barplot(complaints, names.arg = airline, col = "lightblue", las = 2) ) plot(complaints ~ ontime, data = Airline, pch = 19, col = "red", xlab = "On time", ylab = "Complaints")
with(data = Airline, barplot(complaints, names.arg = airline, col = "lightblue", las = 2) ) plot(complaints ~ ontime, data = Airline, pch = 19, col = "red", xlab = "On time", ylab = "Complaints")
Data used in Exercise 5.79
Alcohol
Alcohol
A data frame/tibble with 14 observations on one variable
age when individual started drinking
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Alcohol$age) qqline(Alcohol$age) SIGN.test(Alcohol$age, md = 20, conf.level = 0.99)
qqnorm(Alcohol$age) qqline(Alcohol$age) SIGN.test(Alcohol$age, md = 20, conf.level = 0.99)
Data used in Exercise 8.22
Allergy
Allergy
A data frame/tibble with 406 observations on two variables
a factor with levels insomnia
,
headache
, and drowsiness
a factor with levels seldane-d
,
pseudoephedrine
, and placebo
Marion Merrel Dow, Inc. Kansas City, Mo. 64114.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~event + medication, data = Allergy) T1 chisq.test(T1)
T1 <- xtabs(~event + medication, data = Allergy) T1 chisq.test(T1)
Data used in Exercise 5.58
Anesthet
Anesthet
A with 10 observations on one variable
recovery time (in hours)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Anesthet$recover) qqline(Anesthet$recover) with(data = Anesthet, t.test(recover, conf.level = 0.90)$conf )
qqnorm(Anesthet$recover) qqline(Anesthet$recover) with(data = Anesthet, t.test(recover, conf.level = 0.90)$conf )
Data used in Exercise 2.96
Anxiety
Anxiety
A data frame/tibble with 20 observations on two variables
anxiety score before a major math test
math test score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(math ~ anxiety, data = Anxiety, ylab = "score", main = "Exercise 2.96") with(data = Anxiety, cor(math, anxiety) ) linmod <- lm(math ~ anxiety, data = Anxiety) abline(linmod, col = "purple") summary(linmod)
plot(math ~ anxiety, data = Anxiety, ylab = "score", main = "Exercise 2.96") with(data = Anxiety, cor(math, anxiety) ) linmod <- lm(math ~ anxiety, data = Anxiety) abline(linmod, col = "purple") summary(linmod)
Data used in Examples 9.2 and 9.9
Apolipop
Apolipop
A data frame/tibble with 15 observations on two variables
number of cups of coffee per day
level of apoliprotein B
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(apolipB ~ coffee, data = Apolipop) linmod <- lm(apolipB ~ coffee, data = Apolipop) summary(linmod) summary(linmod)$sigma anova(linmod) anova(linmod)[2, 3]^.5 par(mfrow = c(2, 2)) plot(linmod) par(mfrow = c(1, 1))
plot(apolipB ~ coffee, data = Apolipop) linmod <- lm(apolipB ~ coffee, data = Apolipop) summary(linmod) summary(linmod)$sigma anova(linmod) anova(linmod)[2, 3]^.5 par(mfrow = c(2, 2)) plot(linmod) par(mfrow = c(1, 1))
Data for Exercise 1.119
Append
Append
A data frame/tibble with 20 observations on one variable
fees for an appendectomy for a random sample of 20 hospitals in North Carolina
North Carolina Medical Database Commission, August 1994.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
fee <- Append$fee ll <- mean(fee) - 2*sd(fee) ul <- mean(fee) + 2*sd(fee) limits <-c(ll, ul) limits fee[fee < ll | fee > ul]
fee <- Append$fee ll <- mean(fee) - 2*sd(fee) ul <- mean(fee) + 2*sd(fee) limits <-c(ll, ul) limits fee[fee < ll | fee > ul]
Data for Exercise 10.60
Appendec
Appendec
A data frame/tibble with 59 observations on two variables
median costs of appendectomies at hospitals across the state of North Carolina in 1992
a vector classifying each hospital as rural, regional, or metropolitan
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(cost ~ region, data = Appendec, col = c("red", "blue", "cyan")) anova(lm(cost ~ region, data = Appendec))
boxplot(cost ~ region, data = Appendec, col = c("red", "blue", "cyan")) anova(lm(cost ~ region, data = Appendec))
Data for Exercises 2.1, 2.26, 2.35 and 2.51
Aptitude
Aptitude
A data frame/tibble with 8 observations on two variables
aptitude test scores
productivity scores
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(product ~ aptitude, data = Aptitude, main = "Exercise 2.1") model1 <- lm(product ~ aptitude, data = Aptitude) model1 abline(model1, col = "red", lwd=3) resid(model1) fitted(model1) cor(Aptitude$product, Aptitude$aptitude)
plot(product ~ aptitude, data = Aptitude, main = "Exercise 2.1") model1 <- lm(product ~ aptitude, data = Aptitude) model1 abline(model1, col = "red", lwd=3) resid(model1) fitted(model1) cor(Aptitude$product, Aptitude$aptitude)
Data for Exercises 5.120, 10.20 and Example 1.16
Archaeo
Archaeo
A data frame/tibble with 60 observations on two variables
number of years before 1983 - the year the data were obtained
Ceramic Phase numbers
Cunliffe, B. (1984) and Naylor and Smith (1988).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(age ~ phase, data = Archaeo, col = "yellow", main = "Example 1.16", xlab = "Ceramic Phase", ylab = "Age") anova(lm(age ~ as.factor(phase), data= Archaeo))
boxplot(age ~ phase, data = Archaeo, col = "yellow", main = "Example 1.16", xlab = "Ceramic Phase", ylab = "Age") anova(lm(age ~ as.factor(phase), data= Archaeo))
Data for Exercise 10.58
Arthriti
Arthriti
A data frame/tibblewith 51 observations on two variables
time (measured in days) until an arthritis sufferer experienced relief
a factor with levels A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(time ~ treatment, data = Arthriti, col = c("lightblue", "lightgreen", "yellow"), ylab = "days") anova(lm(time ~ treatment, data = Arthriti))
boxplot(time ~ treatment, data = Arthriti, col = c("lightblue", "lightgreen", "yellow"), ylab = "days") anova(lm(time ~ treatment, data = Arthriti))
Data for Exercise 1.107
Artifici
Artifici
A data frame/tibble with 15 observations on one variable
duration (in hours) for transplant
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Artifici$duration, 2) summary(Artifici$duration) values <- Artifici$duration[Artifici$duration < 6.5] values summary(values)
stem(Artifici$duration, 2) summary(Artifici$duration) values <- Artifici$duration[Artifici$duration < 6.5] values summary(values)
Data for Exercise 10.51
Asprin
Asprin
A data frame/tibble with 15 observations on two variables
time (in seconds) for aspirin to dissolve
impurity of an ingredient with levels 1%
,
5%
, and 10%
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(time ~ impurity, data = Asprin, col = c("red", "blue", "green"))
boxplot(time ~ impurity, data = Asprin, col = c("red", "blue", "green"))
Data for Exercise 7.52
Asthmati
Asthmati
A data frame/tibble with nine observations on three variables
asthmatic relief index for patients given a drug
asthmatic relief index for patients given a placebo
difference between the placebo
and drug
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Asthmati$difference) qqline(Asthmati$difference) shapiro.test(Asthmati$difference) with(data = Asthmati, t.test(placebo, drug, paired = TRUE, mu = 0, alternative = "greater") )
qqnorm(Asthmati$difference) qqline(Asthmati$difference) shapiro.test(Asthmati$difference) with(data = Asthmati, t.test(placebo, drug, paired = TRUE, mu = 0, alternative = "greater") )
Data for Example 2.2 and Exercises 2.43 and 2.57
Attorney
Attorney
A data frame/tibble with 88 observations on three variables
U.S. attorneys' office staff per 1 million population
U.S. attorneys' office convictions per 1 million population
a factor with levels
Albuquerque
, Alexandria, Va
, Anchorage
, Asheville,
NC
, Atlanta
, Baltimore
, Baton Rouge
, Billings, Mt
,
Birmingham, Al
, Boise, Id
, Boston
, Buffalo
,
Burlington, Vt
, Cedar Rapids
, Charleston, WVA
,
Cheyenne, Wy
, Chicago
, Cincinnati
, Cleveland
,
Columbia, SC
, Concord, NH
, Denver
, Des Moines
,
Detroit
, East St. Louis
, Fargo, ND
, Fort Smith, Ark
,
Fort Worth
, Grand Rapids, Mi
, Greensboro, NC
,
Honolulu
, Houston
, Indianapolis
, Jackson, Miss
,
Kansas City
, Knoxville, Tn
, Las Vegas
, Lexington,
Ky
, Little Rock
, Los Angeles
, Louisville
, Memphis
,
Miami
, Milwaukee
, Minneapolis
, Mobile, Ala
,
Montgomery, Ala
, Muskogee, Ok
, Nashville
, New Haven,
Conn
, New Orleans
, New York (Brooklyn)
, New York
(Manhattan)
, Newark, NJ
, Oklahoma City
, Omaha
,
Oxford, Miss
, Pensacola, Fl
, Philadelphia
, Phoenix
,
Pittsburgh
, Portland, Maine
, Portland, Ore
,
Providence, RI
, Raleigh, NC
, Roanoke, Va
,
Sacramento
, Salt Lake City
, San Antonio
, San Diego
,
San Francisco
, Savannah, Ga
, Scranton, Pa
, Seattle
,
Shreveport, La
, Sioux Falls, SD
, South Bend, Ind
,
Spokane, Wash
,Springfield, Ill
, St. Louis
,
Syracuse, NY
, Tampa
, Topeka, Kan
, Tulsa
,
Tyler, Tex
, Washington
, Wheeling, WVa
, and Wilmington,
Del
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(mfrow=c(1, 2)) plot(convict ~ staff, data = Attorney, main = "With Washington, D.C.") plot(convict[-86] ~staff[-86], data = Attorney, main = "Without Washington, D.C.") par(mfrow=c(1, 1))
par(mfrow=c(1, 2)) plot(convict ~ staff, data = Attorney, main = "With Washington, D.C.") plot(convict[-86] ~staff[-86], data = Attorney, main = "Without Washington, D.C.") par(mfrow=c(1, 1))
Data for Exercise 7.46
Autogear
Autogear
A data frame/tibble with 20 observations on two variables
number of defective gears in the production of 100 gears per day
a factor with levels A
and B
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(defectives ~ manufacturer, data = Autogear) wilcox.test(defectives ~ manufacturer, data = Autogear) t.test(defectives ~ manufacturer, var.equal = TRUE, data = Autogear)
t.test(defectives ~ manufacturer, data = Autogear) wilcox.test(defectives ~ manufacturer, data = Autogear) t.test(defectives ~ manufacturer, var.equal = TRUE, data = Autogear)
Data for Exercise 7.40
Backtoback
Backtoback
A data frame/tibble with 24 observations on two variables
a numeric vector
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
wilcox.test(score ~ group, data = Backtoback) t.test(score ~ group, data = Backtoback)
wilcox.test(score ~ group, data = Backtoback) t.test(score ~ group, data = Backtoback)
Data for Exercise 1.11
Bbsalaries
Bbsalaries
A data frame/tibble with 142 observations on two variables
1999 salary for baseball player
a factor with levels Angels
, Indians
,
Orioles
, Redsoxs
, and Whitesoxs
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripchart(salary ~ team, data = Bbsalaries, method = "stack", pch = 19, col = "blue", cex = 0.75) title(main = "Major League Salaries")
stripchart(salary ~ team, data = Bbsalaries, method = "stack", pch = 19, col = "blue", cex = 0.75) title(main = "Major League Salaries")
Data for Exercises 1.124 and 2.94
Bigten
Bigten
A data frame/tibble with 44 observations on the following four variables
a factor with levels Illinois
,
Indiana
, Iowa
, Michigan
, Michigan State
,
Minnesota
, Northwestern
, Ohio State
, Penn State
,
Purdue
, and Wisconsin
graduation rate
factor with two levels 1984-1985
and 1993-1994
factor with two levels athlete
and student
NCAA Graduation Rates Report, 2000.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(rate ~ status, data = subset(Bigten, year = "1993-1994"), horizontal = TRUE, main = "Graduation Rates 1993-1994") with(data = Bigten, tapply(rate, list(year, status), mean) )
boxplot(rate ~ status, data = subset(Bigten, year = "1993-1994"), horizontal = TRUE, main = "Graduation Rates 1993-1994") with(data = Bigten, tapply(rate, list(year, status), mean) )
Data for Exercise 1.49
Biology
Biology
A data frame/tibble with 30 observations on one variable
test scores on the first test in a beginning biology class
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Biology$score, breaks = "scott", col = "brown", freq = FALSE, main = "Problem 1.49", xlab = "Test Score") lines(density(Biology$score), lwd=3)
hist(Biology$score, breaks = "scott", col = "brown", freq = FALSE, main = "Problem 1.49", xlab = "Test Score") lines(density(Biology$score), lwd=3)
Data for Example 1.10
Birth
Birth
A data frame/tibble with 51 observations on three variables
a character with levels Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, District of
Colunbia
, Florida
, Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
, Kansas
, Kentucky
,
Louisiana
, Maine
, Maryland
, Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
, Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
,
Washington
, West Virginia
, Wisconsin
, and Wyoming
live birth rates per 1000 population
a factor with levels 1990
and 1998
National Vital Statistics Report, 48, March 28, 2000, National Center for Health Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
rate1998 <- subset(Birth, year == "1998", select = rate) stem(x = rate1998$rate, scale = 2) hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate", main = "Figure 1.14 in BSDA", col = "pink") hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate", main = "Figure 1.16 in BSDA", col = "pink", freq = FALSE) lines(density(rate1998$rate), lwd = 3) rm(rate1998)
rate1998 <- subset(Birth, year == "1998", select = rate) stem(x = rate1998$rate, scale = 2) hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate", main = "Figure 1.14 in BSDA", col = "pink") hist(rate1998$rate, breaks = seq(10.9, 21.9, 1.0), xlab = "1998 Birth Rate", main = "Figure 1.16 in BSDA", col = "pink", freq = FALSE) lines(density(rate1998$rate), lwd = 3) rm(rate1998)
Data for Exercise 8.55
Blackedu
Blackedu
A data frame/tibble with 3800 observations on two variables
a factor with levels Female
and Male
a factor with levels High school dropout
,
High school graudate
, Some college
, Bachelor
's degree
, and
Graduate degree
Bureau of Census data.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~gender + education, data = Blackedu) T1 chisq.test(T1)
T1 <- xtabs(~gender + education, data = Blackedu) T1 chisq.test(T1)
Data for Exercise 7.84
Blood
Blood
A data frame/tibble with 15 observations on the following two variables
blood pressure recorded from an automated blood pressure machine
blood pressure recorded by an expert using an at-home device
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
DIFF <- Blood$machine - Blood$expert shapiro.test(DIFF) qqnorm(DIFF) qqline(DIFF) rm(DIFF) t.test(Blood$machine, Blood$expert, paired = TRUE)
DIFF <- Blood$machine - Blood$expert shapiro.test(DIFF) qqnorm(DIFF) qqline(DIFF) rm(DIFF) t.test(Blood$machine, Blood$expert, paired = TRUE)
Data for Exercise 10.14
Board
Board
A data frame/tibble with 7 observations on three variables
1999 salary (in $1000) for board directors
a factor with levels A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(salary ~ university, data = Board, col = c("red", "blue", "green"), ylab = "Income") tapply(Board$salary, Board$university, summary) anova(lm(salary ~ university, data = Board)) ## Not run: library(dplyr) dplyr::group_by(Board, university) %>% summarize(Average = mean(salary)) ## End(Not run)
boxplot(salary ~ university, data = Board, col = c("red", "blue", "green"), ylab = "Income") tapply(Board$salary, Board$university, summary) anova(lm(salary ~ university, data = Board)) ## Not run: library(dplyr) dplyr::group_by(Board, university) %>% summarize(Average = mean(salary)) ## End(Not run)
Data for Example 7.22
Bones
Bones
A data frame/tibble with 70 observations on two variables
bone density measurements
a factor with levels active
and nonactive
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(density ~ group, data = Bones, alternative = "greater") t.test(rank(density) ~ group, data = Bones, alternative = "greater") wilcox.test(density ~ group, data = Bones, alternative = "greater")
t.test(density ~ group, data = Bones, alternative = "greater") t.test(rank(density) ~ group, data = Bones, alternative = "greater") wilcox.test(density ~ group, data = Bones, alternative = "greater")
Data for Exercise 9.53
Books
Books
A data frame/tibble with 17 observations on two variables
number of books read
spelling score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(spelling ~ book, data = Books) mod <- lm(spelling ~ book, data = Books) summary(mod) abline(mod, col = "blue", lwd = 2)
plot(spelling ~ book, data = Books) mod <- lm(spelling ~ book, data = Books) summary(mod) abline(mod, col = "blue", lwd = 2)
Data for Exercise 10.30 and 10.31
Bookstor
Bookstor
A data frame/tibble with 72 observations on two variables
money obtained for selling textbooks
a factor with levels A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(dollars ~ store, data = Bookstor, col = c("purple", "lightblue", "cyan")) kruskal.test(dollars ~ store, data = Bookstor)
boxplot(dollars ~ store, data = Bookstor, col = c("purple", "lightblue", "cyan")) kruskal.test(dollars ~ store, data = Bookstor)
Data for Exercises 2.15, 2.44, 2.58 and Examples 2.3 and 2.20
Brain
Brain
A data frame/tibble with 28 observations on three variables
a factor with levels African
elephant
, Asian Elephant
, Brachiosaurus
, Cat
,
Chimpanzee
, Cow
, Diplodocus
, Donkey
, Giraffe
,
Goat
, Gorilla
, Gray wolf
, Guinea Pig
, Hamster
,
Horse
, Human
, Jaguar
, Kangaroo
, Mole
,
Mouse
, Mt Beaver
, Pig
, Potar monkey
, Rabbit
,
Rat
, Rhesus monkey
, Sheep
, and Triceratops
body weight (in kg)
brain weight (in g)
P. Rousseeuw and A. Leroy, Robust Regression and Outlier Detection (New York: Wiley, 1987).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(log(brainweight) ~ log(bodyweight), data = Brain, pch = 19, col = "blue", main = "Example 2.3") mod <- lm(log(brainweight) ~ log(bodyweight), data = Brain) abline(mod, lty = "dashed", col = "blue")
plot(log(brainweight) ~ log(bodyweight), data = Brain, pch = 19, col = "blue", main = "Example 2.3") mod <- lm(log(brainweight) ~ log(bodyweight), data = Brain) abline(mod, lty = "dashed", col = "blue")
Data for Exercise 1.73
Bumpers
Bumpers
A data frame/tibble with 23 observations on two variables
a factor with levels Buick Century
,
Buick Skylark
, Chevrolet Cavalier
, Chevrolet Corsica
,
Chevrolet Lumina
, Dodge Dynasty
, Dodge Monaco
, Ford
Taurus
, Ford Tempo
, Honda Accord
, Hyundai Sonata
,
Mazda 626
, Mitsubishi Galant
, Nissan Stanza
,
Oldsmobile Calais
, Oldsmobile Ciere
, Plymouth Acclaim
,
Pontiac 6000
, Pontiac Grand Am
, Pontiac Sunbird
,
Saturn SL2
, Subaru Legacy
, and Toyota Camry
total repair cost (in dollars) after crashing a car into a barrier four times while the car was traveling at 5 miles per hour
Insurance Institute of Highway Safety.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Bumpers$repair) stripchart(Bumpers$repair, method = "stack", pch = 19, col = "blue") library(lattice) dotplot(car ~ repair, data = Bumpers)
EDA(Bumpers$repair) stripchart(Bumpers$repair, method = "stack", pch = 19, col = "blue") library(lattice) dotplot(car ~ repair, data = Bumpers)
Data for Exercise 8.25
Bus
Bus
A data frame/tibble with 29363 observations on two variables
a factor with levels absent
and
present
a factor with levels am
, noon
, pm
,
swing
, and split
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~attendance + shift, data = Bus) T1 chisq.test(T1)
T1 <- xtabs(~attendance + shift, data = Bus) T1 chisq.test(T1)
Data for Exercises 5.104 and 6.43
Bypass
Bypass
A data frame/tibble with 17 observations on two variables
a factor with levels Carolinas Med
Ct
, Duke Med Ct
, Durham Regional
, Forsyth Memorial
,
Frye Regional
, High Point Regional
, Memorial Mission
,
Mercy
, Moore Regional
, Moses Cone Memorial
, NC
Baptist
, New Hanover Regional
, Pitt Co. Memorial
,
Presbyterian
, Rex
, Univ of North Carolina
, and Wake
County
median charge for coronary bypass
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Bypass$charge) t.test(Bypass$charge, conf.level=.90)$conf t.test(Bypass$charge, mu = 35000)
EDA(Bypass$charge) t.test(Bypass$charge, conf.level=.90)$conf t.test(Bypass$charge, mu = 35000)
Data for Exercise 7.83
Cabinets
Cabinets
A data frame/tibble with 20 observations on three variables
a numeric vector
estimate for kitchen cabinets from supplier A (in dollars)
estimate for kitchen cabinets from supplier A (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
DIF <- Cabinets$supplA - Cabinets$supplB qqnorm(DIF) qqline(DIF) shapiro.test(DIF) with(data = Cabinets, t.test(supplA, supplB, paired = TRUE) ) with(data = Cabinets, wilcox.test(supplA, supplB, paired = TRUE) ) rm(DIF)
DIF <- Cabinets$supplA - Cabinets$supplB qqnorm(DIF) qqline(DIF) shapiro.test(DIF) with(data = Cabinets, t.test(supplA, supplB, paired = TRUE) ) with(data = Cabinets, wilcox.test(supplA, supplB, paired = TRUE) ) rm(DIF)
Data for Exercises 6.55 and 6.64
Cancer
Cancer
A data frame/tibble with 64 observations on two variables
survival time (in days) of terminal patients treated with vitamin C
a factor indicating type of cancer with levels
breast
, bronchus
, colon
, ovary
, and
stomach
Cameron, E and Pauling, L. 1978. “Supplemental Ascorbate in the Supportive Treatment of Cancer.” Proceedings of the National Academy of Science, 75, 4538-4542.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(survival ~ type, Cancer, col = "blue") stomach <- Cancer$survival[Cancer$type == "stomach"] bronchus <- Cancer$survival[Cancer$type == "bronchus"] boxplot(stomach, ylab = "Days") SIGN.test(stomach, md = 100, alternative = "greater") SIGN.test(bronchus, md = 100, alternative = "greater") rm(bronchus, stomach)
boxplot(survival ~ type, Cancer, col = "blue") stomach <- Cancer$survival[Cancer$type == "stomach"] bronchus <- Cancer$survival[Cancer$type == "bronchus"] boxplot(stomach, ylab = "Days") SIGN.test(stomach, md = 100, alternative = "greater") SIGN.test(bronchus, md = 100, alternative = "greater") rm(bronchus, stomach)
Data for Exercise 10.28 and 10.29
Carbon
Carbon
A data frame/tibble with 24 observations on two variables
carbon monoxide measured (in parts per million)
a factor with levels SiteA
, SiteB
, and SiteC
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(CO ~ site, data = Carbon, col = "lightgreen") kruskal.test(CO ~ site, data = Carbon)
boxplot(CO ~ site, data = Carbon, col = "lightgreen") kruskal.test(CO ~ site, data = Carbon)
Data for Exercise 1.116
Cat
Cat
A data frame/tibble with 17 observations on one variable
reading score on the California Achievement Test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Cat$score) fivenum(Cat$score) boxplot(Cat$score, main = "Problem 1.116", col = "green")
stem(Cat$score) fivenum(Cat$score) boxplot(Cat$score, main = "Problem 1.116", col = "green")
Data for Exercises 7.34 and 7.48
Censored
Censored
A data frame/tibble with 121 observations on three variables
survival time (in days) of patients with small cell lung cancer
a factor with levels armA
and armB
indicating the
treatment a patient received
the age of the patient
Ying, Z., Jung, S., Wei, L. 1995. “Survival Analysis with Median Regression Models.” Journal of the American Statistical Association, 90, 178-184.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(survival ~ treatment, data = Censored, col = "yellow") wilcox.test(survival ~ treatment, data = Censored, alternative = "greater")
boxplot(survival ~ treatment, data = Censored, col = "yellow") wilcox.test(survival ~ treatment, data = Censored, alternative = "greater")
Data for Examples 1.11, 1.12, 1.13, 2.11 and 5.1
Challeng
Challeng
A data frame/tibble with 25 observations on four variables
a character variable indicating the flight
date of the flight
temperature (in fahrenheit)
number of failures
Dalal, S. R., Fowlkes, E. B., Hoadley, B. 1989. “Risk Analysis of the Space Shuttle: Pre-Challenger Prediction of Failure.” Journal of the American Statistical Association, 84, No. 408, 945-957.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Challeng$temp) summary(Challeng$temp) IQR(Challeng$temp) quantile(Challeng$temp) fivenum(Challeng$temp) stem(sort(Challeng$temp)[-1]) summary(sort(Challeng$temp)[-1]) IQR(sort(Challeng$temp)[-1]) quantile(sort(Challeng$temp)[-1]) fivenum(sort(Challeng$temp)[-1]) par(mfrow=c(1, 2)) qqnorm(Challeng$temp) qqline(Challeng$temp) qqnorm(sort(Challeng$temp)[-1]) qqline(sort(Challeng$temp)[-1]) par(mfrow=c(1, 1))
stem(Challeng$temp) summary(Challeng$temp) IQR(Challeng$temp) quantile(Challeng$temp) fivenum(Challeng$temp) stem(sort(Challeng$temp)[-1]) summary(sort(Challeng$temp)[-1]) IQR(sort(Challeng$temp)[-1]) quantile(sort(Challeng$temp)[-1]) fivenum(sort(Challeng$temp)[-1]) par(mfrow=c(1, 2)) qqnorm(Challeng$temp) qqline(Challeng$temp) qqnorm(sort(Challeng$temp)[-1]) qqline(sort(Challeng$temp)[-1]) par(mfrow=c(1, 1))
Data for Example 5.3
Chemist
Chemist
A data frame/tibble with 50 observations on one variable
starting salary (in dollars) for chemistry major
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Chemist$salary)
EDA(Chemist$salary)
Data for Exercise 6.41
Chesapea
Chesapea
A data frame/tibble with 16 observations on one variable
surface salinity measurements (in parts per 1000) for station 11, offshore from Annanapolis, Maryland, on July 3-4, 1927.
Davis, J. (1986) Statistics and Data Analysis in Geology, Second Edition. John Wiley and Sons, New York.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Chesapea$salinity) qqline(Chesapea$salinity) shapiro.test(Chesapea$salinity) t.test(Chesapea$salinity, mu = 7)
qqnorm(Chesapea$salinity) qqline(Chesapea$salinity) shapiro.test(Chesapea$salinity) t.test(Chesapea$salinity, mu = 7)
Data for Exercise 8.35
Chevy
Chevy
A data frame/tibble with 67 observations on two variables
a factor with levels 1988-90
and
1991-93
a factor with levels much better than average
, above average
,
average
, below average
, and much worse than average
Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~year + frequency, data = Chevy) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~year + frequency, data = Chevy) T1 chisq.test(T1) rm(T1)
Data for Exercise 10.15
Chicken
Chicken
A data frame/tibble with 13 observations onthree variables
weight gain over a specified period
a factor with levels ration1
, ration2
,
and ration3
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(gain ~ feed, col = c("red","blue","green"), data = Chicken) anova(lm(gain ~ feed, data = Chicken))
boxplot(gain ~ feed, col = c("red","blue","green"), data = Chicken) anova(lm(gain ~ feed, data = Chicken))
Data for Exercises 6.49 and 7.47
Chipavg
Chipavg
A data frame/tibble with 30 observations on three variables
thickness of the oxide layer for wafer1
thickness of the oxide layer for wafer2
average thickness of the oxide layer of the eight measurements obtained from each set of two wafers
Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Chipavg$thickness) t.test(Chipavg$thickness, mu = 1000) boxplot(Chipavg$wafer1, Chipavg$wafer2, name = c("Wafer 1", "Wafer 2")) shapiro.test(Chipavg$wafer1) shapiro.test(Chipavg$wafer2) t.test(Chipavg$wafer1, Chipavg$wafer2, var.equal = TRUE)
EDA(Chipavg$thickness) t.test(Chipavg$thickness, mu = 1000) boxplot(Chipavg$wafer1, Chipavg$wafer2, name = c("Wafer 1", "Wafer 2")) shapiro.test(Chipavg$wafer1) shapiro.test(Chipavg$wafer2) t.test(Chipavg$wafer1, Chipavg$wafer2, var.equal = TRUE)
Data for Exercise 10.9
Chips
Chips
A data frame/tibble with 30 observations on eight variables
first measurement of thickness of the oxide layer for wafer1
second measurement of thickness of the oxide layer for wafer1
third measurement of thickness of the oxide layer for wafer1
fourth measurement of thickness of the oxide layer for wafer1
first measurement of thickness of the oxide layer for wafer2
second measurement of thickness of the oxide layer for wafer2
third measurement of thickness of the oxide layer for wafer2
fourth measurement of thickness of the oxide layer for wafer2
Yashchin, E. 1995. “Likelihood Ratio Methods for Monitoring Parameters of a Nested Random Effect Model.” Journal of the American Statistical Association, 90, 729-738.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
with(data = Chips, boxplot(wafer11, wafer12, wafer13, wafer14, wafer21, wafer22, wafer23, wafer24, col = "pink") )
with(data = Chips, boxplot(wafer11, wafer12, wafer13, wafer14, wafer21, wafer22, wafer23, wafer24, col = "pink") )
Data for Example 10.4
Cigar
Cigar
A data frame/tibble with 100 observations on two variables
amount of tar (measured in milligrams)
a factor indicating cigarette brand with levels brandA
, brandB
,
brandC
, and brandD
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(tar ~ brand, data = Cigar, col = "cyan", ylab = "mg tar") anova(lm(tar ~ brand, data = Cigar))
boxplot(tar ~ brand, data = Cigar, col = "cyan", ylab = "mg tar") anova(lm(tar ~ brand, data = Cigar))
Data for Exercise 2.27
Cigarett
Cigarett
A data frame/tibble with 16 observations on two variables
mothers' estimated average number of cigarettes smoked per day
children's birth weights (in pounds)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(weight ~ cigarettes, data = Cigarett) model <- lm(weight ~ cigarettes, data = Cigarett) abline(model, col = "red") with(data = Cigarett, cor(weight, cigarettes) ) rm(model)
plot(weight ~ cigarettes, data = Cigarett) model <- lm(weight ~ cigarettes, data = Cigarett) abline(model, col = "red") with(data = Cigarett, cor(weight, cigarettes) ) rm(model)
This program simulates random samples from which it constructs confidence intervals for one of the parameters mean (Mu), variance (Sigma), or proportion of successes (Pi).
CIsim( samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95, type = "Mean" )
CIsim( samples = 100, n = 30, mu = 0, sigma = 1, conf.level = 0.95, type = "Mean" )
samples |
the number of samples desired. |
n |
the size of each sample. |
mu |
if constructing confidence intervals for the population mean or
the population variance, mu is the population mean (i.e., type is one of
either |
sigma |
the population standard deviation. |
conf.level |
confidence level for the graphed confidence intervals, restricted to lie between zero and one. |
type |
character string, one of |
Default is to construct confidence intervals for the population mean. Simulated confidence intervals for the population variance or population proportion of successes are possible by selecting the appropriate value in the type argument.
Graph depicts simulated confidence intervals. The number of confidence intervals that do not contain the parameter of interest are counted and reported in the commands window.
Alan T. Arnholt
CIsim(100, 30, 100, 10) # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the Mean are constructed # and depicted in the graph. CIsim(100, 30, 100, 10, type="Var") # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the variance are constructed # and depicted in the graph. CIsim(100, 50, .5, type="Pi", conf.level=.90) # Simulates 100 samples of size 50 from # a binomial distribution where the population # proportion of successes is 0.5. From the # 100 simulated samples, 90% confidence # intervals for Pi are constructed # and depicted in the graph.
CIsim(100, 30, 100, 10) # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the Mean are constructed # and depicted in the graph. CIsim(100, 30, 100, 10, type="Var") # Simulates 100 samples of size 30 from # a normal distribution with mean 100 # and standard deviation 10. From the # 100 simulated samples, 95% confidence # intervals for the variance are constructed # and depicted in the graph. CIsim(100, 50, .5, type="Pi", conf.level=.90) # Simulates 100 samples of size 50 from # a binomial distribution where the population # proportion of successes is 0.5. From the # 100 simulated samples, 90% confidence # intervals for Pi are constructed # and depicted in the graph.
Data for Exercise 9.7
Citrus
Citrus
A data frame/tibble with nine observations on two variables
age of children
percent peak bone density
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(percent ~ age, data = Citrus) summary(model) anova(model) rm(model)
model <- lm(percent ~ age, data = Citrus) summary(model) anova(model) rm(model)
Data for Exercise 10.16
Clean
Clean
A data frame/tibble with 45 observations on two variables
residual contaminants
a factor with levels A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(clean ~ agent, col = c("red", "blue", "green"), data = Clean) anova(lm(clean ~ agent, data = Clean))
boxplot(clean ~ agent, col = c("red", "blue", "green"), data = Clean) anova(lm(clean ~ agent, data = Clean))
Data for Exercise 10.24 and 10.25
Coaxial
Coaxial
A data frame/tibble with 45 observations on two variables
signal loss per 1000 feet
factor with three levels of coaxial cable typeA
,
typeB
, and typeC
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(signal ~ cable, data = Coaxial, col = c("red", "green", "yellow")) kruskal.test(signal ~ cable, data = Coaxial)
boxplot(signal ~ cable, data = Coaxial, col = c("red", "green", "yellow")) kruskal.test(signal ~ cable, data = Coaxial)
Data for Exercise 7.55
Coffee
Coffee
A data frame/tibble with nine observations on three variables
workers' productivity scores without a coffee break
workers' productivity scores with a coffee break
with
minus without
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Coffee$differences) qqline(Coffee$differences) shapiro.test(Coffee$differences) t.test(Coffee$with, Coffee$without, paired = TRUE, alternative = "greater") wilcox.test(Coffee$with, Coffee$without, paired = TRUE, alterantive = "greater")
qqnorm(Coffee$differences) qqline(Coffee$differences) shapiro.test(Coffee$differences) t.test(Coffee$with, Coffee$without, paired = TRUE, alternative = "greater") wilcox.test(Coffee$with, Coffee$without, paired = TRUE, alterantive = "greater")
Data for Exercise 5.68
Coins
Coins
A data frame/tibble with 12 observations on one variable
yearly returns on each of 12 possible investments
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Coins$return) qqline(Coins$return)
qqnorm(Coins$return) qqline(Coins$return)
Computes all possible combinations of n
objects taken k
at a
time.
Combinations(n, k)
Combinations(n, k)
n |
a number. |
k |
a number less than or equal to |
Returns a matrix containing the possible combinations of n
objects taken k
at a time.
Combinations(5,2) # The columns in the matrix list the values of the 10 possible # combinations of 5 things taken 2 at a time.
Combinations(5,2) # The columns in the matrix list the values of the 10 possible # combinations of 5 things taken 2 at a time.
Data for Exercises 1.13, and 7.85
Commute
Commute
A data frame/tibble with 39 observations on three variables
a factor with levels Atlanta
,
Baltimore
, Boston
, Buffalo
, Charlotte
,
Chicago
, Cincinnati
, Cleveland
, Columbus
,
Dallas
, Denver
, Detroit
, Hartford
, Houston
,
Indianapolis
, Kansas City
, Los Angeles
, Miami
,
Milwaukee
, Minneapolis
, New Orleans
, New York
,
Norfolk
, Orlando
, Philadelphia
, Phoenix
,
Pittsburgh
, Portland
, Providence
, Rochester
,
Sacramento
, Salt Lake City
, San Antonio
, San Diego
,
San Francisco
, Seattle
, St. Louis
, Tampa
, and
Washington
year
commute times
Federal Highway Administration.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripplot(year ~ time, data = Commute, jitter = TRUE) dotplot(year ~ time, data = Commute) bwplot(year ~ time, data = Commute) stripchart(time ~ year, data = Commute, method = "stack", pch = 1, cex = 2, col = c("red", "blue"), group.names = c("1980", "1990"), main = "", xlab = "minutes") title(main = "Commute Time") boxplot(time ~ year, data = Commute, names=c("1980", "1990"), horizontal = TRUE, las = 1)
stripplot(year ~ time, data = Commute, jitter = TRUE) dotplot(year ~ time, data = Commute) bwplot(year ~ time, data = Commute) stripchart(time ~ year, data = Commute, method = "stack", pch = 1, cex = 2, col = c("red", "blue"), group.names = c("1980", "1990"), main = "", xlab = "minutes") title(main = "Commute Time") boxplot(time ~ year, data = Commute, names=c("1980", "1990"), horizontal = TRUE, las = 1)
Data for Exercise 1.68 and 1.82
Concept
Concept
A data frame/tibble with 28 observations on one variable
Tennessee self concept scores
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(Concept$self) sd(Concept$self) diff(range(Concept$self)) IQR(Concept$self) summary(Concept$self/10) IQR(Concept$self/10) sd(Concept$self/10) diff(range(Concept$self/10))
summary(Concept$self) sd(Concept$self) diff(range(Concept$self)) IQR(Concept$self) summary(Concept$self/10) IQR(Concept$self/10) sd(Concept$self/10) diff(range(Concept$self/10))
Data for Example 7.17
Concrete
Concrete
A data frame/tibble with 20 observations on two variables
comprehensive strength (in pounds per square inch)
factor with levels new
and old
indicating the
method used to construct a concrete block
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
wilcox.test(strength ~ method, data = Concrete, alternative = "greater")
wilcox.test(strength ~ method, data = Concrete, alternative = "greater")
Data for Exercise 7.77
Corn
Corn
A data frame/tibble with 12 observations on three variables
corn yield with new meathod
corn yield with standard method
new
minus standard
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Corn$differences) qqnorm(Corn$differences) qqline(Corn$differences) shapiro.test(Corn$differences) t.test(Corn$differences, alternative = "greater")
boxplot(Corn$differences) qqnorm(Corn$differences) qqline(Corn$differences) shapiro.test(Corn$differences) t.test(Corn$differences, alternative = "greater")
Data for Exercise 2.23
Correlat
Correlat
A data frame/tibble with 13 observations on two variables
a numeric vector
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(y ~ x, data = Correlat) model <- lm(y ~ x, data = Correlat) abline(model) rm(model)
plot(y ~ x, data = Correlat) model <- lm(y ~ x, data = Correlat) abline(model) rm(model)
Data for Exercise 6.96
Counsel
Counsel
A data frame/tibble with 18 observations on one variable
standardized psychology scores after a counseling process
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Counsel$score) t.test(Counsel$score, mu = 70)
EDA(Counsel$score) t.test(Counsel$score, mu = 70)
Data for Exercise 1.34
Cpi
Cpi
A data frame/tibble with 20 observations on two variables
year
consumer price index
Bureau of Labor Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(cpi ~ year, data = Cpi, type = "l", lty = 2, lwd = 2, col = "red") barplot(Cpi$cpi, col = "pink", las = 2, main = "Problem 1.34")
plot(cpi ~ year, data = Cpi, type = "l", lty = 2, lwd = 2, col = "red") barplot(Cpi$cpi, col = "pink", las = 2, main = "Problem 1.34")
Data for Exercises 1.90, 2.32, 3.64, and 5.113
Crime
Crime
A data frame/tibble with 102 observations on three variables
a factor with levels Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, DC
, Delaware
, Florida
,
Georgia
, Hawaii
, Idaho
, Illinois
, Indiana
,
Iowa
, Kansas
, Kentucky
, Louisiana
, Maine
,
Maryland
, Massachusetts
, Michigan
, Minnesota
,
Mississippi
, Missour
, Montana
, Nebraska
,
Nevada
, New Hampshire
, New Jersey
, New Mexico
,
New York
, North Carolina
, North Dakota
, Ohio
,
Oklahoma
, Oregon
, Pennsylvania
, Rhode Island
,
South Carolina
, South Dakota
, Tennessee
, Texas
,
Utah
, Vermont
, Virginia
, Washington
, West
Virginia
, Wisconsin
, and Wyoming
a factor with levels 1983
and 1993
crime rate per 100,000 inhabitants
U.S. Department of Justice, Bureau of Justice Statistics, Sourcebook of Criminal Justice Statistics, 1993.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(rate ~ year, data = Crime, col = "red")
boxplot(rate ~ year, data = Crime, col = "red")
Data for Exercise 7.62
Darwin
Darwin
A data frame/tibble with 15 observations on three variables
number of pot
height of plant (in inches) after a fixed period of time when cross-fertilized
height of plant (in inches) after a fixed period of time when self-fertilized
Darwin, C. (1876) The Effect of Cross- and Self-Fertilization in the Vegetable Kingdom, 2nd edition, London.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
differ <- Darwin$cross - Darwin$self qqnorm(differ) qqline(differ) shapiro.test(differ) wilcox.test(Darwin$cross, Darwin$self, paired = TRUE) rm(differ)
differ <- Darwin$cross - Darwin$self qqnorm(differ) qqline(differ) shapiro.test(differ) wilcox.test(Darwin$cross, Darwin$self, paired = TRUE) rm(differ)
Data for Example 2.22
Dealers
Dealers
A data frame/tibble with 122 observations on two variables
a factor with levels Honda
, Toyota
, Mazda
,
Ford
, Dodge
, and Saturn
a factor with levels Replaces unnecessarily
and Follows manufacturer guidelines
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
xtabs(~type + service, data = Dealers) T1 <- xtabs(~type + service, data = Dealers) T1 addmargins(T1) pt <- prop.table(T1, margin = 1) pt barplot(t(pt), col = c("red", "skyblue"), legend = colnames(T1)) rm(T1, pt)
xtabs(~type + service, data = Dealers) T1 <- xtabs(~type + service, data = Dealers) T1 addmargins(T1) pt <- prop.table(T1, margin = 1) pt barplot(t(pt), col = c("red", "skyblue"), legend = colnames(T1)) rm(T1, pt)
Data for Exercise 1.27
Defectiv
Defectiv
A data frame/tibble with 20 observations on one variable
number of defective items produced by the employees in a small business firm
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~ number, data = Defectiv) T1 barplot(T1, col = "pink", ylab = "Frequency", xlab = "Defective Items Produced by Employees", main = "Problem 1.27") rm(T1)
T1 <- xtabs(~ number, data = Defectiv) T1 barplot(T1, col = "pink", ylab = "Frequency", xlab = "Defective Items Produced by Employees", main = "Problem 1.27") rm(T1)
Data for Exercise 2.75
Degree
Degree
A data frame/tibble with 1064 observations on two variables
a factor with levels Health
,
Education
, Foreign Language
, Psychology
, Fine Arts
,
Life Sciences
, Business
, Social Science
, Physical Sciences
,
Engineering
, and All Fields
a factor with levels 1970
and 1990
U.S. Department of Health and Human Services, National Center for Education Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~field + awarded, data = Degree) T1 barplot(t(T1), beside = TRUE, col = c("red", "skyblue"), legend = colnames(T1)) rm(T1)
T1 <- xtabs(~field + awarded, data = Degree) T1 barplot(t(T1), beside = TRUE, col = c("red", "skyblue"), legend = colnames(T1)) rm(T1)
Data for Exercise 10.55
Delay
Delay
A data frame/tibble with 80 observations on two variables
the delay time (in minutes) for 80 randomly selected flights
a factor with levels A
, B
, C
, and D
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(delay ~ carrier, data = Delay, main = "Exercise 10.55", ylab = "minutes", col = "pink") kruskal.test(delay ~carrier, data = Delay)
boxplot(delay ~ carrier, data = Delay, main = "Exercise 10.55", ylab = "minutes", col = "pink") kruskal.test(delay ~carrier, data = Delay)
Data for Exercise 1.26
Depend
Depend
A data frame/tibble with 50 observations on one variable
number of dependent children in a family
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~ number, data = Depend) T1 barplot(T1, col = "lightblue", main = "Problem 1.26", xlab = "Number of Dependent Children", ylab = "Frequency") rm(T1)
T1 <- xtabs(~ number, data = Depend) T1 barplot(T1, col = "lightblue", main = "Problem 1.26", xlab = "Number of Dependent Children", ylab = "Frequency") rm(T1)
Data for Exercise 5.21
Detroit
Detroit
A data frame/tibble with 40 observations on one variable
the educational level (in years) of a sample of 40 auto workers in a plant in Detroit
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Detroit$educ)
EDA(Detroit$educ)
Data used for Exercise 8.50
Develop
Develop
A data frame/tibble with 5656 observations on two variables
a factor with levels African American
, American Indian
,
Asian
, Latino
, and White
a factor with levels Two-year
and Four-year
Research in Development Education (1994), V. 11, 2.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~race + college, data = Develop) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~race + college, data = Develop) T1 chisq.test(T1) rm(T1)
Data for Exercise 6.47
Devmath
Devmath
A data frame/tibble with 40 observations on one variable
first exam score
Data provided by Dr. Anita Kitchens.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Devmath$score) t.test(Devmath$score, mu = 80, alternative = "less")
EDA(Devmath$score) t.test(Devmath$score, mu = 80, alternative = "less")
Data for Exercise 3.109
Dice
Dice
A data frame/tibble with 11 observations on two variables
possible outcomes for the sum of two dice
probability for outcome x
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
roll1 <- sample(1:6, 20000, replace = TRUE) roll2 <- sample(1:6, 20000, replace = TRUE) outcome <- roll1 + roll2 T1 <- table(outcome)/length(outcome) remove(roll1, roll2, outcome) T1 round(t(Dice), 5) rm(roll1, roll2, T1)
roll1 <- sample(1:6, 20000, replace = TRUE) roll2 <- sample(1:6, 20000, replace = TRUE) outcome <- roll1 + roll2 T1 <- table(outcome)/length(outcome) remove(roll1, roll2, outcome) T1 round(t(Dice), 5) rm(roll1, roll2, T1)
Data for Exercise 2.8
Diesel
Diesel
A data frame/tibble with 650 observations on three variables
date when price was recorded
price per gallon (in dollars)
a factor with levels California
, CentralAtlantic
,
Coast
, EastCoast
, Gulf
, LowerAtlantic
, NatAvg
,
NorthEast
, Rocky
, and WesternMountain
Energy Information Administration, National Enerfy Information Center: 1000 Independence Ave., SW, Washington, D.C., 20585.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(las = 2) boxplot(pricepergallon ~ location, data = Diesel) boxplot(pricepergallon ~ location, data = droplevels(Diesel[Diesel$location == "EastCoast" | Diesel$location == "Gulf" | Diesel$location == "NatAvg" | Diesel$location == "Rocky" | Diesel$location == "California", ]), col = "pink", main = "Exercise 2.8") par(las = 1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Diesel, aes(x = date, y = pricepergallon, color = location)) + geom_point() + geom_smooth(se = FALSE) + theme_bw() + labs(y = "Price per Gallon (in dollars)") ## End(Not run)
par(las = 2) boxplot(pricepergallon ~ location, data = Diesel) boxplot(pricepergallon ~ location, data = droplevels(Diesel[Diesel$location == "EastCoast" | Diesel$location == "Gulf" | Diesel$location == "NatAvg" | Diesel$location == "Rocky" | Diesel$location == "California", ]), col = "pink", main = "Exercise 2.8") par(las = 1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Diesel, aes(x = date, y = pricepergallon, color = location)) + geom_point() + geom_smooth(se = FALSE) + theme_bw() + labs(y = "Price per Gallon (in dollars)") ## End(Not run)
Data for Exercises 1.14 and 1.37
Diplomat
Diplomat
A data frame/tibble with 10 observations on three variables
a factor with levels Brazil
,
Bulgaria
, Egypt
, Indonesia
, Israel
, Nigeria
,
Russia
, S. Korea
, Ukraine
, and Venezuela
total number of tickets
number of tickets per vehicle per month
Time, November 8, 1993. Figures are from January to June 1993.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(las = 2, mfrow = c(2, 2)) stripchart(number ~ country, data = Diplomat, pch = 19, col= "red", vertical = TRUE) stripchart(rate ~ country, data = Diplomat, pch = 19, col= "blue", vertical = TRUE) with(data = Diplomat, barplot(number, names.arg = country, col = "red")) with(data = Diplomat, barplot(rate, names.arg = country, col = "blue")) par(las = 1, mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, number), y = number)) + geom_bar(stat = "identity", fill = "pink", color = "black") + theme_bw() + labs(x = "", y = "Total Number of Tickets") ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, rate), y = rate)) + geom_bar(stat = "identity", fill = "pink", color = "black") + theme_bw() + labs(x = "", y = "Tickets per vehicle per month") ## End(Not run)
par(las = 2, mfrow = c(2, 2)) stripchart(number ~ country, data = Diplomat, pch = 19, col= "red", vertical = TRUE) stripchart(rate ~ country, data = Diplomat, pch = 19, col= "blue", vertical = TRUE) with(data = Diplomat, barplot(number, names.arg = country, col = "red")) with(data = Diplomat, barplot(rate, names.arg = country, col = "blue")) par(las = 1, mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, number), y = number)) + geom_bar(stat = "identity", fill = "pink", color = "black") + theme_bw() + labs(x = "", y = "Total Number of Tickets") ggplot2::ggplot(data = Diplomat, aes(x = reorder(country, rate), y = rate)) + geom_bar(stat = "identity", fill = "pink", color = "black") + theme_bw() + labs(x = "", y = "Tickets per vehicle per month") ## End(Not run)
Data for Exercise 1.127
Disposal
Disposal
A data frame/tibble with 29 observations on one variable
pounds of toxic waste per $1000 of shipments of its products
Bureau of the Census, Reducing Toxins, Statistical Brief SB/95-3, February 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Disposal$pounds) fivenum(Disposal$pounds) EDA(Disposal$pounds)
stem(Disposal$pounds) fivenum(Disposal$pounds) EDA(Disposal$pounds)
Data for Exercise 2.88
Dogs
Dogs
A data frame/tibble with 20 observations on three variables
a factor with levels Beagle
,
Boxer
, Chihuahua
, Chow
, Dachshund
,
Dalmatian
, Doberman
, Huskie
, Labrador
,
Pomeranian
, Poodle
, Retriever
, Rotweiler
,
Schnauzer
, Shepherd
, Shetland
, ShihTzu
,
Spaniel
, Springer
, and Yorkshire
numeric ranking
a factor with levels 1992
, 1993
, 1997
,
and 1998
The World Almanac and Book of Facts, 2000.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
cor(Dogs$ranking[Dogs$year == "1992"], Dogs$ranking[Dogs$year == "1993"]) cor(Dogs$ranking[Dogs$year == "1997"], Dogs$ranking[Dogs$year == "1998"]) ## Not run: library(ggplot2) ggplot2::ggplot(data = Dogs, aes(x = reorder(breed, ranking), y = ranking)) + geom_bar(stat = "identity") + facet_grid(year ~. ) + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) ## End(Not run)
cor(Dogs$ranking[Dogs$year == "1992"], Dogs$ranking[Dogs$year == "1993"]) cor(Dogs$ranking[Dogs$year == "1997"], Dogs$ranking[Dogs$year == "1998"]) ## Not run: library(ggplot2) ggplot2::ggplot(data = Dogs, aes(x = reorder(breed, ranking), y = ranking)) + geom_bar(stat = "identity") + facet_grid(year ~. ) + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) ## End(Not run)
Data for Exercise 1.20
Domestic
Domestic
A data frame/tibble with five observations on two variables
a factor with levels 12-19
, 20-24
,
25-34
, 35-49
, and 50-64
rate of domestic violence per 1000 women
U.S. Department of Justice.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
barplot(Domestic$rate, names.arg = Domestic$age) ## Not run: library(ggplot2) ggplot2::ggplot(data = Domestic, aes(x = age, y = rate)) + geom_bar(stat = "identity", fill = "purple", color = "black") + labs(x = "", y = "Domestic violence per 1000 women") + theme_bw() ## End(Not run)
barplot(Domestic$rate, names.arg = Domestic$age) ## Not run: library(ggplot2) ggplot2::ggplot(data = Domestic, aes(x = age, y = rate)) + geom_bar(stat = "identity", fill = "purple", color = "black") + labs(x = "", y = "Domestic violence per 1000 women") + theme_bw() ## End(Not run)
Data for Exercises 5.14 and 7.49
Dopamine
Dopamine
A data frame/tibble with 25 observations on two variables
dopamine b-hydroxylase activity (units are nmol/(ml)(h)/(mg) of protein)
a factor with levels nonpsychotic
and psychotic
D.E. Sternberg, D.P. Van Kammen, and W.E. Bunney, "Schizophrenia: Dopamine b-Hydroxylase Activity and Treatment Respsonse," Science, 216 (1982), 1423 - 1425.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(dbh ~ group, data = Dopamine, col = "orange") t.test(dbh ~ group, data = Dopamine, var.equal = TRUE)
boxplot(dbh ~ group, data = Dopamine, col = "orange") t.test(dbh ~ group, data = Dopamine, var.equal = TRUE)
Data for Exercise 1.35
Dowjones
Dowjones
A data frame/tibble with 105 observations on three variables
date
Dow Jones closing price
percent change from previous year
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(close ~ year, data = Dowjones, type = "l", main = "Exercise 1.35") ## Not run: library(ggplot2) ggplot2::ggplot(data = Dowjones, aes(x = year, y = close)) + geom_point(size = 0.5) + geom_line(color = "red") + theme_bw() + labs(y = "Dow Jones Closing Price") ## End(Not run)
plot(close ~ year, data = Dowjones, type = "l", main = "Exercise 1.35") ## Not run: library(ggplot2) ggplot2::ggplot(data = Dowjones, aes(x = year, y = close)) + geom_point(size = 0.5) + geom_line(color = "red") + theme_bw() + labs(y = "Dow Jones Closing Price") ## End(Not run)
Data for Exercise 8.53
Drink
Drink
A data frame/tibble with 472 observations on two variables
a factor with levels ok
,
tolerated
, and immoral
a factor with levels for
, against
, and undecided
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~drinking + referendum, data = Drink) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~drinking + referendum, data = Drink) T1 chisq.test(T1) rm(T1)
Data for Example 7.15
Drug
Drug
A data frame/tibble with 28 observations on two variables
number of trials to master a task
a factor with levels control
and experimental
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(trials ~ group, data = Drug, main = "Example 7.15", col = c("yellow", "red")) wilcox.test(trials ~ group, data = Drug) t.test(rank(trials) ~ group, data = Drug, var.equal = TRUE)
boxplot(trials ~ group, data = Drug, main = "Example 7.15", col = c("yellow", "red")) wilcox.test(trials ~ group, data = Drug) t.test(rank(trials) ~ group, data = Drug, var.equal = TRUE)
Data for Exercise 2.90
Dyslexia
Dyslexia
A data frame/tibble with eight observations on seven variables
number of words read per minute
age of participant
a factor with levels female
and
male
a factor with levels left
and right
weight of participant (in pounds)
height of participant (in inches)
number of children in family
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(height ~ weight, data = Dyslexia) plot(words ~ factor(handed), data = Dyslexia, xlab = "hand", col = "lightblue")
plot(height ~ weight, data = Dyslexia) plot(words ~ factor(handed), data = Dyslexia, xlab = "hand", col = "lightblue")
Data for Exercise 6.97
Earthqk
Earthqk
A data frame/tibble with 100 observations on two variables
year seimic activity recorded
annual incidence of sever earthquakes
Quenoille, M.H. (1952), Associated Measurements, Butterworth, London. p 279.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Earthqk$severity) t.test(Earthqk$severity, mu = 100, alternative = "greater")
EDA(Earthqk$severity) t.test(Earthqk$severity, mu = 100, alternative = "greater")
Function that produces a histogram, density plot, boxplot, and Q-Q plot.
EDA(x, trim = 0.05)
EDA(x, trim = 0.05)
x |
numeric vector. |
trim |
fraction (between 0 and 0.5, inclusive) of values to be trimmed
from each end of the ordered data. If |
Will not return command window information on data sets containing more than 5000 observations. It will however still produce graphical output for data sets containing more than 5000 observations.
Function returns various measures of center and location. The values returned for the Quartiles are based on the definitions provided in BSDA. The boxplot is based on the Quartiles returned in the commands window.
Requires package e1071.
Alan T. Arnholt
EDA(rnorm(100)) # Produces four graphs for the 100 randomly # generated standard normal variates.
EDA(rnorm(100)) # Produces four graphs for the 100 randomly # generated standard normal variates.
Data for Exercise 2.41
Educat
Educat
A data frame/tibble with 51 observations on three variables
a factor with levels Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, DC
, Delaware
, Florida
,
Georgia
, Hawaii
, Idaho
, Illinois
, Indiana
,
Iowa
, Kansas
, Kentucky
, Louisiana
, Maine
,
Maryland
, Massachusetts
, Michigan
, Minnesota
,
Mississippi
, Missour
, Montana
, Nebraska
,
Nevada
, New Hampshire
, New Jersey
, New Mexico
,
New York
, North Carolina
, North Dakota
, Ohio
,
Oklahoma
, Oregon
, Pennsylvania
, Rhode Island
,
South Carolina
, South Dakota
, Tennessee
, Texas
,
Utah
, Vermont
, Virginia
, Washington
, West
Virginia
, Wisconsin
, and Wyoming
percent of the population without a high school degree
violent crimes per 100,000 population
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(crime ~ nodegree, data = Educat, xlab = "Percent of population without high school degree", ylab = "Violent Crime Rate per 100,000")
plot(crime ~ nodegree, data = Educat, xlab = "Percent of population without high school degree", ylab = "Violent Crime Rate per 100,000")
Data for Exercise 9.22
Eggs
Eggs
A data frame/tibble with 12 observations on two variables
amount of feed supplement
number of eggs per day for 100 chickens
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(eggs ~ feed, data = Eggs) model <- lm(eggs ~ feed, data = Eggs) abline(model, col = "red") summary(model) rm(model)
plot(eggs ~ feed, data = Eggs) model <- lm(eggs ~ feed, data = Eggs) abline(model, col = "red") summary(model) rm(model)
Data for Exercise 1.92 and 2.61
Elderly
Elderly
A data frame/tibble with 51 observations on three variables
a factor with levels Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, District of
Colunbia
, Florida
, Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
, Kansas
, Kentucky
,
Louisiana
, Maine
, Maryland
, Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
, Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
,
Washington
, West Virginia
, Wisconsin
, and Wyoming
percent of the population over the age of 65 in 1985
percent of the population over the age of 65 in 1998
U.S. Census Bureau Internet site, February 2000.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
with(data = Elderly, stripchart(x = list(percent1998, percent1985), method = "stack", pch = 19, col = c("red","blue"), group.names = c("1998", "1985")) ) with(data = Elderly, cor(percent1998, percent1985)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Elderly, aes(x = percent1985, y = percent1998)) + geom_point() + theme_bw() ## End(Not run)
with(data = Elderly, stripchart(x = list(percent1998, percent1985), method = "stack", pch = 19, col = c("red","blue"), group.names = c("1998", "1985")) ) with(data = Elderly, cor(percent1998, percent1985)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Elderly, aes(x = percent1985, y = percent1998)) + geom_point() + theme_bw() ## End(Not run)
Data for Exercises 2.5, 2.24, and 2.55
Energy
Energy
A data frame/tibble with 12 observations on two variables
size of home (in square feet)
killowatt-hours per month
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(kilowatt ~ size, data = Energy) with(data = Energy, cor(size, kilowatt)) model <- lm(kilowatt ~ size, data = Energy) plot(Energy$size, resid(model), xlab = "size")
plot(kilowatt ~ size, data = Energy) with(data = Energy, cor(size, kilowatt)) model <- lm(kilowatt ~ size, data = Energy) plot(Energy$size, resid(model), xlab = "size")
Data for Example 10.7
Engineer
Engineer
A data frame/tibble with 51 observations on two variables
salary (in $1000) 10 years after graduation
a factor with levels A
, B
, and C
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(salary ~ university, data = Engineer, main = "Example 10.7", col = "yellow") kruskal.test(salary ~ university, data = Engineer) anova(lm(salary ~ university, data = Engineer)) anova(lm(rank(salary) ~ university, data = Engineer))
boxplot(salary ~ university, data = Engineer, main = "Example 10.7", col = "yellow") kruskal.test(salary ~ university, data = Engineer) anova(lm(salary ~ university, data = Engineer)) anova(lm(rank(salary) ~ university, data = Engineer))
Data for Example 1.8
Entrance
Entrance
A data frame/tibble with 24 observations on one variable
college entrance exam score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Entrance$score) stem(Entrance$score, scale = 2)
stem(Entrance$score) stem(Entrance$score, scale = 2)
Data for Exercise 1.65
Epaminicompact
Epaminicompact
A data frame/tibble with 22 observations on ten variables
a character variable with value MINICOMPACT CARS
a character variable with values AUDI
,
BMW
, JAGUAR
, MERCEDES-BENZ
, MITSUBISHI
, and
PORSCHE
a character variable with values 325CI
CONVERTIBLE
, 330CI CONVERTIBLE
, 911 CARRERA 2/4
, 911
TURBO
, CLK320 (CABRIOLET)
, CLK430 (CABRIOLET)
, ECLIPSE
SPYDER
, JAGUAR XK8 CONVERTIBLE
, JAGUAR XKR CONVERTIBLE
, M3
CONVERTIBLE
, TT COUPE
, and TT COUPE QUATTRO
engine displacement (in liters)
number of cylinders
a factor with levels Auto(L5)
, Auto(S4)
, Auto(S5)
,
Manual(M5)
, and Manual(M6)
a factor with levels 4
(four wheel drive), F
(front wheel drive),
and R
(rear wheel drive)
city mpg
highway mpg
combined city and highway mpg
EPA data.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(Epaminicompact$cty) plot(hwy ~ cty, data = Epaminicompact)
summary(Epaminicompact$cty) plot(hwy ~ cty, data = Epaminicompact)
Data for Exercise 5.8
Epatwoseater
Epatwoseater
A data frame/tibble with 36 observations on ten variables
a character variable with value TWO SEATERS
a character variable with values ACURA
, AUDI
,
BMW
, CHEVROLET
, DODGE
, FERRARI
, HONDA
,
LAMBORGHINI
, MAZDA
, MERCEDES-BENZ
, PLYMOUTH
,
PORSCHE
, and TOYOTA
a character variable with values
BOXSTER
, BOXSTER S
, CORVETTE
, DB132/144
DIABLO
, FERRARI 360 MODENA/SPIDER
, FERRARI 550
MARANELLO/BARCHETTA
, INSIGHT
, MR2
,MX-5 MIATA
, NSX
,
PROWLER
, S2000
, SL500
, SL600
, SLK230
KOMPRESSOR
, SLK320
, TT ROADSTER
, TT ROADSTER QUATTRO
,
VIPER CONVERTIBLE
, VIPER COUPE
, Z3 COUPE
, Z3
ROADSTER
, and Z8
engine displacement (in liters)
number of cylinders
a factor with levels Auto(L4)
, Auto(L5)
, Auto(S4)
,
Auto(S5)
, Auto(S6)
, Manual(M5)
, and Manual(M6)
a factor with levels 4
(four wheel drive) F
(front wheel drive) R
(rear wheel drive)
city mpg
highway mpg
combined city and highway mpg
@source Environmental Protection Agency.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(Epatwoseater$cty) plot(hwy ~ cty, data = Epatwoseater) boxplot(cty ~ drv, data = Epatwoseater, col = "lightgreen")
summary(Epatwoseater$cty) plot(hwy ~ cty, data = Epatwoseater) boxplot(cty ~ drv, data = Epatwoseater, col = "lightgreen")
Data for Exercise 1.104
Executiv
Executiv
A data frame/tibble with 25 observations on one variable
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Executiv$age, xlab = "Age of banking executives", breaks = 5, main = "", col = "gray")
hist(Executiv$age, xlab = "Age of banking executives", breaks = 5, main = "", col = "gray")
Data for Exercise 1.44
Exercise
Exercise
A data frame/tibble with 30 observations on one variable
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Exercise$loss)
stem(Exercise$loss)
Data for Example 7.21
Fabric
Fabric
A data frame/tibble with 20 observations on three variables
a numeric vector
a character variable with values with
and without
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
## Not run: library(tidyr) tidyr::spread(Fabric, softner, softness) -> FabricWide wilcox.test(Pair(with, without)~1, alternative = "greater", data = FabricWide) T7 <- tidyr::spread(Fabric, softner, softness) %>% mutate(di = with - without, adi = abs(di), rk = rank(adi), srk = sign(di)*rk) T7 t.test(T7$srk, alternative = "greater") ## End(Not run)
## Not run: library(tidyr) tidyr::spread(Fabric, softner, softness) -> FabricWide wilcox.test(Pair(with, without)~1, alternative = "greater", data = FabricWide) T7 <- tidyr::spread(Fabric, softner, softness) %>% mutate(di = with - without, adi = abs(di), rk = rank(adi), srk = sign(di)*rk) T7 t.test(T7$srk, alternative = "greater") ## End(Not run)
Data for Exercise 5.12 and 5.111
Faithful
Faithful
A data frame/tibble with 299 observations on two variables
a numeric vector
a factor with levels 1
and 2
A. Azzalini and A. Bowman, "A Look at Some Data on the Old Faithful Geyser," Journal of the Royal Statistical Society, Series C, 39 (1990), 357-366.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(time ~ eruption, data = Faithful) hist(Faithful$time, xlab = "wait time", main = "", freq = FALSE) lines(density(Faithful$time)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Faithful, aes(x = time, y = ..density..)) + geom_histogram(binwidth = 5, fill = "pink", col = "black") + geom_density() + theme_bw() + labs(x = "wait time") ## End(Not run)
t.test(time ~ eruption, data = Faithful) hist(Faithful$time, xlab = "wait time", main = "", freq = FALSE) lines(density(Faithful$time)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Faithful, aes(x = time, y = ..density..)) + geom_histogram(binwidth = 5, fill = "pink", col = "black") + geom_density() + theme_bw() + labs(x = "wait time") ## End(Not run)
Data for Exercise 2.89
Family
Family
A data frame/tibble with 20 observations on two variables
number in family
cost per person (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(cost ~ number, data = Family) abline(lm(cost ~ number, data = Family), col = "red") cor(Family$cost, Family$number) ## Not run: library(ggplot2) ggplot2::ggplot(data = Family, aes(x = number, y = cost)) + geom_point() + geom_smooth(method = "lm") + theme_bw() ## End(Not run)
plot(cost ~ number, data = Family) abline(lm(cost ~ number, data = Family), col = "red") cor(Family$cost, Family$number) ## Not run: library(ggplot2) ggplot2::ggplot(data = Family, aes(x = number, y = cost)) + geom_point() + geom_smooth(method = "lm") + theme_bw() ## End(Not run)
Data for Exercise 8.23
Ferraro1
Ferraro1
A data frame/tibble with 1000 observations on two variables
a factor with levels Men
and
Women
a character vector of 1984 president and vice-president candidates
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~gender + candidate, data = Ferraro1) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~gender + candidate, data = Ferraro1) T1 chisq.test(T1) rm(T1)
Data for Exercise 8.23
Ferraro2
Ferraro2
A data frame/tibble with 1000 observations on two variables
a factor with levels Men
and
Women
a character vector of 1984 president and vice-president candidates
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~gender + candidate, data = Ferraro2) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~gender + candidate, data = Ferraro2) T1 chisq.test(T1) rm(T1)
Data for Exercise 1.125
Fertility
Fertility
A data frame/tibble with 51 observations on two variables
a character variable with values Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, District of
Colunbia
, Florida
, Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
, Kansas
, Kentucky
,
Louisiana
, Maine
, Maryland
,Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
, Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
,
Washington
, West Virginia
, Wisconsin
, and Wyoming
fertility rate (expected number of births during childbearing years)
Population Reference Bureau.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Fertility$rate) fivenum(Fertility$rate) EDA(Fertility$rate)
stem(Fertility$rate) fivenum(Fertility$rate) EDA(Fertility$rate)
Data for Exercise 5.11
Firstchi
Firstchi
A data frame/tibble with 87 observations on one variable
age of woman at birth of her first child
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Firstchi$age)
EDA(Firstchi$age)
Data for Exercises 5.83, 5.119, and 7.29
Fish
Fish
A data frame/tibble with 1534 observations on two variables
a character variable with values smallmesh
and largemesh
length of the fish measured in centimeters
R. Millar, “Estimating the Size - Selectivity of Fishing Gear by Conditioning on the Total Catch,” Journal of the American Statistical Association, 87 (1992), 962 - 968.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
tapply(Fish$length, Fish$codend, median, na.rm = TRUE) SIGN.test(Fish$length[Fish$codend == "smallmesh"], conf.level = 0.99) ## Not run: dplyr::group_by(Fish, codend) %>% summarize(MEDIAN = median(length, na.rm = TRUE)) ## End(Not run)
tapply(Fish$length, Fish$codend, median, na.rm = TRUE) SIGN.test(Fish$length[Fish$codend == "smallmesh"], conf.level = 0.99) ## Not run: dplyr::group_by(Fish, codend) %>% summarize(MEDIAN = median(length, na.rm = TRUE)) ## End(Not run)
Data for Exercise 7.71
Fitness
Fitness
A data frame/tibble with 18 observations on the three variables
a character variable indicating subject number
a character variable with values After
and Before
a numeric vector recording the number of sit-ups performed in one minute
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
## Not run: tidyr::spread(Fitness, test, number) -> FitnessWide t.test(Pair(After, Before)~1, alternative = "greater", data = FitnessWide) Wide <- tidyr::spread(Fitness, test, number) %>% mutate(diff = After - Before) Wide qqnorm(Wide$diff) qqline(Wide$diff) t.test(Wide$diff, alternative = "greater") ## End(Not run)
## Not run: tidyr::spread(Fitness, test, number) -> FitnessWide t.test(Pair(After, Before)~1, alternative = "greater", data = FitnessWide) Wide <- tidyr::spread(Fitness, test, number) %>% mutate(diff = After - Before) Wide qqnorm(Wide$diff) qqline(Wide$diff) t.test(Wide$diff, alternative = "greater") ## End(Not run)
Data for Statistical Insight Chapter 2
Florida2000
Florida2000
A data frame/tibble with 67 observations on 12 variables
a character variable with values ALACHUA
,
BAKER
, BAY
, BRADFORD
, BREVARD
, BROWARD
,
CALHOUN
, CHARLOTTE
, CITRUS
, CLAY
, COLLIER
,
COLUMBIA
, DADE
, DE SOTO
, DIXIE
, DUVAL
,
ESCAMBIA
, FLAGLER
, FRANKLIN
, GADSDEN
,
GILCHRIST
, GLADES
, GULF
, HAMILTON
, HARDEE
,
HENDRY
, HERNANDO
, HIGHLANDS
, HILLSBOROUGH
,
HOLMES
, INDIAN RIVER
, JACKSON
, JEFFERSON
,
LAFAYETTE
, LAKE
, LEE
, LEON
, LEVY
,
LIBERTY
, MADISON
, MANATEE
, MARION
, MARTIN
,
MONROE
, NASSAU
, OKALOOSA
, OKEECHOBEE
, ORANGE
,
OSCEOLA
, PALM BEACH
, PASCO
, PINELLAS
, POLK
,
PUTNAM
, SANTA ROSA
, SARASOTA
, SEMINOLE
,
ST. JOHNS
, ST. LUCIE
, SUMTER
, SUWANNEE
, TAYLOR
,
UNION
, VOLUSIA
, WAKULLA
, WALTON
, and WASHINGTON
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
number of votes
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(buchanan ~ total, data = Florida2000, xlab = "Total votes cast (in thousands)", ylab = "Votes for Buchanan")
plot(buchanan ~ total, data = Florida2000, xlab = "Total votes cast (in thousands)", ylab = "Votes for Buchanan")
Data for Exercise 5.76
Fluid
Fluid
A data frame/tibble with 76 observations on two variables
a character variable showing kilowats
breakdown time (in minutes)
E. Soofi, N. Ebrahimi, and M. Habibullah, 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
DF1 <- Fluid[Fluid$kilovolts == "34kV", ] DF1 # OR DF2 <- subset(Fluid, subset = kilovolts == "34kV") DF2 stem(DF2$time) SIGN.test(DF2$time) ## Not run: library(dplyr) DF3 <- dplyr::filter(Fluid, kilovolts == "34kV") DF3 ## End(Not run)
DF1 <- Fluid[Fluid$kilovolts == "34kV", ] DF1 # OR DF2 <- subset(Fluid, subset = kilovolts == "34kV") DF2 stem(DF2$time) SIGN.test(DF2$time) ## Not run: library(dplyr) DF3 <- dplyr::filter(Fluid, kilovolts == "34kV") DF3 ## End(Not run)
Data for Exercise 5.106
Food
Food
A data frame/tibble with 40 observations on one variable
a numeric vector recording annual food expenditure (in dollars) in the state of Ohio.
Bureau of Labor Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Food$expenditure)
EDA(Food$expenditure)
Data for Exercises 1.56, 1.75, 3.69, and 5.60
Framingh
Framingh
A data frame/tibble with 62 observations on one variable
a numeric vector with cholesterol values
R. D'Agostino, et al., (1990) "A Suggestion for Using Powerful and Informative Tests for Normality," The American Statistician, 44 316-321.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Framingh$cholest) boxplot(Framingh$cholest, horizontal = TRUE) hist(Framingh$cholest, freq = FALSE) lines(density(Framingh$cholest)) mean(Framingh$cholest > 200 & Framingh$cholest < 240) ## Not run: library(ggplot2) ggplot2::ggplot(data = Framingh, aes(x = factor(1), y = cholest)) + geom_boxplot() + # boxplot labs(x = "") + # no x label theme_bw() + # black and white theme geom_jitter(width = 0.2) + # jitter points coord_flip() # Create horizontal plot ggplot2::ggplot(data = Framingh, aes(x = cholest, y = ..density..)) + geom_histogram(fill = "pink", binwidth = 15, color = "black") + geom_density() + theme_bw() ## End(Not run)
stem(Framingh$cholest) boxplot(Framingh$cholest, horizontal = TRUE) hist(Framingh$cholest, freq = FALSE) lines(density(Framingh$cholest)) mean(Framingh$cholest > 200 & Framingh$cholest < 240) ## Not run: library(ggplot2) ggplot2::ggplot(data = Framingh, aes(x = factor(1), y = cholest)) + geom_boxplot() + # boxplot labs(x = "") + # no x label theme_bw() + # black and white theme geom_jitter(width = 0.2) + # jitter points coord_flip() # Create horizontal plot ggplot2::ggplot(data = Framingh, aes(x = cholest, y = ..density..)) + geom_histogram(fill = "pink", binwidth = 15, color = "black") + geom_density() + theme_bw() ## End(Not run)
Data for Exercise 6.53
Freshman
Freshman
A data frame/tibble with 30 observations on one variable
a numeric vector of ages
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Freshman$age, md = 19)
SIGN.test(Freshman$age, md = 19)
Data for Exercise 8.54
Funeral
Funeral
A data frame/tibble with 400 observations on two variables
a factor with levels Central
,
East,
South
, and West
a factor with levels less than expected
, about what expected
,
and more than expected
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~region + cost, data = Funeral) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~region + cost, data = Funeral) T1 chisq.test(T1) rm(T1)
Data for Example 5.2
Galaxie
Galaxie
A data frame/tibble with 82 observations on one variable
velocity measured in kilometers per second
K. Roeder, "Density Estimation with Confidence Sets Explained by Superclusters and Voids in the Galaxies," Journal of the American Statistical Association, 85 (1990), 617-624.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Galaxie$velocity)
EDA(Galaxie$velocity)
Data for Exercise 2.76
Gallup
Gallup
A data frame/tibble with 1,200 observations on two variables
a factor with levels National
, Gender: Male
Gender: Female
, Education: College
, Eduction: High School
,
Education: Grade School
, Age: 18-24
, Age: 25-29
, Age: 30-49
,
Age: 50-older
, Religion: Protestant
, and Religion: Catholic
a factor with levels Criminal
, Not Criminal
, and No Opinion
George H. Gallup The Gallup Opinion Index Report No. 179 (Princeton, NJ: The Gallup Poll, July 1980), p. 15.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~demographics + opinion, data = Gallup) T1 t(T1[c(2, 3), ]) barplot(t(T1[c(2, 3), ])) barplot(t(T1[c(2, 3), ]), beside = TRUE) ## Not run: library(dplyr) library(ggplot2) dplyr::filter(Gallup, demographics == "Gender: Male" | demographics == "Gender: Female") %>% ggplot2::ggplot(aes(x = demographics, fill = opinion)) + geom_bar() + theme_bw() + labs(y = "Fraction") ## End(Not run)
T1 <- xtabs(~demographics + opinion, data = Gallup) T1 t(T1[c(2, 3), ]) barplot(t(T1[c(2, 3), ])) barplot(t(T1[c(2, 3), ]), beside = TRUE) ## Not run: library(dplyr) library(ggplot2) dplyr::filter(Gallup, demographics == "Gender: Male" | demographics == "Gender: Female") %>% ggplot2::ggplot(aes(x = demographics, fill = opinion)) + geom_bar() + theme_bw() + labs(y = "Fraction") ## End(Not run)
Data for Exercise 1.45
Gasoline
Gasoline
A data frame/tibble with 25 observations on one variable
price for one gallon of gasoline
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Gasoline$price) ## Not run: library(ggplot2) ggplot2::ggplot(data = Gasoline, aes(x = factor(1), y = price)) + geom_violin() + geom_jitter() + theme_bw() ## End(Not run)
stem(Gasoline$price) ## Not run: library(ggplot2) ggplot2::ggplot(data = Gasoline, aes(x = factor(1), y = price)) + geom_violin() + geom_jitter() + theme_bw() ## End(Not run)
Data for Exercise 7.60
German
German
A data frame/tibble with ten observations on three variables
a character variable indicating student number
a character variable with values Before
and After
to indicate when the student received experimental instruction in German
the number of errors in copying a German passage
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
## Not run: tidyr::spread(German, when, errors) -> GermanWide t.test(Pair(After, Before) ~ 1, data = GermanWide) wilcox.test(Pair(After, Before) ~ 1, data = GermanWide) T8 <- tidyr::spread(German, when, errors) %>% mutate(di = After - Before, adi = abs(di), rk = rank(adi), srk = sign(di)*rk) T8 qqnorm(T8$di) qqline(T8$di) t.test(T8$srk) ## End(Not run)
## Not run: tidyr::spread(German, when, errors) -> GermanWide t.test(Pair(After, Before) ~ 1, data = GermanWide) wilcox.test(Pair(After, Before) ~ 1, data = GermanWide) T8 <- tidyr::spread(German, when, errors) %>% mutate(di = After - Before, adi = abs(di), rk = rank(adi), srk = sign(di)*rk) T8 qqnorm(T8$di) qqline(T8$di) t.test(T8$srk) ## End(Not run)
Data for Exercise 5.24
Golf
Golf
A data frame/tibble with 20 observations on one variable
distance a golf ball is driven in yards
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Golf$yards) qqnorm(Golf$yards) qqline(Golf$yards) ## Not run: library(ggplot2) ggplot2::ggplot(data = Golf, aes(sample = yards)) + geom_qq() + theme_bw() ## End(Not run)
stem(Golf$yards) qqnorm(Golf$yards) qqline(Golf$yards) ## Not run: library(ggplot2) ggplot2::ggplot(data = Golf, aes(sample = yards)) + geom_qq() + theme_bw() ## End(Not run)
Data for Exercise 5.112
Governor
Governor
A data frame/tibble with 50 observations on three variables
a character variable with values Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, Florida
,
Georgia
, Hawaii
, Idaho
, Illinois
, Indiana
,
Iowa
, Kansas
, Kentucky
, Louisiana
, Maine
,
Maryland
, Massachusetts
, Michigan
, Minnesota
,
Mississippi
, Missouri
, Montana
, Nebraska
,
Nevada
, New Hampshire
, New Jersey
, New Mexico
,
New York
, North Carolina
, North Dakota
, Ohio
,
Oklahoma
, Oregon
, Pennsylvania
, Rhode Island
,
South Carolina
, South Dakota
, Tennessee
, Texas
,
Utah
, Vermont
, Virginia
, Washington
, West
Virginia
, Wisconsin
, and Wyoming
a factor indicating year
a numeric vector with the governor's salary (in dollars)
The 2000 World Almanac and Book of Facts.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(salary ~ year, data = Governor) ## Not run: library(ggplot2) ggplot2::ggplot(data = Governor, aes(x = salary)) + geom_density(fill = "pink") + facet_grid(year ~ .) + theme_bw() ## End(Not run)
boxplot(salary ~ year, data = Governor) ## Not run: library(ggplot2) ggplot2::ggplot(data = Governor, aes(x = salary)) + geom_density(fill = "pink") + facet_grid(year ~ .) + theme_bw() ## End(Not run)
Data for Example 2.13
Gpa
Gpa
A data frame/tibble with 10 observations on two variables
high school gpa
college gpa
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(collgpa ~ hsgpa, data = Gpa) mod <- lm(collgpa ~ hsgpa, data = Gpa) abline(mod) # add line yhat <- predict(mod) # fitted values e <- resid(mod) # residuals cbind(Gpa, yhat, e) # Table 2.1 cor(Gpa$hsgpa, Gpa$collgpa) ## Not run: library(ggplot2) ggplot2::ggplot(data = Gpa, aes(x = hsgpa, y = collgpa)) + geom_point() + geom_smooth(method = "lm") + theme_bw() ## End(Not run)
plot(collgpa ~ hsgpa, data = Gpa) mod <- lm(collgpa ~ hsgpa, data = Gpa) abline(mod) # add line yhat <- predict(mod) # fitted values e <- resid(mod) # residuals cbind(Gpa, yhat, e) # Table 2.1 cor(Gpa$hsgpa, Gpa$collgpa) ## Not run: library(ggplot2) ggplot2::ggplot(data = Gpa, aes(x = hsgpa, y = collgpa)) + geom_point() + geom_smooth(method = "lm") + theme_bw() ## End(Not run)
Data for Exercise 1.120
Grades
Grades
A data frame with 29 observations on one variable
a numeric vector containing test grades
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Grades$grades, main = "", xlab = "Test grades", right = FALSE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Grades, aes(x = grades, y = ..density..)) + geom_histogram(fill = "pink", binwidth = 5, color = "black") + geom_density(lwd = 2, color = "red") + theme_bw() ## End(Not run)
hist(Grades$grades, main = "", xlab = "Test grades", right = FALSE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Grades, aes(x = grades, y = ..density..)) + geom_histogram(fill = "pink", binwidth = 5, color = "black") + geom_density(lwd = 2, color = "red") + theme_bw() ## End(Not run)
Data for Exercise 1.118
Graduate
Graduate
A data frame/tibble with 12 observations on three variables
a character variable with values Alabama
,
Arkansas
, Auburn
, Florida
, Georgia
, Kentucky
,
Louisiana St
, Mississippi
, Mississippi St
, South
Carolina,
Tennessee
, and Vanderbilt
a character variable with values Al
, Ar
, Au
Fl
, Ge
, Ke
, LSt
, Mi
, MSt
, SC
,
Te
, and Va
graduation rate
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
barplot(Graduate$percent, names.arg = Graduate$school, las = 2, cex.names = 0.7, col = "tomato")
barplot(Graduate$percent, names.arg = Graduate$school, las = 2, cex.names = 0.7, col = "tomato")
Data for Exercise 6.57
Greenriv
Greenriv
A data frame/tibble with 37 observations on one variable
varve thickness in millimeters
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Greenriv$thick) SIGN.test(Greenriv$thick, md = 7.3, alternative = "greater")
stem(Greenriv$thick) SIGN.test(Greenriv$thick, md = 7.3, alternative = "greater")
Data for Exercises 6.45 and 6.98
Grnriv2
Grnriv2
A data frame/tibble with 101 observations on one variable
varve thickness (in millimeters)
J. Davis, Statistics and Data Analysis in Geology, 2nd Ed., Jon Wiley and Sons, New York.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Grnriv2$thick) t.test(Grnriv2$thick, mu = 8, alternative = "less")
stem(Grnriv2$thick) t.test(Grnriv2$thick, mu = 8, alternative = "less")
Data for Exercise 10.42
Groupabc
Groupabc
A data frame/tibble with 45 observations on two variables
a factor with levels A
, B
, and C
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(response ~ group, data = Groupabc, col = c("red", "blue", "green")) anova(lm(response ~ group, data = Groupabc))
boxplot(response ~ group, data = Groupabc, col = c("red", "blue", "green")) anova(lm(response ~ group, data = Groupabc))
Data for Exercise 10.4
Groups
Groups
A data frame/tibble with 78 observations on two variables
a factor with levels A
, B
, and C
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(response ~ group, data = Groups, col = c("red", "blue", "green")) anova(lm(response ~ group, data = Groups))
boxplot(response ~ group, data = Groups, col = c("red", "blue", "green")) anova(lm(response ~ group, data = Groups))
Data for Exercises 2.21 and 9.14
Gym
Gym
A data frame/tibble with eight observations on three variables
age of child
number of gymnastic activities successfully completed
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(number ~ age, data = Gym) model <- lm(number ~ age, data = Gym) abline(model, col = "red") summary(model)
plot(number ~ age, data = Gym) model <- lm(number ~ age, data = Gym) abline(model, col = "red") summary(model)
Data for Exercise 7.57
Habits
Habits
A data frame/tibble with 11 observations on four variables
study habit score
study habit score
B
minus A
the signed-ranked-differences
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
shapiro.test(Habits$differ) qqnorm(Habits$differ) qqline(Habits$differ) wilcox.test(Pair(B, A) ~ 1, data = Habits, alternative = "less") t.test(Habits$signrks, alternative = "less") ## Not run: library(ggplot2) ggplot2::ggplot(data = Habits, aes(x = differ)) + geom_dotplot(fill = "blue") + theme_bw() ## End(Not run)
shapiro.test(Habits$differ) qqnorm(Habits$differ) qqline(Habits$differ) wilcox.test(Pair(B, A) ~ 1, data = Habits, alternative = "less") t.test(Habits$signrks, alternative = "less") ## Not run: library(ggplot2) ggplot2::ggplot(data = Habits, aes(x = differ)) + geom_dotplot(fill = "blue") + theme_bw() ## End(Not run)
Data for Example 6.9
Haptoglo
Haptoglo
A data frame/tibble with eight observations on one variable
haptoglobin concentration (in grams per liter)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
shapiro.test(Haptoglo$concent) t.test(Haptoglo$concent, mu = 2, alternative = "less")
shapiro.test(Haptoglo$concent) t.test(Haptoglo$concent, mu = 2, alternative = "less")
Daily receipts for a small hardware store for 31 working days
Hardware
Hardware
A data frame with 31 observations on one variable
a numeric vector of daily receipts (in dollars)
J.C. Miller and J.N. Miller, (1988), Statistics for Analytical Chemistry, 2nd Ed. (New York: Halsted Press).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Hardware$receipt)
stem(Hardware$receipt)
Data for Example 2.18 and Exercise 9.34
Hardwood
Hardwood
A data frame/tibble with 19 observations on two variables
tensile strength of kraft paper (in pounds per square inch)
percent of hardwood in the batch of pulp that was used to produce the paper
G. Joglekar, et al., "Lack-of-Fit Testing When Replicates Are Not Available," The American Statistician, 43(3), (1989), 135-143.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(tensile ~ hardwood, data = Hardwood) model <- lm(tensile ~ hardwood, data = Hardwood) abline(model, col = "red") plot(model, which = 1)
plot(tensile ~ hardwood, data = Hardwood) model <- lm(tensile ~ hardwood, data = Hardwood) abline(model, col = "red") plot(model, which = 1)
Data for Exercise 1.29
Heat
Heat
A data frame/tibble with 301 observations on two variables
a factor with levels Utility gas
,
LP bottled gas
, Electricity
, Fuel oil
, Wood
, and
Other
a factor with levels American Indians on reservation
,
All U.S. households
, and American Indians not on reservations
Bureau of the Census, Housing of the American Indians on Reservations, Statistical Brief 95-11, April 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~ fuel + location, data = Heat) T1 barplot(t(T1), beside = TRUE, legend = TRUE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Heat, aes(x = fuel, fill = location)) + geom_bar(position = "dodge") + labs(y = "percent") + theme_bw() + theme(axis.text.x = element_text(angle = 30, hjust = 1)) ## End(Not run)
T1 <- xtabs(~ fuel + location, data = Heat) T1 barplot(t(T1), beside = TRUE, legend = TRUE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Heat, aes(x = fuel, fill = location)) + geom_bar(position = "dodge") + labs(y = "percent") + theme_bw() + theme(axis.text.x = element_text(angle = 30, hjust = 1)) ## End(Not run)
Data for Exercise 10.32
Heating
Heating
A data frame/tibble with 90 observations on the two variables
a factor with levels A
, B
, and C
denoting
the type of oil heater
heater efficiency rating
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(efficiency ~ type, data = Heating, col = c("red", "blue", "green")) kruskal.test(efficiency ~ type, data = Heating)
boxplot(efficiency ~ type, data = Heating, col = c("red", "blue", "green")) kruskal.test(efficiency ~ type, data = Heating)
Data for Exercise 2.77
Hodgkin
Hodgkin
A data frame/tibble with 538 observations on two variables
a factor with levels LD
,
LP
, MC
, and NS
a factor with levels Positive
, Partial
, and None
I. Dunsmore, F. Daly, Statistical Methods, Unit 9, Categorical Data, Milton Keynes, The Open University, 18.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~type + response, data = Hodgkin) T1 barplot(t(T1), legend = TRUE, beside = TRUE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Hodgkin, aes(x = type, fill = response)) + geom_bar(position = "dodge") + theme_bw() ## End(Not run)
T1 <- xtabs(~type + response, data = Hodgkin) T1 barplot(t(T1), legend = TRUE, beside = TRUE) ## Not run: library(ggplot2) ggplot2::ggplot(data = Hodgkin, aes(x = type, fill = response)) + geom_bar(position = "dodge") + theme_bw() ## End(Not run)
Data for Statistical Insight Chapter 5
Homes
Homes
A data frame/tibble with 65 observations on the four variables
a character variable with values Akron OH
,
Albuquerque NM
, Anaheim CA
, Atlanta GA
, Baltimore
MD
, Baton Rouge LA
, Birmingham AL
, Boston MA
,
Bradenton FL
, Buffalo NY
, Charleston SC
, Chicago
IL
, Cincinnati OH
, Cleveland OH
, Columbia SC
,
Columbus OH
, Corpus Christi TX
, Dallas TX
,
Daytona Beach FL
, Denver CO
, Des Moines IA
,
Detroit MI
, El Paso TX
, Grand Rapids MI
,
Hartford CT
, Honolulu HI
, Houston TX
,
Indianapolis IN
, Jacksonville FL
, Kansas City MO
,
Knoxville TN
, Las Vegas NV
, Los Angeles CA
,
Louisville KY
, Madison WI
, Memphis TN
, Miami FL
,
Milwaukee WI
, Minneapolis MN
, Mobile AL
,
Nashville TN
, New Haven CT
, New Orleans LA
, New
York NY
, Oklahoma City OK
, Omaha NE
, Orlando FL
,
Philadelphia PA
, Phoenix AZ
, Pittsburgh PA
,
Portland OR
, Providence RI
, Sacramento CA
, Salt
Lake City UT
, San Antonio TX
, San Diego CA
, San
Francisco CA
, Seattle WA
, Spokane WA
, St Louis MO
,
Syracuse NY
, Tampa FL
, Toledo OH
, Tulsa OK
, and
Washington DC
a character variable with values Midwest
, Northeast
,
South
, and West
a factor with levels 1994
and 2000
median house price (in dollars)
National Association of Realtors.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
tapply(Homes$price, Homes$year, mean) tapply(Homes$price, Homes$region, mean) p2000 <- subset(Homes, year == "2000") p1994 <- subset(Homes, year == "1994") ## Not run: library(dplyr) library(ggplot2) dplyr::group_by(Homes, year, region) %>% summarize(AvgPrice = mean(price)) ggplot2::ggplot(data = Homes, aes(x = region, y = price)) + geom_boxplot() + theme_bw() + facet_grid(year ~ .) ## End(Not run)
tapply(Homes$price, Homes$year, mean) tapply(Homes$price, Homes$region, mean) p2000 <- subset(Homes, year == "2000") p1994 <- subset(Homes, year == "1994") ## Not run: library(dplyr) library(ggplot2) dplyr::group_by(Homes, year, region) %>% summarize(AvgPrice = mean(price)) ggplot2::ggplot(data = Homes, aes(x = region, y = price)) + geom_boxplot() + theme_bw() + facet_grid(year ~ .) ## End(Not run)
Data for Exercise 7.78
Homework
Homework
A data frame with 30 observations on two variables
type of school either private
or public
number of hours per week spent on homework
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(time ~ school, data = Homework, ylab = "Hours per week spent on homework") # t.test(time ~ school, data = Homework)
boxplot(time ~ school, data = Homework, ylab = "Hours per week spent on homework") # t.test(time ~ school, data = Homework)
Data for Statistical Insight Chapter 6
Honda
Honda
A data frame/tibble with 35 observations on one variable
miles per gallon for a Honda Civic
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(Honda$mileage, mu = 40, alternative = "less")
t.test(Honda$mileage, mu = 40, alternative = "less")
Data for Example 10.6
Hostile
Hostile
A data frame/tibble with 135 observations on two variables
a factor with the location of the high school student
(Rural
, Suburban
, or Urban
)
the score from the Hostility Level Test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(hostility ~ location, data = Hostile, col = c("red", "blue", "green")) kruskal.test(hostility ~ location, data = Hostile)
boxplot(hostility ~ location, data = Hostile, col = c("red", "blue", "green")) kruskal.test(hostility ~ location, data = Hostile)
Data for Exercise 5.82
Housing
Housing
A data frame/tibble with 74 observations on three variables
a character variable with values Albany
,
Anaheim
, Atlanta
, Baltimore
, Birmingham
,
Boston
, Chicago
, Cincinnati
, Cleveland
,
Columbus
, Dallas
, Denver
, Detroit
, Ft
Lauderdale
, Houston
, Indianapolis
, Kansas City
, Los
Angeles
, Louisville
, Memphis
, Miami
, Milwaukee
,
Minneapolis
, Nashville
, New York
, Oklahoma City
,
Philadelphia
, Providence
, Rochester
, Salt Lake City
,
San Antonio
, San Diego
, San Francisco
, San Jose
,
St Louis
, Tampa
, and Washington
a factor with levels 1984
and 1993
median house price (in dollars)
National Association of Realtors.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripchart(price ~ year, data = Housing, method = "stack", pch = 1, col = c("red", "blue")) ## Not run: library(ggplot2) ggplot2::ggplot(data = Housing, aes(x = price, fill = year)) + geom_dotplot() + facet_grid(year ~ .) + theme_bw() ## End(Not run)
stripchart(price ~ year, data = Housing, method = "stack", pch = 1, col = c("red", "blue")) ## Not run: library(ggplot2) ggplot2::ggplot(data = Housing, aes(x = price, fill = year)) + geom_dotplot() + facet_grid(year ~ .) + theme_bw() ## End(Not run)
Data for Exercises 1.38, 10.19, and Example 1.6
Hurrican
Hurrican
A data frame/tibble with 46 observations on four variables
a numeric vector indicating year
a numeric vector recording number of storms
a numeric vector recording number of hurricanes
a factor with levels cold
, neutral
, and
warm
National Hurricane Center.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~hurrican, data = Hurrican) T1 barplot(T1, col = "blue", main = "Problem 1.38", xlab = "Number of hurricanes", ylab = "Number of seasons") boxplot(storms ~ elnino, data = Hurrican, col = c("blue", "yellow", "red")) anova(lm(storms ~ elnino, data = Hurrican)) rm(T1)
T1 <- xtabs(~hurrican, data = Hurrican) T1 barplot(T1, col = "blue", main = "Problem 1.38", xlab = "Number of hurricanes", ylab = "Number of seasons") boxplot(storms ~ elnino, data = Hurrican, col = c("blue", "yellow", "red")) anova(lm(storms ~ elnino, data = Hurrican)) rm(T1)
Data for Exercise 2.46 and 2.60
Iceberg
Iceberg
A data frame with 12 observations on three variables
a character variable with abbreviated months of the year
number of icebergs sighted south of Newfoundland
number of icebergs sighted south of Grand Banks
N. Shaw, Manual of Meteorology, Vol. 2 (London: Cambridge University Press 1942), 7; and F. Mosteller and J. Tukey, Data Analysis and Regression (Reading, MA: Addison - Wesley, 1977).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(Newfoundland ~ `Grand Banks`, data = Iceberg) abline(lm(Newfoundland ~ `Grand Banks`, data = Iceberg), col = "blue")
plot(Newfoundland ~ `Grand Banks`, data = Iceberg) abline(lm(Newfoundland ~ `Grand Banks`, data = Iceberg), col = "blue")
Data for Exercise 1.33
Income
Income
A data frame/tibble with 51 observations on two variables
a character variable with values Alabama
,
Alaska
, Arizona
, Arkansas
, California
,
Colorado
, Connecticut
, Delaware
, District of
Colunbia
, Florida
, Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
, Kansas
, Kentucky
,
Louisiana
, Maine
, Maryland
, Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
, Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
,
Washington
, West Virginia
, Wisconsin
, and Wyoming
percent change in income from first quarter to the second quarter of 2000
US Department of Commerce.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Income$class <- cut(Income$percent_change, breaks = c(-Inf, 0.5, 1.0, 1.5, 2.0, Inf)) T1 <- xtabs(~class, data = Income) T1 barplot(T1, col = "pink") ## Not run: library(ggplot2) DF <- as.data.frame(T1) DF ggplot2::ggplot(data = DF, aes(x = class, y = Freq)) + geom_bar(stat = "identity", fill = "purple") + theme_bw() ## End(Not run)
Income$class <- cut(Income$percent_change, breaks = c(-Inf, 0.5, 1.0, 1.5, 2.0, Inf)) T1 <- xtabs(~class, data = Income) T1 barplot(T1, col = "pink") ## Not run: library(ggplot2) DF <- as.data.frame(T1) DF ggplot2::ggplot(data = DF, aes(x = class, y = Freq)) + geom_bar(stat = "identity", fill = "purple") + theme_bw() ## End(Not run)
Data for Exercise 7.41
Independent
Independent
A data frame/tibble with 46 observations on two variables
a numeric vector
a factor with levels A
and B
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Independent$score[Independent$group=="A"]) qqline(Independent$score[Independent$group=="A"]) qqnorm(Independent$score[Independent$group=="B"]) qqline(Independent$score[Independent$group=="B"]) boxplot(score ~ group, data = Independent, col = "blue") wilcox.test(score ~ group, data = Independent)
qqnorm(Independent$score[Independent$group=="A"]) qqline(Independent$score[Independent$group=="A"]) qqnorm(Independent$score[Independent$group=="B"]) qqline(Independent$score[Independent$group=="B"]) boxplot(score ~ group, data = Independent, col = "blue") wilcox.test(score ~ group, data = Independent)
Data for Exercise 2.95
Indian
Indian
A data frame/tibble with ten observations on four variables
a character variable with values Blackfeet
,
Fort Apache
, Gila River
, Hopi
, Navajo
, Papago
,
Pine Ridge
, Rosebud
, San Carlos
, and Zuni Pueblo
percent who have graduated from high school
per capita income (in dollars)
percent poverty
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(mfrow = c(1, 2)) plot(`per capita income` ~ `percent high school`, data = Indian, xlab = "Percent high school graudates", ylab = "Per capita income") plot(`poverty rate` ~ `percent high school`, data = Indian, xlab = "Percent high school graudates", ylab = "Percent poverty") par(mfrow = c(1, 1))
par(mfrow = c(1, 2)) plot(`per capita income` ~ `percent high school`, data = Indian, xlab = "Percent high school graudates", ylab = "Per capita income") plot(`poverty rate` ~ `percent high school`, data = Indian, xlab = "Percent high school graudates", ylab = "Percent poverty") par(mfrow = c(1, 1))
Data for Exercise 1.128
Indiapol
Indiapol
A data frame/tibble with 39 observations on two variables
the year of the race
the winners average speed (in mph)
The World Almanac and Book of Facts, 2000, p. 1004.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(speed ~ year, data = Indiapol, type = "b")
plot(speed ~ year, data = Indiapol, type = "b")
Data for Exercises 7.11 and 7.36
Indy500
Indy500
A data frame/tibble with 33 observations on four variables
a character variable with values andretti
,
bachelart
, boesel
, brayton
, c.guerrero
,
cheever
, fabi
, fernandez
, ferran
, fittipaldi
,
fox
, goodyear
, gordon
, gugelmin
, herta
,
james
, johansson
, jones
, lazier
, luyendyk
,
matsuda
, matsushita
, pruett
, r.guerrero
,
rahal
, ribeiro
, salazar
, sharp
, sullivan
,
tracy
, vasser
, villeneuve
, and zampedri
qualifying speed (in mph)
number of Indianapolis 500 starts
a numeric vector where 1 indicates the driver has 4 or fewer Indianapolis 500 starts and a 2 for drivers with 5 or more Indianapolis 500 starts
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripchart(qualif ~ group, data = Indy500, method = "stack", pch = 19, col = c("red", "blue")) boxplot(qualif ~ group, data = Indy500) t.test(qualif ~ group, data = Indy500) ## Not run: library(ggplot2) ggplot2::ggplot(data = Indy500, aes(sample = qualif)) + geom_qq() + facet_grid(group ~ .) + theme_bw() ## End(Not run)
stripchart(qualif ~ group, data = Indy500, method = "stack", pch = 19, col = c("red", "blue")) boxplot(qualif ~ group, data = Indy500) t.test(qualif ~ group, data = Indy500) ## Not run: library(ggplot2) ggplot2::ggplot(data = Indy500, aes(sample = qualif)) + geom_qq() + facet_grid(group ~ .) + theme_bw() ## End(Not run)
Data for Exercises 2.12 and 2.29
Inflatio
Inflatio
A data frame/tibble with 24 observations on four variables
a numeric vector of years
average hourly wage for salaried employees (in dollars)
percent increase in hourly wage over previous year
percent inflation rate
Bureau of Labor Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(increase ~ inflation, data = Inflatio) cor(Inflatio$increase, Inflatio$inflation, use = "complete.obs")
plot(increase ~ inflation, data = Inflatio) cor(Inflatio$increase, Inflatio$inflation, use = "complete.obs")
Data for Exercises 5.91 and 6.48
Inletoil
Inletoil
A data frame/tibble with 12 observations on one variable
inlet oil temperature (Fahrenheit)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Inletoil$temp, breaks = 3) qqnorm(Inletoil$temp) qqline(Inletoil$temp) t.test(Inletoil$temp) t.test(Inletoil$temp, mu = 98, alternative = "less")
hist(Inletoil$temp, breaks = 3) qqnorm(Inletoil$temp) qqline(Inletoil$temp) t.test(Inletoil$temp) t.test(Inletoil$temp, mu = 98, alternative = "less")
Data for Statistical Insight Chapter 8
Inmate
Inmate
A data frame/tibble with 28,047 observations on two variables
a factor with levels white
,
black
, and hispanic
a factor with levels heroin
, crack
, cocaine
,
and marijuana
C. Wolf Harlow (1994), Comparing Federal and State Prison Inmates, NCJ-145864, U.S. Department of Justice, Bureau of Justice Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~race + drug, data = Inmate) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~race + drug, data = Inmate) T1 chisq.test(T1) rm(T1)
Data for Exercise 8.59
Inspect
Inspect
A data frame/tibble with 174 observations on two variables
a factor with levels auto inspection
,
auto repair
, car care center
, gas station
, new car
dealer
, and tire store
a factor with levels less than 70%
, between 70% and 84%
, and more than 85%
The Charlotte Observer, December 13, 1992.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~ station + passed, data = Inspect) T1 barplot(T1, beside = TRUE, legend = TRUE) chisq.test(T1) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Inspect, aes(x = passed, fill = station)) + geom_bar(position = "dodge") + theme_bw() ## End(Not run)
T1 <- xtabs(~ station + passed, data = Inspect) T1 barplot(T1, beside = TRUE, legend = TRUE) chisq.test(T1) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Inspect, aes(x = passed, fill = station)) + geom_bar(position = "dodge") + theme_bw() ## End(Not run)
Data for Exercise 9.50
Insulate
Insulate
A data frame/tibble with ten observations on two variables
outside temperature (in degrees Celcius)
heat loss (in BTUs)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(loss ~ temp, data = Insulate) model <- lm(loss ~ temp, data = Insulate) abline(model, col = "blue") summary(model) ## Not run: library(ggplot2) ggplot2::ggplot(data = Insulate, aes(x = temp, y = loss)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_bw() ## End(Not run)
plot(loss ~ temp, data = Insulate) model <- lm(loss ~ temp, data = Insulate) abline(model, col = "blue") summary(model) ## Not run: library(ggplot2) ggplot2::ggplot(data = Insulate, aes(x = temp, y = loss)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_bw() ## End(Not run)
Data for Exercises 9.51 and 9.52
Iqgpa
Iqgpa
A data frame/tibble with 12 observations on two variables
IQ scores
Grade point average
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(gpa ~ iq, data = Iqgpa, col = "blue", pch = 19) model <- lm(gpa ~ iq, data = Iqgpa) summary(model) rm(model)
plot(gpa ~ iq, data = Iqgpa, col = "blue", pch = 19) model <- lm(gpa ~ iq, data = Iqgpa) summary(model) rm(model)
Data for Examples 1.15 and 5.19
Irises
Irises
A data frame/tibble with 150 observations on five variables
sepal length (in cm)
sepal width (in cm)
petal length (in cm)
petal width (in cm)
a factor with levels setosa
, versicolor
, and virginica
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, Part II, 179-188.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
tapply(Irises$sepal_length, Irises$species, mean) t.test(Irises$sepal_length[Irises$species == "setosa"], conf.level = 0.99) hist(Irises$sepal_length[Irises$species == "setosa"], main = "Sepal length for\n Iris Setosa", xlab = "Length (in cm)") boxplot(sepal_length ~ species, data = Irises)
tapply(Irises$sepal_length, Irises$species, mean) t.test(Irises$sepal_length[Irises$species == "setosa"], conf.level = 0.99) hist(Irises$sepal_length[Irises$species == "setosa"], main = "Sepal length for\n Iris Setosa", xlab = "Length (in cm)") boxplot(sepal_length ~ species, data = Irises)
Data for Exercise 2.14, 2.17, 2.31, 2.33, and 2.40
Jdpower
Jdpower
A data frame/tibble with 29 observations on three variables
a factor with levels Acura
, BMW
,
Buick
, Cadillac
, Chevrolet
, Dodge
Eagle
,
Ford
, Geo
, Honda
, Hyundai
, Infiniti
,
Jaguar
, Lexus
, Lincoln
, Mazda
, Mercedes-Benz
,
Mercury
, Mitsubishi
, Nissan
, Oldsmobile
,
Plymouth
, Pontiac
, Saab
, Saturn
, and Subaru
,
Toyota
Volkswagen
, Volvo
number of problems per 100 cars in 1994
number of problems per 100 cars in 1995
USA Today, May 25, 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(`1995` ~ `1994`, data = Jdpower) summary(model) plot(`1995` ~ `1994`, data = Jdpower) abline(model, col = "red") rm(model)
model <- lm(`1995` ~ `1994`, data = Jdpower) summary(model) plot(`1995` ~ `1994`, data = Jdpower) abline(model, col = "red") rm(model)
Data for Exercise 9.60
Jobsat
Jobsat
A data frame/tibble with nine observations on two variables
Wilson Stress Profile score for teachers
job satisfaction score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(satisfaction ~ wspt, data = Jobsat) model <- lm(satisfaction ~ wspt, data = Jobsat) abline(model, col = "blue") summary(model) rm(model)
plot(satisfaction ~ wspt, data = Jobsat) model <- lm(satisfaction ~ wspt, data = Jobsat) abline(model, col = "blue") summary(model) rm(model)
Data for Exercise 4.85
Kidsmoke
Kidsmoke
A data frame/tibble with 1000 observations on two variables
character vector with values female
and male
a character vector with values no
and yes
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~smoke + gender, data = Kidsmoke) T1 prop.table(T1) prop.table(T1, 1) prop.table(T1, 2)
T1 <- xtabs(~smoke + gender, data = Kidsmoke) T1 prop.table(T1) prop.table(T1, 1) prop.table(T1, 2)
Data for Example 5.9
Kilowatt
Kilowatt
A data frame/tibble with 51 observations on two variables
a factor with levels Alabama
Alaska
, Arizona
, Arkansas
California
,
Colorado
, Connecticut
, Delaware
, District of
Columbia
, Florida
,Georgia
, Hawaii
, Idaho
,
Illinois
, Indiana
, Iowa
Kansas
Kentucky
,
Louisiana
, Maine
, Maryland
, Massachusetts
,
Michigan
, Minnesota
, Mississippi
, Missour
,
Montana
Nebraska
, Nevada
, New Hampshire
, New
Jersey
, New Mexico
, New York
, North Carolina
, North
Dakota
, Ohio
, Oklahoma
, Oregon
, Pennsylvania
,
Rhode Island
, South Carolina
, South Dakota
,
Tennessee
, Texas
, Utah
, Vermont
, Virginia
Washington
, West Virginia
, Wisconsin
, and Wyoming
a numeric vector indicating rates for kilowatt per hour
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Kilowatt$rate)
EDA(Kilowatt$rate)
Data for Exercise 7.68
Kinder
Kinder
A data frame/tibble with eight observations on three variables
a numeric indicator of pair
reading score of kids who went to kindergarten
reading score of kids who did not go to kindergarten
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Kinder$kinder, Kinder$nokinder) diff <- Kinder$kinder - Kinder$nokinder qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
boxplot(Kinder$kinder, Kinder$nokinder) diff <- Kinder$kinder - Kinder$nokinder qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
Data for Exercise 10.18
Laminect
Laminect
A data frame/tibble with 138 observations on two variables
a character vector indicating the area of the hospital with Rural
, Regional
,
and Metropol
a numeric vector indicating cost of a laminectomy
Consumer's Guide to Hospitalization Charges in North Carolina Hospitals (August 1994), North Carolina Medical Database Commission, Department of Insurance.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(cost ~ area, data = Laminect, col = topo.colors(3)) anova(lm(cost ~ area, data = Laminect))
boxplot(cost ~ area, data = Laminect, col = topo.colors(3)) anova(lm(cost ~ area, data = Laminect))
Data for Example 1.17
Lead
Lead
A data frame/tibble with 66 observations on the two variables
a character vector with values exposed
and control
a numeric vector indicating the level of lead in children's blood (in micrograms/dl)
Morton, D. et al. (1982), "Lead Absorption in Children of Employees in a Lead-Related Industry," American Journal of Epidemiology, 155, 549-555.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(lead ~ group, data = Lead, col = topo.colors(2))
boxplot(lead ~ group, data = Lead, col = topo.colors(2))
Data for Exercise 7.31
Leader
Leader
A data frame/tibble with 34 observations on two variables
a character vector indicating age with values under35
and over35
score on a leadership exam
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ age, data = Leader, col = c("gray", "green")) t.test(score ~ age, data = Leader)
boxplot(score ~ age, data = Leader, col = c("gray", "green")) t.test(score ~ age, data = Leader)
Data for Example 6.12
Lethal
Lethal
A data frame/tibble with 30 observations on one variable
a numeric vector indicating time surivived after injection (in seconds)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Lethal$survival, md = 45, alternative = "less")
SIGN.test(Lethal$survival, md = 45, alternative = "less")
Data for Exercise 1.31
Life
Life
A data frame/tibble with eight observations on three variables
a numeric vector indicating year
life expectancy for men (in years)
life expectancy for women (in years)
National Center for Health Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(men ~ year, type = "l", ylim = c(min(men, women), max(men, women)), col = "blue", main = "Life Expectancy vs Year", ylab = "Age", xlab = "Year", data = Life) lines(women ~ year, col = "red", data = Life) text(1955, 65, "Men", col = "blue") text(1955, 70, "Women", col = "red")
plot(men ~ year, type = "l", ylim = c(min(men, women), max(men, women)), col = "blue", main = "Life Expectancy vs Year", ylab = "Age", xlab = "Year", data = Life) lines(women ~ year, col = "red", data = Life) text(1955, 65, "Men", col = "blue") text(1955, 70, "Women", col = "red")
Data for Exercise 2.4, 2.37, and 2.49
Lifespan
Lifespan
A data frame/tibble with six observations two variables
temperature (in Celcius)
lifespan of component (in hours)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(life ~ heat, data = Lifespan) model <- lm(life ~ heat, data = Lifespan) abline(model, col = "red") resid(model) sum((resid(model))^2) anova(model) rm(model)
plot(life ~ heat, data = Lifespan) model <- lm(life ~ heat, data = Lifespan) abline(model, col = "red") resid(model) sum((resid(model))^2) anova(model) rm(model)
Data for Exercise 2.6
Ligntmonth
Ligntmonth
A data frame/tibble with 12 observations on four variables
a factor with levels 1/01/2000
,
10/01/2000
, 11/01/2000
, 12/01/2000
, 2/01/2000
,
3/01/2000
, 4/01/2000
, 5/01/2000
, 6/01/2000
,
7/01/2000
, 8/01/2000
, and 9/01/2000
number of deaths due to lightning strikes
number of injuries due to lightning strikes
damage due to lightning strikes (in dollars)
Lighting Fatalities, Injuries and Damage Reports in the United States, 1959-1994, NOAA Technical Memorandum NWS SR-193, Dept. of Commerce.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(deaths ~ damage, data = Ligntmonth) model = lm(deaths ~ damage, data = Ligntmonth) abline(model, col = "red") rm(model)
plot(deaths ~ damage, data = Ligntmonth) model = lm(deaths ~ damage, data = Ligntmonth) abline(model, col = "red") rm(model)
Data for Exercise 10.33
Lodge
Lodge
A data frame/tibble with 45 observations on six variables
a numeric vector indicating the amount of vehicles that passed a site in 1 hour
a numeric vector with values 1
, 2
, and 3
ranks for variable traffic
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(traffic ~ site, data = Lodge, col = cm.colors(3)) anova(lm(traffic ~ factor(site), data = Lodge))
boxplot(traffic ~ site, data = Lodge, col = cm.colors(3)) anova(lm(traffic ~ factor(site), data = Lodge))
Data for Exercise 10.45
Longtail
Longtail
A data frame/tibble with 60 observations on three variables
a numeric vector
a numeric vector with values 1
, 2
, and 3
ranks for variable score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ group, data = Longtail, col = heat.colors(3)) kruskal.test(score ~ factor(group), data = Longtail) anova(lm(score ~ factor(group), data = Longtail))
boxplot(score ~ group, data = Longtail, col = heat.colors(3)) kruskal.test(score ~ factor(group), data = Longtail) anova(lm(score ~ factor(group), data = Longtail))
Data for Example 7.18
Lowabil
Lowabil
A data frame/tibble with 12 observations on three variables
a numeric indicator of pair
score of the child with the experimental method
score of the child with the standard method
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
diff = Lowabil$experiment - Lowabil$control qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
diff = Lowabil$experiment - Lowabil$control qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
Data for Exercise 9.9
Magnesiu
Magnesiu
A data frame/tibble with 20 observations on two variables
distance between samples
concentration of magnesium
Davis, J. (1986), Statistics and Data Analysis in Geology, 2d. Ed., John Wiley and Sons, New York, p. 146.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(magnesium ~ distance, data = Magnesiu) model = lm(magnesium ~ distance, data = Magnesiu) abline(model, col = "red") summary(model) rm(model)
plot(magnesium ~ distance, data = Magnesiu) model = lm(magnesium ~ distance, data = Magnesiu) abline(model, col = "red") summary(model) rm(model)
Data for Exercise 5.73
Malpract
Malpract
A data frame/tibble with 17 observations on one variable
malpractice reward (in $1000)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Malpract$award, conf.level = 0.90)
SIGN.test(Malpract$award, conf.level = 0.90)
Data for Exercise 5.81
Manager
Manager
A data frame/tibble with 26 observations on one variable
random sample of advertised annual salaries of top executives (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Manager$salary) SIGN.test(Manager$salary)
stem(Manager$salary) SIGN.test(Manager$salary)
Data for Exercise 6.100
Marked
Marked
A data frame/tibble with 65 observations on one variable
percentage of marked cars in 65 Florida police departments
Law Enforcement Management and Administrative Statistics, 1993, Bureau of Justice Statistics, NCJ-148825, September 1995, p. 147-148.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Marked$percent) SIGN.test(Marked$percent, md = 60, alternative = "greater") t.test(Marked$percent, mu = 60, alternative = "greater")
EDA(Marked$percent) SIGN.test(Marked$percent, md = 60, alternative = "greater") t.test(Marked$percent, mu = 60, alternative = "greater")
Data for Exercise 1.69
Math
Math
A data frame/tibble with 30 observations on one variable
scores on a standardized test for 30 tenth graders
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Math$score) hist(Math$score, main = "Math Scores", xlab = "score", freq = FALSE) lines(density(Math$score), col = "red") CharlieZ <- (62 - mean(Math$score))/sd(Math$score) CharlieZ scale(Math$score)[which(Math$score == 62)]
stem(Math$score) hist(Math$score, main = "Math Scores", xlab = "score", freq = FALSE) lines(density(Math$score), col = "red") CharlieZ <- (62 - mean(Math$score))/sd(Math$score) CharlieZ scale(Math$score)[which(Math$score == 62)]
Data for Exercise 5.26
Mathcomp
Mathcomp
A data frame/tibble with 31 observations one variable
scores of 31 entering freshmen at a community college on a national standardized test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Mathcomp$score) EDA(Mathcomp$score)
stem(Mathcomp$score) EDA(Mathcomp$score)
Data for Exercise 9.24, Example 9.1, and Example 9.6
Mathpro
Mathpro
A data frame/tibble with 51 observations on four variables
a factor with levels Conn
,
D.C.
, Del
, Ga
, Hawaii
, Ind
, Maine
,
Mass
, Md
, N.C.
, N.H.
, N.J.
, N.Y.
,
Ore
, Pa
, R.I.
, S.C.
, Va
, and Vt
SAT math scores for high school seniors
math proficiency scores for eigth graders
a numeric vector
National Assessment of Educational Progress and The College Board.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(sat_math ~ profic, data = Mathpro) plot(sat_math ~ profic, data = Mathpro, ylab = "SAT", xlab = "proficiency") abline(model, col = "red") summary(model) rm(model)
model <- lm(sat_math ~ profic, data = Mathpro) plot(sat_math ~ profic, data = Mathpro, ylab = "SAT", xlab = "proficiency") abline(model, col = "red") summary(model) rm(model)
Data for Exercise 10.13
Maze
Maze
A data frame/tibble with 32 observations on two variables
error scores for animals running through a maze under different conditions
a factor with levels CondA
,
CondB,
CondC
, and CondD
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ condition, data = Maze, col = rainbow(4)) anova(lm(score ~ condition, data = Maze))
boxplot(score ~ condition, data = Maze, col = rainbow(4)) anova(lm(score ~ condition, data = Maze))
Data for Exercise 10.52
Median
Median
A data frame/tibble with 45 observations on two variables
a vector with values Sample1
, Sample 2
, and Sample 3
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(value ~ sample, data = Median, col = rainbow(3)) anova(lm(value ~ sample, data = Median)) kruskal.test(value ~ factor(sample), data = Median)
boxplot(value ~ sample, data = Median, col = rainbow(3)) anova(lm(value ~ sample, data = Median)) kruskal.test(value ~ factor(sample), data = Median)
Data for Exercise 6.52
Mental
Mental
A data frame/tibble with 16 observations on one variable
mental age of 16 girls
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Mental$age, md = 100)
SIGN.test(Mental$age, md = 100)
Data for Example 1.9
Mercury
Mercury
A data frame/tibble with 25 observations on one variable
a numeric vector measuring mercury (in parts per million)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Mercury$mercury)
stem(Mercury$mercury)
Data for Exercise 5.117
Metrent
Metrent
A data frame/tibble with 46 observations on one variable
monthly rent in dollars
U.S. Bureau of the Census, Housing in the Metropolitan Areas, Statistical Brief SB/94/19, September 1994.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Metrent$rent, col = "magenta") t.test(Metrent$rent, conf.level = 0.99)$conf
boxplot(Metrent$rent, col = "magenta") t.test(Metrent$rent, conf.level = 0.99)$conf
Data for Example 5.7
Miller
Miller
A data frame/tibble with 25 observations on one variable
scores on the Miller Personality test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Miller$miller) fivenum(Miller$miller) boxplot(Miller$miller) qqnorm(Miller$miller,col = "blue") qqline(Miller$miller, col = "red")
stem(Miller$miller) fivenum(Miller$miller) boxplot(Miller$miller) qqnorm(Miller$miller,col = "blue") qqline(Miller$miller, col = "red")
Data for Exercise 1.41
Miller1
Miller1
A data frame/tibble with 20 observations on one variable
scores on the Miller personality test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Miller1$miller) stem(Miller1$miller, scale = 2)
stem(Miller1$miller) stem(Miller1$miller, scale = 2)
Data for Exercise 9.32
Moisture
Moisture
A data frame/tibble with 16 observations on four variables
a numeric vector
g of water per 100 g of dried sediment
a numeric vector
a numeric vector
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2d. ed., John Wiley and Sons, New York, pp. 177, 185.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(moisture ~ depth, data = Moisture) model <- lm(moisture ~ depth, data = Moisture) abline(model, col = "red") plot(resid(model) ~ depth, data = Moisture) rm(model)
plot(moisture ~ depth, data = Moisture) model <- lm(moisture ~ depth, data = Moisture) abline(model, col = "red") plot(resid(model) ~ depth, data = Moisture) rm(model)
Data for Exercise 7.45
Monoxide
Monoxide
A data frame/tibble with ten observations on two variables
a vector with values manufacturer
and competitor
carbon monoxide emitted
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(emission ~ company, data = Monoxide, col = topo.colors(2)) t.test(emission ~ company, data = Monoxide) wilcox.test(emission ~ company, data = Monoxide) ## Not run: library(ggplot2) ggplot2::ggplot(data = Monoxide, aes(x = company, y = emission)) + geom_boxplot() + theme_bw() ## End(Not run)
boxplot(emission ~ company, data = Monoxide, col = topo.colors(2)) t.test(emission ~ company, data = Monoxide) wilcox.test(emission ~ company, data = Monoxide) ## Not run: library(ggplot2) ggplot2::ggplot(data = Monoxide, aes(x = company, y = emission)) + geom_boxplot() + theme_bw() ## End(Not run)
Data for Exercise 7.53
Movie
Movie
A data frame/tibble with 12 observations on three variables
moral aptitude before viewing the movie
moral aptitude after viewing the movie
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Movie$differ) qqline(Movie$differ) shapiro.test(Movie$differ) t.test(Movie$differ, conf.level = 0.99) wilcox.test(Movie$differ)
qqnorm(Movie$differ) qqline(Movie$differ) shapiro.test(Movie$differ) t.test(Movie$differ, conf.level = 0.99) wilcox.test(Movie$differ)
Data for Exercise 7.59
Music
Music
A data frame/tibble with 12 observations on three variables
a numeric vector measuring the improvement scores on a music recognition test
a numeric vector measuring the improvement scores on a music recognition test
method1
- method2
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Music$differ) qqline(Music$differ) shapiro.test(Music$differ) t.test(Music$differ) ## Not run: library(ggplot2) ggplot2::ggplot(data = Music, aes(x = differ)) + geom_dotplot() + theme_bw() ## End(Not run)
qqnorm(Music$differ) qqline(Music$differ) shapiro.test(Music$differ) t.test(Music$differ) ## Not run: library(ggplot2) ggplot2::ggplot(data = Music, aes(x = differ)) + geom_dotplot() + theme_bw() ## End(Not run)
Data for Exercises 2.28, 9.19, and Example 2.8
Name
Name
A data frame/tibble with 42 observations on three variables
a factor with levels Band-Aid
,
Barbie
, Birds Eye
, Budweiser
, Camel
, Campbell
,
Carlsberg
, Coca-Cola
, Colgate
, Del Monte
,
Fisher-Price
, Gordon's
, Green Giant
, Guinness
,
Haagen-Dazs
, Heineken
, Heinz
, Hennessy
,
Hermes
, Hershey
, Ivory
, Jell-o
, Johnnie
Walker
, Kellogg
, Kleenex
, Kraft
, Louis Vuitton
,
Marlboro
, Nescafe
, Nestle
, Nivea
, Oil of Olay
,
Pampers
, Pepsi-Cola
, Planters
, Quaker
, Sara
Lee
, Schweppes
, Smirnoff
, Tampax
, Winston
, and
Wrigley's
value in billions of dollars
revenue in billions of dollars
Financial World.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(value ~ revenue, data = Name) model <- lm(value ~ revenue, data = Name) abline(model, col = "red") cor(Name$value, Name$revenue) summary(model) rm(model)
plot(value ~ revenue, data = Name) model <- lm(value ~ revenue, data = Name) abline(model, col = "red") cor(Name$value, Name$revenue) summary(model) rm(model)
Data for Exercise 10.53
Nascar
Nascar
A data frame/tibble with 36 observations on six variables
duration of pit stop (in seconds)
a numeric vector representing team 1, 2, or 3
a numeric vector ranking each pit stop in order of speed
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(time ~ team, data = Nascar, col = rainbow(3)) model <- lm(time ~ factor(team), data = Nascar) summary(model) anova(model) rm(model)
boxplot(time ~ team, data = Nascar, col = rainbow(3)) model <- lm(time ~ factor(team), data = Nascar) summary(model) anova(model) rm(model)
Data for Example 10.3
Nervous
Nervous
A data frame/tibble with 25 observations on two variables
a numeric vector representing reaction time
a numeric vector indicating each of the 4 drugs
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(react ~ drug, data = Nervous, col = rainbow(4)) model <- aov(react ~ factor(drug), data = Nervous) summary(model) TukeyHSD(model) plot(TukeyHSD(model), las = 1)
boxplot(react ~ drug, data = Nervous, col = rainbow(4)) model <- aov(react ~ factor(drug), data = Nervous) summary(model) TukeyHSD(model) plot(TukeyHSD(model), las = 1)
Data for Exercise 1.43
Newsstand
Newsstand
A data frame/tibble with 20 observations on one variable
profit of each newsstand (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Newsstand$profit) stem(Newsstand$profit, scale = 3)
stem(Newsstand$profit) stem(Newsstand$profit, scale = 3)
Data for Exercise 9.63
Nfldraf2
Nfldraf2
A data frame/tibble with 47 observations on three variables
rating of each player on a scale out of 10
forty yard dash time (in seconds)
weight of each player (in pounds)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(rating ~ forty, data = Nfldraf2) summary(lm(rating ~ forty, data = Nfldraf2))
plot(rating ~ forty, data = Nfldraf2) summary(lm(rating ~ forty, data = Nfldraf2))
Data for Exercises 9.10 and 9.16
Nfldraft
Nfldraft
A data frame/tibble with 29 observations on three variables
rating of each player on a scale out of 10
forty yard dash time (in seconds)
weight of each player (in pounds)
USA Today, April 20, 1994.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(rating ~ forty, data = Nfldraft) cor(Nfldraft$rating, Nfldraft$forty) summary(lm(rating ~ forty, data = Nfldraft))
plot(rating ~ forty, data = Nfldraft) cor(Nfldraft$rating, Nfldraft$forty) summary(lm(rating ~ forty, data = Nfldraft))
Data for Exercise 9.21
Nicotine
Nicotine
A data frame/tibble with eight observations on two variables
nicotine content (in milligrams)
sales figures (in $100,000)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(sales ~ nicotine, data = Nicotine) plot(sales ~ nicotine, data = Nicotine) abline(model, col = "red") summary(model) predict(model, newdata = data.frame(nicotine = 1), interval = "confidence", level = 0.99)
model <- lm(sales ~ nicotine, data = Nicotine) plot(sales ~ nicotine, data = Nicotine) abline(model, col = "red") summary(model) predict(model, newdata = data.frame(nicotine = 1), interval = "confidence", level = 0.99)
Function that computes and draws the area between two user specified values in a user specified normal distribution with a given mean and standard deviation
normarea(lower = -Inf, upper = Inf, m, sig)
normarea(lower = -Inf, upper = Inf, m, sig)
lower |
the lower value |
upper |
the upper value |
m |
the mean for the population |
sig |
the standard deviation of the population |
Alan T. Arnholt
normarea(70, 130, 100, 15) # Finds and P(70 < X < 130) given X is N(100,15).
normarea(70, 130, 100, 15) # Finds and P(70 < X < 130) given X is N(100,15).
Function to determine required sample size to be within a given margin of error.
nsize(b, sigma = NULL, p = 0.5, conf.level = 0.95, type = "mu")
nsize(b, sigma = NULL, p = 0.5, conf.level = 0.95, type = "mu")
b |
the desired bound. |
sigma |
population standard deviation. Not required if using type
|
p |
estimate for the population proportion of successes. Not required
if using type |
conf.level |
confidence level for the problem, restricted to lie between zero and one. |
type |
character string, one of |
Answer is based on a normal approximation when using type "pi"
.
Returns required sample size.
Alan T. Arnholt
nsize(b=.03, p=708/1200, conf.level=.90, type="pi") # Returns the required sample size (n) to estimate the population # proportion of successes with a 0.9 confidence interval # so that the margin of error is no more than 0.03 when the # estimate of the population propotion of successes is 708/1200. # This is problem 5.38 on page 257 of Kitchen's BSDA. nsize(b=.15, sigma=.31, conf.level=.90, type="mu") # Returns the required sample size (n) to estimate the population # mean with a 0.9 confidence interval so that the margin # of error is no more than 0.15. This is Example 5.17 on page # 261 of Kitchen's BSDA.
nsize(b=.03, p=708/1200, conf.level=.90, type="pi") # Returns the required sample size (n) to estimate the population # proportion of successes with a 0.9 confidence interval # so that the margin of error is no more than 0.03 when the # estimate of the population propotion of successes is 708/1200. # This is problem 5.38 on page 257 of Kitchen's BSDA. nsize(b=.15, sigma=.31, conf.level=.90, type="mu") # Returns the required sample size (n) to estimate the population # mean with a 0.9 confidence interval so that the margin # of error is no more than 0.15. This is Example 5.17 on page # 261 of Kitchen's BSDA.
Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph while a Q-Q plot of the actual data is depicted in the center of the graph.
ntester(actual.data)
ntester(actual.data)
actual.data |
a numeric vector. Missing and infinite values are
allowed, but are ignored in the calculation. The length of
|
Q-Q plots of randomly generated normal data of the same size as the tested data are generated and ploted on the perimeter of the graph sheet while a Q-Q plot of the actual data is depicted in the center of the graph. The p-values are calculated form the Shapiro-Wilk W-statistic. Function will only work on numeric vectors containing less than or equal to 5000 observations.
Alan T. Arnholt
Shapiro, S.S. and Wilk, M.B. (1965). An analysis of variance test for normality (complete samples). Biometrika 52 : 591-611.
ntester(rexp(50,1)) # Q-Q plot of random exponential data in center plot # surrounded by 8 Q-Q plots of randomly generated # standard normal data of size 50.
ntester(rexp(50,1)) # Q-Q plot of random exponential data in center plot # surrounded by 8 Q-Q plots of randomly generated # standard normal data of size 50.
Data for Exercise 9.61
Orange
Orange
A data frame/tibble with six observations on two variables
harvest in millions of boxes
average price charged by California growers for a 75-pound box of navel oranges
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(price ~ harvest, data = Orange) model <- lm(price ~ harvest, data = Orange) abline(model, col = "red") summary(model) rm(model)
plot(price ~ harvest, data = Orange) model <- lm(price ~ harvest, data = Orange) abline(model, col = "red") summary(model) rm(model)
Data for Example 1.3
Orioles
Orioles
A data frame/tibble with 27 observations on three variables
a factor with levels Albert
,
Arthur
, B.J.
, Brady
, Cal
, Charles
,
dl-Delino
, dl-Scott
, Doug
, Harold
, Heathcliff
,
Jeff
, Jesse
, Juan
, Lenny
, Mike
, Rich
,
Ricky
, Scott
, Sidney
, Will
, and Willis
a factor with levels Amaral
, Anderson
,
Baines
, Belle
, Bones
, Bordick
, Clark
,
Conine
, Deshields
, Erickson
, Fetters
, Garcia
,
Guzman
, Johns
, Johnson
, Kamieniecki
, Mussina
,
Orosco
, Otanez
, Ponson
, Reboulet
, Rhodes
,
Ripken Jr.
, Slocumb
, Surhoff
,Timlin
, and
Webster
a numeric vector containing each player's salary (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripchart(Orioles$`1999salary`, method = "stack", pch = 19) ## Not run: library(ggplot2) ggplot2::ggplot(data = Orioles, aes(x = `1999salary`)) + geom_dotplot(dotsize = 0.5) + labs(x = "1999 Salary") + theme_bw() ## End(Not run)
stripchart(Orioles$`1999salary`, method = "stack", pch = 19) ## Not run: library(ggplot2) ggplot2::ggplot(data = Orioles, aes(x = `1999salary`)) + geom_dotplot(dotsize = 0.5) + labs(x = "1999 Salary") + theme_bw() ## End(Not run)
Data for Exercise 7.86
Oxytocin
Oxytocin
A data frame/tibble with 11 observations on three variables
a numeric vector indicating each subject
mean arterial blood pressure of subject before receiving oxytocin
mean arterial blood pressure of subject after receiving oxytocin
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
diff = Oxytocin$after - Oxytocin$before qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
diff = Oxytocin$after - Oxytocin$before qqnorm(diff) qqline(diff) shapiro.test(diff) t.test(diff) rm(diff)
Data for Exercise 1.32
Parented
Parented
A data frame/tibble with 200 observations on two variables
a factor with levels 4yr college
degree
, Doctoral degree
, Grad degree
, H.S grad or less
,
Some college
, and Some grad school
a factor with levels mother
and father
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~education + parent, data = Parented) T1 barplot(t(T1), beside = TRUE, legend = TRUE, col = c("blue", "red")) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Parented, aes(x = education, fill = parent)) + geom_bar(position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) + scale_fill_manual(values = c("pink", "blue")) + labs(x = "", y = "") ## End(Not run)
T1 <- xtabs(~education + parent, data = Parented) T1 barplot(t(T1), beside = TRUE, legend = TRUE, col = c("blue", "red")) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Parented, aes(x = education, fill = parent)) + geom_bar(position = "dodge") + theme_bw() + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) + scale_fill_manual(values = c("pink", "blue")) + labs(x = "", y = "") ## End(Not run)
Data for Example 9.3
Patrol
Patrol
A data frame/tibble with ten observations on three variables
number of tickets written per week
patrolperson's experience (in years)
natural log of tickets
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(tickets ~ years, data = Patrol) summary(model) confint(model, level = 0.98)
model <- lm(tickets ~ years, data = Patrol) summary(model) confint(model, level = 0.98)
Data for Exercise 2.20
Pearson
Pearson
A data frame/tibble with 11 observations on three variables
number indicating family of brother and sister pair
height of brother (in inches)
height of sister (in inches)
Pearson, K. and Lee, A. (1902-3), On the Laws of Inheritance in Man, Biometrika, 2, 357.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(brother ~ sister, data = Pearson, col = "lightblue") cor(Pearson$brother, Pearson$sister)
plot(brother ~ sister, data = Pearson, col = "lightblue") cor(Pearson$brother, Pearson$sister)
Data for Exercise 6.95
Phone
Phone
A data frame/tibble with 20 observations on one variable
duration of long distance phone call (in minutes)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Phone$time) qqline(Phone$time) shapiro.test(Phone$time) SIGN.test(Phone$time, md = 5, alternative = "greater")
qqnorm(Phone$time) qqline(Phone$time) shapiro.test(Phone$time) SIGN.test(Phone$time, md = 5, alternative = "greater")
Data for Exercise 1.113
Poison
Poison
A data frame/tibble with 226,361 observations on one variable
a factor with levels Alcohol
,
Cleaning agent
, Cosmetics
, Drugs
, Insecticides
, and
Plants
Centers for Disease Control, Atlanta, Georgia.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~type, data = Poison) T1 par(mar = c(5.1 + 2, 4.1, 4.1, 2.1)) barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(6)) par(mar = c(5.1, 4.1, 4.1, 2.1)) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Poison, aes(x = type, fill = type)) + geom_bar() + theme_bw() + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) + guides(fill = FALSE) ## End(Not run)
T1 <- xtabs(~type, data = Poison) T1 par(mar = c(5.1 + 2, 4.1, 4.1, 2.1)) barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(6)) par(mar = c(5.1, 4.1, 4.1, 2.1)) rm(T1) ## Not run: library(ggplot2) ggplot2::ggplot(data = Poison, aes(x = type, fill = type)) + geom_bar() + theme_bw() + theme(axis.text.x = element_text(angle = 85, vjust = 0.5)) + guides(fill = FALSE) ## End(Not run)
Data for Example 8.3
Politic
Politic
A data frame/tibble with 250 observations on two variables
a factor with levels republican
, democrat
, and other
a factor with levels female
and male
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~party + gender, data = Politic) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~party + gender, data = Politic) T1 chisq.test(T1) rm(T1)
Data for Exercise 5.59
Pollutio
Pollutio
A data frame/tibble with 15 observations on one variable
air pollution index
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Pollutio$inde) t.test(Pollutio$inde, conf.level = 0.98)$conf
stem(Pollutio$inde) t.test(Pollutio$inde, conf.level = 0.98)$conf
Data for Exercise 5.86
Porosity
Porosity
A data frame/tibble with 20 observations on one variable
porosity measurement (percent)
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2nd edition, pages 63-65.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Porosity$porosity) fivenum(Porosity$porosity) boxplot(Porosity$porosity, col = "lightgreen")
stem(Porosity$porosity) fivenum(Porosity$porosity) boxplot(Porosity$porosity, col = "lightgreen")
Data for Exercise 9.11 and 9.17
Poverty
Poverty
A data frame/tibble with 20 observations on four variables
a factor with levels Atlanta
,
Buffalo
, Cincinnati
, Cleveland
, Dayton, O
,
Detroit
, Flint, Mich
, Fresno, C
, Gary, Ind
,
Hartford, C
, Laredo
, Macon, Ga
, Miami
,
Milwaukee
, New Orleans
, Newark, NJ
, Rochester,NY
,
Shreveport
, St. Louis
, and Waco, Tx
percent of children living in poverty
crime rate (per 1000 people)
population of city
Children's Defense Fund and the Bureau of Justice Statistics.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(poverty ~ crime, data = Poverty) model <- lm(poverty ~ crime, data = Poverty) abline(model, col = "red") summary(model) rm(model)
plot(poverty ~ crime, data = Poverty) model <- lm(poverty ~ crime, data = Poverty) abline(model, col = "red") summary(model) rm(model)
Data for Exercise 2.2 and 2.38
Precinct
Precinct
A data frame/tibble with eight observations on two variables
robbery rate (per 1000 people)
percent with low income
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(rate ~ income, data = Precinct) model <- (lm(rate ~ income, data = Precinct)) abline(model, col = "red") rm(model)
plot(rate ~ income, data = Precinct) model <- (lm(rate ~ income, data = Precinct)) abline(model, col = "red") rm(model)
Data for Exercise 5.10 and 5.22
Prejudic
Prejudic
A data frame with 25 observations on one variable
racial prejudice score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Prejudic$prejud) EDA(Prejudic$prejud)
stem(Prejudic$prejud) EDA(Prejudic$prejud)
Data for Exercise 1.126
Presiden
Presiden
A data frame/tibble with 43 observations on five variables
a factor with levels A.
, B.
,
C.
, D.
, F.
, G.
, G. W.
, H.
, J.
,
L.
, M.
, R.
, T.
, U.
, W.
, and Z.
a factor with levels Adams
, Arthur
,
Buchanan
, Bush
, Carter
, Cleveland
, Clinton
,
Coolidge
, Eisenhower
, Fillmore
, Ford
,
Garfield
, Grant
, Harding
, Harrison
, Hayes
,
Hoover
, Jackson
, Jefferson
, Johnson
, Kennedy
,
Lincoln
, Madison
, McKinley
, Monroe
, Nixon
,
Pierce
, Polk
, Reagan
, Roosevelt
, Taft
,
Taylor
, Truman
, Tyler
, VanBuren
, Washington
, and
Wilson
a factor with levels ARK
,
CAL
, CONN
, GA
, IA
, ILL
, KY
, MASS
,
MO
, NC
, NEB
, NH
, NJ
, NY
, OH
,
PA
, SC
, TEX
, VA
, and VT
President's age at inauguration
President's age at death
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
pie(xtabs(~birth_state, data = Presiden)) stem(Presiden$inaugural_age) stem(Presiden$death_age) par(mar = c(5.1, 4.1 + 3, 4.1, 2.1)) stripchart(x=list(Presiden$inaugural_age, Presiden$death_age), method = "stack", col = c("green","brown"), pch = 19, las = 1) par(mar = c(5.1, 4.1, 4.1, 2.1))
pie(xtabs(~birth_state, data = Presiden)) stem(Presiden$inaugural_age) stem(Presiden$death_age) par(mar = c(5.1, 4.1 + 3, 4.1, 2.1)) stripchart(x=list(Presiden$inaugural_age, Presiden$death_age), method = "stack", col = c("green","brown"), pch = 19, las = 1) par(mar = c(5.1, 4.1, 4.1, 2.1))
Data for Exercise 9.55
Press
Press
A data frame/tibble with 20 observations on two variables
years of education
degree of confidence in the press (the higher the score, the more confidence)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(confidence ~ education_yrs, data = Press) model <- lm(confidence ~ education_yrs, data = Press) abline(model, col = "purple") summary(model) rm(model)
plot(confidence ~ education_yrs, data = Press) model <- lm(confidence ~ education_yrs, data = Press) abline(model, col = "purple") summary(model) rm(model)
Data for Exercise 6.61
Prognost
Prognost
A data frame/tibble with 15 observations on one variable
Kloper's Prognostic Rating Scale score
Newmark, C., et al. (1973), Predictive Validity of the Rorschach Prognostic Rating Scale with Behavior Modification Techniques, Journal of Clinical Psychology, 29, 246-248.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Prognost$kprs_score) t.test(Prognost$kprs_score, mu = 9)
EDA(Prognost$kprs_score) t.test(Prognost$kprs_score, mu = 9)
Data for Exercise 10.17
Program
Program
A data frame/tibble with 44 observations on two variables
a character variable with values method1
, method2
,
method3
, and method4
standardized test score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ method, col = c("red", "blue", "green", "yellow"), data = Program) anova(lm(score ~ method, data = Program)) TukeyHSD(aov(score ~ method, data = Program)) par(mar = c(5.1, 4.1 + 4, 4.1, 2.1)) plot(TukeyHSD(aov(score ~ method, data = Program)), las = 1) par(mar = c(5.1, 4.1, 4.1, 2.1))
boxplot(score ~ method, col = c("red", "blue", "green", "yellow"), data = Program) anova(lm(score ~ method, data = Program)) TukeyHSD(aov(score ~ method, data = Program)) par(mar = c(5.1, 4.1 + 4, 4.1, 2.1)) plot(TukeyHSD(aov(score ~ method, data = Program)), las = 1) par(mar = c(5.1, 4.1, 4.1, 2.1))
Data for Exercise 2.50
Psat
Psat
A data frame/tibble with seven observations on the two variables
PSAT score
SAT score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(sat ~ psat, data = Psat) par(mfrow = c(1, 2)) plot(Psat$psat, resid(model)) plot(model, which = 1) rm(model) par(mfrow = c(1, 1))
model <- lm(sat ~ psat, data = Psat) par(mfrow = c(1, 2)) plot(Psat$psat, resid(model)) plot(model, which = 1) rm(model) par(mfrow = c(1, 1))
Data for Exercise 1.42
Psych
Psych
A data frame/tibble with 23 observations on one variable
number of correct repsonses in a psychology experiment
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Psych$score) EDA(Psych$score)
stem(Psych$score) EDA(Psych$score)
Data for Exercise 5.22 and 5.65
Puerto
Puerto
A data frame/tibble with 50 observations on one variable
weekly family income (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Puerto$income) boxplot(Puerto$income, col = "purple") t.test(Puerto$income,conf.level = .90)$conf
stem(Puerto$income) boxplot(Puerto$income, col = "purple") t.test(Puerto$income,conf.level = .90)$conf
Data for Exercise 1.53, 1.77, 1.88, 5.66, and 7.50
Quail
Quail
A data frame/tibble with 40 observations on two variables
a character variable with values placebo
and treatment
low-density lipoprotein (LDL) cholestrol level
J. McKean, and T. Vidmar (1994), "A Comparison of Two Rank-Based Methods for the Analysis of Linear Models," The American Statistician, 48, 220-229.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(level ~ group, data = Quail, horizontal = TRUE, xlab = "LDL Level", col = c("yellow", "lightblue"))
boxplot(level ~ group, data = Quail, horizontal = TRUE, xlab = "LDL Level", col = c("yellow", "lightblue"))
Data for Exercise 7.81
Quality
Quality
A data frame/tibble with 15 observations on two variables
a character variable with values Process1
and Process2
results of a quality control test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ process, data = Quality, col = "lightgreen") t.test(score ~ process, data = Quality)
boxplot(score ~ process, data = Quality, col = "lightgreen") t.test(score ~ process, data = Quality)
Data for Exercise 9.8
Rainks
Rainks
A data frame/tibble with 35 observations on five variables
rainfall (in inches)
rainfall (in inches)
rainfall (in inches)
rainfall (in inches)
rainfall (in inches)
R. Picard, K. Berk (1990), Data Splitting, The American Statistician, 44, (2), 140-147.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
cor(Rainks) model <- lm(rain ~ x2, data = Rainks) summary(model)
cor(Rainks) model <- lm(rain ~ x2, data = Rainks) summary(model)
Data for Exercise 9.36 and Example 9.8
Randd
Randd
A data frame/tibble with 12 observations on two variables
research and development expenditures (in million dollars)
sales (in million dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(sales ~ rd, data = Randd) model <- lm(sales ~ rd, data = Randd) abline(model, col = "purple") summary(model) plot(model, which = 1) rm(model)
plot(sales ~ rd, data = Randd) model <- lm(sales ~ rd, data = Randd) abline(model, col = "purple") summary(model) plot(model, which = 1) rm(model)
Data for Exercise 1.52, 1.76, 5.62, and 6.44
Rat
Rat
A data frame/tibble with 20 observations on one variable
survival time in weeks for rats exposed to a high level of radiation
J. Lawless, Statistical Models and Methods for Lifetime Data (New York: Wiley, 1982).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Rat$survival_time) qqnorm(Rat$survival_time) qqline(Rat$survival_time) summary(Rat$survival_time) t.test(Rat$survival_time) t.test(Rat$survival_time, mu = 100, alternative = "greater")
hist(Rat$survival_time) qqnorm(Rat$survival_time) qqline(Rat$survival_time) summary(Rat$survival_time) t.test(Rat$survival_time) t.test(Rat$survival_time, mu = 100, alternative = "greater")
Data for Example 2.6
Ratings
Ratings
A data frame/tibble with 250 observations on two variables
character variable with students' ratings of instructor (A-F)
students' grade point average
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(gpa ~ rating, data = Ratings, xlab = "Student rating of instructor", ylab = "Student GPA") ## Not run: library(ggplot2) ggplot2::ggplot(data = Ratings, aes(x = rating, y = gpa, fill = rating)) + geom_boxplot() + theme_bw() + theme(legend.position = "none") + labs(x = "Student rating of instructor", y = "Student GPA") ## End(Not run)
boxplot(gpa ~ rating, data = Ratings, xlab = "Student rating of instructor", ylab = "Student GPA") ## Not run: library(ggplot2) ggplot2::ggplot(data = Ratings, aes(x = rating, y = gpa, fill = rating)) + geom_boxplot() + theme_bw() + theme(legend.position = "none") + labs(x = "Student rating of instructor", y = "Student GPA") ## End(Not run)
Data for Example 6.11
Reaction
Reaction
A data frame/tibble with 12 observations on one variable
threshold reaction time (in seconds) for persons subjected to emotional stress
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Reaction$time) SIGN.test(Reaction$time, md = 15, alternative = "less")
stem(Reaction$time) SIGN.test(Reaction$time, md = 15, alternative = "less")
Data for Exercise 1.72 and 2.10
Reading
Reading
A data frame/tibble with 30 observations on four variables
standardized reading test score
sorted values of score
trimmed values of sorted
winsorized values of score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Reading$score, main = "Exercise 1.72", col = "lightgreen", xlab = "Standardized reading score") summary(Reading$score) sd(Reading$score)
hist(Reading$score, main = "Exercise 1.72", col = "lightgreen", xlab = "Standardized reading score") summary(Reading$score) sd(Reading$score)
Data for Exercises 2.10 and 2.53
Readiq
Readiq
A data frame/tibble with 14 observations on two variables
reading achievement score
IQ score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(reading ~ iq, data = Readiq) model <- lm(reading ~ iq, data = Readiq) abline(model, col = "purple") predict(model, newdata = data.frame(iq = c(100, 120))) residuals(model)[c(6, 7)] rm(model)
plot(reading ~ iq, data = Readiq) model <- lm(reading ~ iq, data = Readiq) abline(model, col = "purple") predict(model, newdata = data.frame(iq = c(100, 120))) residuals(model)[c(6, 7)] rm(model)
Data for Exercise 8.20
Referend
Referend
A data frame with 237 observations on two variables
a factor with levels A
, B
, and C
a factor with levels for
, against
, and undecided
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~choice + response, data = Referend) T1 chisq.test(T1) chisq.test(T1)$expected
T1 <- xtabs(~choice + response, data = Referend) T1 chisq.test(T1) chisq.test(T1)$expected
Data for Exercise 10.26
Region
Region
A data frame/tibble with 48 observations on three variables
pollution index
region of a county (west
, central
, and east
)
ranked values of pollution
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(pollution ~ region, data = Region, col = "gray") anova(lm(pollution ~ region, data = Region))
boxplot(pollution ~ region, data = Region, col = "gray") anova(lm(pollution ~ region, data = Region))
Data for Exercise 2.3, 2.39, and 2.54
Register
Register
A data frame/tibble with nine observations on two variables
age of cash register (in years)
maintenance cost of cash register (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(cost ~ age, data = Register) model <- lm(cost ~ age, data = Register) abline(model, col = "red") predict(model, newdata = data.frame(age = c(5, 10))) plot(model, which = 1) rm(model)
plot(cost ~ age, data = Register) model <- lm(cost ~ age, data = Register) abline(model, col = "red") predict(model, newdata = data.frame(age = c(5, 10))) plot(model, which = 1) rm(model)
Data for Exercise 7.61
Rehab
Rehab
A data frame/tibble with 20 observations on four variables
inmate identification number
rating from first psychiatrist on the inmates rehabilative potential
rating from second psychiatrist on the inmates rehabilative potential
psych1
- psych2
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Rehab$differ) qqnorm(Rehab$differ) qqline(Rehab$differ) t.test(Rehab$differ)
boxplot(Rehab$differ) qqnorm(Rehab$differ) qqline(Rehab$differ) t.test(Rehab$differ)
Data for Exercise 7.43
Remedial
Remedial
A data frame/tibble with 84 observations on two variables
a character variable with values female
and male
math placement score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ gender, data = Remedial, col = c("purple", "blue")) t.test(score ~ gender, data = Remedial, conf.level = 0.98) t.test(score ~ gender, data = Remedial, conf.level = 0.98)$conf wilcox.test(score ~ gender, data = Remedial, conf.int = TRUE, conf.level = 0.98)
boxplot(score ~ gender, data = Remedial, col = c("purple", "blue")) t.test(score ~ gender, data = Remedial, conf.level = 0.98) t.test(score ~ gender, data = Remedial, conf.level = 0.98)$conf wilcox.test(score ~ gender, data = Remedial, conf.int = TRUE, conf.level = 0.98)
Data for Exercise 1.122
Rentals
Rentals
A data frame/tibble with 45 observations on one variable
weekly apartment rental price (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Rentals$rent) sum(Rentals$rent < mean(Rentals$rent) - 3*sd(Rentals$rent) | Rentals$rent > mean(Rentals$rent) + 3*sd(Rentals$rent))
stem(Rentals$rent) sum(Rentals$rent < mean(Rentals$rent) - 3*sd(Rentals$rent) | Rentals$rent > mean(Rentals$rent) + 3*sd(Rentals$rent))
Data for Exercise 5.77
Repair
Repair
A data frame/tibble with 22 observations on one variable
time to repair a wrecked in car (in hours)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Repair$time) SIGN.test(Repair$time, conf.level = 0.98)
stem(Repair$time) SIGN.test(Repair$time, conf.level = 0.98)
Data for Exercise 9.59
Retail
Retail
A data frame/tibble with 10 observations on two variables
length of employment (in months)
employee gross sales (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(sales ~ months, data = Retail) model <- lm(sales ~ months, data = Retail) abline(model, col = "blue") summary(model)
plot(sales ~ months, data = Retail) model <- lm(sales ~ months, data = Retail) abline(model, col = "blue") summary(model)
Data for Exercise 2.9
Ronbrown1
Ronbrown1
A data frame/tibble with 75 observations on two variables
ocen depth (in meters)
ocean temperature (in Celsius)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(temperature ~ depth, data = Ronbrown1, ylab = "Temperature")
plot(temperature ~ depth, data = Ronbrown1, ylab = "Temperature")
Data for Exercise 2.56 and Example 2.4
Ronbrown2
Ronbrown2
A data frame/tibble with 150 observations on three variables
ocean depth (in meters)
ocean temperature (in Celcius)
ocean salinity level
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(salinity ~ depth, data = Ronbrown2) model <- lm(salinity ~ depth, data = Ronbrown2) summary(model) plot(model, which = 1) rm(model)
plot(salinity ~ depth, data = Ronbrown2) model <- lm(salinity ~ depth, data = Ronbrown2) summary(model) plot(model, which = 1) rm(model)
Data for Example 7.16
Rural
Rural
A data frame/tibble with 33 observations on two variables
child's social adjustment score
character variable with values city
and rural
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ area, data = Rural) wilcox.test(score ~ area, data = Rural) ## Not run: library(dplyr) Rural <- dplyr::mutate(Rural, r = rank(score)) Rural t.test(r ~ area, data = Rural) ## End(Not run)
boxplot(score ~ area, data = Rural) wilcox.test(score ~ area, data = Rural) ## Not run: library(dplyr) Rural <- dplyr::mutate(Rural, r = rank(score)) Rural t.test(r ~ area, data = Rural) ## End(Not run)
Data for Exercise 3.66
Salary
Salary
A data frame/tibble with 25 observations on one variable
starting salary for Ph.D. psycholgists (in dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Salary$salary, pch = 19, col = "purple") qqline(Salary$salary, col = "blue")
qqnorm(Salary$salary, pch = 19, col = "purple") qqline(Salary$salary, col = "blue")
Data for Exercise 5.27 and 5.64
Salinity
Salinity
A data frame/tibble with 48 observations on one variable
surface-water salinity value
J. Davis, Statistics and Data Analysis in Geology, 2nd ed. (New York: John Wiley, 1986).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Salinity$salinity) qqnorm(Salinity$salinity, pch = 19, col = "purple") qqline(Salinity$salinity, col = "blue") t.test(Salinity$salinity, conf.level = 0.99) t.test(Salinity$salinity, conf.level = 0.99)$conf
stem(Salinity$salinity) qqnorm(Salinity$salinity, pch = 19, col = "purple") qqline(Salinity$salinity, col = "blue") t.test(Salinity$salinity, conf.level = 0.99) t.test(Salinity$salinity, conf.level = 0.99)$conf
Data for Statistical Insight Chapter 9
Sat
Sat
A data frame/tibble with 102 observations on seven variables
U.S. state
verbal SAT score
math SAT score
combined verbal and math SAT score
percent of high school seniors taking the SAT
state expenditure per student (in dollars)
year
The 2000 World Almanac and Book of Facts, Funk and Wagnalls Corporation, New Jersey.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Sat94 <- Sat[Sat$year == 1994, ] Sat94 Sat99 <- subset(Sat, year == 1999) Sat99 stem(Sat99$total) plot(total ~ percent, data = Sat99) model <- lm(total ~ percent, data = Sat99) abline(model, col = "blue") summary(model) rm(model)
Sat94 <- Sat[Sat$year == 1994, ] Sat94 Sat99 <- subset(Sat, year == 1999) Sat99 stem(Sat99$total) plot(total ~ percent, data = Sat99) model <- lm(total ~ percent, data = Sat99) abline(model, col = "blue") summary(model) rm(model)
Data for Exercise 10.34 and 10.49
Saving
Saving
A data frame/tibble with 65 observations on two variables
problem-asset-ratio for Savings & Loans that were listed as being financially troubled in 1992
U.S. state
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(par ~ state, data = Saving, col = "red") boxplot(par ~ state, data = Saving, log = "y", col = "red") model <- aov(par ~ state, data = Saving) summary(model) plot(TukeyHSD(model)) kruskal.test(par ~ factor(state), data = Saving)
boxplot(par ~ state, data = Saving, col = "red") boxplot(par ~ state, data = Saving, log = "y", col = "red") model <- aov(par ~ state, data = Saving) summary(model) plot(TukeyHSD(model)) kruskal.test(par ~ factor(state), data = Saving)
Data for Exercise 1.89
Scales
Scales
A data frame/tibble with 20 observations on two variables
variable indicating brand of bathroom scale (A
, B
, C
, or D
)
recorded value (in pounds) of a 100 pound weight
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(reading ~ brand, data = Scales, col = rainbow(4), ylab = "Weight (lbs)") ## Not run: library(ggplot2) ggplot2::ggplot(data = Scales, aes(x = brand, y = reading, fill = brand)) + geom_boxplot() + labs(y = "weight (lbs)") + theme_bw() + theme(legend.position = "none") ## End(Not run)
boxplot(reading ~ brand, data = Scales, col = rainbow(4), ylab = "Weight (lbs)") ## Not run: library(ggplot2) ggplot2::ggplot(data = Scales, aes(x = brand, y = reading, fill = brand)) + geom_boxplot() + labs(y = "weight (lbs)") + theme_bw() + theme(legend.position = "none") ## End(Not run)
Data for Exercise 6.99
Schizop2
Schizop2
A data frame/tibble with 17 observations on one variable
schizophrenics score on a second standardized exam
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Schizop2$score, xlab = "score on standardized test after a tranquilizer", main = "Exercise 6.99", breaks = 10, col = "orange") EDA(Schizop2$score) SIGN.test(Schizop2$score, md = 22, alternative = "greater")
hist(Schizop2$score, xlab = "score on standardized test after a tranquilizer", main = "Exercise 6.99", breaks = 10, col = "orange") EDA(Schizop2$score) SIGN.test(Schizop2$score, md = 22, alternative = "greater")
Data for Example 6.10
Schizoph
Schizoph
A data frame/tibble with 13 observations on one variable
schizophrenics score on a standardized exam one hour after recieving a specified dose of a tranqilizer.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Schizoph$score, xlab = "score on standardized test", main = "Example 6.10", breaks = 10, col = "orange") EDA(Schizoph$score) t.test(Schizoph$score, mu = 20)
hist(Schizoph$score, xlab = "score on standardized test", main = "Example 6.10", breaks = 10, col = "orange") EDA(Schizoph$score) t.test(Schizoph$score, mu = 20)
Data for Exercise 8.24
Seatbelt
Seatbelt
A data frame/tibble with 86,759 observations on two variables
a factor with levels No
and Yes
a factor with levels None
, Minimal
,
Minor
, or Major
indicating the extent of the drivers injuries
Jobson, J. (1982), Applied Multivariate Data Analysis, Springer-Verlag, New York, p. 18.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~seatbelt + injuries, data = Seatbelt) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~seatbelt + injuries, data = Seatbelt) T1 chisq.test(T1) rm(T1)
Data for Example 7.19
Selfdefe
Selfdefe
A data frame/tibble with nine observations on three variables
number identifying the woman
before the course self-confidence score
after the course self-confidence score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
Selfdefe$differ <- Selfdefe$after - Selfdefe$before Selfdefe t.test(Selfdefe$differ, alternative = "greater")
Selfdefe$differ <- Selfdefe$after - Selfdefe$before Selfdefe t.test(Selfdefe$differ, alternative = "greater")
Data for Exercise 1.83 and 3.67
Senior
Senior
A data frame/tibble with 31 observations on one variable
reaction time for senior citizens applying for a driver's license renewal
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Senior$reaction) fivenum(Senior$reaction) boxplot(Senior$reaction, main = "Problem 1.83, part d", horizontal = TRUE, col = "purple")
stem(Senior$reaction) fivenum(Senior$reaction) boxplot(Senior$reaction, main = "Problem 1.83, part d", horizontal = TRUE, col = "purple")
Data for Exercise 1.123
Sentence
Sentence
A data frame/tibble with 41 observations on one variable
sentence length (in months) for prisoners convicted of homocide
U.S. Department of Justice, Bureau of Justice Statistics, Prison Sentences and Time Served for Violence, NCJ-153858, April 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Sentence$months) ll <- mean(Sentence$months)-2*sd(Sentence$months) ul <- mean(Sentence$months)+2*sd(Sentence$months) limits <- c(ll, ul) limits rm(ul, ll, limits)
stem(Sentence$months) ll <- mean(Sentence$months)-2*sd(Sentence$months) ul <- mean(Sentence$months)+2*sd(Sentence$months) limits <- c(ll, ul) limits rm(ul, ll, limits)
Data for Exercises 10.11 and 10.12
Shkdrug
Shkdrug
A data frame/tibble with 64 observations on two variables
type of treament Drug/NoS
, Drug/Shk
,
NoDg/NoS
, or NoDrug/S
number of tasks completed in a 10-minute period
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(response ~ treatment, data = Shkdrug, col = "gray") model <- lm(response ~ treatment, data = Shkdrug) anova(model) rm(model)
boxplot(response ~ treatment, data = Shkdrug, col = "gray") model <- lm(response ~ treatment, data = Shkdrug) anova(model) rm(model)
Data for Exercise 10.50
Shock
Shock
A data frame/tibble with 27 observations on two variables
grouping variable with values of Group1
(no shock),
Group2
(medium shock), and Group3
(severe shock)
number of attempts to complete a task
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(attempts ~ group, data = Shock, col = "violet") model <- lm(attempts ~ group, data = Shock) anova(model) rm(model)
boxplot(attempts ~ group, data = Shock, col = "violet") model <- lm(attempts ~ group, data = Shock) anova(model) rm(model)
Data for Exercise 9.58
Shoplift
Shoplift
A data frame/tibble with eight observations on two variables
sales (in 1000 dollars)
loss (in 100 dollars)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(loss ~ sales, data = Shoplift) model <- lm(loss ~ sales, data = Shoplift) summary(model) rm(model)
plot(loss ~ sales, data = Shoplift) model <- lm(loss ~ sales, data = Shoplift) summary(model) rm(model)
Data for Exercise 6.65
Short
Short
A data frame/tibble with 158 observations on two variables
sample number
parallax measurements (seconds of a degree)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Short$parallax, main = "Problem 6.65", xlab = "", col = "orange") SIGN.test(Short$parallax, md = 8.798) t.test(Short$parallax, mu = 8.798)
hist(Short$parallax, main = "Problem 6.65", xlab = "", col = "orange") SIGN.test(Short$parallax, md = 8.798) t.test(Short$parallax, mu = 8.798)
Data for Exercise 9.20
Shuttle
Shuttle
A data frame/tibble with 15 observations on two variables
number of shuttle riders
number of automobiles in the downtown area
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(autos ~ users, data = Shuttle) model <- lm(autos ~ users, data = Shuttle) summary(model) rm(model)
plot(autos ~ users, data = Shuttle) model <- lm(autos ~ users, data = Shuttle) summary(model) rm(model)
This function will test a hypothesis based on the sign test and reports linearly interpolated confidence intervals for one sample problems.
SIGN.test( x, y = NULL, md = 0, alternative = "two.sided", conf.level = 0.95, ... )
SIGN.test( x, y = NULL, md = 0, alternative = "two.sided", conf.level = 0.95, ... )
x |
numeric vector; |
y |
optional numeric vector; |
md |
a single number representing the value of the population median specified by the null hypothesis |
alternative |
is a character string, one of |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
... |
further arguments to be passed to or from methods |
Computes a “Dependent-samples Sign-Test” if both x
and
y
are provided. If only x
is provided, computes the
“Sign-Test”.
A list of class htest_S
, containing the following components:
statistic |
the S-statistic (the number of positive differences between the data and the hypothesized median), with names attribute “S”. |
p.value |
the p-value for the test |
conf.int |
is a confidence interval (vector of length 2) for the true
median based on linear interpolation. The confidence level is recorded in the attribute
|
estimate |
is avector of length 1, giving the sample median; this
estimates the corresponding population parameter. Component |
null.value |
is the value of the median specified by the null hypothesis.
This equals the input argument |
alternative |
records the value of the input argument alternative:
|
data.name |
a character string (vector of length 1)
containing the actual name of the input vector |
Confidence.Intervals |
a 3 by 3 matrix containing the lower achieved confidence interval, the interpolated confidence interval, and the upper achived confidence interval |
For the one-sample sign-test, the null hypothesis
is that the median of the population from which x
is drawn is
md
. For the two-sample dependent case, the null hypothesis is that
the median for the differences of the populations from which x
and
y
are drawn is md
. The alternative hypothesis indicates the
direction of divergence of the population median for x
from md
(i.e., "greater"
, "less"
, "two.sided"
.)
The reported confidence interval is based on linear interpolation. The lower and upper confidence levels are exact.
Alan T. Arnholt
Gibbons, J.D. and Chakraborti, S. (1992). Nonparametric Statistical Inference. Marcel Dekker Inc., New York.
Kitchens, L.J.(2003). Basic Statistics and Data Analysis. Duxbury.
Conover, W. J. (1980). Practical Nonparametric Statistics, 2nd ed. Wiley, New York.
Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. Holden and Day, San Francisco.
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) SIGN.test(x, md = 6.5) # Computes two-sided sign-test for the null hypothesis # that the population median for 'x' is 6.5. The alternative # hypothesis is that the median is not 6.5. An interpolated 95% # confidence interval for the population median will be computed. reaction <- c(14.3, 13.7, 15.4, 14.7, 12.4, 13.1, 9.2, 14.2, 14.4, 15.8, 11.3, 15.0) SIGN.test(reaction, md = 15, alternative = "less") # Data from Example 6.11 page 330 of Kitchens BSDA. # Computes one-sided sign-test for the null hypothesis # that the population median is 15. The alternative # hypothesis is that the median is less than 15. # An interpolated upper 95% upper bound for the population # median will be computed.
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) SIGN.test(x, md = 6.5) # Computes two-sided sign-test for the null hypothesis # that the population median for 'x' is 6.5. The alternative # hypothesis is that the median is not 6.5. An interpolated 95% # confidence interval for the population median will be computed. reaction <- c(14.3, 13.7, 15.4, 14.7, 12.4, 13.1, 9.2, 14.2, 14.4, 15.8, 11.3, 15.0) SIGN.test(reaction, md = 15, alternative = "less") # Data from Example 6.11 page 330 of Kitchens BSDA. # Computes one-sided sign-test for the null hypothesis # that the population median is 15. The alternative # hypothesis is that the median is less than 15. # An interpolated upper 95% upper bound for the population # median will be computed.
Data for Example 1.18
Simpson
Simpson
A data frame/tibble with 100 observations on three variables
grade point average
sport played (basketball, soccer, or track)
athlete sex (male, female)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(gpa ~ gender, data = Simpson, col = "violet") boxplot(gpa ~ sport, data = Simpson, col = "lightgreen") ## Not run: library(ggplot2) ggplot2::ggplot(data = Simpson, aes(x = gender, y = gpa, fill = gender)) + geom_boxplot() + facet_grid(.~sport) + theme_bw() ## End(Not run)
boxplot(gpa ~ gender, data = Simpson, col = "violet") boxplot(gpa ~ sport, data = Simpson, col = "lightgreen") ## Not run: library(ggplot2) ggplot2::ggplot(data = Simpson, aes(x = gender, y = gpa, fill = gender)) + geom_boxplot() + facet_grid(.~sport) + theme_bw() ## End(Not run)
Data for Exercise 1.47
Situp
Situp
A data frame/tibble with 20 observations on one variable
maximum number of situps completed in an exercise class after 1 month in the program
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Situp$number) hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE) hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE, freq = FALSE, col = "pink", main = "Problem 1.47", xlab = "Maximum number of situps") lines(density(Situp$number), col = "red")
stem(Situp$number) hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE) hist(Situp$number, breaks = seq(0, 70, 10), right = FALSE, freq = FALSE, col = "pink", main = "Problem 1.47", xlab = "Maximum number of situps") lines(density(Situp$number), col = "red")
Data for Exercise 7.65
Skewed
Skewed
A data frame/tibble with 21 observations on two variables
values from a sample of size 16 from a particular population
values from a sample of size 14 from a particular population
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Skewed$C1, Skewed$C2, col = c("pink", "lightblue")) wilcox.test(Skewed$C1, Skewed$C2)
boxplot(Skewed$C1, Skewed$C2, col = c("pink", "lightblue")) wilcox.test(Skewed$C1, Skewed$C2)
Data for Exercise 5.20
Skin
Skin
A data frame/tibble with 11 observations on four variables
patient identification number
graft survival time in days for a closely matched skin graft on the same burn patient
graft survival time in days for a poorly matched skin graft on the same burn patient
difference between close and poor (in days)
R. F. Woolon and P. A. Lachenbruch, "Rank Tests for Censored Matched Pairs," Biometrika, 67(1980), 597-606.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Skin$differ) boxplot(Skin$differ, col = "pink") summary(Skin$differ)
stem(Skin$differ) boxplot(Skin$differ, col = "pink") summary(Skin$differ)
Data for Exercise 5.116
Slc
Slc
A data frame/tibble with 190 observations on one variable
Red blood cell sodium-lithium countertransport
Roeder, K., (1994), "A Graphical Technique for Determining the Number of Components in a Mixture of Normals," Journal of the American Statistical Association, 89, 497-495.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Slc$slc) hist(Slc$slc, freq = FALSE, xlab = "sodium lithium countertransport", main = "", col = "lightblue") lines(density(Slc$slc), col = "purple")
EDA(Slc$slc) hist(Slc$slc, freq = FALSE, xlab = "sodium lithium countertransport", main = "", col = "lightblue") lines(density(Slc$slc), col = "purple")
Data for Exercises 6.40, 6.59, 7.10, and 7.35
Smokyph
Smokyph
A data frame/tibble with 75 observations on three variables
water sample pH level
charater variable with values low
(elevation below 0.6 miles),
and high
(elevation above 0.6 miles)
elevation in miles
Schmoyer, R. L. (1994), Permutation Tests for Correlation in Regression Errors, Journal of the American Statistical Association, 89, 1507-1516.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(Smokyph$waterph) tapply(Smokyph$waterph, Smokyph$code, mean) stripchart(waterph ~ code, data = Smokyph, method = "stack", pch = 19, col = c("red", "blue")) t.test(Smokyph$waterph, mu = 7) SIGN.test(Smokyph$waterph, md = 7) t.test(waterph ~ code, data = Smokyph, alternative = "less") t.test(waterph ~ code, data = Smokyph, conf.level = 0.90) ## Not run: library(ggplot2) ggplot2::ggplot(data = Smokyph, aes(x = waterph, fill = code)) + geom_dotplot() + facet_grid(code ~ .) + guides(fill = FALSE) ## End(Not run)
summary(Smokyph$waterph) tapply(Smokyph$waterph, Smokyph$code, mean) stripchart(waterph ~ code, data = Smokyph, method = "stack", pch = 19, col = c("red", "blue")) t.test(Smokyph$waterph, mu = 7) SIGN.test(Smokyph$waterph, md = 7) t.test(waterph ~ code, data = Smokyph, alternative = "less") t.test(waterph ~ code, data = Smokyph, conf.level = 0.90) ## Not run: library(ggplot2) ggplot2::ggplot(data = Smokyph, aes(x = waterph, fill = code)) + geom_dotplot() + facet_grid(code ~ .) + guides(fill = FALSE) ## End(Not run)
Data for Exercise 8.21
Snore
Snore
A data frame/tibble with 2,484 observations on two variables
factor with levels nonsnorer
, ocassional snorer
,
nearly every night
, and snores every night
factor indicating whether the indiviudal has heart disease
(no
or yes
)
Norton, P. and Dunn, E. (1985), Snoring as a Risk Factor for Disease, British Medical Journal, 291, 630-632.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~ heartdisease + snore, data = Snore) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~ heartdisease + snore, data = Snore) T1 chisq.test(T1) rm(T1)
Data for Exercise 7.87
Snow
Snow
A data frame/tibble with 34 observations on two variables
concentration of microparticles from melted snow (in parts per billion)
location of snow sample (Antarctica
or Greenland
)
Davis, J., Statistics and Data Analysis in Geology, John Wiley, New York.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(concent ~ site, data = Snow, col = c("lightblue", "lightgreen"))
boxplot(concent ~ site, data = Snow, col = c("lightblue", "lightgreen"))
Data for Exercise 1.46
Soccer
Soccer
A data frame/tibble with 25 observations on one variable
soccer players weight (in pounds)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Soccer$weight, scale = 2) hist(Soccer$weight, breaks = seq(110, 210, 10), col = "orange", main = "Problem 1.46 \n Weights of Soccer Players", xlab = "weight (lbs)", right = FALSE)
stem(Soccer$weight, scale = 2) hist(Soccer$weight, breaks = seq(110, 210, 10), col = "orange", main = "Problem 1.46 \n Weights of Soccer Players", xlab = "weight (lbs)", right = FALSE)
Data for Exercise 6.63
Social
Social
A data frame/tibble with 25 observations on one variable
annual income (in dollars) of North Carolina social workers with less than five years experience.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Social$income, md = 27500, alternative = "less")
SIGN.test(Social$income, md = 27500, alternative = "less")
Data for Exercise 2.42
Sophomor
Sophomor
A data frame/tibble with 20 observations on four variables
identification number
grade point average
SAT math score
final exam grade in college algebra
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
cor(Sophomor) plot(exam ~ gpa, data = Sophomor) ## Not run: library(ggplot2) ggplot2::ggplot(data = Sophomor, aes(x = gpa, y = exam)) + geom_point() ggplot2::ggplot(data = Sophomor, aes(x = sat, y = exam)) + geom_point() ## End(Not run)
cor(Sophomor) plot(exam ~ gpa, data = Sophomor) ## Not run: library(ggplot2) ggplot2::ggplot(data = Sophomor, aes(x = gpa, y = exam)) + geom_point() ggplot2::ggplot(data = Sophomor, aes(x = sat, y = exam)) + geom_point() ## End(Not run)
Data for Exercise 1.84
South
South
A data frame/tibble with 31 observations on one variable
murder rate per 100,000 people
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(South$rate, col = "gray", ylab = "Murder rate per 100,000 people")
boxplot(South$rate, col = "gray", ylab = "Murder rate per 100,000 people")
Data for Exercise 7.58
Speed
Speed
A data frame/tibble with 15 observations on four variables
reading comprehension score before taking a speed-reading course
reading comprehension score after taking a speed-reading course
after - before (comprehension reading scores)
signed ranked differences
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(Speed$differ, alternative = "greater") t.test(Speed$signranks, alternative = "greater") wilcox.test(Pair(Speed$after, Speed$before) ~ 1, data = Speed, alternative = "greater")
t.test(Speed$differ, alternative = "greater") t.test(Speed$signranks, alternative = "greater") wilcox.test(Pair(Speed$after, Speed$before) ~ 1, data = Speed, alternative = "greater")
Data for Exercise 7.82
Spellers
Spellers
A data frame/tibble with ten observations on two variables
character variable with values Fourth
and Colleague
score on a standardized spelling test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ teacher, data = Spellers, col = "pink") t.test(score ~ teacher, data = Spellers)
boxplot(score ~ teacher, data = Spellers, col = "pink") t.test(score ~ teacher, data = Spellers)
Data for Exercise 7.56
Spelling
Spelling
A data frame/tibble with nine observations on three variables
spelling score before a 2-week course of instruction
spelling score after a 2-week course of instruction
after - before (spelling score)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Spelling$differ) qqline(Spelling$differ) shapiro.test(Spelling$differ) t.test(Spelling$differ)
qqnorm(Spelling$differ) qqline(Spelling$differ) shapiro.test(Spelling$differ) t.test(Spelling$differ)
Data for Exercise 8.32
Sports
Sports
A data frame/tibble with 200 observations on two variables
a factor with levels male
and female
a factor with levels football
, basketball
,
baseball
, and tennis
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~gender + sport, data = Sports) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~gender + sport, data = Sports) T1 chisq.test(T1) rm(T1)
Data for Exercise 8.33
Spouse
Spouse
A data frame/tibble with 540 observations on two variables
a factor with levels not prosecuted
, pleaded guilty
,
convicted
, and acquited
a factor with levels husband
and wife
Bureau of Justice Statistics (September 1995), Spouse Murder Defendants in Large Urban Counties, Executive Summary, NCJ-156831.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~result + spouse, data = Spouse) T1 chisq.test(T1) rm(T1)
T1 <- xtabs(~result + spouse, data = Spouse) T1 chisq.test(T1) rm(T1)
Computes all possible samples from a given population using simple random sampling.
SRS(POPvalues, n)
SRS(POPvalues, n)
POPvalues |
vector containing the poulation values. |
n |
the sample size. |
Returns a matrix containing the possible simple random samples of
size n
taken from a population POPvalues
.
Alan T. Arnholt
SRS(c(5,8,3),2) # The rows in the matrix list the values for the 3 possible # simple random samples of size 2 from the population of 5,8, and 3.
SRS(c(5,8,3),2) # The rows in the matrix list the values for the 3 possible # simple random samples of size 2 from the population of 5,8, and 3.
Data for Exercise 6.93
Stable
Stable
A data frame/tibble with nine observations on one variable
time (in seconds) for horse to run 1 mile
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Stable$time, md = 98.5, alternative = "greater")
SIGN.test(Stable$time, md = 98.5, alternative = "greater")
Data for Statistical Insight Chapter 1 and Exercise 5.110
Stamp
Stamp
A data frame/tibble with 485 observations on one variable
stamp thickness (in mm)
Izenman, A., Sommer, C. (1988), Philatelic Mixtures and Multimodal Densities, Journal of the American Statistical Association, 83, 941-953.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Stamp$thickness, freq = FALSE, col = "lightblue", main = "", xlab = "stamp thickness (mm)") lines(density(Stamp$thickness), col = "blue") t.test(Stamp$thickness, conf.level = 0.99)
hist(Stamp$thickness, freq = FALSE, col = "lightblue", main = "", xlab = "stamp thickness (mm)") lines(density(Stamp$thickness), col = "blue") t.test(Stamp$thickness, conf.level = 0.99)
Data for Exercise 7.30
Statclas
Statclas
A data frame/tibble with 72 observations on two variables
class meeting time (9am or 2pm)
grade for an introductory statistics class
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
str(Statclas) boxplot(score ~ class, data = Statclas, col = "red") t.test(score ~ class, data = Statclas)
str(Statclas) boxplot(score ~ class, data = Statclas, col = "red") t.test(score ~ class, data = Statclas)
Data for Exercise 6.62
Statelaw
Statelaw
A data frame/tibble with 50 observations on two variables
U.S. state
dollars spent per resident on law enforcement
Bureau of Justice Statistics, Law Enforcement Management and Administrative Statistics, 1993, NCJ-148825, September 1995, page 84.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Statelaw$cost) SIGN.test(Statelaw$cost, md = 8, alternative = "less")
EDA(Statelaw$cost) SIGN.test(Statelaw$cost, md = 8, alternative = "less")
Data for Exercises 1.70 and 1.87
Statisti
Statisti
A data frame/tibble with 62 observations on two variables
character variable with values Class1
and Class2
test score for an introductory statistics test
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ class, data = Statisti, col = "violet") tapply(Statisti$score, Statisti$class, summary, na.rm = TRUE) ## Not run: library(dplyr) dplyr::group_by(Statisti, class) %>% summarize(Mean = mean(score, na.rm = TRUE), Median = median(score, na.rm = TRUE), SD = sd(score, na.rm = TRUE), RS = IQR(score, na.rm = TRUE)) ## End(Not run)
boxplot(score ~ class, data = Statisti, col = "violet") tapply(Statisti$score, Statisti$class, summary, na.rm = TRUE) ## Not run: library(dplyr) dplyr::group_by(Statisti, class) %>% summarize(Mean = mean(score, na.rm = TRUE), Median = median(score, na.rm = TRUE), SD = sd(score, na.rm = TRUE), RS = IQR(score, na.rm = TRUE)) ## End(Not run)
Data for Exercise 6.79
Step
Step
A data frame/tibble with 12 observations on one variable
State test of educational progress (STEP) science test score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Step$score) t.test(Step$score, mu = 80, alternative = "less") wilcox.test(Step$score, mu = 80, alternative = "less")
EDA(Step$score) t.test(Step$score, mu = 80, alternative = "less") wilcox.test(Step$score, mu = 80, alternative = "less")
Data for Example 7.20
Stress
Stress
A data frame/tibble with 12 observations on two variables
short term memory score before being exposed to a stressful situation
short term memory score after being exposed to a stressful situation
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
diff <- Stress$prestress - Stress$poststress qqnorm(diff) qqline(diff) t.test(diff) ## Not run: wilcox.test(Pair(Stress$prestress, Stress$poststress)~1, data = Stress) ## End(Not run)
diff <- Stress$prestress - Stress$poststress qqnorm(diff) qqline(diff) t.test(diff) ## Not run: wilcox.test(Pair(Stress$prestress, Stress$poststress)~1, data = Stress) ## End(Not run)
Data for Exercise 5.25
Study
Study
A data frame/tibble with 50 observations on one variable
number of hours a week freshmen reported studying for their courses
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Study$hours) hist(Study$hours, col = "violet") summary(Study$hours)
stem(Study$hours) hist(Study$hours, col = "violet") summary(Study$hours)
Data for Exercises 2.16, 2.45, and 2.59
Submarin
Submarin
A data frame/tibble with 16 observations on three variables
month
number of submarines reported sunk by U.S. Navy
number of submarines actually sunk by U.S. Navy
F. Mosteller, S. Fienberg, and R. Rourke, Beginning Statistics with Data Analysis (Reading, MA: Addison-Wesley, 1983).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(actual ~ reported, data = Submarin) summary(model) plot(actual ~ reported, data = Submarin) abline(model, col = "red") rm(model)
model <- lm(actual ~ reported, data = Submarin) summary(model) plot(actual ~ reported, data = Submarin) abline(model, col = "red") rm(model)
Data for Exercise 5.19
Subway
Subway
A data frame/tibble with 30 observations on one variable
time (in minutes) it takes a subway to travel from the airport to downtown
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Subway$time, main = "Exercise 5.19", xlab = "Time (in minutes)", col = "purple") summary(Subway$time)
hist(Subway$time, main = "Exercise 5.19", xlab = "Time (in minutes)", col = "purple") summary(Subway$time)
Data for Example 1.7
Sunspot
Sunspot
A data frame/tibble with 301 observations on two variables
year
average number of sunspots for the year
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(sunspots ~ year, data = Sunspot, type = "l") ## Not run: library(ggplot2) lattice::xyplot(sunspots ~ year, data = Sunspot, main = "Yearly sunspots", type = "l") lattice::xyplot(sunspots ~ year, data = Sunspot, type = "l", main = "Yearly sunspots", aspect = "xy") ggplot2::ggplot(data = Sunspot, aes(x = year, y = sunspots)) + geom_line() + theme_bw() ## End(Not run)
plot(sunspots ~ year, data = Sunspot, type = "l") ## Not run: library(ggplot2) lattice::xyplot(sunspots ~ year, data = Sunspot, main = "Yearly sunspots", type = "l") lattice::xyplot(sunspots ~ year, data = Sunspot, type = "l", main = "Yearly sunspots", aspect = "xy") ggplot2::ggplot(data = Sunspot, aes(x = year, y = sunspots)) + geom_line() + theme_bw() ## End(Not run)
Data for Exercise 1.54
Superbowl
Superbowl
A data frame/tibble with 35 observations on five variables
name of Suberbowl winning team
winning score for the Superbowl
name of Suberbowl losing team
score of losing teama numeric vector
winner_score - loser_score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Superbowl$victory_margin)
stem(Superbowl$victory_margin)
Data for Statistical Insight Chapter 10
Supercar
Supercar
A data frame/tibble with 30 observations on two variables
top speed (in miles per hour) of car without redlining
name of sports car
Car and Drvier (July 1995).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(speed ~ car, data = Supercar, col = rainbow(6), ylab = "Speed (mph)") summary(aov(speed ~ car, data = Supercar)) anova(lm(speed ~ car, data = Supercar))
boxplot(speed ~ car, data = Supercar, col = rainbow(6), ylab = "Speed (mph)") summary(aov(speed ~ car, data = Supercar)) anova(lm(speed ~ car, data = Supercar))
Data for Exercise 5.63
Tablrock
Tablrock
A data frame/tibble with 719 observations on the following 17 variables.
date
time of day
ozone concentration
temperature (in Celcius)
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(Tablrock$ozone) boxplot(Tablrock$ozone) qqnorm(Tablrock$ozone) qqline(Tablrock$ozone) par(mar = c(5.1 - 1, 4.1 + 2, 4.1 - 2, 2.1)) boxplot(ozone ~ day, data = Tablrock, horizontal = TRUE, las = 1, cex.axis = 0.7) par(mar = c(5.1, 4.1, 4.1, 2.1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Tablrock, aes(sample = ozone)) + geom_qq() + theme_bw() ggplot2::ggplot(data = Tablrock, aes(x = as.factor(day), y = ozone)) + geom_boxplot(fill = "pink") + coord_flip() + labs(x = "") + theme_bw() ## End(Not run)
summary(Tablrock$ozone) boxplot(Tablrock$ozone) qqnorm(Tablrock$ozone) qqline(Tablrock$ozone) par(mar = c(5.1 - 1, 4.1 + 2, 4.1 - 2, 2.1)) boxplot(ozone ~ day, data = Tablrock, horizontal = TRUE, las = 1, cex.axis = 0.7) par(mar = c(5.1, 4.1, 4.1, 2.1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Tablrock, aes(sample = ozone)) + geom_qq() + theme_bw() ggplot2::ggplot(data = Tablrock, aes(x = as.factor(day), y = ozone)) + geom_boxplot(fill = "pink") + coord_flip() + labs(x = "") + theme_bw() ## End(Not run)
Data for Exercise 5.114
Teacher
Teacher
A data frame/tibble with 51 observations on three variables
U.S. state
academic year
avaerage salary (in dollars)
National Education Association.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(mfrow = c(3, 1)) hist(Teacher$salary[Teacher$year == "1973-74"], main = "Teacher salary 1973-74", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) hist(Teacher$salary[Teacher$year == "1983-84"], main = "Teacher salary 1983-84", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) hist(Teacher$salary[Teacher$year == "1993-94"], main = "Teacher salary 1993-94", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) par(mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Teacher, aes(x = salary)) + geom_histogram(fill = "purple", color = "black") + facet_grid(year ~ .) + theme_bw() ## End(Not run)
par(mfrow = c(3, 1)) hist(Teacher$salary[Teacher$year == "1973-74"], main = "Teacher salary 1973-74", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) hist(Teacher$salary[Teacher$year == "1983-84"], main = "Teacher salary 1983-84", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) hist(Teacher$salary[Teacher$year == "1993-94"], main = "Teacher salary 1993-94", xlab = "salary", xlim = range(Teacher$salary, na.rm = TRUE)) par(mfrow = c(1, 1)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Teacher, aes(x = salary)) + geom_histogram(fill = "purple", color = "black") + facet_grid(year ~ .) + theme_bw() ## End(Not run)
Data for Exercise 6.56
Tenness
Tenness
A data frame/tibble with 20 observations on one variable
Tennessee Self-Concept Scale score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Tenness$score, freq= FALSE, main = "", col = "green", xlab = "Tennessee Self-Concept Scale score") lines(density(Tenness$score)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Tenness, aes(x = score, y = ..density..)) + geom_histogram(binwidth = 2, fill = "purple", color = "black") + geom_density(color = "red", fill = "pink", alpha = 0.3) + theme_bw() ## End(Not run)
hist(Tenness$score, freq= FALSE, main = "", col = "green", xlab = "Tennessee Self-Concept Scale score") lines(density(Tenness$score)) ## Not run: library(ggplot2) ggplot2::ggplot(data = Tenness, aes(x = score, y = ..density..)) + geom_histogram(binwidth = 2, fill = "purple", color = "black") + geom_density(color = "red", fill = "pink", alpha = 0.3) + theme_bw() ## End(Not run)
Data for Example 7.11
Tensile
Tensile
A data frame/tibble with 72 observations on two variables
plastic bag tensile strength (pounds per square inch)
factor with run number (1 or 2)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(tensile ~ run, data = Tensile, col = c("purple", "cyan")) t.test(tensile ~ run, data = Tensile)
boxplot(tensile ~ run, data = Tensile, col = c("purple", "cyan")) t.test(tensile ~ run, data = Tensile)
Data for Exercise 5.80
Test1
Test1
A data frame/tibble with 25 observations on one variable
score on first statistics exam
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Test1$score) boxplot(Test1$score, col = "purple")
stem(Test1$score) boxplot(Test1$score, col = "purple")
Data for Example 9.5
Thermal
Thermal
A data frame/tibble with 12 observations on the two variables
temperature (degrees Celcius)
heat loss (BTUs)
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
model <- lm(loss ~ temp, data = Thermal) summary(model) plot(loss ~ temp, data = Thermal) abline(model, col = "red") rm(model)
model <- lm(loss ~ temp, data = Thermal) summary(model) plot(loss ~ temp, data = Thermal) abline(model, col = "red") rm(model)
Data for your enjoyment
Tiaa
Tiaa
A data frame/tibble with 365 observations on four variables
closing price (in dollars)
closing price (in dollars)
closing price (in dollars)
day of the year
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
data(Tiaa)
data(Tiaa)
Data for Exercise 5.18
Ticket
Ticket
A data frame/tibble with 20 observations on one variable
time (in seconds) to check out a reservation
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Ticket$time)
EDA(Ticket$time)
Data for Exercise 9.36
Toaster
Toaster
A data frame/tibble with 17 observations on three variables
name of toaster
Consumer Reports score
price of toaster (in dollars)
Consumer Reports (October 1994).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(cost ~ score, data = Toaster) model <- lm(cost ~ score, data = Toaster) summary(model) names(summary(model)) summary(model)$r.squared plot(model, which = 1)
plot(cost ~ score, data = Toaster) model <- lm(cost ~ score, data = Toaster) summary(model) names(summary(model)) summary(model)$r.squared plot(model, which = 1)
Data for Exercise 2.78
Tonsils
Tonsils
A data frame/tibble with 1,398 observations on two variables
a factor with levels Normal
, Large
, and Very Large
a factor with levels Carrier
and Non-carrier
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~size + status, data = Tonsils) T1 prop.table(T1, 1) prop.table(T1, 1)[2, 1] barplot(t(T1), legend = TRUE, beside = TRUE, col = c("red", "green")) ## Not run: library(dplyr) library(ggplot2) NDF <- dplyr::count(Tonsils, size, status) ggplot2::ggplot(data = NDF, aes(x = size, y = n, fill = status)) + geom_bar(stat = "identity", position = "dodge") + scale_fill_manual(values = c("red", "green")) + theme_bw() ## End(Not run)
T1 <- xtabs(~size + status, data = Tonsils) T1 prop.table(T1, 1) prop.table(T1, 1)[2, 1] barplot(t(T1), legend = TRUE, beside = TRUE, col = c("red", "green")) ## Not run: library(dplyr) library(ggplot2) NDF <- dplyr::count(Tonsils, size, status) ggplot2::ggplot(data = NDF, aes(x = size, y = n, fill = status)) + geom_bar(stat = "identity", position = "dodge") + scale_fill_manual(values = c("red", "green")) + theme_bw() ## End(Not run)
Data for Exercise 5.13
Tort
Tort
A data frame/tibble with 45 observations on five variables
U.S. county
average number of months to process a tort
population of the county
number of torts
rate per 10,000 residents
U.S. Department of Justice, Tort Cases in Large Counties, Bureau of Justice Statistics Special Report, April 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
EDA(Tort$months)
EDA(Tort$months)
Data for Exercises 1.55, 5.08, 5.109, 8.58, and 10.35
Toxic
Toxic
A data frame/tibble with 51 observations on five variables
U.S. state
U.S. region
number of commercial hazardous waste sites
percent of minorities living in communities with commercial hazardous waste sites
a numeric vector
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
hist(Toxic$sites, col = "red") hist(Toxic$minority, col = "blue") qqnorm(Toxic$minority) qqline(Toxic$minority) boxplot(sites ~ region, data = Toxic, col = "lightgreen") tapply(Toxic$sites, Toxic$region, median) kruskal.test(sites ~ factor(region), data = Toxic)
hist(Toxic$sites, col = "red") hist(Toxic$minority, col = "blue") qqnorm(Toxic$minority) qqline(Toxic$minority) boxplot(sites ~ region, data = Toxic, col = "lightgreen") tapply(Toxic$sites, Toxic$region, median) kruskal.test(sites ~ factor(region), data = Toxic)
Data for Exercises 2.97, 5.115, and 9.62
Track
Track
A data frame with 55 observations on eight variables
athlete's country
time in seconds for 100 m
time in seconds for 200 m
time in seconds for 400 m
time in minutes for 800 m
time in minutes for 1500 m
time in minutes for 3000 m
time in minutes for marathon
Dawkins, B. (1989), "Multivariate Analysis of National Track Records," The American Statistician, 43(2), 110-115.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(`200m` ~ `100m`, data = Track) plot(`400m` ~ `100m`, data = Track) plot(`400m` ~ `200m`, data = Track) cor(Track[, 2:8])
plot(`200m` ~ `100m`, data = Track) plot(`400m` ~ `100m`, data = Track) plot(`400m` ~ `200m`, data = Track) cor(Track[, 2:8])
Data for Exercise 1.36
Track15
Track15
A data frame/tibble with 26 observations on two variables
Olympic year
Olympic winning time (in seconds) for the 1500-meter run
The World Almanac and Book of Facts, 2000.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(time~ year, data = Track15, type = "b", pch = 19, ylab = "1500m time in seconds", col = "green")
plot(time~ year, data = Track15, type = "b", pch = 19, ylab = "1500m time in seconds", col = "green")
Data for Exercise 10.44
Treatments
Treatments
A data frame/tibble with 24 observations on two variables
score from an experiment
factor with levels 1, 2, and 3
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(score ~ group, data = Treatments, col = "violet") summary(aov(score ~ group, data = Treatments)) summary(lm(score ~ group, data = Treatments)) anova(lm(score ~ group, data = Treatments))
boxplot(score ~ group, data = Treatments, col = "violet") summary(aov(score ~ group, data = Treatments)) summary(lm(score ~ group, data = Treatments)) anova(lm(score ~ group, data = Treatments))
Data for Exercise 1.50
Trees
Trees
A data frame/tibble with 20 observations on one variable
number of trees in a grid
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Trees$number) hist(Trees$number, main = "Exercise 1.50", xlab = "number", col = "brown")
stem(Trees$number) hist(Trees$number, main = "Exercise 1.50", xlab = "number", col = "brown")
Data for Example 10.2
Trucks
Trucks
A data frame/tibble with 15 observations on two variables
miles per gallon
a factor with levels chevy
, dodge
, and ford
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(mpg ~ truck, data = Trucks, horizontal = TRUE, las = 1) summary(aov(mpg ~ truck, data = Trucks))
boxplot(mpg ~ truck, data = Trucks, horizontal = TRUE, las = 1) summary(aov(mpg ~ truck, data = Trucks))
Performs a one-sample, two-sample, or a Welch modified two-sample t-test
based on user supplied summary information. Output is identical to that
produced with t.test
.
tsum.test( mean.x, s.x = NULL, n.x = NULL, mean.y = NULL, s.y = NULL, n.y = NULL, alternative = "two.sided", mu = 0, var.equal = FALSE, conf.level = 0.95 )
tsum.test( mean.x, s.x = NULL, n.x = NULL, mean.y = NULL, s.y = NULL, n.y = NULL, alternative = "two.sided", mu = 0, var.equal = FALSE, conf.level = 0.95 )
mean.x |
a single number representing the sample mean of |
s.x |
a single number representing the sample standard deviation for
|
n.x |
a single number representing the sample size for |
mean.y |
a single number representing the sample mean of |
s.y |
a single number representing the sample standard deviation for
|
n.y |
a single number representing the sample size for |
alternative |
is a character string, one of |
mu |
is a single number representing the value of the mean or difference in means specified by the null hypothesis. |
var.equal |
logical flag: if |
conf.level |
is the confidence level for the returned confidence interval; it must lie between zero and one. |
If y
is NULL
, a one-sample t-test is carried out with
x
. If y is not NULL
, either a standard or Welch modified
two-sample t-test is performed, depending on whether var.equal
is
TRUE
or FALSE
.
A list of class htest
, containing the following components:
statistic |
the t-statistic, with names attribute |
parameters |
is the degrees of freedom of the t-distribution associated
with statistic. Component |
p.value |
the p-value for the test. |
conf.int |
is
a confidence interval (vector of length 2) for the true mean or difference
in means. The confidence level is recorded in the attribute
|
estimate |
vector of length 1 or 2, giving the sample mean(s) or mean
of differences; these estimate the corresponding population parameters.
Component |
null.value |
the value of the mean or difference in means specified by
the null hypothesis. This equals the input argument |
alternative |
records the value of the input argument alternative:
|
data.name |
a character string (vector of length 1) containing the names x and y for the two summarized samples. |
For the one-sample t-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard and Welch modified two-sample t-tests, the null hypothesis
is that the population mean for x
less that for y
is
mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means for
x
and y
) from mu
(i.e., "greater"
,
"less"
, or "two.sided"
).
Alan T. Arnholt
Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
tsum.test(mean.x=5.6, s.x=2.1, n.x=16, mu=4.9, alternative="greater") # Problem 6.31 on page 324 of BSDA states: The chamber of commerce # of a particular city claims that the mean carbon dioxide # level of air polution is no greater than 4.9 ppm. A random # sample of 16 readings resulted in a sample mean of 5.6 ppm, # and s=2.1 ppm. One-sided one-sample t-test. The null # hypothesis is that the population mean for 'x' is 4.9. # The alternative hypothesis states that it is greater than 4.9. x <- rnorm(12) tsum.test(mean(x), sd(x), n.x=12) # Two-sided one-sample t-test. The null hypothesis is that # the population mean for 'x' is zero. The alternative # hypothesis states that it is either greater or less # than zero. A confidence interval for the population mean # will be computed. Note: above returns same answer as: t.test(x) x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) tsum.test(mean(x), s.x=sd(x), n.x=11 ,mean(y), s.y=sd(y), n.y=8, mu=2) # Two-sided standard two-sample t-test. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. # Note: above returns same answer as: t.test(x, y) tsum.test(mean(x), s.x=sd(x), n.x=11, mean(y), s.y=sd(y), n.y=8, conf.level=0.90) # Two-sided standard two-sample t-test. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. Note: above returns same answer as: t.test(x, y, conf.level=0.90)
tsum.test(mean.x=5.6, s.x=2.1, n.x=16, mu=4.9, alternative="greater") # Problem 6.31 on page 324 of BSDA states: The chamber of commerce # of a particular city claims that the mean carbon dioxide # level of air polution is no greater than 4.9 ppm. A random # sample of 16 readings resulted in a sample mean of 5.6 ppm, # and s=2.1 ppm. One-sided one-sample t-test. The null # hypothesis is that the population mean for 'x' is 4.9. # The alternative hypothesis states that it is greater than 4.9. x <- rnorm(12) tsum.test(mean(x), sd(x), n.x=12) # Two-sided one-sample t-test. The null hypothesis is that # the population mean for 'x' is zero. The alternative # hypothesis states that it is either greater or less # than zero. A confidence interval for the population mean # will be computed. Note: above returns same answer as: t.test(x) x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) tsum.test(mean(x), s.x=sd(x), n.x=11 ,mean(y), s.y=sd(y), n.y=8, mu=2) # Two-sided standard two-sample t-test. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. # Note: above returns same answer as: t.test(x, y) tsum.test(mean(x), s.x=sd(x), n.x=11, mean(y), s.y=sd(y), n.y=8, conf.level=0.90) # Two-sided standard two-sample t-test. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. Note: above returns same answer as: t.test(x, y, conf.level=0.90)
Data for Examples 2.1 and 2.7
Tv
Tv
A data frame/tibble with 53 observations on three variables
U.S. state
percent of students who watch more than six hours of TV a day
state average on national math test
Educational Testing Services.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(test ~ percent, data = Tv, col = "blue") cor(Tv$test, Tv$percent)
plot(test ~ percent, data = Tv, col = "blue") cor(Tv$test, Tv$percent)
Data for Exercise 7.54
Twin
Twin
A data frame/tibble with nine observations on three variables
score on intelligence test without drug
score on intelligence test after taking drug
twinA
- twinB
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
qqnorm(Twin$differ) qqline(Twin$differ) shapiro.test(Twin$differ) t.test(Twin$differ)
qqnorm(Twin$differ) qqline(Twin$differ) shapiro.test(Twin$differ) t.test(Twin$differ)
Data for Exercise 1.15
Undergrad
Undergrad
A data frame/tibble with 100 observations on six variables
character variable with values Female
and Male
college major
college year group classification
grade point average
Scholastic Assessment Test score
number of courses dropped
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stripchart(gpa ~ class, data = Undergrad, method = "stack", col = c("blue","red","green","lightblue"), pch = 19, main = "GPA versus Class") stripchart(gpa ~ gender, data = Undergrad, method = "stack", col = c("red", "blue"), pch = 19, main = "GPA versus Gender") stripchart(sat ~ drops, data = Undergrad, method = "stack", col = c("blue", "red", "green", "lightblue"), pch = 19, main = "SAT versus Drops") stripchart(drops ~ gender, data = Undergrad, method = "stack", col = c("red", "blue"), pch = 19, main = "Drops versus Gender") ## Not run: library(ggplot2) ggplot2::ggplot(data = Undergrad, aes(x = sat, y = drops, fill = factor(drops))) + facet_grid(drops ~.) + geom_dotplot() + guides(fill = FALSE) ## End(Not run)
stripchart(gpa ~ class, data = Undergrad, method = "stack", col = c("blue","red","green","lightblue"), pch = 19, main = "GPA versus Class") stripchart(gpa ~ gender, data = Undergrad, method = "stack", col = c("red", "blue"), pch = 19, main = "GPA versus Gender") stripchart(sat ~ drops, data = Undergrad, method = "stack", col = c("blue", "red", "green", "lightblue"), pch = 19, main = "SAT versus Drops") stripchart(drops ~ gender, data = Undergrad, method = "stack", col = c("red", "blue"), pch = 19, main = "Drops versus Gender") ## Not run: library(ggplot2) ggplot2::ggplot(data = Undergrad, aes(x = sat, y = drops, fill = factor(drops))) + facet_grid(drops ~.) + geom_dotplot() + guides(fill = FALSE) ## End(Not run)
Data for Exercise 6.46 and 6.98
Vacation
Vacation
A data frame/tibble with 35 observations on one variable
number of days of paid holidays and vacation leave taken
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Vacation$number, col = "violet") hist(Vacation$number, main = "Exercise 6.46", col = "blue", xlab = "number of days of paid holidays and vacation leave taken") t.test(Vacation$number, mu = 24)
boxplot(Vacation$number, col = "violet") hist(Vacation$number, main = "Exercise 6.46", col = "blue", xlab = "number of days of paid holidays and vacation leave taken") t.test(Vacation$number, mu = 24)
Data for Exercise 1.111
Vaccine
Vaccine
A data frame/tibble with 11 observations on two variables
U.S. state
number of reported serious reactions per million doses of a vaccine
Center for Disease Control, Atlanta, Georgia.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Vaccine$number, scale = 2) fn <- fivenum(Vaccine$number) fn iqr <- IQR(Vaccine$number) iqr
stem(Vaccine$number, scale = 2) fn <- fivenum(Vaccine$number) fn iqr <- IQR(Vaccine$number) iqr
Data for Exercise 8.34
Vehicle
Vehicle
A data frame/tibble with 151 observations on two variables
a factor with levels domestic
and foreign
a factor with levels Much better than average
,
Above average
, Average
, Below average
, and Much worse than average
Insurance Institute for Highway Safety and the Highway Loss Data Institute, 1995.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~make + rating, data = Vehicle) T1 chisq.test(T1)
T1 <- xtabs(~make + rating, data = Vehicle) T1 chisq.test(T1)
Data for Exercise 9.30
Verbal
Verbal
A data frame/tibble with 15 observations on two variables
number of library books checked out
verbal test score
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(verbal ~ number, data = Verbal) abline(lm(verbal ~ number, data = Verbal), col = "red") summary(lm(verbal ~ number, data = Verbal))
plot(verbal ~ number, data = Verbal) abline(lm(verbal ~ number, data = Verbal), col = "red") summary(lm(verbal ~ number, data = Verbal))
Data for Exercise 2.98
Victoria
Victoria
A data frame/tibble with 20 observations on three variables
year
mean annual level of Lake Victoria Nyanza
number of sunspots
N. Shaw, Manual of Meteorology, Vol. 1 (London: Cambridge University Press, 1942), p. 284; and F. Mosteller and J. W. Tukey, Data Analysis and Regression (Reading, MA: Addison-Wesley, 1977).
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(level ~ sunspot, data = Victoria) model <- lm(level ~ sunspot, data = Victoria) summary(model) rm(model)
plot(level ~ sunspot, data = Victoria) model <- lm(level ~ sunspot, data = Victoria) summary(model) rm(model)
Data for Exercise 7.44
Viscosit
Viscosit
A data frame/tibble with 11 observations on two variables
viscosity measurement for a certain substance on day one
viscosity measurement for a certain substance on day two
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(Viscosit$first, Viscosit$second, col = "blue") t.test(Viscosit$first, Viscosit$second, var.equal = TRUE)
boxplot(Viscosit$first, Viscosit$second, col = "blue") t.test(Viscosit$first, Viscosit$second, var.equal = TRUE)
Data for Exercise 5.6
Visual
Visual
A data frame/tibble with 18 observations on one variable
visual acuity measurement
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
stem(Visual$visual) boxplot(Visual$visual, col = "purple")
stem(Visual$visual) boxplot(Visual$visual, col = "purple")
Data for Exercise 7.80
Vocab
Vocab
A data frame/tibble with 14 observations on two variables
reading test score before formal vocabulary training
reading test score after formal vocabulary training
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
t.test(Pair(Vocab$first, Vocab$second) ~ 1)
t.test(Pair(Vocab$first, Vocab$second) ~ 1)
Data for Exercise 9.18
Wastewat
Wastewat
A data frame/tibble with 44 observations on two variables
injected water (in million gallons)
number of earthqueakes detected in Denver
Davis, J. C. (1986), Statistics and Data Analysis in Geology, 2 ed., John Wiley and Sons, New York, p. 228, and Bardwell, G. E. (1970), Some Statistical Features of the Relationship between Rocky Mountain Arsenal Waste Disposal and Frequency of Earthquakes, Geological Society of America, Engineering Geology Case Histories, 8, 33-337.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(number ~ gallons, data = Wastewat) model <- lm(number ~ gallons, data = Wastewat) summary(model) anova(model) plot(model, which = 2)
plot(number ~ gallons, data = Wastewat) model <- lm(number ~ gallons, data = Wastewat) summary(model) anova(model) plot(model, which = 2)
Data for Exercise 1.30
Weather94
Weather94
A data frame/tibble with 388 observations on one variable
factor with levels Extreme Temp
, Flash Flood
,
Fog
, High Wind
, Hurricane
, Lighting
, Other
,
River Flood
, Thunderstorm
, Tornado
, and Winter Weather
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
T1 <- xtabs(~type, data = Weather94) T1 par(mar = c(5.1 + 2, 4.1 - 1, 4.1 - 2, 2.1)) barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(11)) par(mar = c(5.1, 4.1, 4.1, 2.1)) ## Not run: library(ggplot2) T2 <- as.data.frame(T1) T2 ggplot2::ggplot(data =T2, aes(x = reorder(type, Freq), y = Freq)) + geom_bar(stat = "identity", fill = "purple") + theme_bw() + theme(axis.text.x = element_text(angle = 55, vjust = 0.5)) + labs(x = "", y = "count") ## End(Not run)
T1 <- xtabs(~type, data = Weather94) T1 par(mar = c(5.1 + 2, 4.1 - 1, 4.1 - 2, 2.1)) barplot(sort(T1, decreasing = TRUE), las = 2, col = rainbow(11)) par(mar = c(5.1, 4.1, 4.1, 2.1)) ## Not run: library(ggplot2) T2 <- as.data.frame(T1) T2 ggplot2::ggplot(data =T2, aes(x = reorder(type, Freq), y = Freq)) + geom_bar(stat = "identity", fill = "purple") + theme_bw() + theme(axis.text.x = element_text(angle = 55, vjust = 0.5)) + labs(x = "", y = "count") ## End(Not run)
Data for Exercise 2.11
Wheat
Wheat
A data frame/tibble with 19 observations on three variables
year
national weekly earnings (in dollars) for production workers
price for a bushel of wheat (in dollars)
The World Almanac and Book of Facts, 2000.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
par(mfrow = c(1, 2)) plot(earnings ~ year, data = Wheat) plot(price ~ year, data = Wheat) par(mfrow = c(1, 1))
par(mfrow = c(1, 2)) plot(earnings ~ year, data = Wheat) plot(price ~ year, data = Wheat) par(mfrow = c(1, 1))
Data for Exercise 9.34
Windmill
Windmill
A data frame/tibble with 25 observations on two variables
wind velocity (miles per hour)
power generated (DC volts)
Joglekar, et al. (1989), Lack of Fit Testing when Replicates Are Not Available, The American Statistician, 43,(3), 135-143.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
summary(lm(output ~ velocity, data = Windmill)) anova(lm(output ~ velocity, data = Windmill))
summary(lm(output ~ velocity, data = Windmill)) anova(lm(output ~ velocity, data = Windmill))
Data for Exercise 6.54
Window
Window
A data frame/tibble with nine observations on two variables
window number
percent leakage from a 50 mph wind
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
SIGN.test(Window$leakage, md = 0.125, alternative = "greater")
SIGN.test(Window$leakage, md = 0.125, alternative = "greater")
Data for Exercise 9.23
Wins
Wins
A data frame with 12 observations on nine variables
name of team
number of wins
batting average
runs batted in
bases stole
number of strikeots
number of times caught stealing
number of errors
earned run average
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(wins ~ era, data = Wins) ## Not run: library(ggplot2) ggplot2::ggplot(data = Wins, aes(x = era, y = wins)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_bw() ## End(Not run)
plot(wins ~ era, data = Wins) ## Not run: library(ggplot2) ggplot2::ggplot(data = Wins, aes(x = era, y = wins)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + theme_bw() ## End(Not run)
Data for Exercise 7.42
Wool
Wool
A data frame/tibble with 20 observations on two variables
type of wool (Type I
, Type 2
)
strength of wool
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
boxplot(strength ~ type, data = Wool, col = c("blue", "purple")) t.test(strength ~ type, data = Wool, var.equal = TRUE)
boxplot(strength ~ type, data = Wool, col = c("blue", "purple")) t.test(strength ~ type, data = Wool, var.equal = TRUE)
Data for Exercise 2.7
Yearsunspot
Yearsunspot
A data frame/tibble with 252 observations on two variables
average number of sunspots
date
NASA/Marshall Space Flight Center, Huntsville, AL 35812.
Kitchens, L. J. (2003) Basic Statistics and Data Analysis. Pacific Grove, CA: Brooks/Cole, a division of Thomson Learning.
plot(number ~ year, data = Yearsunspot)
plot(number ~ year, data = Yearsunspot)
This function is based on the standard normal distribution and creates confidence intervals and tests hypotheses for both one and two sample problems.
z.test( x, y = NULL, alternative = "two.sided", mu = 0, sigma.x = NULL, sigma.y = NULL, conf.level = 0.95 )
z.test( x, y = NULL, alternative = "two.sided", mu = 0, sigma.x = NULL, sigma.y = NULL, conf.level = 0.95 )
x |
numeric vector; |
y |
numeric vector; |
alternative |
character string, one of |
mu |
a single number representing the value of the mean or difference in means specified by the null hypothesis |
sigma.x |
a single number representing the population standard
deviation for |
sigma.y |
a single number representing the population standard
deviation for |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
If y
is NULL
, a one-sample z-test is carried out with
x
. If y is not NULL
, a standard two-sample z-test is
performed.
A list of class htest
, containing the following components:
statistic |
the z-statistic, with names attribute |
p.value |
the p-value for the test |
conf.int |
is a confidence
interval (vector of length 2) for the true mean or difference in means. The
confidence level is recorded in the attribute |
estimate |
vector of
length 1 or 2, giving the sample mean(s) or mean of differences; these
estimate the corresponding population parameters. Component |
null.value |
is the
value of the mean or difference in means specified by the null hypothesis.
This equals the input argument |
alternative |
records the
value of the input argument alternative: |
data.name |
a character string (vector of length
1) containing the actual names of the input vectors |
For the one-sample z-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard two-sample z-tests, the null hypothesis is that the
population mean for x
less that for y
is mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means for
x
and y
) from mu
(i.e., "greater"
,
"less"
, "two.sided"
).
Alan T. Arnholt
Kitchens, L.J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
x <- rnorm(12) z.test(x,sigma.x=1) # Two-sided one-sample z-test where the assumed value for # sigma.x is one. The null hypothesis is that the population # mean for 'x' is zero. The alternative hypothesis states # that it is either greater or less than zero. A confidence # interval for the population mean will be computed. x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5) z.test(x, sigma.x=0.5, y, sigma.y=0.5, mu=2) # Two-sided standard two-sample z-test where both sigma.x # and sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90) # Two-sided standard two-sample z-test where both sigma.x and # sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. rm(x, y)
x <- rnorm(12) z.test(x,sigma.x=1) # Two-sided one-sample z-test where the assumed value for # sigma.x is one. The null hypothesis is that the population # mean for 'x' is zero. The alternative hypothesis states # that it is either greater or less than zero. A confidence # interval for the population mean will be computed. x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5) z.test(x, sigma.x=0.5, y, sigma.y=0.5, mu=2) # Two-sided standard two-sample z-test where both sigma.x # and sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90) # Two-sided standard two-sample z-test where both sigma.x and # sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. rm(x, y)
This function is based on the standard normal distribution and creates
confidence intervals and tests hypotheses for both one and two sample
problems based on summarized information the user passes to the function.
Output is identical to that produced with z.test
.
zsum.test( mean.x, sigma.x = NULL, n.x = NULL, mean.y = NULL, sigma.y = NULL, n.y = NULL, alternative = "two.sided", mu = 0, conf.level = 0.95 )
zsum.test( mean.x, sigma.x = NULL, n.x = NULL, mean.y = NULL, sigma.y = NULL, n.y = NULL, alternative = "two.sided", mu = 0, conf.level = 0.95 )
mean.x |
a single number representing the sample mean of |
sigma.x |
a single number representing the population standard
deviation for |
n.x |
a single number representing the sample size for |
mean.y |
a single number representing the sample mean of |
sigma.y |
a single number representing the population standard
deviation for |
n.y |
a single number representing the sample size for |
alternative |
is a character string, one of |
mu |
a single number representing the value of the mean or difference in means specified by the null hypothesis |
conf.level |
confidence level for the returned confidence interval, restricted to lie between zero and one |
If y
is NULL
, a one-sample z-test is carried out with
x
. If y is not NULL
, a standard two-sample z-test is
performed.
A list of class htest
, containing the following components:
statistic |
the z-statistic, with names attribute |
p.value |
the p-value for the test |
conf.int |
is a confidence
interval (vector of length 2) for the true mean or difference in means. The
confidence level is recorded in the attribute |
estimate |
vector of
length 1 or 2, giving the sample mean(s) or mean of differences; these
estimate the corresponding population parameters. Component |
null.value |
the value
of the mean or difference in means specified by the null hypothesis. This
equals the input argument |
alternative |
records the value of
the input argument alternative: |
data.name |
a character string (vector of length
1) containing the names |
For the one-sample z-test, the null hypothesis is
that the mean of the population from which x
is drawn is mu
.
For the standard two-sample z-tests, the null hypothesis is that the
population mean for x
less that for y
is mu
.
The alternative hypothesis in each case indicates the direction of
divergence of the population mean for x
(or difference of means of
x
and y
) from mu
(i.e., "greater"
,
"less"
, "two.sided"
).
Alan T. Arnholt
Kitchens, L. J. (2003). Basic Statistics and Data Analysis. Duxbury.
Hogg, R. V. and Craig, A. T. (1970). Introduction to Mathematical Statistics, 3rd ed. Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974). Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7th ed. Ames, Iowa: Iowa State University Press.
zsum.test(mean.x=56/30,sigma.x=2, n.x=30, alternative="greater", mu=1.8) # Example 9.7 part a. from PASWR. x <- rnorm(12) zsum.test(mean(x),sigma.x=1,n.x=12) # Two-sided one-sample z-test where the assumed value for # sigma.x is one. The null hypothesis is that the population # mean for 'x' is zero. The alternative hypothesis states # that it is either greater or less than zero. A confidence # interval for the population mean will be computed. # Note: returns same answer as: z.test(x,sigma.x=1) # x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) zsum.test(mean(x), sigma.x=0.5, n.x=11 ,mean(y), sigma.y=0.5, n.y=8, mu=2) # Two-sided standard two-sample z-test where both sigma.x # and sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. # Note: returns same answer as: z.test(x, sigma.x=0.5, y, sigma.y=0.5) # zsum.test(mean(x), sigma.x=0.5, n.x=11, mean(y), sigma.y=0.5, n.y=8, conf.level=0.90) # Two-sided standard two-sample z-test where both sigma.x and # sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. Note: returns same answer as: z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90) rm(x, y)
zsum.test(mean.x=56/30,sigma.x=2, n.x=30, alternative="greater", mu=1.8) # Example 9.7 part a. from PASWR. x <- rnorm(12) zsum.test(mean(x),sigma.x=1,n.x=12) # Two-sided one-sample z-test where the assumed value for # sigma.x is one. The null hypothesis is that the population # mean for 'x' is zero. The alternative hypothesis states # that it is either greater or less than zero. A confidence # interval for the population mean will be computed. # Note: returns same answer as: z.test(x,sigma.x=1) # x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7.0, 6.4, 7.1, 6.7, 7.6, 6.8) y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5.0, 4.1, 5.5) zsum.test(mean(x), sigma.x=0.5, n.x=11 ,mean(y), sigma.y=0.5, n.y=8, mu=2) # Two-sided standard two-sample z-test where both sigma.x # and sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is 2. # The alternative hypothesis is that this difference is not 2. # A confidence interval for the true difference will be computed. # Note: returns same answer as: z.test(x, sigma.x=0.5, y, sigma.y=0.5) # zsum.test(mean(x), sigma.x=0.5, n.x=11, mean(y), sigma.y=0.5, n.y=8, conf.level=0.90) # Two-sided standard two-sample z-test where both sigma.x and # sigma.y are both assumed to equal 0.5. The null hypothesis # is that the population mean for 'x' less that for 'y' is zero. # The alternative hypothesis is that this difference is not # zero. A 90% confidence interval for the true difference will # be computed. Note: returns same answer as: z.test(x, sigma.x=0.5, y, sigma.y=0.5, conf.level=0.90) rm(x, y)