This function is a wrapper that calls specify()
, hypothesize()
, and
calculate()
consecutively that can be used to calculate observed
statistics from data. hypothesize()
will only be called if a point
null hypothesis parameter is supplied.
Learn more in vignette("infer")
.
observe(
x,
formula,
response = NULL,
explanatory = NULL,
success = NULL,
null = NULL,
p = NULL,
mu = NULL,
med = NULL,
sigma = NULL,
stat = c("mean", "median", "sum", "sd", "prop", "count", "diff in means",
"diff in medians", "diff in props", "Chisq", "F", "slope", "correlation", "t", "z",
"ratio of props", "odds ratio"),
order = NULL,
...
)
x | A data frame that can be coerced into a tibble. |
---|---|
formula | A formula with the response variable on the left and the
explanatory on the right. Alternatively, a |
response | The variable name in |
explanatory | The variable name in |
success | The level of |
null | The null hypothesis. Options include |
p | The true proportion of successes (a number between 0 and 1). To be used with point null hypotheses when the specified response variable is categorical. |
mu | The true mean (any numerical value). To be used with point null hypotheses when the specified response variable is continuous. |
med | The true median (any numerical value). To be used with point null hypotheses when the specified response variable is continuous. |
sigma | The true standard deviation (any numerical value). To be used with point null hypotheses. |
stat | A string giving the type of the statistic to calculate. Current
options include |
order | A string vector of specifying the order in which the levels of
the explanatory variable should be ordered for subtraction (or division
for ratio-based statistics), where |
... | To pass options like |
A 1-column tibble containing the calculated statistic stat
.
Other wrapper functions:
chisq_stat()
,
chisq_test()
,
prop_test()
,
t_stat()
,
t_test()
Other functions for calculating observed statistics:
chisq_stat()
,
t_stat()
# calculating the observed mean number of hours worked per week
gss %>%
observe(hours ~ NULL, stat = "mean")
#> Response: hours (numeric)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 41.4
# equivalently, calculating the same statistic with the core verbs
gss %>%
specify(response = hours) %>%
calculate(stat = "mean")
#> Response: hours (numeric)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 41.4
# calculating a t statistic for hypothesized mu = 40 hours worked/week
gss %>%
observe(hours ~ NULL, stat = "t", null = "point", mu = 40)
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 2.09
# equivalently, calculating the same statistic with the core verbs
gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
#> Response: hours (numeric)
#> Null Hypothesis: point
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 2.09
# similarly for a difference in means in age based on whether
# the respondent has a college degree
observe(
gss,
age ~ college,
stat = "diff in means",
order = c("degree", "no degree")
)
#> Response: age (numeric)
#> Explanatory: college (factor)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 0.941
# equivalently, calculating the same statistic with the core verbs
gss %>%
specify(age ~ college) %>%
calculate("diff in means", order = c("degree", "no degree"))
#> Response: age (numeric)
#> Explanatory: college (factor)
#> # A tibble: 1 × 1
#> stat
#> <dbl>
#> 1 0.941
# for a more in-depth explanation of how to use the infer package
if (FALSE) {
vignette("infer")
}