Internal function used in textstat_keyness. Computes \(chi^2\) with Yates' continuity correction for 2x2 tables.
keyness_chi2_dt(x, correction = c("default", "yates", "williams", "none")) keyness_chi2_stats(x) keyness(t, f, sum_t, sum_f) keyness_exact(x) keyness_lr(x, correction = c("default", "yates", "williams", "none")) keyness_pmi(x)
x | a dfm object |
---|---|
correction | implement the Yates correction for 2x2 tables |
t | (scalar) frequency of target |
f | (scalar) frequency of reference |
sum_t | total of all target words |
sum_f | total of all reference words |
a data.frame of chi2 and p-values with rows named for each feature
keyness_chi2_dt
uses vectorized computation from data.table
objects.
keyness_chi2_stats
uses element-by-element application of
chisq.test.
keyness_exact
computes Fisher's exact using element-by-element
application of fisher.test, returning the odds ratio.
keyness_lr
computes the \(G^2\) likelihood ratio statistic
using vectorized computation
keyness_pmi
computes the Pointwise Mutual Information stat
using vectorized computation
https://en.wikipedia.org/wiki/Yates's_correction_for_continuity
https://influentialpoints.com/Training/g-likelihood_ratio_test.htm
dfmat <- dfm(c(d1 = "a a a b b c c c c c c d e f g h h", d2 = "a a b c c d d d d e f h")) quanteda:::keyness_chi2_dt(dfmat)#> feature chi2 p n_target n_reference #> 1 a 0.004738562 0.9451192 3 2 #> 2 b 0.089303670 0.7650643 2 1 #> 3 c 0.467298378 0.4942327 6 2 #> 4 d -2.040247141 0.1531848 1 4 #> 5 e -0.065813362 0.7975330 1 1 #> 6 f -0.065813362 0.7975330 1 1 #> 7 g 0.731092437 0.3925293 1 0 #> 8 h 0.089303670 0.7650643 2 1quanteda:::keyness_chi2_stats(dfmat)#> feature chi2.a chi2.b chi2.c chi2.d chi2.e #> 1 a 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 2 b 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 3 c 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 4 d 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 5 e 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 6 f 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 7 g 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> 8 h 1.626059e-31 6.775245e-32 0.4672984 -2.040247 -2.242191e-32 #> chi2.f chi2.g chi2.h p.a p.b p.c p.d p.e p.f #> 1 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 2 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 3 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 4 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 5 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 6 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 7 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> 8 -2.242191e-32 2.20717e-31 6.775245e-32 1 1 0.4942327 0.1531848 1 1 #> p.g p.h n_target n_reference #> 1 1 1 3 2 #> 2 1 1 2 1 #> 3 1 1 6 2 #> 4 1 1 1 4 #> 5 1 1 1 1 #> 6 1 1 1 1 #> 7 1 1 1 0 #> 8 1 1 2 1quanteda:::keyness_exact(dfmat)#> feature stat p n_target n_reference #> 1 a 1.0689130 1.0000000 3 2 #> 2 b 1.4480473 1.0000000 2 1 #> 3 c 2.6375984 0.4083471 6 2 #> 4 d 0.1347360 0.1296366 1 4 #> 5 e 0.6966446 1.0000000 1 1 #> 6 f 0.6966446 1.0000000 1 1 #> 7 g Inf 1.0000000 1 0 #> 8 h 1.4480473 1.0000000 2 1quanteda:::keyness_lr(dfmat)#> feature G2 p n_target n_reference #> 1 a 0.004751467 0.9450446 3 2 #> 2 b 0.091241952 0.7626042 2 1 #> 3 c 0.477346514 0.4896267 6 2 #> 4 d -2.028083548 0.1544152 1 4 #> 5 e -0.064900531 0.7989117 1 1 #> 6 f -0.064900531 0.7989117 1 1 #> 7 g 1.093291040 0.2957432 1 0 #> 8 h 0.091241952 0.7626042 2 1quanteda:::keyness_pmi(dfmat)#> feature pmi p n_target n_reference #> 1 a 0.02325686 0.8787910 3 2 #> 2 b 0.12861738 0.7198699 2 1 #> 3 c 0.24640041 0.6196211 6 2 #> 4 d -1.07535542 0.2997389 1 4 #> 5 e -0.15906469 0.6900191 1 1 #> 6 f -0.15906469 0.6900191 1 1 #> 7 g 0.53408249 0.4648955 1 0 #> 8 h 0.12861738 0.7198699 2 1