Quick List
The tables below are generated from the public measure properties in the codebase. Each row shows the public measure name, the defining equation used by the implementation, and a short note. Measures are grouped by variable-pair family, and the agreement metrics that were missing from the old quick list are included here too.
Binary-Binary (87)
Name |
Equation |
Note |
|---|---|---|
|
\(ARI = \frac{\sum_{ij} \binom{n_{ij}}{2} - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}{\frac{1}{2}\left(\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}\right) - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}\)
|
Adjusted Rand index for the induced contingency table; values can be negative and the binomial terms can overflow for large n. |
|
\(\left|\frac{a(c+d)}{c(a+b)}\right|\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{\sigma-\sigma'}{2n}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{\sqrt{ad}+a}{\sqrt{ad}+a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{\sqrt{ad}+a-(b+c)}{\sqrt{ad}+a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{\max(a+b,a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\sum_i \sum_j \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\)
\(E_i = \frac{N_{i*} N_{*j}}{N}\)
|
The chi-square statistic \(\chi^2\), is defined as follows. |
|
\((R - 1)(C - 1)\)
|
Degrees of freedom for the chi-square statistic on the induced contingency table. |
|
\(\sqrt{2\left(1 - \frac{a}{\sqrt{(a+b)(a+c)}}\right)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{\sqrt{2}(ad-bc)}{\sqrt{(ad-bc)^2-(a+b)(a+c)(b+d)(c+d)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ad-bc}{\min((a+b)(a+c),(b+d)(c+d))}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(C = \sqrt{\frac{\chi^2}{n + \chi^2}}\)
|
Contingency coefficient computed from the chi-square statistic of the binary table. |
|
\(\frac{a}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(V = \sqrt{\frac{\chi^2}{n}}\)
|
Cramer’s V as implemented for the binary 2x2 case. |
|
\(\frac{ad-bc}{\sqrt{n(a+b)(a+c)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{2a}{2a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ad-bc}{(a+b+c+d)^2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{2}\left(\frac{1}{a+b}+\frac{1}{a+c}\right)\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\sqrt{b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{n^2(na-(a+b)(a+c))}{(a+b)(a+c)(b+d)(c+d)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{\sqrt{(a+b)(a+c)}}-\frac{max(a+b,a+c)}{2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a+0.5d}{a+b+c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{na-(a+b)(a+c)}{n \min(a+b,a+c) - (a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{na}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{n(a-0.5)^2}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\log a - \log n - \log \frac{a+b}{n} - \log \frac{a+c}{n}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(P_e = 1 - \frac{\max_{c} N_{* c}}{N}\)
\(P_{e|r} = 1 - \frac{\sum_r \max_{c} N_{r c}}{N}\)
\(\lambda_{B|A} = \frac{P_e - P_{e|r}}{P_e}\)
|
Goodman-Kruskal’s lambda is the proportional reduction in error of predicting one variable b given another a: \(\lambda_{B|A}\). |
|
\(\lambda_{A|B} = \frac{\sum_c \max_r N_{rc} - \max_r N_{r*}}{N - \max_r N_{r*}}\)
|
Reverse-direction Goodman-Kruskal lambda, predicting rows from columns. |
|
\(\frac{\sigma - \sigma'}{2n-\sigma'}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a+d}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a+d}{a+0.5b+0.5c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{(a+d)-(b+c)}{a+b+c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(b+c\)
|
Hamming; Canberra; Manhattan; Cityblock; Minkowski |
|
\(2\sqrt{1 - \frac{a}{\sqrt{(a+b)(a+c)}}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(a+d\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(a\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{3a}{3a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{b + c}{a + b + c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{a+b}+\frac{a}{a+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{0.5a(2a+b+c)}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{b+c}{2a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a^2 - bc}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\chi^2 = \frac{(b-c)^2}{b+c}\)
\(p = 1 - F_{\chi^2_1}(\chi^2)\)
|
McNemar’s chi-square test on the off-diagonal disagreement counts. |
|
\(\frac{b+c}{a+b+c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{4(ad-bc)}{(a+d)^2+(b+c)^2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{0.5(ab + ac) + bc}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)
|
The mutual information between two variables \(X\) and \(Y\) is denoted as \(I(X;Y)\). |
|
\(\frac{a}{\sqrt{(a+b)(a+c)}}\)
|
Also known as Fowlkes-Mallows Index. |
|
\(\frac{ad}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(OR = \frac{p_{11}p_{00}}{p_{10}p_{01}} = \frac{ad}{bc}\)
|
Odds ratio, also referred to in the code as the cross-product ratio. |
|
\(\frac{4bc}{(a+b+c+d)^2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\sqrt{\frac{\chi^2}{n+\chi^2}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\chi^2=\frac{n(ad-bc)^2}{(a+b)(a+c)(c+d)(b+d)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ab+bc}{ab+2bc+cd}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\sqrt{\frac{\rho}{n+\rho}}\)
\(\rho=\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\phi = \sqrt{\frac{\chi^2}{N}}\)
|
Gets \(\phi\). |
|
\(\frac{a+d}{a+2b+2c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{a+b+c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{n(b+c)-(b-c)^2}{(a+b+c+d)^2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{\min(a+b,a+c)}\)
|
Simpson (or Overlap). |
|
\(\frac{(b+c)^2}{(a+b+c+d)^2}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a+d}{a+b+c+d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{a+2b+2c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{2a+2d}{2a+b+c+2d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a+d}{b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ad}{(a+b)(a+c)(b+d)\sqrt{c+d}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{1}{4}\left(\frac{a}{a+b}+\frac{a}{a+c}+\frac{d}{b+d}+\frac{d}{b+d}\right)\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{2(a + d)}{2(a + d) + b + c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a^2}{(a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\log_{10} \frac{n\left(|ad-bc|-\frac{n}{2}\right)^2}{(a+b)(a+c)(b+d)(c+d)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(D_T = -\log_2(\mathrm{roger\_tanimoto})\)
|
Distance form derived directly from the Roger-Tanimoto similarity in the current implementation. |
|
\(\frac{a}{2a+b+c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a}{b + c}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{a(c+d)}{c(a+b)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{na - (a+b)(a+c)}{na + (a+b)(a+c)}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(b=0\)
\(c=0\)
\(a=0\)
\(\frac{y-1}{y+1}, y={\left(\frac{da}{bc}\right)}^{\frac{\pi}{4}}\)
|
Tetrachoric correlation ranges from \([-1, 1]\), where 0 indicates no agreement, 1 indicates perfect agreement and -1 indicates perfect disagreement. |
|
\(T = \sqrt{\chi^2}\)
|
Current implementation returns the square root of the chi-square statistic for the binary table. |
|
\(U(X|Y) = \frac{I(X;Y)}{H(X)}\)
\(H(X) = -\sum_x P(x) \log P(x)\)
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)
|
The uncertainty coefficient \(U(X|Y)\) for two variables \(X\) and \(Y\) is defined as follows. |
|
\(U_\mathrm{rev} = \frac{I(X;Y)}{H(\mathrm{rows})}\)
|
Mutual information normalized by the row entropy, reversing the default direction used by uncertainty_coefficient. |
|
\(\frac{b+c}{4a+4b+4c+4d}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{ad-bc}{ad+bc}\)
\(Q = \frac{\alpha - 1}{\alpha + 1}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{2bc}{ad+bc}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(\frac{\sqrt{ad}-\sqrt{bc}}{\sqrt{ad}+\sqrt{bc}}\)
|
Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n. |
|
\(Y = \frac{\sqrt\alpha - 1}{\sqrt\alpha + 1}\)
|
Yule’s Y is based off of the odds ratio or cross-product ratio, \(\alpha\). |
Confusion Matrix, Binary-Binary (29)
Name |
Equation |
Note |
|---|---|---|
|
\(ACC = \frac{TP + TN}{TP + TN + FP + FN}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(BA = \frac{TPR + TNR}{2}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(BI = TPR + TNR - 1\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(\frac{PLR}{NLR}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(F1 = \frac{PPV \times TPR}{PPV + TPR}\)
|
F1 score: harmonic mean of precision and sensitivity. |
|
\(FDR = \frac{FP}{FP + TP}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(FN\)
|
Raw false-negative count from the confusion matrix. |
|
\(FNR = \frac{FN}{FN + TP}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(FOR = \frac{FN}{FN + TN}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(FP\)
|
Raw false-positive count from the confusion matrix. |
|
\(FPR = \frac{FP}{FP + TN}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(MCC = \frac{TP + TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(MK = PPV + NPV - 1\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(N = TP + FN + FP + TN\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(NLR = \frac{FNR}{TNR}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(NPV = \frac{TN}{TN + FN}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(PLR = \frac{TPR}{FPR}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(PPV = \frac{TP}{TP + FP}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(PPV = \frac{TP}{TP + FP}\)
|
Alias of positive predictive value (PPV). |
|
\(\frac{TP + FN}{N}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(PT = \frac{\sqrt{TPR(-TNR + 1)} + TNR - 1}{TPR + TNR - 1}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(TPR = \frac{TP}{TP + FN}\)
|
Alias of true positive rate (TPR). |
|
\(TPR = \frac{TP}{TP + FN}\)
|
Alias of true positive rate (TPR). |
|
\(TNR = \frac{TN}{TN + FP}\)
|
Alias of true negative rate (TNR). |
|
\(TN\)
|
Raw true-negative count from the confusion matrix. |
|
\(TNR = \frac{TN}{TN + FP}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(TP\)
|
Raw true-positive count from the confusion matrix. |
|
\(TPR = \frac{TP}{TP + FN}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
|
\(TS = \frac{TP}{TP + FN + FP}\)
|
Confusion-matrix metric derived from the TP, FN, FP, TN counts. |
Categorical-Categorical (9)
Name |
Equation |
Note |
|---|---|---|
|
\(ARI = \frac{\sum_{ij} \binom{n_{ij}}{2} - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}{\frac{1}{2}\left(\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}\right) - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}\)
|
Adjusted Rand index for the categorical contingency table; values can be negative and the binomial terms can overflow for large n. |
|
\(\sum_i \sum_j \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\)
\(E_i = \frac{N_{i*} N_{*j}}{N}\)
|
The chi-square statistic \(\chi^2\), is defined as follows. |
|
\((R - 1)(C - 1)\)
|
Degrees of freedom for the chi-square statistic on the induced contingency table. |
|
\(P_e = 1 - \frac{\max_{c} N_{* c}}{N}\)
\(P_{e|r} = 1 - \frac{\sum_r \max_{c} N_{r c}}{N}\)
\(\lambda_{B|A} = \frac{P_e - P_{e|r}}{P_e}\)
|
Goodman-Kruskal’s lambda is the proportional reduction in error of predicting one variable b given another a: \(\lambda_{B|A}\). |
|
\(\lambda_{A|B} = \frac{\sum_c \max_r N_{rc} - \max_r N_{r*}}{N - \max_r N_{r*}}\)
|
Reverse-direction Goodman-Kruskal lambda, predicting rows from columns. |
|
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)
|
The mutual information between two variables \(X\) and \(Y\) is denoted as \(I(X;Y)\). |
|
\(\phi = \sqrt{\frac{\chi^2}{N}}\)
|
Gets \(\phi\). |
|
\(U(X|Y) = \frac{I(X;Y)}{H(X)}\)
\(H(X) = -\sum_x P(x) \log P(x)\)
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)
|
The uncertainty coefficient \(U(X|Y)\) for two variables \(X\) and \(Y\) is defined as follows. |
|
\(U_\mathrm{rev} = \frac{I(X;Y)}{H(\mathrm{rows})}\)
|
Mutual information normalized by the row entropy, reversing the default direction used by uncertainty_coefficient. |
Agreement, Categorical-Categorical (2)
Name |
Equation |
Note |
|---|---|---|
|
\(\kappa = \frac{\theta_1 - \theta_2}{1 - \theta_2}\)
\(\theta_1 = \sum_i p_{ii}\)
\(\theta_2 = \sum_i p_{i+}p_{+i}\)
|
Computes Cohen’s \(\kappa\). |
|
\(\kappa = \frac{\theta_1 - \theta_2}{1 - \theta_2}\)
\(\theta_1 = \frac{p_{ii}}{p_{i+}}\)
\(\theta_2 = p_{+i}\)
|
Cohen-Light \(\kappa\). |
Binary-Continuous, Biserial (3)
Name |
Equation |
Note |
|---|---|---|
|
\(r_{pb} = \frac{(y_1 - y_0)\sqrt{pq}}{\sigma}\)
\(r_b = r_{pb}\frac{\sqrt{pq}}{\phi(\Phi^{-1}(q))}\)
|
Biserial correlation using the point-biserial term with the standard normal PDF/CDF correction from the implementation. |
|
\(r_{pb} = \frac{(y_1 - y_0)\sqrt{pq}}{\sigma}\)
|
Point-biserial correlation between a binary variable and a continuous response. |
|
\(r_{rb} = \frac{2(y_1 - y_0)}{n}\)
|
Rank-biserial statistic as currently implemented from the two group means and sample size. |
Categorical-Continuous (7)
Name |
Equation |
Note |
|---|---|---|
|
\(F = \frac{SS_B / (k - 1)}{SS_W / (n - k)}\)
|
One-way ANOVA F statistic with p-value returned by scipy.stats.f_oneway. |
|
\(CH = \frac{\operatorname{tr}(B_k)}{\operatorname{tr}(W_k)} \cdot \frac{n-k}{k-1}\)
|
Calinski-Harabasz separation score from scikit-learn over the grouped continuous values. |
|
\(DB = \frac{1}{k}\sum_i \max_{j \neq i}\frac{s_i + s_j}{d_{ij}}\)
|
Davies-Bouldin index from scikit-learn over category-labelled one-dimensional samples. |
|
\(\eta = \sqrt{\eta^2}\)
|
Correlation ratio magnitude derived as the square root of eta_squared. |
|
\(\eta^2 = \frac{\sigma_{\bar{y}}^2}{\sigma_{y}^2}\)
|
Gets \(\eta^2 = \frac{\sigma_{\bar{y}}^2}{\sigma_{y}^2}\) |
|
\(H = \frac{12}{N(N+1)}\sum_i \frac{R_i^2}{n_i} - 3(N+1)\)
|
Kruskal-Wallis H statistic with p-value returned by scipy.stats.kruskal. |
|
\(s_i = \frac{b_i - a_i}{\max(a_i, b_i)}\)
|
Silhouette coefficient over category-labelled one-dimensional samples; the implementation returns the dataset average. |
Ordinal-Ordinal, Concordance (3)
Name |
Equation |
Note |
|---|---|---|
|
\(\gamma = \frac{\pi_c - \pi_d}{1 - \pi_t}\)
\(\pi_c = \frac{C}{n}\)
\(\pi_d = \frac{D}{n}\)
\(\pi_t = \frac{T}{n}\)
|
Goodman-Kruskal \(\gamma\) is like Somer’s D. |
|
\(\tau = \frac{C - D}{{{n}\choose{2}}}\)
|
Kendall’s \(\tau\) is defined as follows. |
|
\(d_{Y \cdot X} = \frac{\pi_c - \pi_d}{\pi_c + \pi_d + \pi_t^Y}\)
\(d_{X \cdot Y} = \frac{\pi_c - \pi_d}{\pi_c + \pi_d + \pi_t^X}\)
\(\pi_c = \frac{C}{n}\)
\(\pi_d = \frac{D}{n}\)
\(\pi_t^X = \frac{T^X}{n}\)
\(\pi_t^Y = \frac{T^Y}{n}\)
|
Computes Somers’ d for two continuous variables. |
Continuous-Continuous (4)
Name |
Equation |
Note |
|---|---|---|
|
\(\tau = \frac{C-D}{\sqrt{(C+D+T_x)(C+D+T_y)}}\)
|
Kendall rank correlation and p-value returned by scipy.stats.kendalltau. |
|
\(r = \frac{\sum_i (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i (y_i-\bar{y})^2}}\)
|
Pearson linear correlation and p-value returned by scipy.stats.pearsonr. |
|
\(y = \beta_0 + \beta_1 x\)
|
Linear regression via scipy.stats.linregress; this API returns the correlation coefficient r and p-value. |
|
\(\rho = \mathrm{corr}(\mathrm{rank}(x), \mathrm{rank}(y))\)
|
Spearman rank correlation and p-value returned by scipy.stats.spearmanr. |