Quick List

The tables below are generated from the public measure properties in the codebase. Each row shows the public measure name, the defining equation used by the implementation, and a short note. Measures are grouped by variable-pair family, and the agreement metrics that were missing from the old quick list are included here too.

Binary-Binary (87)

Name

Equation

Note

adjusted_rand_index

\(ARI = \frac{\sum_{ij} \binom{n_{ij}}{2} - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}{\frac{1}{2}\left(\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}\right) - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}\)

Adjusted Rand index for the induced contingency table; values can be negative and the binomial terms can overflow for large n.

ample

\(\left|\frac{a(c+d)}{c(a+b)}\right|\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

anderberg

\(\frac{\sigma-\sigma'}{2n}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

baroni_urbani_buser_i

\(\frac{\sqrt{ad}+a}{\sqrt{ad}+a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

baroni_urbani_buser_ii

\(\frac{\sqrt{ad}+a-(b+c)}{\sqrt{ad}+a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

braun_banquet

\(\frac{a}{\max(a+b,a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

chisq

\(\sum_i \sum_j \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\)
\(E_i = \frac{N_{i*} N_{*j}}{N}\)

The chi-square statistic \(\chi^2\), is defined as follows.

chisq_dof

\((R - 1)(C - 1)\)

Degrees of freedom for the chi-square statistic on the induced contingency table.

chord

\(\sqrt{2\left(1 - \frac{a}{\sqrt{(a+b)(a+c)}}\right)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

cole_i

\(\frac{\sqrt{2}(ad-bc)}{\sqrt{(ad-bc)^2-(a+b)(a+c)(b+d)(c+d)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

cole_ii

\(\frac{ad-bc}{\min((a+b)(a+c),(b+d)(c+d))}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

contingency_coefficient

\(C = \sqrt{\frac{\chi^2}{n + \chi^2}}\)

Contingency coefficient computed from the chi-square statistic of the binary table.

cosine

\(\frac{a}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

cramer_v

\(V = \sqrt{\frac{\chi^2}{n}}\)

Cramer’s V as implemented for the binary 2x2 case.

dennis

\(\frac{ad-bc}{\sqrt{n(a+b)(a+c)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

dice

\(\frac{2a}{2a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

disperson

\(\frac{ad-bc}{(a+b+c+d)^2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

driver_kroeber

\(\frac{a}{2}\left(\frac{1}{a+b}+\frac{1}{a+c}\right)\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

euclid

\(\sqrt{b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

eyraud

\(\frac{n^2(na-(a+b)(a+c))}{(a+b)(a+c)(b+d)(c+d)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

fager_mcgowan

\(\frac{a}{\sqrt{(a+b)(a+c)}}-\frac{max(a+b,a+c)}{2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

faith

\(\frac{a+0.5d}{a+b+c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

forbes_ii

\(\frac{na-(a+b)(a+c)}{n \min(a+b,a+c) - (a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

forbesi

\(\frac{na}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

fossum

\(\frac{n(a-0.5)^2}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

gilbert_wells

\(\log a - \log n - \log \frac{a+b}{n} - \log \frac{a+c}{n}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

gk_lambda

\(P_e = 1 - \frac{\max_{c} N_{* c}}{N}\)
\(P_{e|r} = 1 - \frac{\sum_r \max_{c} N_{r c}}{N}\)
\(\lambda_{B|A} = \frac{P_e - P_{e|r}}{P_e}\)

Goodman-Kruskal’s lambda is the proportional reduction in error of predicting one variable b given another a: \(\lambda_{B|A}\).

gk_lambda_reversed

\(\lambda_{A|B} = \frac{\sum_c \max_r N_{rc} - \max_r N_{r*}}{N - \max_r N_{r*}}\)

Reverse-direction Goodman-Kruskal lambda, predicting rows from columns.

goodman_kruskal

\(\frac{\sigma - \sigma'}{2n-\sigma'}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

gower

\(\frac{a+d}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

gower_legendre

\(\frac{a+d}{a+0.5b+0.5c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

hamann

\(\frac{(a+d)-(b+c)}{a+b+c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

hamming

\(b+c\)

Hamming; Canberra; Manhattan; Cityblock; Minkowski

hellinger

\(2\sqrt{1 - \frac{a}{\sqrt{(a+b)(a+c)}}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

inner_product

\(a+d\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

intersection

\(a\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

jaccard

\(\frac{a}{a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

jaccard_3w

\(\frac{3a}{3a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

jaccard_distance

\(\frac{b + c}{a + b + c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

johnson

\(\frac{a}{a+b}+\frac{a}{a+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

kulcyznski_ii

\(\frac{0.5a(2a+b+c)}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

kulczynski_i

\(\frac{a}{b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

lance_williams

\(\frac{b+c}{2a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

mcconnaughey

\(\frac{a^2 - bc}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

mcnemar_test

\(\chi^2 = \frac{(b-c)^2}{b+c}\)
\(p = 1 - F_{\chi^2_1}(\chi^2)\)

McNemar’s chi-square test on the off-diagonal disagreement counts.

mean_manhattan

\(\frac{b+c}{a+b+c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

michael

\(\frac{4(ad-bc)}{(a+d)^2+(b+c)^2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

mountford

\(\frac{a}{0.5(ab + ac) + bc}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

mutual_information

\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)

The mutual information between two variables \(X\) and \(Y\) is denoted as \(I(X;Y)\).

ochia_i

\(\frac{a}{\sqrt{(a+b)(a+c)}}\)

Also known as Fowlkes-Mallows Index.

ochia_ii

\(\frac{ad}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

odds_ratio

\(OR = \frac{p_{11}p_{00}}{p_{10}p_{01}} = \frac{ad}{bc}\)

Odds ratio, also referred to in the code as the cross-product ratio.

pattern_difference

\(\frac{4bc}{(a+b+c+d)^2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

pearson_heron_i

\(\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

pearson_heron_ii

\(\sqrt{\frac{\chi^2}{n+\chi^2}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

pearson_i

\(\chi^2=\frac{n(ad-bc)^2}{(a+b)(a+c)(c+d)(b+d)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

peirce

\(\frac{ab+bc}{ab+2bc+cd}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

person_ii

\(\sqrt{\frac{\rho}{n+\rho}}\)
\(\rho=\frac{ad-bc}{\sqrt{(a+b)(a+c)(b+d)(c+d)}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

phi

\(\phi = \sqrt{\frac{\chi^2}{N}}\)

Gets \(\phi\).

roger_tanimoto

\(\frac{a+d}{a+2b+2c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

russel_rao

\(\frac{a}{a+b+c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

shape_difference

\(\frac{n(b+c)-(b-c)^2}{(a+b+c+d)^2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

simpson

\(\frac{a}{\min(a+b,a+c)}\)

Simpson (or Overlap).

size_difference

\(\frac{(b+c)^2}{(a+b+c+d)^2}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_michener

\(\frac{a+d}{a+b+c+d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_sneath_i

\(\frac{a}{a+2b+2c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_sneath_ii

\(\frac{2a+2d}{2a+b+c+2d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_sneath_iii

\(\frac{a+d}{b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_sneath_iv

\(\frac{ad}{(a+b)(a+c)(b+d)\sqrt{c+d}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sokal_sneath_v

\(\frac{1}{4}\left(\frac{a}{a+b}+\frac{a}{a+c}+\frac{d}{b+d}+\frac{d}{b+d}\right)\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sorensen_dice

\(\frac{2(a + d)}{2(a + d) + b + c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

sorgenfrei

\(\frac{a^2}{(a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

stiles

\(\log_{10} \frac{n\left(|ad-bc|-\frac{n}{2}\right)^2}{(a+b)(a+c)(b+d)(c+d)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

tanimoto_distance

\(D_T = -\log_2(\mathrm{roger\_tanimoto})\)

Distance form derived directly from the Roger-Tanimoto similarity in the current implementation.

tanimoto_i

\(\frac{a}{2a+b+c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

tanimoto_ii

\(\frac{a}{b + c}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

tarantula

\(\frac{a(c+d)}{c(a+b)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

tarwid

\(\frac{na - (a+b)(a+c)}{na + (a+b)(a+c)}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

tetrachoric

\(b=0\)
\(c=0\)
\(a=0\)
\(\frac{y-1}{y+1}, y={\left(\frac{da}{bc}\right)}^{\frac{\pi}{4}}\)

Tetrachoric correlation ranges from \([-1, 1]\), where 0 indicates no agreement, 1 indicates perfect agreement and -1 indicates perfect disagreement.

tschuprow_t

\(T = \sqrt{\chi^2}\)

Current implementation returns the square root of the chi-square statistic for the binary table.

uncertainty_coefficient

\(U(X|Y) = \frac{I(X;Y)}{H(X)}\)
\(H(X) = -\sum_x P(x) \log P(x)\)
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)

The uncertainty coefficient \(U(X|Y)\) for two variables \(X\) and \(Y\) is defined as follows.

uncertainty_coefficient_reversed

\(U_\mathrm{rev} = \frac{I(X;Y)}{H(\mathrm{rows})}\)

Mutual information normalized by the row entropy, reversing the default direction used by uncertainty_coefficient.

vari

\(\frac{b+c}{4a+4b+4c+4d}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

yule_q

\(\frac{ad-bc}{ad+bc}\)
\(Q = \frac{\alpha - 1}{\alpha + 1}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

yule_q_difference

\(\frac{2bc}{ad+bc}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

yule_w

\(\frac{\sqrt{ad}-\sqrt{bc}}{\sqrt{ad}+\sqrt{bc}}\)

Binary-binary coefficient derived from the 2x2 contingency cells a, b, c, d and total n.

yule_y

\(Y = \frac{\sqrt\alpha - 1}{\sqrt\alpha + 1}\)

Yule’s Y is based off of the odds ratio or cross-product ratio, \(\alpha\).

Confusion Matrix, Binary-Binary (29)

Name

Equation

Note

acc

\(ACC = \frac{TP + TN}{TP + TN + FP + FN}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

ba

\(BA = \frac{TPR + TNR}{2}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

bm

\(BI = TPR + TNR - 1\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

dor

\(\frac{PLR}{NLR}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

f1

\(F1 = \frac{PPV \times TPR}{PPV + TPR}\)

F1 score: harmonic mean of precision and sensitivity.

fdr

\(FDR = \frac{FP}{FP + TP}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

fn

\(FN\)

Raw false-negative count from the confusion matrix.

fnr

\(FNR = \frac{FN}{FN + TP}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

fomr

\(FOR = \frac{FN}{FN + TN}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

fp

\(FP\)

Raw false-positive count from the confusion matrix.

fpr

\(FPR = \frac{FP}{FP + TN}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

mcc

\(MCC = \frac{TP + TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

mk

\(MK = PPV + NPV - 1\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

n

\(N = TP + FN + FP + TN\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

nlr

\(NLR = \frac{FNR}{TNR}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

npv

\(NPV = \frac{TN}{TN + FN}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

plr

\(PLR = \frac{TPR}{FPR}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

ppv

\(PPV = \frac{TP}{TP + FP}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

precision

\(PPV = \frac{TP}{TP + FP}\)

Alias of positive predictive value (PPV).

prevalence

\(\frac{TP + FN}{N}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

pt

\(PT = \frac{\sqrt{TPR(-TNR + 1)} + TNR - 1}{TPR + TNR - 1}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

recall

\(TPR = \frac{TP}{TP + FN}\)

Alias of true positive rate (TPR).

sensitivity

\(TPR = \frac{TP}{TP + FN}\)

Alias of true positive rate (TPR).

specificity

\(TNR = \frac{TN}{TN + FP}\)

Alias of true negative rate (TNR).

tn

\(TN\)

Raw true-negative count from the confusion matrix.

tnr

\(TNR = \frac{TN}{TN + FP}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

tp

\(TP\)

Raw true-positive count from the confusion matrix.

tpr

\(TPR = \frac{TP}{TP + FN}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

ts

\(TS = \frac{TP}{TP + FN + FP}\)

Confusion-matrix metric derived from the TP, FN, FP, TN counts.

Categorical-Categorical (9)

Name

Equation

Note

adjusted_rand_index

\(ARI = \frac{\sum_{ij} \binom{n_{ij}}{2} - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}{\frac{1}{2}\left(\sum_i \binom{a_i}{2} + \sum_j \binom{b_j}{2}\right) - \frac{\left(\sum_i \binom{a_i}{2}\right)\left(\sum_j \binom{b_j}{2}\right)}{\binom{n}{2}}}\)

Adjusted Rand index for the categorical contingency table; values can be negative and the binomial terms can overflow for large n.

chisq

\(\sum_i \sum_j \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\)
\(E_i = \frac{N_{i*} N_{*j}}{N}\)

The chi-square statistic \(\chi^2\), is defined as follows.

chisq_dof

\((R - 1)(C - 1)\)

Degrees of freedom for the chi-square statistic on the induced contingency table.

gk_lambda

\(P_e = 1 - \frac{\max_{c} N_{* c}}{N}\)
\(P_{e|r} = 1 - \frac{\sum_r \max_{c} N_{r c}}{N}\)
\(\lambda_{B|A} = \frac{P_e - P_{e|r}}{P_e}\)

Goodman-Kruskal’s lambda is the proportional reduction in error of predicting one variable b given another a: \(\lambda_{B|A}\).

gk_lambda_reversed

\(\lambda_{A|B} = \frac{\sum_c \max_r N_{rc} - \max_r N_{r*}}{N - \max_r N_{r*}}\)

Reverse-direction Goodman-Kruskal lambda, predicting rows from columns.

mutual_information

\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)

The mutual information between two variables \(X\) and \(Y\) is denoted as \(I(X;Y)\).

phi

\(\phi = \sqrt{\frac{\chi^2}{N}}\)

Gets \(\phi\).

uncertainty_coefficient

\(U(X|Y) = \frac{I(X;Y)}{H(X)}\)
\(H(X) = -\sum_x P(x) \log P(x)\)
\(I(X;Y) = \sum_y \sum_x P(x, y) \log \frac{P(x, y)}{P(x) P(y)}\)

The uncertainty coefficient \(U(X|Y)\) for two variables \(X\) and \(Y\) is defined as follows.

uncertainty_coefficient_reversed

\(U_\mathrm{rev} = \frac{I(X;Y)}{H(\mathrm{rows})}\)

Mutual information normalized by the row entropy, reversing the default direction used by uncertainty_coefficient.

Agreement, Categorical-Categorical (2)

Name

Equation

Note

chohen_k

\(\kappa = \frac{\theta_1 - \theta_2}{1 - \theta_2}\)
\(\theta_1 = \sum_i p_{ii}\)
\(\theta_2 = \sum_i p_{i+}p_{+i}\)

Computes Cohen’s \(\kappa\).

cohen_light_k

\(\kappa = \frac{\theta_1 - \theta_2}{1 - \theta_2}\)
\(\theta_1 = \frac{p_{ii}}{p_{i+}}\)
\(\theta_2 = p_{+i}\)

Cohen-Light \(\kappa\).

Binary-Continuous, Biserial (3)

Name

Equation

Note

biserial

\(r_{pb} = \frac{(y_1 - y_0)\sqrt{pq}}{\sigma}\)
\(r_b = r_{pb}\frac{\sqrt{pq}}{\phi(\Phi^{-1}(q))}\)

Biserial correlation using the point-biserial term with the standard normal PDF/CDF correction from the implementation.

point_biserial

\(r_{pb} = \frac{(y_1 - y_0)\sqrt{pq}}{\sigma}\)

Point-biserial correlation between a binary variable and a continuous response.

rank_biserial

\(r_{rb} = \frac{2(y_1 - y_0)}{n}\)

Rank-biserial statistic as currently implemented from the two group means and sample size.

Categorical-Continuous (7)

Name

Equation

Note

anova

\(F = \frac{SS_B / (k - 1)}{SS_W / (n - k)}\)

One-way ANOVA F statistic with p-value returned by scipy.stats.f_oneway.

calinski_harabasz

\(CH = \frac{\operatorname{tr}(B_k)}{\operatorname{tr}(W_k)} \cdot \frac{n-k}{k-1}\)

Calinski-Harabasz separation score from scikit-learn over the grouped continuous values.

davies_bouldin

\(DB = \frac{1}{k}\sum_i \max_{j \neq i}\frac{s_i + s_j}{d_{ij}}\)

Davies-Bouldin index from scikit-learn over category-labelled one-dimensional samples.

eta

\(\eta = \sqrt{\eta^2}\)

Correlation ratio magnitude derived as the square root of eta_squared.

eta_squared

\(\eta^2 = \frac{\sigma_{\bar{y}}^2}{\sigma_{y}^2}\)

Gets \(\eta^2 = \frac{\sigma_{\bar{y}}^2}{\sigma_{y}^2}\)

kruskal

\(H = \frac{12}{N(N+1)}\sum_i \frac{R_i^2}{n_i} - 3(N+1)\)

Kruskal-Wallis H statistic with p-value returned by scipy.stats.kruskal.

silhouette

\(s_i = \frac{b_i - a_i}{\max(a_i, b_i)}\)

Silhouette coefficient over category-labelled one-dimensional samples; the implementation returns the dataset average.

Ordinal-Ordinal, Concordance (3)

Name

Equation

Note

goodman_kruskal_gamma

\(\gamma = \frac{\pi_c - \pi_d}{1 - \pi_t}\)
\(\pi_c = \frac{C}{n}\)
\(\pi_d = \frac{D}{n}\)
\(\pi_t = \frac{T}{n}\)

Goodman-Kruskal \(\gamma\) is like Somer’s D.

kendall_tau

\(\tau = \frac{C - D}{{{n}\choose{2}}}\)

Kendall’s \(\tau\) is defined as follows.

somers_d

\(d_{Y \cdot X} = \frac{\pi_c - \pi_d}{\pi_c + \pi_d + \pi_t^Y}\)
\(d_{X \cdot Y} = \frac{\pi_c - \pi_d}{\pi_c + \pi_d + \pi_t^X}\)
\(\pi_c = \frac{C}{n}\)
\(\pi_d = \frac{D}{n}\)
\(\pi_t^X = \frac{T^X}{n}\)
\(\pi_t^Y = \frac{T^Y}{n}\)

Computes Somers’ d for two continuous variables.

Continuous-Continuous (4)

Name

Equation

Note

kendall

\(\tau = \frac{C-D}{\sqrt{(C+D+T_x)(C+D+T_y)}}\)

Kendall rank correlation and p-value returned by scipy.stats.kendalltau.

pearson

\(r = \frac{\sum_i (x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i (y_i-\bar{y})^2}}\)

Pearson linear correlation and p-value returned by scipy.stats.pearsonr.

regression

\(y = \beta_0 + \beta_1 x\)

Linear regression via scipy.stats.linregress; this API returns the correlation coefficient r and p-value.

spearman

\(\rho = \mathrm{corr}(\mathrm{rank}(x), \mathrm{rank}(y))\)

Spearman rank correlation and p-value returned by scipy.stats.spearmanr.