Calculates a distance matrix from a matrix of probability distributions using Jensen-Shannon divergence. Adapted from https://enterotype.embl.de/.
Arguments
- M
a probability distribution matrix, e.g., normalized transcript compatibility counts.
- pseudocount
a small number to avoid division by zero errors.
- normalizeCounts
logical, whether to attempt to normalize by dividing by the column sums. Set to
TRUE
if this is, e.g., a count matrix.
References
https://web.archive.org/web/20240131141033/https://enterotype.embl.de/enterotypes.html#dm.
Examples
set.seed(42)
M <- matrix(rpois(100, lambda=100), ncol=5)
colnames(M) <- paste0("sample", 1:5)
rownames(M) <- paste0("gene", 1:20)
Mnorm <- apply(M, 2, function(x) x/sum(x))
Mjsd <- jsd(Mnorm)
# equivalently
Mjsd <- jsd(M, normalizeCounts=TRUE)
Mjsd
#> sample1 sample2 sample3 sample4
#> sample2 0.04351841
#> sample3 0.06114582 0.04682573
#> sample4 0.05485535 0.04587213 0.04641441
#> sample5 0.04869326 0.04697849 0.03939286 0.04894781
plot(hclust(Mjsd))