measure string distances between sibling tips
compute_tip_sibling_distances.Rd
Basically, this function is useful for asking if one tree groups more similar tips together better than another tree.
Usage
compute_tip_sibling_distances(
states_df,
tree,
states_col = "state",
cell_id_col = "cell_id"
)
Details
For sibling tips, measure the distance between their states, treating the states across the genome as a string and obtaining a string distance.
States for a cell id are first converted to letters (to prevent double digit states from counting as 2 characters) and then made into a single string across the genome for each cell. I.e., 2 2 2 3 3 3 10 -> C C C D D D K see dlptools::map_states_to_letters() for details.
Then for each tip, it's sister tip is found and the string distance is measured. If the sister to a tip is a clade, the mean distance to all tips in the clade are found. E.g., in tree (A, (B, C)) the sister to A is both B & C. See dlptools::get_dist_to_sibs() for details.
Finally, a mean distance across all sibling clades is computed and returned.