Skip to contents

Can optionally specify a padding to mark locations of bins as being close enough to a centromere. Often bins near centromeres are corrupt in their state calls. In the past, we have filtered within 3 Mb of centromeres.

Usage

mark_bins_overlapping_centromeres(
  reads_df,
  padding = 0,
  bin_start_col = "start",
  bin_end_col = "end",
  version = c("hg19", "hg38")
)

Arguments

reads_df

tibble of read data

padding

int of number of BP to add to each side of the centromere

bin_start_col

Default: start. column name of the start of bins.

bin_end_col

Default: end column name of the end of bins.

version

default 'hg19', or choose 'hg38' for locations of centromeres.

Value

input table, but with a boolean 'within_centro' column added (and potentially other centromere information columns, if needed)

Details

IMPORTANT! This adds the padding to each side of the centromere. So if you specify a padding of 3 Mb, it will be within 3 Mb of the start and 3 Mb of the end of the centromere.