filterData {doblin} | R Documentation |
Filter Lineage Data for Clustering
Description
This function filters lineage frequency data to retain only dominant and persistent barcodes suitable for clustering. It removes barcodes that do not meet a specified minimum mean frequency and a minimum number of time points with non-zero frequency. The function saves two CSV files: one with all original barcodes and one with the filtered set.
Usage
filterData(
input_df,
freq_threshold,
time_threshold,
output_directory,
input_name
)
Arguments
input_df |
A data frame containing the input data. It must have columns |
freq_threshold |
A numeric value specifying the minimum mean frequency required to retain a barcode. |
time_threshold |
An integer specifying the minimum number of time points where the barcode's frequency is non-zero. |
output_directory |
A string specifying the directory where plots will be saved. |
input_name |
A string used as the base name for output files (e.g., "replicate1"). |
Value
A data frame containing the ID, relative frequency at each time point, mean frequency, and number of non-zero time points for each retained barcode.
Examples
# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)
# Apply filtering to retain dominant and persistent barcodes
filtered_df <- filterData(
input_df = input_dataframe,
freq_threshold = 0.00005,
time_threshold = 5,
output_directory = tempdir(),
input_name = "demo"
)