filterData {doblin}R Documentation

Filter Lineage Data for Clustering

Description

This function filters lineage frequency data to retain only dominant and persistent barcodes suitable for clustering. It removes barcodes that do not meet a specified minimum mean frequency and a minimum number of time points with non-zero frequency. The function saves two CSV files: one with all original barcodes and one with the filtered set.

Usage

filterData(
  input_df,
  freq_threshold,
  time_threshold,
  output_directory,
  input_name
)

Arguments

input_df

A data frame containing the input data. It must have columns ID, Time, and Reads.

freq_threshold

A numeric value specifying the minimum mean frequency required to retain a barcode.

time_threshold

An integer specifying the minimum number of time points where the barcode's frequency is non-zero.

output_directory

A string specifying the directory where plots will be saved.

input_name

A string used as the base name for output files (e.g., "replicate1").

Value

A data frame containing the ID, relative frequency at each time point, mean frequency, and number of non-zero time points for each retained barcode.

Examples

# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)

# Apply filtering to retain dominant and persistent barcodes
filtered_df <- filterData(
  input_df = input_dataframe,
  freq_threshold = 0.00005,        
  time_threshold = 5,            
  output_directory = tempdir(),  
  input_name = "demo"            
)

[Package doblin version 0.1.1 Index]