filterHC {doblin} | R Documentation |
Filter Hierarchical Clusters Based on Size and Dominance
Description
This function filters the results of hierarchical clustering by retaining only clusters
that contain at least n_members
unique lineages. To avoid excluding potentially dominant but small clusters,
the user may also provide a minimum average frequency threshold to retain small
clusters that include a dominant member.
Usage
filterHC(
series_filtered,
clusters,
n_members,
min_freq_ignored_clusters = NULL
)
Arguments
series_filtered |
A data frame preprocessed using |
clusters |
A data frame containing hierarchical clustering assignments (e.g., from |
n_members |
An integer specifying the minimum number of members (lineages) required for a cluster to be retained. |
min_freq_ignored_clusters |
Optional. A numeric value specifying the minimum average frequency required to retain
small clusters (i.e., those with fewer than |
Value
A data frame containing the filtered clusters, including both large clusters and optionally small clusters with at least
one dominant member (based on the min_freq_ignored_clusters
threshold).
Examples
# Load demo barcode count data (installed with the package)
demo_file <- system.file("extdata", "demo_input.csv", package = "doblin")
input_dataframe <- readr::read_csv(demo_file, show_col_types = FALSE)
# Filter data to retain dominant and persistent barcodes
filtered_df <- filterData(
input_df = input_dataframe,
freq_threshold = 0.00005,
time_threshold = 5,
output_directory = tempdir(),
input_name = "demo"
)
# Perform hierarchical clustering using Pearson correlation
cluster_assignments <- performHClustering(
filtered_data = filtered_df,
agglomeration_method = "average",
similarity_metric = "pearson",
output_directory = tempdir(),
input_name = "demo",
missing_values = "pairwise.complete.obs",
dtw_norm = NULL
)
# Filter clusters: keep only clusters with at least 8 members.
filtered_clusters <- filterHC(
series_filtered = filtered_df,
clusters = cluster_assignments,
n_members = 8,
min_freq_ignored_clusters = 0.0001
)