reinSummary {tall}R Documentation

Summarize Reinert Clustering Results

Description

This function summarizes the results of the Reinert clustering algorithm, including the most frequent documents and significant terms for each cluster. The input is the result returned by the term_per_cluster function.

Usage

reinSummary(tc, n = 10)

Arguments

tc

A list returned by the term_per_cluster function. The list includes:

  • segments: A data frame with segments information, including cluster and doc_id.

  • terms: A data frame with terms information, including cluster, sign, chi_square, and term.

n

Integer. The number of top terms (based on Chi-squared value) to include in the summary for each cluster and sign. Default is 10.

Details

This function performs the following steps:

  1. Extracts the most frequent document for each cluster.

  2. Summarizes the number of documents per cluster.

  3. Selects the top n terms for each cluster, separated by positive and negative signs.

  4. Combines the terms and segment information into a final summary table.

Value

A data frame summarizing the clustering results. The table includes:

See Also

term_per_cluster, reinPlot

Examples


data(mobydick)
res <- reinert(
  x = mobydick,
  k = 10,
  term = "token",
  segment_size = 40,
  min_segment_size = 5,
  min_split_members = 10,
  cc_test = 0.3,
  tsj = 3
)

tc <- term_per_cluster(res, cutree = NULL, k = 1:10, negative = FALSE)

S <- reinSummary(tc, n = 10)

head(S, 10)



[Package tall version 0.3.0 Index]