reinSummary {tall} | R Documentation |
Summarize Reinert Clustering Results
Description
This function summarizes the results of the Reinert clustering algorithm, including the most frequent documents and significant terms for each cluster.
The input is the result returned by the term_per_cluster
function.
Usage
reinSummary(tc, n = 10)
Arguments
tc |
A list returned by the
|
n |
Integer. The number of top terms (based on Chi-squared value) to include in the summary for each cluster and sign. Default is 10. |
Details
This function performs the following steps:
Extracts the most frequent document for each cluster.
Summarizes the number of documents per cluster.
Selects the top
n
terms for each cluster, separated by positive and negative signs.Combines the terms and segment information into a final summary table.
Value
A data frame summarizing the clustering results. The table includes:
-
cluster
: The cluster ID. -
Positive terms
: The topn
positive terms for each cluster, concatenated into a single string. -
Negative terms
: The topn
negative terms for each cluster, concatenated into a single string. -
Most frequent document
: The document ID that appears most frequently in each cluster. -
N. of Documents per Cluster
: The number of documents in each cluster.
See Also
Examples
data(mobydick)
res <- reinert(
x = mobydick,
k = 10,
term = "token",
segment_size = 40,
min_segment_size = 5,
min_split_members = 10,
cc_test = 0.3,
tsj = 3
)
tc <- term_per_cluster(res, cutree = NULL, k = 1:10, negative = FALSE)
S <- reinSummary(tc, n = 10)
head(S, 10)