nuss {NUSS} | R Documentation |
Mixed N-Grams and Unigram Sequence Segmentation (NUSS) function
Description
nuss
returns the data.frame containing
hashtag, its segmented version, ids of dictionary words,
number of words it have taken to segment the hashtag,
total number of points, and computed score.
Usage
nuss(sequences, texts)
Arguments
sequences |
character vector, sequence to be segmented, (e.g., hashtag) or without it. Case-insensitive. |
texts |
character vector, these are the texts used to create n-grams and unigram dictionary. Case-insensitive. |
Details
This function is an arbitrary combination of ngrams_dictionary, unigram_dictionary, ngrams_segmentation, unigram_sequence_segmentation, created to easily segment short texts based on text corpus.
Value
The output always will be data.frame with sequences, that were
The output is not in the input order. If needed, use
lapply
Examples
texts <- c("this is science",
"science is #fascinatingthing",
"this is a scientific approach",
"science is everywhere",
"the beauty of science")
nuss(c("thisisscience", "scienceisscience"), texts)
[Package NUSS version 0.1.0 Index]