ngrams_dictionary {NUSS}R Documentation

Create n-grams dictionary

Description

ngrams_dictionary returns the data.frame containing dictionary for ngrams_segmentation.

Usage

ngrams_dictionary(
  texts,
  clean = TRUE,
  ngram_min = 1,
  ngram_max = 5,
  points_filter = 1
)

Arguments

texts

character vector, these are the texts used to create n-grams dictionary. Case-sensitive.

clean

logical, indicating if the texts should be cleaned before creating n-grams dictionary.

ngram_min

numeric, sets the minimum number of words in creating the dictionary.

ngram_max

numeric, sets the maximum number of words in creating the dictionary.

points_filter

numeric, sets the minimal number of points (occurrences) of an n-gram to be included in the dictionary.

Value

The output always will be data.frame with 4 columns: 1) to_search, 2) to_replace, 3) id, 4) points.

Examples

texts <- c("this is science",
           "science is #fascinatingthing",
           "this is a scientific approach",
           "science is everywhere",
           "the beauty of science")
ngrams_dictionary(texts)
ngrams_dictionary(texts,
                  clean = FALSE)
ngrams_dictionary(texts,
                  clean = TRUE,
                  ngram_min = 2,
                  ngram_max = 2)


[Package NUSS version 0.1.0 Index]