preprocess_text {text2emotion} | R Documentation |
Preprocess Text with Slang Handling
Description
This function performs multi-stage text preprocessing, including lowercasing, HTML cleaning, punctuation normalization, contraction expansion, internet slang replacement, emoticon replacement, and final standardization.
Usage
preprocess_text(text, use_textclean = TRUE, custom_slang = NULL)
Arguments
text |
A character vector of input texts. |
use_textclean |
Logical. Whether to use |
custom_slang |
A named character vector providing user-defined slang mappings. Optional. |
Details
The preprocessing pipeline includes:
Lowercasing the text.
Replacing HTML entities and non-ASCII characters.
Expanding common English contractions (e.g., "I'm" -> "I am").
Replacing internet slang and emoticons if
use_textclean
isTRUE
.Handling additional slang defined by the user.
Normalizing repeated punctuations and whitespace.
Value
A character vector of cleaned and normalized text.
Examples
preprocess_text("I'm feeling lit rn!!!")
preprocess_text("I can't believe it... lol :)", use_textclean = TRUE)