transformer_vocab {pangoling} | R Documentation |
Returns the vocabulary of a model
Description
Returns the (decoded) vocabulary of a model.
Usage
transformer_vocab(
model = getOption("pangoling.causal.default"),
add_special_tokens = NULL,
decode = FALSE,
config_tokenizer = NULL
)
Arguments
model |
Name of a pre-trained model or folder. One should be able to use models based on "gpt2". See hugging face website. |
add_special_tokens |
Whether to include special tokens. It has the same default as the AutoTokenizer method in Python. |
decode |
Logical. If |
config_tokenizer |
List with other arguments that control how the tokenizer from Hugging Face is accessed. |
Value
A vector with the vocabulary of a model.
See Also
Other token-related functions:
ntokens()
,
tokenize_lst()
Examples
transformer_vocab(model = "gpt2") |>
head()
[Package pangoling version 1.0.3 Index]