composite_id {corella} | R Documentation |
Create unique identifier columns
Description
A unique identifier is a pattern of words, letters and/or numbers that is unique to a single record within a dataset. Unique identifiers are useful because they identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information(and would otherwise be considered a duplicate).
The identifier functions in corella make it easier to
generate columns with unique identifiers in a dataset. These functions can
be used within set_events()
, set_occurrences()
, or (equivalently)
dplyr::mutate()
.
Usage
composite_id(..., sep = "-")
sequential_id(width)
random_id()
Arguments
... |
Zero or more variable names from the tibble being
mutated (unquoted), and/or zero or more |
sep |
Character used to separate field values. Defaults to |
width |
(Integer) how many characters should the resulting string be? Defaults to one plus the order of magnitude of the largest number. |
Details
Generally speaking, it is better to use existing
information from a dataset to generate identifiers. For this reason we
recommend using composite_id()
to aggregate existing fields, if no
such composite is already present within the dataset. Composite IDs are
more meaningful and stable; they are easier to check and harder to overwrite.
It is possible to call
sequential_id()
or random_id()
within
composite_id()
to combine existing and new columns.
Value
An amended tibble
containing a column with identifiers in the
requested format.
Examples
df <- tibble::tibble(
eventDate = paste0(rep(c(2020:2024), 3), "-01-01"),
basisOfRecord = "humanObservation",
site = rep(c("A01", "A02", "A03"), each = 5)
)
# Add composite ID using a random ID, site name and eventDate
df |>
set_occurrences(
occurrenceID = composite_id(random_id(),
site,
eventDate)
)
# Add composite ID using a sequential number, site name and eventDate
df |>
set_occurrences(
occurrenceID = composite_id(sequential_id(),
site,
eventDate)
)