clean_lab_result {lab2clean} | R Documentation |
Clean and Standardize Laboratory Result Values
Description
This function is designed to clean and standardize laboratory result values. It creates two new columns "clean_result" and "scale_type" without altering the original result values. The function is part of a comprehensive R package designed for cleaning laboratory datasets.
Usage
clean_lab_result(
lab_data,
raw_result,
locale = "NO",
report = TRUE,
n_records = NA
)
Arguments
lab_data |
A data frame containing laboratory data. |
raw_result |
The column in |
locale |
A string representing the locale for the laboratory data. Defaults to "NO". |
report |
A report is written in the console. Defaults to "TRUE". |
n_records |
In case you are loading a grouped list of distinct results, then you can assign the n_records to the column that contains the frequency of each distinct result. Defaults to NA |
Details
The function undergoes the following methodology:
Clear Typos: Removes typographical errors and extraneous characters.
Handle Extra Variables: Identifies and separates extra variables from result values.
Detect and Assign Scale Types: Identifies and assigns the scale type using regular expressions.
Number Formatting: Standardizes number formats based on predefined rules and locale.
Mining Text Results: Identifies common words and patterns in text results.
Internal Datasets:
The function uses an internal dataset; common_words_languages.csv
which contains common words
in various languages used for pattern identification in text result values.
Value
A modified lab_data
data frame with additional columns:
-
clean_result
: Cleaned and standardized result values. -
scale_type
: The scale type of result values (Quantitative, Ordinal, Nominal). -
cleaning_comments
: Comments about the cleaning process for each record.
Note
This function is part of a larger data cleaning pipeline and should be evaluated in that context. The package framework includes functions for cleaning result values and validating quantitative results for each test identifier.
Performance of the function can be affected by the size of lab_data
. Considerations for data size
or pre-processing may be needed.
Author(s)
Ahmed Zayed ahmed.zayed@kuleuven.be
See Also
Function 2 for result validation,