process_tibble_uniprot {oglcnac}R Documentation

Process a Tibble of UniProt Data

Description

This function processes a tibble containing accession and accession_source columns. It retrieves data from the UniProt API for rows with accession_source == "UniProt" and overwrites or creates the entry_name, protein_name, and gene_name columns only if the parsed values are not NULL or NA.

Usage

process_tibble_uniprot(
  data,
  accession_col = "accession",
  accession_source_col = "accession_source",
  entry_name_col = "entry_name",
  protein_name_col = "protein_name",
  gene_name_col = "gene_name"
)

Arguments

data

A tibble containing at least accession and accession_source columns.

accession_col

The column name for accession numbers (default: "accession").

accession_source_col

The column name for accession sources (default: "accession_source").

entry_name_col

The column name for entry names (default: "entry_name").

protein_name_col

The column name for protein names (default: "protein_name").

gene_name_col

The column name for gene names (default: "gene_name").

Value

A tibble with UniProt data processed.

Examples

# Example usage:

# Load necessary library
library(tibble)

# Reduced example data as an R tibble
test_data <- tibble::tibble(
  id = c(1, 78, 83, 87),
  species = c("mouse", "mouse", "rat", "mouse"),
  sample_type = c("brain", "brain", "brain", "brain"),
  accession = c("O88737", "O35927", "Q9R064", "P51611"),
  accession_source = c("OtherDB", "UniProt", "UniProt", "UniProt"),
  entry_name = c("BSN_MOUSE", NA, "GORS2_RAT", NA),
  protein_name = c("Protein bassoon", NA, "Golgi reassembly-stacking protein2", NA),
  gene_name = c("Bsn", NA, "Gorasp2", NA)
)

# Process the tibble
result_data <- process_tibble_uniprot(test_data)

# Compare the original and processed tibbles
compare_tibbles_uniprot(test_data, result_data)


[Package oglcnac version 0.1.5 Index]