extract_ip_features {featForge}R Documentation

Extract IP Address Features

Description

This function extracts a comprehensive set of features from a vector of IP address strings to support feature engineering in credit-scoring datasets. It processes both IPv4 and IPv6 addresses and returns a data frame with derived features. The features include IP version classification, octet-level breakdown for IPv4 addresses (with both string‐ and numeric-based octets), checks for leading zeros, a numeric conversion of the address, a basic approximation of IPv6 numeric values, pattern metrics such as a palindrome check and Shannon entropy, multicast status, and a Hilbert curve encoding for IPv4 addresses.

Usage

extract_ip_features(ip_addresses, error_on_invalid = FALSE)

Arguments

ip_addresses

A character vector of IP address strings.

error_on_invalid

Logical flag indicating how to handle invalid IP addresses. If TRUE, the function throws an error upon encountering any invalid IP address; if FALSE (the default), invalid IP addresses are replaced with NA and a warning is issued.

Details

The function follows these steps:

Value

A data frame with the following columns:

ip_version

A character vector indicating the IP version; either "IPv4" or "IPv6". Invalid addresses are set to NA.

ip_v4_octet1

The numeric conversion of the first octet of an IPv4 address as extracted from the IP string.

ip_v4_octet2

The numeric conversion of the second octet of an IPv4 address.

ip_v4_octet3

The numeric conversion of the third octet of an IPv4 address.

ip_v4_octet4

The numeric conversion of the fourth octet of an IPv4 address.

ip_v4_octet1_has_leading_zero

An integer flag indicating whether the first octet of an IPv4 address includes a leading zero.

ip_v4_octet2_has_leading_zero

An integer flag indicating whether the second octet includes a leading zero.

ip_v4_octet3_has_leading_zero

An integer flag indicating whether the third octet includes a leading zero.

ip_v4_octet4_has_leading_zero

An integer flag indicating whether the fourth octet includes a leading zero.

ip_leading_zero_count

An integer count of how many octets in an IPv4 address contain leading zeros.

ip_v4_numeric_vector

The 32-bit integer representation of an IPv4 address, computed as (A * 256^3) + (B * 256^2) + (C * 256) + D.

ip_v6_numeric_approx_vector

An approximate numeric conversion of an IPv6 address. This value is computed from the eight hextets and is intended for interval comparisons only; precision may be lost for large values (above 2^53).

ip_is_palindrome

An integer value indicating whether the entire IP address string is a palindrome (i.e., it reads the same forwards and backwards).

ip_entropy

A numeric value representing the Shannon entropy of the IP address string, computed over the distribution of its characters. Higher entropy values indicate a more varied (less repetitive) pattern.

Examples

# Load the package's sample dataset
data(featForge_sample_data)

# Extract IP features and combine them with the original IP column
result <- cbind(
  data.frame(ip = featForge_sample_data$ip),
  extract_ip_features(featForge_sample_data$ip)
)
print(result)


[Package featForge version 0.1.2 Index]