BatchLinUCBDisjointPolicyEpsilon {cramR}R Documentation

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Description

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Batch Disjoint LinUCB Policy with Epsilon-Greedy

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration, using batched updates.

Methods

- 'initialize(alpha = 1.0, epsilon = 0.1, batch_size = 1)': Constructor. - 'set_parameters(context_params)': Initializes sufficient statistics for each arm. - 'get_action(t, context)': Selects an arm using UCB scores and epsilon-greedy rule. - 'set_reward(t, context, action, reward)': Updates statistics and refreshes model at batch intervals.

Super class

cramR::NA

Public fields

alpha

Numeric, UCB exploration strength parameter.

epsilon

Numeric, probability of taking a random exploratory action.

batch_size

Integer, number of rounds per batch update.

A_cc

List of Gram matrices per arm, accumulated across batch.

b_cc

List of reward-weighted context vectors per arm.

class_name

Internal class name identifier.

Methods

Public methods

Inherited methods

Method new()

Constructor for batched LinUCB with epsilon-greedy exploration.

Usage
BatchLinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1, batch_size = 1)
Arguments
alpha

Numeric. UCB width parameter (exploration strength).

epsilon

Numeric. Probability of selecting a random arm.

batch_size

Integer. Number of rounds before updating parameters.


Method set_parameters()

Initialize arm-specific parameter containers.

Usage
BatchLinUCBDisjointPolicyEpsilon$set_parameters(context_params)
Arguments
context_params

List containing at least 'unique' (feature size) and 'k' (number of arms).


Method get_action()

Chooses an arm based on UCB and epsilon-greedy sampling.

Usage
BatchLinUCBDisjointPolicyEpsilon$get_action(t, context)
Arguments
t

Integer timestep.

context

List containing the context for the decision.

Returns

A list with the selected action.


Method set_reward()

Updates arm-specific sufficient statistics based on observed reward. Parameter updates occur only at the end of a batch.

Usage
BatchLinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
Arguments
t

Integer timestep.

context

Context object used for decision-making.

action

List containing the chosen action.

reward

List containing the observed reward.

Returns

Updated internal model parameters.


Method clone()

The objects of this class are cloneable with this method.

Usage
BatchLinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


[Package cramR version 0.1.0 Index]