BatchContextualEpsilonGreedyPolicy {cramR} | R Documentation |
Batch Contextual Epsilon-Greedy Policy
Description
Batch Contextual Epsilon-Greedy Policy
Batch Contextual Epsilon-Greedy Policy
Details
Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.
Super class
cramR::NA
Public fields
epsilon
Probability of selecting a random arm (exploration rate).
batch_size
Number of rounds per batch before updating model parameters.
A_cc
List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.
b_cc
List of reward-weighted context sums (one per arm), updated batch-wise.
class_name
Internal class name identifier.
Methods
Public methods
Inherited methods
Method new()
Constructor for the Batch Epsilon-Greedy policy.
Usage
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)
Arguments
epsilon
Numeric between 0 and 1. Probability of random arm selection.
batch_size
Integer. Number of observations between parameter updates.
Method set_parameters()
Initializes the parameter structures for each arm.
Usage
BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)
Arguments
context_params
A list with at least 'd' (number of features) and 'k' (number of arms).
Method get_action()
Chooses an arm based on epsilon-greedy logic and the current estimates.
Usage
BatchContextualEpsilonGreedyPolicy$get_action(t, context)
Arguments
t
Integer time step.
context
A list with contextual features and arm count.
Returns
A list with the selected action.
Method set_reward()
Updates model statistics based on observed reward. Updates occur once per batch.
Usage
BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)
Arguments
t
Integer time step.
context
List of contextual features used for the action.
action
A list with the chosen arm.
reward
A list with the observed reward.
Returns
Updated parameter estimates.
Method clone()
The objects of this class are cloneable with this method.
Usage
BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.