BatchContextualEpsilonGreedyPolicy {cramR}R Documentation

Batch Contextual Epsilon-Greedy Policy

Description

Batch Contextual Epsilon-Greedy Policy

Batch Contextual Epsilon-Greedy Policy

Details

Implements an epsilon-greedy exploration strategy for contextual bandits with batched updates.

Super class

cramR::NA

Public fields

epsilon

Probability of selecting a random arm (exploration rate).

batch_size

Number of rounds per batch before updating model parameters.

A_cc

List of Gram matrices (one per arm), used to accumulate sufficient statistics across batches.

b_cc

List of reward-weighted context sums (one per arm), updated batch-wise.

class_name

Internal class name identifier.

Methods

Public methods

Inherited methods

Method new()

Constructor for the Batch Epsilon-Greedy policy.

Usage
BatchContextualEpsilonGreedyPolicy$new(epsilon = 0.1, batch_size = 1)
Arguments
epsilon

Numeric between 0 and 1. Probability of random arm selection.

batch_size

Integer. Number of observations between parameter updates.


Method set_parameters()

Initializes the parameter structures for each arm.

Usage
BatchContextualEpsilonGreedyPolicy$set_parameters(context_params)
Arguments
context_params

A list with at least 'd' (number of features) and 'k' (number of arms).


Method get_action()

Chooses an arm based on epsilon-greedy logic and the current estimates.

Usage
BatchContextualEpsilonGreedyPolicy$get_action(t, context)
Arguments
t

Integer time step.

context

A list with contextual features and arm count.

Returns

A list with the selected action.


Method set_reward()

Updates model statistics based on observed reward. Updates occur once per batch.

Usage
BatchContextualEpsilonGreedyPolicy$set_reward(t, context, action, reward)
Arguments
t

Integer time step.

context

List of contextual features used for the action.

action

A list with the chosen arm.

reward

A list with the observed reward.

Returns

Updated parameter estimates.


Method clone()

The objects of this class are cloneable with this method.

Usage
BatchContextualEpsilonGreedyPolicy$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


[Package cramR version 0.1.0 Index]