BatchContextualLinTSPolicy {cramR} | R Documentation |
Batch Contextual Thompson Sampling Policy
Description
Batch Contextual Thompson Sampling Policy
Batch Contextual Thompson Sampling Policy
Details
Implements Thompson Sampling for linear contextual bandits with batch updates.
Methods
- 'initialize(v = 0.2, batch_size = 1)': Constructor, sets variance and batch size. - 'set_parameters(context_params)': Initializes arm-level matrices. - 'get_action(t, context)': Samples from the posterior and selects action. - 'set_reward(t, context, action, reward)': Updates posterior statistics using observed feedback.
Super class
cramR::NA
Public fields
sigma
Numeric, posterior variance scale parameter.
batch_size
Integer, size of mini-batches before parameter updates.
A_cc
List of accumulated Gram matrices per arm.
b_cc
List of reward-weighted context sums per arm.
class_name
Internal name of the class.
Methods
Public methods
Inherited methods
Method new()
Constructor for the batch-based Thompson Sampling policy.
Usage
BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)
Arguments
v
Numeric. Standard deviation scaling parameter for posterior sampling.
batch_size
Integer. Number of rounds before parameters are updated.
Method set_parameters()
Initializes per-arm sufficient statistics.
Usage
BatchContextualLinTSPolicy$set_parameters(context_params)
Arguments
context_params
List with entries: 'unique' (feature vector), 'k' (number of arms).
Method get_action()
Samples from the posterior distribution of expected rewards and selects an action.
Usage
BatchContextualLinTSPolicy$get_action(t, context)
Arguments
t
Integer. Time step.
context
List containing the current context and arm information.
Returns
A list with the chosen arm ('choice').
Method set_reward()
Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every 'batch_size' rounds.
Usage
BatchContextualLinTSPolicy$set_reward(t, context, action, reward)
Arguments
t
Integer. Time step.
context
Context object containing feature info.
action
Chosen action (arm index).
reward
Observed reward for the action.
Returns
Updated internal parameters.
Method clone()
The objects of this class are cloneable with this method.
Usage
BatchContextualLinTSPolicy$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.