BatchContextualLinTSPolicy {cramR}R Documentation

Batch Contextual Thompson Sampling Policy

Description

Batch Contextual Thompson Sampling Policy

Batch Contextual Thompson Sampling Policy

Details

Implements Thompson Sampling for linear contextual bandits with batch updates.

Methods

- 'initialize(v = 0.2, batch_size = 1)': Constructor, sets variance and batch size. - 'set_parameters(context_params)': Initializes arm-level matrices. - 'get_action(t, context)': Samples from the posterior and selects action. - 'set_reward(t, context, action, reward)': Updates posterior statistics using observed feedback.

Super class

cramR::NA

Public fields

sigma

Numeric, posterior variance scale parameter.

batch_size

Integer, size of mini-batches before parameter updates.

A_cc

List of accumulated Gram matrices per arm.

b_cc

List of reward-weighted context sums per arm.

class_name

Internal name of the class.

Methods

Public methods

Inherited methods

Method new()

Constructor for the batch-based Thompson Sampling policy.

Usage
BatchContextualLinTSPolicy$new(v = 0.2, batch_size = 1)
Arguments
v

Numeric. Standard deviation scaling parameter for posterior sampling.

batch_size

Integer. Number of rounds before parameters are updated.


Method set_parameters()

Initializes per-arm sufficient statistics.

Usage
BatchContextualLinTSPolicy$set_parameters(context_params)
Arguments
context_params

List with entries: 'unique' (feature vector), 'k' (number of arms).


Method get_action()

Samples from the posterior distribution of expected rewards and selects an action.

Usage
BatchContextualLinTSPolicy$get_action(t, context)
Arguments
t

Integer. Time step.

context

List containing the current context and arm information.

Returns

A list with the chosen arm ('choice').


Method set_reward()

Updates Gram matrix and response vector for the chosen arm. Parameters are refreshed every 'batch_size' rounds.

Usage
BatchContextualLinTSPolicy$set_reward(t, context, action, reward)
Arguments
t

Integer. Time step.

context

Context object containing feature info.

action

Chosen action (arm index).

reward

Observed reward for the action.

Returns

Updated internal parameters.


Method clone()

The objects of this class are cloneable with this method.

Usage
BatchContextualLinTSPolicy$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


[Package cramR version 0.1.0 Index]