ContextualLinearBandit {cramR} | R Documentation |
Contextual Linear Bandit Environment
Description
Contextual Linear Bandit Environment
Contextual Linear Bandit Environment
Details
An R6 class for simulating a contextual linear bandit environment with normally distributed rewards.
Methods
- 'initialize(k, d, list_betas, sigma = 0.1, binary_rewards = FALSE)': Constructor. - 'post_initialization()': Loads correct coefficients based on 'sim_id'. - 'get_context(t)': Returns context and sets internal reward vector. - 'get_reward(t, context_common, action)': Returns observed reward for an action.
Super class
cramR::NA
-> ContextualLinearBandit
Public fields
rewards
A vector of rewards for each arm in the current round.
betas
Coefficient matrix of the linear reward model (one column per arm).
sigma
Standard deviation of the Gaussian noise added to rewards.
binary
Logical, indicating whether to convert rewards into binary outcomes.
weights
The latent reward scores before noise and/or binarization.
list_betas
A list of coefficient matrices, one per simulation.
sim_id
Index for selecting which simulation's coefficients to use.
class_name
Name of the class for internal tracking.
Methods
Public methods
Inherited methods
Method new()
Usage
ContextualLinearBandit$new( k, d, list_betas, sigma = 0.1, binary_rewards = FALSE )
Arguments
k
Number of arms
d
Number of features
list_betas
A list of true beta matrices for each simulation
sigma
Standard deviation of Gaussian noise
binary_rewards
Logical, use binary rewards or not
Method post_initialization()
Set the simulation-specific coefficients for the current simulation.
Usage
ContextualLinearBandit$post_initialization()
Returns
No return value; modifies the internal state of the object.
Method get_context()
Usage
ContextualLinearBandit$get_context(t)
Arguments
t
Current time step
Returns
A list containing context vector 'X' and arm count 'k'
Method get_reward()
Usage
ContextualLinearBandit$get_reward(t, context_common, action)
Arguments
t
Current time step
context_common
Context shared across arms
action
Action taken by the policy
Returns
A list with reward and optimal arm/reward info
Method clone()
The objects of this class are cloneable with this method.
Usage
ContextualLinearBandit$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.