LinUCBDisjointPolicyEpsilon {cramR} | R Documentation |
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
Description
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
LinUCB Disjoint Policy with Epsilon-Greedy Exploration
Details
Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.
Methods
- 'initialize(alpha = 1.0, epsilon = 0.1)': Create a new LinUCBDisjointPolicyEpsilon object. - 'set_parameters(context_params)': Initialize arm-level parameters. - 'get_action(t, context)': Selects an arm using epsilon-greedy UCB. - 'set_reward(t, context, action, reward)': Updates internal statistics based on observed reward.
Super class
cramR::NA
Public fields
alpha
Numeric, exploration parameter controlling the width of the confidence bound.
epsilon
Numeric, probability of selecting a random action (exploration).
class_name
Internal class name.
Methods
Public methods
Inherited methods
Method new()
Initializes the policy with UCB parameter alpha
and exploration rate epsilon
.
Usage
LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)
Arguments
alpha
Numeric. Controls width of the UCB bonus.
epsilon
Numeric between 0 and 1. Probability of random action selection.
Method set_parameters()
Set arm-specific parameter structures.
Usage
LinUCBDisjointPolicyEpsilon$set_parameters(context_params)
Arguments
context_params
A list with context information, typically including the number of unique features.
Method get_action()
Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).
Usage
LinUCBDisjointPolicyEpsilon$get_action(t, context)
Arguments
t
Integer time step.
context
A list with contextual features and number of arms.
Returns
A list containing the selected action.
Method set_reward()
Updates internal statistics using the observed reward for the selected arm.
Usage
LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
Arguments
t
Integer time step.
context
Contextual features for all arms at time
t
.action
A list containing the chosen arm.
reward
A list containing the observed reward for the selected arm.
Returns
Updated internal parameters.
Method clone()
The objects of this class are cloneable with this method.
Usage
LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.