LinUCBDisjointPolicyEpsilon {cramR}R Documentation

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Description

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

LinUCB Disjoint Policy with Epsilon-Greedy Exploration

Details

Implements the disjoint LinUCB algorithm with upper confidence bounds and epsilon-greedy exploration.

Methods

- 'initialize(alpha = 1.0, epsilon = 0.1)': Create a new LinUCBDisjointPolicyEpsilon object. - 'set_parameters(context_params)': Initialize arm-level parameters. - 'get_action(t, context)': Selects an arm using epsilon-greedy UCB. - 'set_reward(t, context, action, reward)': Updates internal statistics based on observed reward.

Super class

cramR::NA

Public fields

alpha

Numeric, exploration parameter controlling the width of the confidence bound.

epsilon

Numeric, probability of selecting a random action (exploration).

class_name

Internal class name.

Methods

Public methods

Inherited methods

Method new()

Initializes the policy with UCB parameter alpha and exploration rate epsilon.

Usage
LinUCBDisjointPolicyEpsilon$new(alpha = 1, epsilon = 0.1)
Arguments
alpha

Numeric. Controls width of the UCB bonus.

epsilon

Numeric between 0 and 1. Probability of random action selection.


Method set_parameters()

Set arm-specific parameter structures.

Usage
LinUCBDisjointPolicyEpsilon$set_parameters(context_params)
Arguments
context_params

A list with context information, typically including the number of unique features.


Method get_action()

Selects an arm using epsilon-greedy Upper Confidence Bound (UCB).

Usage
LinUCBDisjointPolicyEpsilon$get_action(t, context)
Arguments
t

Integer time step.

context

A list with contextual features and number of arms.

Returns

A list containing the selected action.


Method set_reward()

Updates internal statistics using the observed reward for the selected arm.

Usage
LinUCBDisjointPolicyEpsilon$set_reward(t, context, action, reward)
Arguments
t

Integer time step.

context

Contextual features for all arms at time t.

action

A list containing the chosen arm.

reward

A list containing the observed reward for the selected arm.

Returns

Updated internal parameters.


Method clone()

The objects of this class are cloneable with this method.

Usage
LinUCBDisjointPolicyEpsilon$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


[Package cramR version 0.1.0 Index]