marlax.engines.Engine#
- class marlax.engines.Engine(epsilon_start, epsilon_end, epsilon_test=0.0)[source]#
Bases:
objectCore engine managing training and testing phases.
- epsilon_start#
Initial exploration rate.
- Type:
float
- epsilon_end#
Final exploration rate after decay.
- Type:
float
- epsilon_test#
Exploration rate during testing.
- Type:
float
- __init__(epsilon_start, epsilon_end, epsilon_test=0.0)[source]#
Initialize the engine with epsilon schedule parameters.
- Parameters:
epsilon_start (float) – Starting epsilon for exploration.
epsilon_end (float) – Ending epsilon after decay.
epsilon_test (float, optional) – Epsilon used in testing. Defaults to 0.0.
Methods
__init__(epsilon_start, epsilon_end[, ...])Initialize the engine with epsilon schedule parameters.
test(env, logger[, num_steps, verbose, ...])Run the evaluation loop for a specified number of steps.
train(env, logger[, num_steps, alpha, ...])Run the training loop for a specified number of steps.
- test(env, logger, num_steps=100000, verbose=True, flush_every=1000000, regime_idx=0)[source]#
Run the evaluation loop for a specified number of steps.
- Parameters:
env (Environment) – The environment instance to test on.
logger (Tracer) – Logger for recording test data (or None).
num_steps (int, optional) – Number of evaluation iterations. Defaults to 100_000.
verbose (bool, optional) – Display progress bar if True. Defaults to True.
flush_every (int, optional) – Log flush interval. Defaults to 1_000_000.
regime_idx (int, optional) – Identifier for environment regime. Defaults to 0.
- Returns:
None
- train(env, logger, num_steps=1000000, alpha=0.1, gamma=0.9, verbose=True, flush_every=1000000, regime_idx=0)[source]#
Run the training loop for a specified number of steps.
- Parameters:
env (Environment) – The environment instance to train on.
logger (Tracer) – Logger for recording training data (or None).
num_steps (int, optional) – Number of training iterations. Defaults to 1_000_000.
alpha (float, optional) – Learning rate for agent updates. Defaults to 0.1.
gamma (float, optional) – Discount factor for future rewards. Defaults to 0.9.
verbose (bool, optional) – Display progress bar if True. Defaults to True.
flush_every (int, optional) – Log flush interval. Defaults to 1_000_000.
regime_idx (int, optional) – Identifier for environment regime. Defaults to 0.
- Returns:
None