marlax.engines.Engine#

class marlax.engines.Engine(epsilon_start, epsilon_end, epsilon_test=0.0)[source]#

Bases: object

Core engine managing training and testing phases.

epsilon_start#

Initial exploration rate.

Type:

float

epsilon_end#

Final exploration rate after decay.

Type:

float

epsilon_test#

Exploration rate during testing.

Type:

float

__init__(epsilon_start, epsilon_end, epsilon_test=0.0)[source]#

Initialize the engine with epsilon schedule parameters.

Parameters:
  • epsilon_start (float) – Starting epsilon for exploration.

  • epsilon_end (float) – Ending epsilon after decay.

  • epsilon_test (float, optional) – Epsilon used in testing. Defaults to 0.0.

Methods

__init__(epsilon_start, epsilon_end[, ...])

Initialize the engine with epsilon schedule parameters.

test(env, logger[, num_steps, verbose, ...])

Run the evaluation loop for a specified number of steps.

train(env, logger[, num_steps, alpha, ...])

Run the training loop for a specified number of steps.

test(env, logger, num_steps=100000, verbose=True, flush_every=1000000, regime_idx=0)[source]#

Run the evaluation loop for a specified number of steps.

Parameters:
  • env (Environment) – The environment instance to test on.

  • logger (Tracer) – Logger for recording test data (or None).

  • num_steps (int, optional) – Number of evaluation iterations. Defaults to 100_000.

  • verbose (bool, optional) – Display progress bar if True. Defaults to True.

  • flush_every (int, optional) – Log flush interval. Defaults to 1_000_000.

  • regime_idx (int, optional) – Identifier for environment regime. Defaults to 0.

Returns:

None

train(env, logger, num_steps=1000000, alpha=0.1, gamma=0.9, verbose=True, flush_every=1000000, regime_idx=0)[source]#

Run the training loop for a specified number of steps.

Parameters:
  • env (Environment) – The environment instance to train on.

  • logger (Tracer) – Logger for recording training data (or None).

  • num_steps (int, optional) – Number of training iterations. Defaults to 1_000_000.

  • alpha (float, optional) – Learning rate for agent updates. Defaults to 0.1.

  • gamma (float, optional) – Discount factor for future rewards. Defaults to 0.9.

  • verbose (bool, optional) – Display progress bar if True. Defaults to True.

  • flush_every (int, optional) – Log flush interval. Defaults to 1_000_000.

  • regime_idx (int, optional) – Identifier for environment regime. Defaults to 0.

Returns:

None