marlax.envs.GridWorld_r0#

class marlax.envs.GridWorld_r0(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Bases: GridWorld

Regime 0: Single-step reward at center regardless of target labels.

__init__(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Initialize regime-0 grid world (center-only reward).

Methods

__init__(grid, n_agents, target_rewards, ...)

Initialize regime-0 grid world (center-only reward).

check_and_activate_rewards()

Check if any agent is at the center and no reward target is active.

check_mismatch()

Detect if agents split between two correct target zones.

check_wrong_reward_zones()

Check if any agent enters a non-target reward zone.

compute_rewards(rewards)

Reward when any agent reaches the center cell.

get_possible_states()

Compute all possible next global states from current positions.

get_state()

Get the current global state representation.

move_agents(actions)

Update agent positions based on provided actions.

reset()

Randomly reposition agents and clear active rewards.

step(actions)

Execute one time step in the environment:

compute_rewards(rewards)[source]#

Reward when any agent reaches the center cell.

Returns:

(collected, rewards)

Return type:

tuple