marlax.envs.GridWorld_r2#

class marlax.envs.GridWorld_r2(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Regime 2: Single target ‘ud’ activated after center.

__init__(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Initialize the grid world parameters and agents.

Parameters:

grid (tuple[int,int]) – Grid dimensions as (width, height).
agents (list[Agent]) – Agent instances present in the environment.
target_rewards (list[float]) – Reward values per agent for correct target.
together_reward (float) – Bonus reward if all agents share a cell.
travel_reward (float) – Cost (negative reward) per move.
wrong_zone_penalty (float, optional) – Penalty for entering a wrong zone. Defaults to -500.
mismatch_penalty (float, optional) – Penalty if agents split between two target zones. Defaults to -250.

Methods

`__init__`(grid, n_agents, target_rewards, ...)	Initialize the grid world parameters and agents.
`check_and_activate_rewards`()	Check if any agent is at the center and no reward target is active.
`check_mismatch`()	Detect if agents split between two correct target zones.
`check_wrong_reward_zones`()	Check if any agent enters a non-target reward zone.
`compute_rewards`(rewards)	Compute rewards based on agent positions and active reward target.
`get_possible_states`()	Compute all possible next global states from current positions.
`get_state`()	Get the current global state representation.
`move_agents`(actions)	Update agent positions based on provided actions.
`reset`()	Randomly reposition agents and clear active rewards.
`step`(actions)	Execute one time step in the environment:

marlax.envs.GridWorld_r2