marlax.envs.GridWorld_r2

marlax.envs.GridWorld_r2#

class marlax.envs.GridWorld_r2(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Bases: GridWorld

Regime 2: Single target โ€˜udโ€™ activated after center.

__init__(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#

Initialize the grid world parameters and agents.

Parameters:
  • grid (tuple[int,int]) โ€“ Grid dimensions as (width, height).

  • agents (list[Agent]) โ€“ Agent instances present in the environment.

  • target_rewards (list[float]) โ€“ Reward values per agent for correct target.

  • together_reward (float) โ€“ Bonus reward if all agents share a cell.

  • travel_reward (float) โ€“ Cost (negative reward) per move.

  • wrong_zone_penalty (float, optional) โ€“ Penalty for entering a wrong zone. Defaults to -500.

  • mismatch_penalty (float, optional) โ€“ Penalty if agents split between two target zones. Defaults to -250.

Methods

__init__(grid,ย n_agents,ย target_rewards,ย ...)

Initialize the grid world parameters and agents.

check_and_activate_rewards()

Check if any agent is at the center and no reward target is active.

check_mismatch()

Detect if agents split between two correct target zones.

check_wrong_reward_zones()

Check if any agent enters a non-target reward zone.

compute_rewards(rewards)

Compute rewards based on agent positions and active reward target.

get_possible_states()

Compute all possible next global states from current positions.

get_state()

Get the current global state representation.

move_agents(actions)

Update agent positions based on provided actions.

reset()

Randomly reposition agents and clear active rewards.

step(actions)

Execute one time step in the environment: