marlax.envs.GridWorld_r1#
- class marlax.envs.GridWorld_r1(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#
Bases:
GridWorldRegime 1: Single target โrlโ activated after center.
- __init__(grid, n_agents, target_rewards, together_reward, travel_reward)[source]#
Initialize the grid world parameters and agents.
- Parameters:
grid (tuple[int,int]) โ Grid dimensions as (width, height).
agents (list[Agent]) โ Agent instances present in the environment.
target_rewards (list[float]) โ Reward values per agent for correct target.
together_reward (float) โ Bonus reward if all agents share a cell.
travel_reward (float) โ Cost (negative reward) per move.
wrong_zone_penalty (float, optional) โ Penalty for entering a wrong zone. Defaults to -500.
mismatch_penalty (float, optional) โ Penalty if agents split between two target zones. Defaults to -250.
Methods
__init__(grid,ย n_agents,ย target_rewards,ย ...)Initialize the grid world parameters and agents.
check_and_activate_rewards()Check if any agent is at the center and no reward target is active.
check_mismatch()Detect if agents split between two correct target zones.
check_wrong_reward_zones()Check if any agent enters a non-target reward zone.
compute_rewards(rewards)Compute rewards based on agent positions and active reward target.
get_possible_states()Compute all possible next global states from current positions.
get_state()Get the current global state representation.
move_agents(actions)Update agent positions based on provided actions.
reset()Randomly reposition agents and clear active rewards.
step(actions)Execute one time step in the environment: