Agent Rewards

The reward is 0 in all steps except the last. At the end of the episode, the reward is the negative of the sum of the distances of the routes traveled by all agents, minus the sum of the penalties for each service not performed. The penalty for a not performed service is 2 times the distance from the depot to that service.

Agent reward settings are defined in file env_agent_reward.py.

Dense Reward

At every step, the reward is the negative distance traveled by the agent. At the end of the episode, a penalty is given equaling \(10\) times the negative distance from the depot to the not attended services.

class maenvs4vrp.environments.gmtdvrp.env_agent_reward.DenseReward[source]

GMTDVRP dense reward class.

__init__()[source]

Constructor.

Parameters:

n/a.

Returns:

None.

get_reward(action)[source]

Get reward and penalty.

Parameters:

action (torch.Tensor) – Tensor with agent moves.

Returns:

Reward. penalty(torch.Tensor): Penalty.

Return type:

reward(torch.Tensor)

set_env(env)[source]

Set environment.

Parameters:

env (AECEnv) – Environment.

Returns:

None.

Sparse Reward

The reward is 0 in all steps except the last. At the end of the episode, the reward is the negative of the sum of the distances of the routes traveled by all agents minus the sum of the penalties for each service not performed. The penalty for a not-performed service is \(10\) times the distance from the depot to that service.

class maenvs4vrp.environments.gmtdvrp.env_agent_reward.SparseReward[source]

GMTDVRP sparse reward class.

__init__()[source]

Constructor.

Parameters:

n/a.

Returns:

None.

get_reward(action)[source]

Get reward and penalty.

Parameters:

action (torch.Tensor) – Tensor with agent moves.

Returns:

Reward. penalty(torch.Tensor): Penalty.

Return type:

reward(torch.Tensor)

set_env(env)[source]

Set environment.

Parameters:

env (AECEnv) – Environment.

Returns:

None.