Agent Rewards

The reward is 0 in all steps except the last. At the end of the episode, the reward is the negative of the sum of the distances of the routes traveled by all agents, minus the sum of the penalties for each service not performed. The penalty for a not performed service is 2 times the distance from the depot to that service.

Agent reward settings are defined in file env_agent_reward.py.

Dense Reward

At every step, the reward is the negative distance traveled plus the client prize collected by the agent.

class maenvs4vrp.environments.pcvrptw.env_agent_reward.DenseReward[source]

PCVRPTW dense reward class.

__init__()[source]

Constructor.

Parameters:

n/a.

Returns:

None.

get_reward(action)[source]

Get reward and penalty.

Parameters:

action (torch.Tensor) – Tensor with agent moves.

Returns:

Reward. penalty(torch.Tensor): Penalty.

Return type:

reward(torch.Tensor)

set_env(env)[source]

Set environment.

Parameters:

env (AECEnv) – Environment.

Returns:

None.

Sparse Reward

The reward is 0 in all steps except the last. At the end of the episode, the reward is the negative of the sum of the distances of the routes traveled plus the sum of all agents collected prizes.

class maenvs4vrp.environments.pcvrptw.env_agent_reward.SparseReward[source]

PCVRPTW sparse reward class.

__init__()[source]

Constructor.

Parameters:

n/a.

Returns:

None.

get_reward(action)[source]

Get reward and penalty.

Parameters:

action (torch.Tensor) – Tensor with agent moves.

Returns:

Reward. penalty(torch.Tensor): Penalty.

Return type:

reward(torch.Tensor)

set_env(env)[source]

Set environment.

Parameters:

env (Environment) – Environment.

Returns:

None.