Multi-Tasking Environments

Install

Uncomment the following cells:

[ ]:
#!git clone https://github.com/ricgama/maenvs4vrp.git
[ ]:
# When using Colab
#%cd maenvs4vrp
#%mv maenvs4vrp/ repo_temp/
#%mv repo_temp/ ..
#%cd ..
#%cp maenvs4vrp/setup.py repo_temp/
#%rm -r maenvs4vrp
#%mv repo_temp/ maenvs4vrp/
#%cd maenvs4vrp/
#!pip install .

Multi-tasking environments support simulations on multiple variants within the same environment structure, unlike all other environments where one can only simulate a single variant.

There is either the possibility of sampling random variants across batches, so that we get an instance with several VRP problems or picking an instance from the list of supported variants.

Supported variants are combinations of a set of attributes, which can be enabled or disabled.

At the moment, MAENVS4VRP offers 4 different multi-tasking environments. One base environment and three generalizations.

Environments supported are:

  • MTVRP: Base environment.

  • GMTVRP: MTVRP generalization with support to online scenarios.

  • MTDVRP: MTVRP generalization with multiple depots.

  • GMTDVRP: MTVRP generalization with support to online scenarios and multiple depots.

MTVRP base environment is adapted from RouteFinder environments.

Supported VRP Variants

Supported VRP variants are present on the following table. Generalizations use these base variants to introduce extra features.

Variants

Capacity

Open Routes

Backhaul

Mixed Problems

Duration Limits

Time Windows

CVRP

OVRP

VRPB

VRPL

VRPTW

OVRPTW

OVRPB

OVRPL

VRPBL

VRPBTW

VRPLTW

OVRPBL

OVRPBTW

OVRPLTW

VRPBLTW

OVRPBLTW

VRPMB

OVRPMB

VRPMBL

VRPMBTW

OVRPMBL

OVRPMBTW

VRPMBLTW

OVRPMBLTW

MTVRP

Let’s explore this base environment

[132]:
from maenvs4vrp.environments.mtvrp.env import Environment
from maenvs4vrp.environments.mtvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.mtvrp.observations import Observations
from maenvs4vrp.environments.mtvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.mtvrp.env_agent_reward import DenseReward
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[133]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()
[134]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

Sample Random Variants

By default, when variant_preset is not specified, env.reset() samples random variants across batches.

If use_combinations is True, attributes are randomly sampled. Otherwise, there’s only one attribute per batch.

[135]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True)

TensorDict env.td_state includes all of the problem’s parameters.

[136]:
env.td_state
[136]:
TensorDict(
    fields={
        agents: TensorDict(
            fields={
                active_agents_mask: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.bool, is_shared=False),
                capacity: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                cum_ttime: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_node: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.int64, is_shared=False),
                cur_step: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.int32, is_shared=False),
                cur_time: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_ttime: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                feasible_nodes: Tensor(shape=torch.Size([4, 2, 6]), device=cpu, dtype=torch.bool, is_shared=False),
                route_length: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                used_capacity_backhaul: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                used_capacity_linehaul: Tensor(shape=torch.Size([4, 2]), device=cpu, dtype=torch.float32, is_shared=False),
                visited_nodes: Tensor(shape=torch.Size([4, 2, 6]), device=cpu, dtype=torch.bool, is_shared=False)},
            batch_size=torch.Size([4]),
            device=cpu,
            is_shared=False),
        backhaul_class: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        backhaul_demands: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
        capacity: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        coords: Tensor(shape=torch.Size([4, 6, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        cur_agent: TensorDict(
            fields={
                action_mask: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.bool, is_shared=False),
                cum_ttime: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_node: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
                cur_route_length: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_step: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int32, is_shared=False),
                cur_time: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                cur_ttime: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                used_capacity_backhaul: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
                used_capacity_linehaul: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False)},
            batch_size=torch.Size([4]),
            device=cpu,
            is_shared=False),
        cur_agent_idx: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        cur_node_idx: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        depot_idx: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        depot_loc: Tensor(shape=torch.Size([4, 1, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        distance_limits: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        done: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False),
        end_time: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False),
        has_backhauls: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        has_distance_limits: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        has_open_routes: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        has_time_windows: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        is_depot: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.bool, is_shared=False),
        is_last_step: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.bool, is_shared=False),
        linehaul_demands: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
        max_tour_duration: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False),
        nodes: TensorDict(
            fields={
                active_nodes_mask: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.bool, is_shared=False),
                backhaul_demands: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
                distance2depot: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
                linehaul_demands: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
                time2depot: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False)},
            batch_size=torch.Size([4]),
            device=cpu,
            is_shared=False),
        num_agents: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.int64, is_shared=False),
        open_routes: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.bool, is_shared=False),
        original_capacity: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        service_time: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
        solution: TensorDict(
            fields={
            },
            batch_size=torch.Size([4]),
            device=cpu,
            is_shared=False),
        speed: Tensor(shape=torch.Size([4, 1]), device=cpu, dtype=torch.float32, is_shared=False),
        start_time: Tensor(shape=torch.Size([4]), device=cpu, dtype=torch.float32, is_shared=False),
        time_windows: Tensor(shape=torch.Size([4, 6, 2]), device=cpu, dtype=torch.float32, is_shared=False),
        tw_high: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False),
        tw_low: Tensor(shape=torch.Size([4, 6]), device=cpu, dtype=torch.float32, is_shared=False)},
    batch_size=torch.Size([4]),
    device=cpu,
    is_shared=False)

You can check if attributes are present in each batch.

Backhauls:

[137]:
env.td_state['has_backhauls']
[137]:
tensor([[ True],
        [ True],
        [False],
        [False]])
[138]:
env.td_state['linehaul_demands']
[138]:
tensor([[0., 4., 8., 0., 1., 1.],
        [0., 3., 1., 0., 0., 8.],
        [0., 4., 2., 4., 6., 9.],
        [0., 9., 8., 5., 5., 4.]])
[139]:
env.td_state['backhaul_demands']
[139]:
tensor([[0., 0., 0., 5., 0., 0.],
        [0., 0., 0., 3., 8., 0.],
        [0., 0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0., 0.]])

Distance Limits:

[140]:
env.td_state['has_distance_limits']
[140]:
tensor([[ True],
        [False],
        [False],
        [ True]])
[141]:
env.td_state['distance_limits']
[141]:
tensor([[2.2786],
        [   inf],
        [   inf],
        [2.4822]])

Open Routes:

[142]:
env.td_state['has_open_routes']
[142]:
tensor([[True],
        [True],
        [True],
        [True]])

Time Windows:

[143]:
env.td_state['has_time_windows']
[143]:
tensor([[ True],
        [False],
        [ True],
        [False]])
[144]:
env.td_state['time_windows']
[144]:
tensor([[[0.0000, 4.6000],
         [1.4379, 1.6248],
         [3.6609, 3.8444],
         [2.4875, 2.6786],
         [1.2890, 1.4818],
         [0.6775, 0.8762]],

        [[0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf]],

        [[0.0000, 4.6000],
         [2.5068, 2.6907],
         [1.9209, 2.1087],
         [2.4964, 2.6811],
         [3.6757, 3.8600],
         [2.4038, 2.6037]],

        [[0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf],
         [0.0000,    inf]]])

Sample Variant Preset from Presets List

Let’s consider variant_preset is VRPBL.

If use_combinations is True, then Backhaul and Distance Limits are not considered into the variant.

Otherwise, VRPBL will be represented.

[145]:
td = env.reset(batch_size=4, num_agents=4, num_nodes=6, use_combinations=False, variant_preset='vrpbl')
[146]:
env.td_state['has_backhauls']
[146]:
tensor([[True],
        [True],
        [True],
        [True]])
[147]:
env.td_state['has_distance_limits']
[147]:
tensor([[True],
        [True],
        [True],
        [True]])
[148]:
env.td_state['has_open_routes']
[148]:
tensor([[False],
        [False],
        [False],
        [False]])
[149]:
env.td_state['has_time_windows']
[149]:
tensor([[False],
        [False],
        [False],
        [False]])

Problem Simulation Cycle

Problem simulation cycle has two different parts: * sample_action(): An action is randomly sampled to the agent according to the action mask present in env.td_state. * step(): The environment’s parameters are updated according to its actions.

The simulation ends when all td['done'] keys become True.

[150]:
td['done']
[150]:
tensor([False, False, False, False])
[151]:
while not td["done"].all():
    td = env.sample_action(td)
    td = env.step(td)
    step = env.env_nsteps
    cur_agent_idx = td['cur_agent_idx']
    print(f'env step number: {step}, active agent name: {cur_agent_idx}')
env step number: 1, active agent name: tensor([[1],
        [0],
        [0],
        [0]])
env step number: 2, active agent name: tensor([[1],
        [0],
        [0],
        [0]])
env step number: 3, active agent name: tensor([[1],
        [0],
        [0],
        [1]])
env step number: 4, active agent name: tensor([[2],
        [1],
        [1],
        [1]])
env step number: 5, active agent name: tensor([[2],
        [2],
        [1],
        [1]])
env step number: 6, active agent name: tensor([[2],
        [2],
        [2],
        [2]])
env step number: 7, active agent name: tensor([[3],
        [3],
        [3],
        [3]])
env step number: 8, active agent name: tensor([[0],
        [3],
        [0],
        [0]])
env step number: 9, active agent name: tensor([[0],
        [0],
        [0],
        [0]])
[152]:
td['done']
[152]:
tensor([True, True, True, True])

GMTVRP

In the standard MTVRP setup, the vehicle’s linehaul and backhaul loads aren’t known until the episode ends, which isn’t practical for modeling real‑time operations. To address this, we created the GMTVRP environment, where each vehicle’s load is specified up front at the start of the episode. Let’s dive in:

[153]:
from maenvs4vrp.environments.gmtvrp.env import Environment
from maenvs4vrp.environments.gmtvrp.env_agent_selector import SmallestTimeAgentSelector
from maenvs4vrp.environments.gmtvrp.observations import Observations
from maenvs4vrp.environments.gmtvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.gmtvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Note: If we want to simulate an online scenario, we have to make sure to instantiate SmallestTimeAgentSelector as our Agent Selector class.

[154]:
gen = InstanceGenerator()
obs = Observations()
sel = SmallestTimeAgentSelector()
rew = DenseReward()
[155]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

Initial Load

As previously stated, on MTVRP simulations, agents’ load is implicit. In order to support online scenarios, initial load must be defined from the beggining of the route. It can be done in 2 ways: * Sampling a random initial load through env.sample_initial_load() method. * Defining a custom initial load on env.reset()

The method env.set_initial_load() must always be called, because it will set the initial load present in key td['initial_load'].

Sample Initial Load

It samples values between 0 and agents’ maximum capacity.

[156]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True)
[157]:
env.td_state['capacity']
[157]:
tensor([[30.],
        [30.],
        [30.],
        [30.]])
[158]:
td = env.sample_initial_load(td)
[159]:
td['initial_load']
[159]:
tensor([[21.6385, 21.7317],
        [21.6329,  7.6351],
        [18.1970, 26.6878],
        [ 9.6124,  4.6757]])
[160]:
td = env.set_initial_load(td)
[161]:
env.td_state['agents']['cur_linehaul_load']
[161]:
tensor([[21.6385, 21.7317],
        [21.6329,  7.6351],
        [18.1970, 26.6878],
        [ 9.6124,  4.6757]])

Set Custom Initial Load

Defined on env.reset() arguments.

[162]:
td = env.reset(batch_size=4, num_agents=2, num_nodes=6, use_combinations=True, initial_load=20)
[163]:
td['initial_load']
[163]:
tensor([[20., 20.],
        [20., 20.],
        [20., 20.],
        [20., 20.]])
[164]:
td = env.set_initial_load(td)
[165]:
env.td_state['agents']['cur_linehaul_load']
[165]:
tensor([[20., 20.],
        [20., 20.],
        [20., 20.],
        [20., 20.]])

MTDVRP

Let’s now explore multidepot environments:

[166]:
from maenvs4vrp.environments.mtdvrp.env import Environment
from maenvs4vrp.environments.mtdvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.mtdvrp.observations import Observations
from maenvs4vrp.environments.mtdvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.mtdvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[167]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()
[168]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)

Multiple Depots

In order to include multiple depots into the simulation, they must defined on env.reset(). The total number of agents will be the product of the numbers of depots and the defined number of agents.

Ex: If num_depots = 3 and num_agents  = 5, the total number of agents will be 15. That means each depot will have 5 agents associated.

Each agent will be assigned to its depot sequentially. So, first agent will be assigned to depot 0, segond agent assigned to depot 1, etc. Every agent must start and end the route at its own depot.

[169]:
td = env.reset(batch_size=3, num_agents=5, num_depots=3, num_nodes=24, use_combinations=True)
[170]:
env.td_state['depot_idx']
[170]:
tensor([[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]])
[171]:
env.td_state['agents']['depot_idx']
[171]:
tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

After a simulation is run, agents’ actions must be starting and ending at their depot.

[172]:
while not td["done"].all():
    td = env.sample_action(td)
    td = env.step(td)
[173]:
env.td_state['solution']['agents']
[173]:
tensor([[ 0,  1,  1,  1,  1,  1,  1,  1,  2,  2,  2,  2,  2,  2,  2,  3,  3,  4,
          4,  4,  4,  4,  4,  4,  5,  5,  6,  6,  7,  8,  9, 10, 11, 12, 13, 14],
        [ 0,  0,  0,  0,  1,  1,  1,  2,  2,  3,  3,  3,  4,  4,  4,  5,  5,  5,
          5,  6,  6,  6,  7,  7,  8,  8,  9,  9,  9, 10, 11, 11, 12, 12, 13, 14],
        [ 0,  0,  0,  1,  1,  1,  2,  2,  2,  2,  3,  3,  3,  4,  4,  4,  5,  5,
          6,  6,  7,  7,  7,  8,  8,  8,  9, 10, 10, 11, 11, 11, 12, 13, 14,  0]])
[174]:
env.td_state['solution']['actions']
[174]:
tensor([[ 0, 15, 16, 14, 13, 20,  9,  1, 21,  4, 22,  3, 18,  6,  2, 11,  0, 10,
         17,  5,  8,  7, 19,  1, 23,  2, 12,  0,  1,  2,  0,  1,  2,  0,  1,  2],
        [18,  4, 23,  0, 10, 20,  1, 11,  2,  8, 19,  0,  7,  5,  1,  6, 22, 16,
          2, 12, 15,  0, 21,  1, 13,  2,  3,  9,  0,  1, 14,  2, 17,  0,  1,  2],
        [16, 14,  0, 12,  8,  1,  9, 10, 22,  2,  7,  5,  0, 13,  4,  1, 18,  2,
         21,  0, 20, 17,  1,  3, 15,  2,  0, 23,  1,  6, 19,  2,  0,  1,  2,  0]])

GMTDVRP

Here, it’s possible to combine an online scenario and multiple depots.

[175]:
from maenvs4vrp.environments.gmtdvrp.env import Environment
from maenvs4vrp.environments.gmtdvrp.env_agent_selector import AgentSelector
from maenvs4vrp.environments.gmtdvrp.observations import Observations
from maenvs4vrp.environments.gmtdvrp.instances_generator import InstanceGenerator
from maenvs4vrp.environments.gmtdvrp.env_agent_reward import DenseReward
%load_ext autoreload
%autoreload 2
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
[176]:
gen = InstanceGenerator()
obs = Observations()
sel = AgentSelector()
rew = DenseReward()
[177]:
env = Environment(
    instance_generator_object=gen,
    obs_builder_object=obs,
    agent_selector_object=sel,
    reward_evaluator=rew
)
[178]:
td = env.reset(batch_size=3, num_agents=5, num_depots=3, num_nodes=24, use_combinations=True, initial_load=15)
[179]:
td = env.set_initial_load(td)
[180]:
env.td_state['agents']['cur_linehaul_load']
[180]:
tensor([[15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.],
        [15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.],
        [15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15., 15.,
         15.]])
[181]:
env.td_state['depot_idx']
[181]:
tensor([[0, 1, 2],
        [0, 1, 2],
        [0, 1, 2]])
[182]:
env.td_state['agents']['depot_idx']
[182]:
tensor([[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

Acknowledgements: