{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "13282c70",
   "metadata": {},
   "source": [
    "# Exploring MAEnvs4VRP library\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ccde3a6",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "150495c9-3c9c-4d4e-8ac2-508f8136769e",
   "metadata": {},
   "source": [
    "### Install"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67b9bf6a-b916-462f-9a2d-5a5f907b7a54",
   "metadata": {},
   "source": [
    "Uncomment the following cells:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62e47599-a094-4805-9f08-7d824da2f156",
   "metadata": {},
   "outputs": [],
   "source": [
    "# !git clone https://github.com/ricgama/maenvs4vrp_beta.git # When using Colab"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8efe0fdb-2534-4e1a-b1a8-a40a2d79451d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# When using Colab\n",
    "# %cd maenvs4vrp_beta/\n",
    "# ! pip install -e .\n",
    "#%cd maenvs4vrp/notebooks/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36902a54-ffaf-4c13-917d-1b76e5ae7f87",
   "metadata": {},
   "outputs": [],
   "source": [
    "# When using Binder\n",
    "#%cd ../../\n",
    "#! pip install -e . "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ee52736-3bdc-4b54-bfc1-70978993887e",
   "metadata": {},
   "source": [
    "This notebook is designed to introduce you to the **MAEnvs4VRP** library through an interactive, step‑by‑step format. You’ll work through a series of concise, hands‑on coding exercises that gradually build your familiarity with the environment’s core features. By the end, you’ll have a solid foundation for leveraging MAEnvs4VRP in your research or applications."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "65b92310-3db2-422b-8ac6-fa93cd98bba1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import matplotlib.pyplot as plt\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80c2d747-7252-4a4e-8c04-185d1fbab20a",
   "metadata": {},
   "source": [
    "## Basic API usage example:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a4634a70-f779-4417-8dba-8d14f407a507",
   "metadata": {},
   "source": [
    "We’ll begin by diving into the Team Orienteering Problem with Time Windows (TOPTW) environment. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2097a205-8aeb-42a1-9fb8-7ebb4e467b8d",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\rcunh\\anaconda3\\envs\\maenvs4vrp\\Lib\\site-packages\\tensordict\\_pytree.py:180: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
      "  register_pytree_node(\n",
      "C:\\Users\\rcunh\\anaconda3\\envs\\maenvs4vrp\\Lib\\site-packages\\tensordict\\_pytree.py:199: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.\n",
      "  register_pytree_node(\n"
     ]
    }
   ],
   "source": [
    "from maenvs4vrp.environments.toptw.env import Environment\n",
    "from maenvs4vrp.environments.toptw.env_agent_selector import AgentSelector\n",
    "from maenvs4vrp.environments.toptw.observations import Observations\n",
    "from maenvs4vrp.environments.toptw.instances_generator import InstanceGenerator\n",
    "from maenvs4vrp.environments.toptw.env_agent_reward import DenseReward"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1b07f250-0190-4b8f-95c0-39adecf6344f",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = InstanceGenerator(batch_size = 8)\n",
    "obs = Observations()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "\n",
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "181ea21c-fdca-4649-bdcd-6dc39347016f",
   "metadata": {},
   "source": [
    "A crucial attribute of the environment is `env.td_state`, which holds the entire state of the simulation and is populated the moment you initialize the environment with `reset()`. Let’s examine the contents of `env.td_state` both before and after calling `reset()`:  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "314da2e8-523c-4bbc-bb5e-d1f3d651ac4d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorDict(\n",
       "    fields={\n",
       "    },\n",
       "    batch_size=torch.Size([8]),\n",
       "    device=cpu,\n",
       "    is_shared=False)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7dc3385c-8062-4eec-b914-9cbfbaf97390",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f534b48-b1d9-4a5e-b4f0-b7b391af8d8d",
   "metadata": {},
   "source": [
    "After `reset()` the `env.td_state` changes to:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2341499d-cde9-4a47-af1d-de15e46ab3bf",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorDict(\n",
       "    fields={\n",
       "        agents: TensorDict(\n",
       "            fields={\n",
       "                active_agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                cum_profit: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                cur_node: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "                cur_step: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.int32, is_shared=False),\n",
       "                cur_time: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                feasible_nodes: Tensor(shape=torch.Size([8, 4, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                step_profit: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                visited_nodes: Tensor(shape=torch.Size([8, 4, 16]), device=cpu, dtype=torch.bool, is_shared=False)},\n",
       "            batch_size=torch.Size([8]),\n",
       "            device=cpu,\n",
       "            is_shared=False),\n",
       "        coords: Tensor(shape=torch.Size([8, 16, 2]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        cur_agent: TensorDict(\n",
       "            fields={\n",
       "                action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                cum_profit: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                cur_node: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "                cur_step: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int32, is_shared=False),\n",
       "                cur_time: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                step_profit: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "            batch_size=torch.Size([8]),\n",
       "            device=cpu,\n",
       "            is_shared=False),\n",
       "        cur_agent_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        cur_node_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        depot_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        depot_loc: Tensor(shape=torch.Size([8, 1, 2]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        done: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        end_time: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        is_depot: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        is_last_step: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        max_tour_duration: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        nodes: TensorDict(\n",
       "            fields={\n",
       "                active_nodes_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                cur_profits: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "            batch_size=torch.Size([8]),\n",
       "            device=cpu,\n",
       "            is_shared=False),\n",
       "        profits: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        service_time: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        solution: TensorDict(\n",
       "            fields={\n",
       "            },\n",
       "            batch_size=torch.Size([8]),\n",
       "            device=cpu,\n",
       "            is_shared=False),\n",
       "        start_time: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        time2depot: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        tw_high: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        tw_low: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "    batch_size=torch.Size([8]),\n",
       "    device=cpu,\n",
       "    is_shared=False)"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "535602b4",
   "metadata": {},
   "source": [
    "This top‑level `TensorDict` represents a batch of **8** parallel environments (`batch_size=[8]`). Inside it:\n",
    "\n",
    "* **`agents`** (TensorDict, `[8,4]` or `[8,4,16]`):\n",
    "\n",
    "  * Tracks **all 4 agents** per environment:\n",
    "\n",
    "    * `active_agents_mask` (`[8, 4]`): which agents are still active\n",
    "    * `cum_profit`, `step_profit` (`[8, 4]`): accumulated and per‑step rewards\n",
    "    * `cur_node`, `cur_step`, `cur_time` (`[8, 4]`): each agent’s current location index, step count, and timestamp\n",
    "    * `feasible_nodes`, `visited_nodes` (`[8, 4, 16]`): per‑agent masks over the 16 possible nodes\n",
    "\n",
    "* **`cur_agent`** (TensorDict, `[8,1]` or `[8,16]`):\n",
    "\n",
    "  * Details for the **currently acting** agent in each environment:\n",
    "\n",
    "    * `action_mask` (`[8,16]`): valid node choices\n",
    "    * `cum_profit`, `step_profit`, `cur_node`, `cur_step`, `cur_time` (`[8,1]`)\n",
    "\n",
    "* **Global scene tensors**:\n",
    "\n",
    "  * `coords` (`[8,16,2]`): x,y locations of all 16 nodes\n",
    "  * `profits`, `service_time`, `time2depot`, `tw_low`, `tw_high` (`[8,16]`): per‑node reward, service duration, travel back to depot, and time‑window bounds\n",
    "  * `depot_idx`, `depot_loc` (`[8,1]`, `[8,1,2]`): index and coordinates of each environment’s depot\n",
    "  * `nodes` (TensorDict `[8,16]`):\n",
    "\n",
    "    * `active_nodes_mask`: which nodes remain unvisited\n",
    "    * `cur_profits`: remaining reward at each node\n",
    "\n",
    "* **Control/status flags**:\n",
    "\n",
    "  * `done`, `is_last_step` (`[8]`): episode termination indicators\n",
    "  * `is_depot` (`[8,16]`): mask marking which node is the depot\n",
    "\n",
    "* **Timing & capacity**:\n",
    "\n",
    "  * `start_time`, `end_time`, `max_tour_duration` (`[8]`): global time settings\n",
    "  * `time2depot` (`[8,16]`): travel times back to depot\n",
    "\n",
    "* **`solution`** is an (empty) TensorDict reserved for building the final route outputs.\n",
    "\n",
    "Together, this structure holds **per-agent**, **per-node**, and **global** information required to step and evaluate each batch of VRP episodes.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc118da4-ea4d-4dd6-984d-425f4673bbd2",
   "metadata": {},
   "source": [
    "Also, as output we have the `TensorDict` `td` containing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "2a08a6f2-4182-4f53-a952-38832919f369",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorDict(\n",
       "    fields={\n",
       "        agent_step: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int32, is_shared=False),\n",
       "        cur_agent_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        cur_node_idx: Tensor(shape=torch.Size([8, 1]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        done: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        observations: TensorDict(\n",
       "            fields={\n",
       "                action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                agent_obs: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "                global_obs: Tensor(shape=torch.Size([8, 3]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                node_dynamic_obs: Tensor(shape=torch.Size([8, 16, 8]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                node_static_obs: Tensor(shape=torch.Size([8, 16, 7]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "                other_agents_obs: Tensor(shape=torch.Size([8, 4, 7]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "            batch_size=torch.Size([8]),\n",
       "            device=cpu,\n",
       "            is_shared=False),\n",
       "        penalty: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        reward: Tensor(shape=torch.Size([8]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "    batch_size=torch.Size([8]),\n",
       "    device=cpu,\n",
       "    is_shared=False)"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "td"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d95646da-3d51-44aa-9564-810ba42d3e12",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([False, False, False, False, False, False, False, False])"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "td[\"done\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b0f98a5",
   "metadata": {},
   "source": [
    "This TensorDict summarizes the current state of the environment and includes the observations of all active agents across a batch of 8 environments:\n",
    "\n",
    "* **Top‑level fields** (shape in brackets is per batch dimension):\n",
    "\n",
    "  * `agent_step` (\\[8, 1], int32): which step each environment’s current agent is on.\n",
    "  * `cur_agent_idx` (\\[8, 1], int64): index (0–3) of the agent whose turn it is.\n",
    "  * `cur_node_idx` (\\[8, 1], int64): the node index where that agent currently resides.\n",
    "  * `done` (\\[8], bool): whether each episode has terminated.\n",
    "  * `penalty` & `reward` (\\[8], float32): scalar penalty or reward resulting from the last action.\n",
    "\n",
    "* **`observations`** (a nested `TensorDict`, batch size \\[8]):\n",
    "\n",
    "  * `action_mask` (\\[8, 16], bool): which of the 16 possible next nodes are valid.\n",
    "  * `agent_obs` (\\[8, 4], float32): per‑agent features (e.g., remaining capacity, current time).\n",
    "  * `agents_mask` (\\[8, 4], bool): which of the 4 agents are still active.\n",
    "  * `global_obs` (\\[8, 3], float32): shared global features (e.g., elapsed time, total remaining profit).\n",
    "  * `node_static_obs` (\\[8, 16, 7], float32): per‑node invariant features (e.g., coordinates, service times).\n",
    "  * `node_dynamic_obs` (\\[8, 16, 8], float32): per‑node time‑varying features (e.g., remaining profit, open/closed status).\n",
    "  * `other_agents_obs` (\\[8, 4, 7], float32): features of the other agents (e.g., their locations and statuses).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dffbf400-b87f-4f31-9849-39aeb194b475",
   "metadata": {},
   "source": [
    "Now we can run an episode:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "05b71101-2a1e-4a15-b4db-b296cf5f9d53",
   "metadata": {},
   "outputs": [],
   "source": [
    "while not td[\"done\"].all():  \n",
    "    td = env.sample_action(td) # this is where we insert our policy\n",
    "    td = env.step(td)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "316e0e2e",
   "metadata": {},
   "source": [
    "at the end we have"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "27cd89de",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([True, True, True, True, True, True, True, True])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "td[\"done\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bea0ddb4",
   "metadata": {},
   "source": [
    "and"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8d91479b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorDict(\n",
       "    fields={\n",
       "        actions: Tensor(shape=torch.Size([8, 17]), device=cpu, dtype=torch.int64, is_shared=False),\n",
       "        agents: Tensor(shape=torch.Size([8, 17]), device=cpu, dtype=torch.int64, is_shared=False)},\n",
       "    batch_size=torch.Size([8]),\n",
       "    device=cpu,\n",
       "    is_shared=False)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state[\"solution\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "335ad7ec",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[12,  7, 11,  0,  3, 15,  8,  5,  0, 10,  2,  0, 14,  9,  6,  1,  0],\n",
       "        [ 2,  4,  6,  0, 15, 12,  0,  9,  3,  7,  0,  5,  0,  0,  0,  0,  0],\n",
       "        [ 5, 13,  0,  4, 10,  0,  8,  1,  0, 12,  6,  7, 15,  0,  0,  0,  0],\n",
       "        [12,  0, 10,  3, 11,  0,  7,  8, 13,  0,  0,  0,  0,  0,  0,  0,  0],\n",
       "        [ 1, 15,  8,  0, 10, 13,  4,  0,  6,  3,  0, 12,  0,  0,  0,  0,  0],\n",
       "        [ 0, 11,  0, 13,  3,  0, 15,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0],\n",
       "        [15,  3,  2, 14,  0,  9,  5,  1,  0,  4,  7,  0, 12, 11,  0,  0,  0],\n",
       "        [ 0,  5,  8,  3,  0,  6, 13,  0, 11, 10,  0,  0,  0,  0,  0,  0,  0]])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state[\"solution\"][\"actions\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "423532bc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],\n",
       "        [0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 0, 0, 0, 0],\n",
       "        [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 0, 0, 0],\n",
       "        [0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 0, 0, 0, 0, 0, 0],\n",
       "        [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 0, 0, 0, 0],\n",
       "        [0, 1, 1, 2, 2, 2, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0],\n",
       "        [0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 0, 0],\n",
       "        [0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 0, 0, 0, 0, 0, 0]])"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state[\"solution\"][\"agents\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f591dca0",
   "metadata": {},
   "source": [
    "Together, these two ´tensors´ give you the full episodic routes for every agent in each environment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "819dbf9e-fb04-4605-9e0c-59cbecd55d78",
   "metadata": {},
   "source": [
    "## Quick walkthrough"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bd5a4fc-5a6b-4f75-a857-ab8317a0c743",
   "metadata": {},
   "source": [
    "Let's now go through the library's building blocks, exploring their functionalities."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f017aa12-fb5b-4680-923c-45838d544b17",
   "metadata": {},
   "source": [
    "### Instance generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "9e5210e1-4755-4971-843e-9b0d97bc103e",
   "metadata": {},
   "outputs": [],
   "source": [
    "instance = gen.sample_instance(num_agents=2, num_nodes=10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "4bef94b4-52b6-40f8-8fb6-c33862b07c99",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['name', 'num_nodes', 'num_agents', 'data'])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance.keys()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1aaf359-79e9-4d25-a883-a0fe817f9482",
   "metadata": {},
   "source": [
    "It's possible to load a set of pre-generaded instances, to be used as validation/test sets. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "a518e23d-5b06-4bcd-b893-e082a6aef634",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_0',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_1',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_10',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_11',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_12',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_13',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_14',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_15',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_16',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_17',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_18',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_19',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_2',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_20',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_21',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_22',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_23',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_24',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_25',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_26',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_27',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_28',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_29',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_3',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_30',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_31',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_32',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_33',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_34',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_35',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_36',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_37',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_38',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_39',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_4',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_40',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_41',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_42',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_43',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_44',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_45',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_46',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_47',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_48',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_49',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_5',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_50',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_51',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_52',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_53',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_54',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_55',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_56',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_57',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_58',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_59',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_6',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_60',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_61',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_62',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_63',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_7',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_8',\n",
       " 'toptw/data/generated\\\\servs_100_agents_5\\\\validation/generated_val_servs_100_agents_5_9']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gen.get_list_of_benchmark_instances()['servs_100_agents_5']['validation']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "b3a22c7d-839e-4c82-8c7c-4f3106fff4cb",
   "metadata": {},
   "outputs": [],
   "source": [
    "set_of_instances = set(gen.get_list_of_benchmark_instances()['servs_100_agents_5']['validation'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "e1df4c26-e772-4050-b72a-8df2e4d78fab",
   "metadata": {},
   "outputs": [],
   "source": [
    "generator = InstanceGenerator(instance_type='validation', set_of_instances=set_of_instances)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "def679a1-7685-4dbc-a2b4-44e8ac11de20",
   "metadata": {},
   "outputs": [],
   "source": [
    "instance = generator.sample_instance()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca3e3db3-241d-4cd9-b848-f23eda11bff0",
   "metadata": {},
   "source": [
    "Let's check instance dict keys:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "cb38eda7-b069-4a13-8aa4-d7521ede3793",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['name', 'num_nodes', 'num_agents', 'data'])"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "7c734efa-22ed-4cd3-883d-1c5306f97886",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'random_instance'"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance['name']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f1012528-daf7-4c33-bd48-eb4788038472",
   "metadata": {},
   "source": [
    "#### Benchmark instances"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "5341525b-018b-4cd9-9b7d-cc3c8c6ea23e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.toptw.benchmark_instances_generator import BenchmarkInstanceGenerator"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba305f9c-f5be-4a34-83f0-a66a44a5db49",
   "metadata": {},
   "source": [
    "In order to narrow the current gap between the test beds for algorithm benchmarking used in RL\n",
    "and OR communities, the library allows a straightforward integration of classical OR benchmark\n",
    "instances. For example, we can load a set of classical benchmark instances. Let's see what benchmark instances we have for the TOPTW:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "b54cda48-d448-48a4-9011-a064ab667ff1",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Solomon': ['c101',\n",
       "  'c102',\n",
       "  'c103',\n",
       "  'c104',\n",
       "  'c105',\n",
       "  'c106',\n",
       "  'c107',\n",
       "  'c108',\n",
       "  'c109',\n",
       "  'c201',\n",
       "  'c202',\n",
       "  'c203',\n",
       "  'c204',\n",
       "  'c205',\n",
       "  'c206',\n",
       "  'c207',\n",
       "  'c208',\n",
       "  'r101',\n",
       "  'r102',\n",
       "  'r103',\n",
       "  'r104',\n",
       "  'r105',\n",
       "  'r106',\n",
       "  'r107',\n",
       "  'r108',\n",
       "  'r109',\n",
       "  'r110',\n",
       "  'r111',\n",
       "  'r112',\n",
       "  'r201',\n",
       "  'r202',\n",
       "  'r203',\n",
       "  'r204',\n",
       "  'r205',\n",
       "  'r206',\n",
       "  'r207',\n",
       "  'r208',\n",
       "  'r209',\n",
       "  'r210',\n",
       "  'r211',\n",
       "  'rc101',\n",
       "  'rc102',\n",
       "  'rc103',\n",
       "  'rc104',\n",
       "  'rc105',\n",
       "  'rc106',\n",
       "  'rc107',\n",
       "  'rc108',\n",
       "  'rc201',\n",
       "  'rc202',\n",
       "  'rc203',\n",
       "  'rc204',\n",
       "  'rc205',\n",
       "  'rc206',\n",
       "  'rc207',\n",
       "  'rc208'],\n",
       " 'Cordeau': ['pr01',\n",
       "  'pr02',\n",
       "  'pr03',\n",
       "  'pr04',\n",
       "  'pr05',\n",
       "  'pr06',\n",
       "  'pr07',\n",
       "  'pr08',\n",
       "  'pr09',\n",
       "  'pr10',\n",
       "  'pr11',\n",
       "  'pr12',\n",
       "  'pr13',\n",
       "  'pr14',\n",
       "  'pr15',\n",
       "  'pr16',\n",
       "  'pr17',\n",
       "  'pr18',\n",
       "  'pr19',\n",
       "  'pr20']}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "BenchmarkInstanceGenerator.get_list_of_benchmark_instances()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e3b7e111-ff49-4ec2-9e01-c5de6cc4e2cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "generator = BenchmarkInstanceGenerator(instance_type='Solomon', set_of_instances={'c101', 'c102'})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "4a230f73-3db1-48a7-b707-8519533307e2",
   "metadata": {},
   "outputs": [],
   "source": [
    "instance_c101 = generator.get_instance('c101')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "1353dc46-c358-4768-86bd-4312d5e9627b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dict_keys(['name', 'num_agents', 'num_nodes', 'data', 'n_digits'])"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance_c101.keys()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "2a36902a-0f30-41b3-b511-db4217357309",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'c101'"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance_c101['name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "e9050e06-68f2-49be-a68d-76f7605abfd9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "101"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance_c101['num_agents']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "d67ece72-9b07-4f35-86b7-fabaa3d1f431",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "101"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "instance_c101['num_nodes']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3eb87be2-614e-4215-a6de-cbb6dc00dae1",
   "metadata": {},
   "source": [
    "###  Observations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f47f0c1-82d7-4097-9d79-69a1c74bab97",
   "metadata": {},
   "source": [
    "Observation features, that will be available to the active agent while interacting with the environment, are handle by `Observations` class. \n",
    "The class has a `default_feature_list` attribute where the default configuration dictionary is defined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "d5e3985a-2c94-4e51-8e1e-8b2c500c579f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'nodes_static': {'x_coordinate': {'feat': 'x_coordinate', 'norm': None},\n",
       "  'y_coordinate': {'feat': 'y_coordinate', 'norm': None},\n",
       "  'tw_low': {'feat': 'tw_low', 'norm': None},\n",
       "  'tw_high': {'feat': 'tw_high', 'norm': None},\n",
       "  'profits': {'feat': 'profits', 'norm': 'min_max'},\n",
       "  'service_time': {'feat': 'service_time', 'norm': 'min_max'},\n",
       "  'is_depot': {'feat': 'is_depot', 'norm': None}},\n",
       " 'nodes_dynamic': ['time2open_div_end_time',\n",
       "  'time2close_div_end_time',\n",
       "  'arrive2node_div_end_time',\n",
       "  'time2open_after_step_div_end_time',\n",
       "  'time2close_after_step_div_end_time',\n",
       "  'time2end_after_step_div_end_time',\n",
       "  'fract_time_after_step_div_end_time',\n",
       "  'reachable_frac_agents'],\n",
       " 'agent': ['x_coordinate',\n",
       "  'y_coordinate',\n",
       "  'frac_current_time',\n",
       "  'arrivedepot_div_end_time'],\n",
       " 'other_agents': ['x_coordinate',\n",
       "  'y_coordinate',\n",
       "  'frac_current_time',\n",
       "  'frac_feasible_nodes',\n",
       "  'dist2agent_div_end_time',\n",
       "  'time_delta2agent_div_max_dur',\n",
       "  'was_last'],\n",
       " 'global': ['frac_profits', 'frac_done_agents', 'frac_colect_profits']}"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obs.default_feature_list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "12c09702-e628-4098-abeb-6e886ea1e7ec",
   "metadata": {},
   "source": [
    "Also, five possible features lists exist, detailing the available features in the class: `POSSIBLE_NODES_STATIC_FEATURES`, `POSSIBLE_NODES_DYNAMIC_FEATURES`, `POSSIBLE_SELF_FEATURES`, `POSSIBLE_AGENTS_FEATURES`, `POSSIBLE_GLOBAL_FEATURES`. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "fe9af964-2206-4529-baec-15fdc0737886",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['x_coordinate',\n",
       " 'y_coordinate',\n",
       " 'tw_low',\n",
       " 'tw_high',\n",
       " 'profits',\n",
       " 'service_time',\n",
       " 'tw_high_minus_tw_low_div_max_dur',\n",
       " 'x_coordinate_min_max',\n",
       " 'y_coordinate_min_max',\n",
       " 'is_depot']"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obs.POSSIBLE_NODES_STATIC_FEATURES"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "bbf96eec-1cb7-4a4f-942e-745f146c2802",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['frac_profits', 'frac_colect_profits', 'frac_done_agents']"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obs.POSSIBLE_GLOBAL_FEATURES"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d331636e-3d17-4f02-93d8-c740fd2386c3",
   "metadata": {},
   "source": [
    "While instantiating the `Observations` class, we can pass through a feature list dictionary specifying which features will be available for the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "0940808f-647a-4d8b-b8cc-5e7f4352c5af",
   "metadata": {},
   "outputs": [],
   "source": [
    "import yaml"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "dd978b8b-7374-4a61-a60a-7ebd3c5d1225",
   "metadata": {},
   "outputs": [],
   "source": [
    "feature_list = yaml.safe_load(\"\"\"\n",
    "    nodes_static:\n",
    "        x_coordinate_min_max:\n",
    "            feat: x_coordinate_min_max\n",
    "            norm: min_max\n",
    "        x_coordinate_min_max: \n",
    "            feat: x_coordinate_min_max\n",
    "            norm: min_max\n",
    "        tw_low_mm:\n",
    "            feat: tw_low\n",
    "            norm: min_max\n",
    "        tw_high:\n",
    "            feat: tw_high\n",
    "            norm: min_max\n",
    "\n",
    "    nodes_dynamic:\n",
    "        - time2open_div_end_time\n",
    "        - time2close_div_end_time\n",
    "        - time2open_after_step_div_end_time\n",
    "        - time2close_after_step_div_end_time\n",
    "        - fract_time_after_step_div_end_time\n",
    "\n",
    "    agent:\n",
    "        - x_coordinate_min_max\n",
    "        - y_coordinate_min_max\n",
    "        - frac_current_time\n",
    "\n",
    "    other_agents:\n",
    "        - x_coordinate_min_max\n",
    "        - y_coordinate_min_max\n",
    "        - frac_current_time\n",
    "        - dist2agent_div_end_time\n",
    "    \n",
    "    global:\n",
    "        - frac_done_agents\n",
    "        - frac_colect_profits\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "dc14b2a3-8d13-496c-98ad-f4c66a8f2897",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = Observations(feature_list)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43f756d1-f782-4d69-a044-df1035eeeb60",
   "metadata": {},
   "source": [
    "We can test these observations on the environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "id": "03f1e7b8-c540-4382-b20d-febe94708d95",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = InstanceGenerator(batch_size=8)\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "\n",
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "7edafb8a-c042-4079-a218-31712a8ce5a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset(batch_size = 8, num_agents=4, num_nodes=16)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "0ac4eaee-693a-4199-a43b-416b84a64123",
   "metadata": {},
   "outputs": [],
   "source": [
    "td_observation = env.observe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "3615cb37-bbb8-4ec4-bf22-7f30fdbb9941",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "TensorDict(\n",
       "    fields={\n",
       "        action_mask: Tensor(shape=torch.Size([8, 16]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        agent_obs: Tensor(shape=torch.Size([8, 3]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        agents_mask: Tensor(shape=torch.Size([8, 4]), device=cpu, dtype=torch.bool, is_shared=False),\n",
       "        global_obs: Tensor(shape=torch.Size([8, 2]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        node_dynamic_obs: Tensor(shape=torch.Size([8, 16, 5]), device=cpu, dtype=torch.float32, is_shared=False),\n",
       "        other_agents_obs: Tensor(shape=torch.Size([8, 4, 4]), device=cpu, dtype=torch.float32, is_shared=False)},\n",
       "    batch_size=torch.Size([8]),\n",
       "    device=cpu,\n",
       "    is_shared=False)"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "td_observation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "daa63a11-eceb-4381-abcd-4d4b33fa352b",
   "metadata": {},
   "source": [
    "Let's run an episode:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "3097ca76-8fd4-4118-8cd7-2ce55760cb2f",
   "metadata": {},
   "outputs": [],
   "source": [
    "while not td[\"done\"].all():  \n",
    "    td = env.sample_action(td) # this is where we insert our policy\n",
    "    td = env.step(td)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae613724-d8fb-45d8-bfc5-26865be946d8",
   "metadata": {},
   "source": [
    "and check the collected profits:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "544fb208-3f19-4c82-a0ef-b171ce9f71ba",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([13.,  9., 10.,  7.,  9.,  5., 11.,  7.])"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "env.td_state['agents']['cum_profit'].sum(-1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62cd583e-8505-4594-8456-e15569da2196",
   "metadata": {},
   "source": [
    "An environment with agents performing random actions is not very impressive. Let's train a policy with [PPO algorithm](https://spinningup.openai.com/en/latest/algorithms/ppo.html) to get smarter agents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "bdf8fb17-a352-468d-b10e-6b46cb34f7b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# When using Binder\n",
    "#%cd maenvs4vrp/notebooks/\n",
    "# When using Colab\n",
    "#%cd maenvs4vrp_beta/maenvs4vrp/learning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b4cc8e5-f462-4b94-979c-251bab06c97d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training with args Namespace(vrp_env='toptw', num_agents=4, num_nodes=21, device=device(type='cpu'), ent_coef=0.01, vf_coef=0.5, clip_coef=0.05, gae=True, gamma=0.99, gae_lambda=0.95, batch_size=512, eval_batch_size=512, hidden_dim=128, n_envs=128, num_steps=26, total_episodes=5001, learning_rate=0.0001, update_epochs=2, anneal_lr=False, max_grad_norm=10, norm_adv=False, torch_deterministic=True, seed=2297, exp_name='test', eval_num_episodes=1, eval_num_print=2500, eval_seed=9875)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Episodic Return:  9.76, Not visited nodes: 10.236328125, Used agents: 4.0, Policy Loss: -7.665, Value Loss: 34.002, los"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "-------------------------------------------\n",
      "\n",
      "Running eval on validation set\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Episodic Return:  9.76, Not visited nodes: 10.236328125, Used agents: 4.0, Policy Loss: -7.665, Value Loss: 34.002, los"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number not visited nodes: 9.37109375\n",
      "number of used agents: 4.0\n",
      "Old best model: -10000000.00\n",
      "New best model:  10.63\n",
      "done\n",
      "\n",
      "-------------------------------------------\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Episodic Return:  17.07, Not visited nodes: 2.931640625, Used agents: 4.0, Policy Loss: -0.035, Value Loss: 0.096, loss"
     ]
    }
   ],
   "source": [
    "%run ../learning/ppo/train_ma_ppo.py --vrp_env toptw --num_agents 4 --num_nodes 21"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a63bd7d4-2e75-4f5a-bf80-827821620dc2",
   "metadata": {},
   "source": [
    "## Challenges"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60ec3de7-3b80-409c-8546-b090ba4b8e24",
   "metadata": {},
   "outputs": [],
   "source": [
    "# when using Colab\n",
    "#%cd ../notebooks/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a32ad02b-9b70-4ccb-a154-29df6b22c0d0",
   "metadata": {},
   "source": [
    "### Ex0. Warm-up"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "070d71d4-e30e-4ad7-a057-70d44900ef5c",
   "metadata": {},
   "source": [
    "Ok! Let's now try some small hands-on coding challenges. To simplify solution verification, allowing a pen and paper check, let's use some small toy instances."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e350121-6c45-43aa-89a3-c5a61bfd59da",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.toptw.toy_instance_generator import ToyInstanceGenerator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40ce2c38-9da6-4ba7-a2c4-d31210ff3438",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "obs = Observations()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "\n",
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ab154f7a-30b9-48e1-91a1-5bc0921f1622",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad8c0dbb-97ea-4baa-8eda-d5188f52d370",
   "metadata": {},
   "source": [
    "The set of service nodes and the depot’s coordinates are as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee1cdade-b583-4a01-9d33-25b7a9fa8614",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(3,3))\n",
    "plt.plot(env.td_state['coords'][0][:,0].numpy(), env.td_state['coords'][0][:,1].numpy(), 'o')\n",
    "plt.plot(env.td_state['coords'][0][0,0].numpy(), env.td_state['coords'][0][0,1].numpy(), 'o', color='red' )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2918d81-daa5-4ff4-b321-c113bd3b4d17",
   "metadata": {},
   "source": [
    "And the corresponding time‑window constraints are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a096a09c-786c-47b4-b2ed-b467a0855c0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "for k, data in enumerate(zip(env.td_state['tw_low'][0].tolist(), env.td_state['tw_high'][0].tolist())):\n",
    "    print(f'node {k} time window is: [{data[0]}; {data[1]}]')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c7a8f72-900a-45c7-888b-6808c23af51b",
   "metadata": {},
   "source": [
    "All agents begin at the depot (node 0, shown as the red dot). The travel times from the depot to each service node are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e83e7bbf-5247-473c-9558-b87d25e1f46f",
   "metadata": {},
   "outputs": [],
   "source": [
    "loc = env.td_state['coords'].gather(1, env.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "time2j = torch.pairwise_distance(loc, env.td_state[\"coords\"], eps=0, keepdim = False)\n",
    "time2j[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19cbb727-2e4d-4f00-bd1d-b16eb1808b76",
   "metadata": {},
   "source": [
    "I) If the agent selects to visit node 1, what will be the collected profit? \n",
    "\n",
    "II) Checking the previous distance values, time windows and the new distances, what will be the mask of the admissible nodes after this step?\n",
    "\n",
    "(hint: check the `env.td_state` attribute.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d664e25-45e6-4226-a7f9-818644cf0aff",
   "metadata": {},
   "outputs": [],
   "source": [
    "td['action'] = torch.tensor([[1]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c17b4bec-535d-4df1-853d-6760094a2f77",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.step(td)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "07e800af-b5c2-49ac-9795-6b7be674be0d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex0.py\n",
    "# your code here!!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5589b6b-b8d7-47d5-be70-4d15c509b469",
   "metadata": {},
   "source": [
    "Next, we’ll dive into the **Observations** module:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c70eb8cd",
   "metadata": {},
   "source": [
    "### Ex1. Team Orienteering Problem with Time Windows - Observations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8747c478-1481-4759-9b93-c1eafba220d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.toptw.env import Environment\n",
    "from maenvs4vrp.environments.toptw.env_agent_selector import AgentSelector\n",
    "from maenvs4vrp.environments.toptw.env_agent_reward import DenseReward"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96d7f29b-bcc5-44ae-91ec-bdfbdb8d529b",
   "metadata": {},
   "source": [
    "A critical part of training agents is their ability to retrieve useful information from the environment in order to act on it. In **MAEnvs4VRP**, you can tailor exactly what data your agents observe by implementing custom methods in the `Observations` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "215d4b68",
   "metadata": {
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.toptw.observations import Observations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b9b82c1",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "43ab28dc-817b-4db2-bab3-c28c702f3cb9",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = Observations()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72d4a20e-63e3-404f-8a5c-a3524251eff5",
   "metadata": {},
   "source": [
    "The class has a `default_feature_list` attribute where the default configuration dictionary is defined."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc058da0-de83-44c4-87c4-a4f6bd425744",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.default_feature_list"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "375f1304-60ac-4b50-8daa-6ca00c2582df",
   "metadata": {},
   "source": [
    "Also, five possible features lists exist, detailing the available features in the class: `possible_nodes_static_features`, `possible_nodes_dynamic_features`, `possible_agent_features`, `possible_agents_features`, `possible_global_features`. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b73f242-c589-4295-b17f-82c5913a7628",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.possible_nodes_dynamic_features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a2fb44e-58c1-4eb5-8cf6-371f6e4de3a6",
   "metadata": {},
   "source": [
    "Lets see how to add another nodes dynamic observation."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f85ae7a1-820c-4e56-bda3-b4d84f756faf",
   "metadata": {},
   "source": [
    "I) Change the code below in order to implement the nodes dynamic feature `wait_time_div_end_time`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b31c3cac-4067-40f0-904f-adebed14d18a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex1.py\n",
    "class Observations(Observations):\n",
    "    \n",
    "    def __init__(self, feature_list:dict = None):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.default_feature_list['nodes_dynamic'].append('wait_time_div_end_time')\n",
    "        self.possible_nodes_dynamic_features.append('wait_time_div_end_time')\n",
    "    \n",
    "    def get_feat_wait_time_div_end_time(self):\n",
    "        \"\"\" dynamic feature\n",
    "        Args:\n",
    "\n",
    "        Returns: \n",
    "            Tensor: waiting time at nodes divided by end time.\n",
    "        \"\"\"\n",
    "        loc = self.env.td_state['coords'].gather(1, self.env.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "        ptime = self.env.td_state['cur_agent']['cur_time'].clone()\n",
    "        time2j = torch.pairwise_distance(loc, self.env.td_state[\"coords\"], eps=0, keepdim = False)\n",
    "        #arrivej = !! your code here !!\n",
    "        #wait = !! your code here !!\n",
    "        return wait / self.env.td_state['end_time'].unsqueeze(dim=-1)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d65f24c5-4752-4036-81e4-3c0452e5d036",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs = Observations(obs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e9a9af8-9978-4111-9819-b47de238ad87",
   "metadata": {},
   "source": [
    "We can re-check the possible nodes dynamic features available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "210a0acc-710c-4fa6-b04d-7329d42d036c",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.possible_nodes_dynamic_features"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2aa5cbe4-4fc7-4f03-96db-afbbc69c2e0f",
   "metadata": {},
   "source": [
    "and the ones the that are going to used by the agent:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a72236a-d632-4503-acc5-f49666aa0e37",
   "metadata": {},
   "outputs": [],
   "source": [
    "obs.default_feature_list['nodes_dynamic']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7bc241c-8c7b-47b6-b34c-353f5d24dd2c",
   "metadata": {},
   "source": [
    "Ok! Now, let's creat the `TOPTW` environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0e0c5b2e",
   "metadata": {},
   "outputs": [],
   "source": [
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62e83d14-d139-4978-9293-d82b74b94492",
   "metadata": {},
   "source": [
    "II) Check if your answer is correct, by running a couple of environment steps."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5318961f-d90f-49c4-9056-34ce73b43c83",
   "metadata": {},
   "source": [
    "Note: the observation feature will be on the `obs.default_feature_list['nodes_dynamic'].index('wait_time_div_end_time')` position of the `node_dynamic_obs` tensor."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7e5ddb5-fdcd-4197-b190-9ea5c456558b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex2.py\n",
    "# check the new feature position here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cd851a2-163a-4e49-b810-24fe3e2077f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b227ebdb-f5b3-412b-b5b3-4b7017eabf4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "#%load snippets/ex3.py\n",
    "#check the nodes dynamic observations on the td "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f3ec8fc5-ac01-4226-9ff3-f957ccc4cb77",
   "metadata": {},
   "source": [
    "Now select a node to visit and execute one environment step (you can replace the value `3` with any other valid node index):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e19cd970-d46c-4523-8d12-4f171e154988",
   "metadata": {},
   "outputs": [],
   "source": [
    "td['action'] = torch.tensor([[3]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "972979dc-972d-447f-9c10-8581a8e9391e",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.step(td)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b63cb372-1dbc-4fdd-8008-517177d8d0f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "00c5ffe5-e173-4458-8d67-341b4801fa9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "411bf40d-c791-45fc-805e-0e117ede9ff2",
   "metadata": {},
   "source": [
    "III) Think of another potentially useful observation feature for this environment. Implement and test it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6ecddb46-752d-4893-9bfe-fdb1aede1e39",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7501d2d9-3ecc-4fc4-bbfa-40416053ad43",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c2cf66c-394f-46a9-a687-c83ef1a0cdef",
   "metadata": {},
   "source": [
    "### Ex2. Split Delivery Vehicle Routing Problem with Time Windows (SDVRPTW)\n",
    "\n",
    "The **Split Delivery Vehicle Routing Problem with Time Windows (SDVRPTW)** extends the classic CVRPTW by allowing each customer’s demand to be split across multiple visits from different vehicles. In other words, a single customer can receive partial deliveries from more than one truck, as long as all time‐window constraints are respected.\n",
    "\n",
    "Below, we’ll outline the modifications needed to turn our existing CVRPTW environment into an SDVRPTW setup. But first, let’s take a quick look at the CVRPTW environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32de209b-8110-4e04-9242-e913fdac4233",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.cvrptw.toy_instance_generator import ToyInstanceGenerator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6b523d4-f60d-4148-a5df-32d17c000b4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "inst = gen.sample_instance()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "739a9227-6efa-462b-8783-4ca44402a901",
   "metadata": {},
   "source": [
    "The services and depot location is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3525aef6-11a8-48e1-bc6a-a2a1cbaa0798",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(3,3))\n",
    "plt.plot(inst['data']['coords'][0][:,0].numpy(), inst['data']['coords'][0][:,1].numpy(), 'o')\n",
    "plt.plot(inst['data']['coords'][0][0,0].numpy(), inst['data']['coords'][0][0,1].numpy(), 'o', color='red' )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27291cb3-c4d6-44d0-bad8-49dc45f1da8d",
   "metadata": {},
   "source": [
    "With time windows and demands:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01327fdd-cd4c-4949-b14b-5eef4f0a2d1c",
   "metadata": {},
   "outputs": [],
   "source": [
    "for k, data in enumerate(zip(inst['data']['tw_low'][0].tolist(), inst['data']['tw_high'][0].tolist(), inst['data']['demands'][0].tolist())):\n",
    "    print(f'node {k} time window is: [{data[0]}; {data[1]}], with demand {data[2]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65478de3-3781-423c-98ee-d9b1c48f4c64",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.cvrptw.env import Environment\n",
    "from maenvs4vrp.environments.cvrptw.env_agent_selector import AgentSelector\n",
    "from maenvs4vrp.environments.cvrptw.env_agent_reward import DenseReward\n",
    "from maenvs4vrp.environments.cvrptw.observations import Observations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4880bc9e-8c94-48d8-8e8a-cd9f59dcee4d",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "obs = Observations()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6c30baf2-c9ed-45ca-beaa-aee45b8b4f4a",
   "metadata": {},
   "outputs": [],
   "source": [
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "100f53d7-7266-4fc6-821e-61df6c79630a",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c2b90d3c-0352-4168-a2de-2c3c5930f335",
   "metadata": {},
   "source": [
    "I) check what agent is active and what actions are admissible for him.\n",
    "\n",
    "Note: on the `td` acess `cur_agent_idx` and `observations`/`action_mask` keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20b5079a-6820-4579-a4c6-f08132861274",
   "metadata": {},
   "outputs": [],
   "source": [
    "#%load snippets/ex4.py\n",
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00e2f28d-8246-4657-97a0-f50995487dfe",
   "metadata": {},
   "source": [
    "This information is also available by accessing the environment `td_state` attribute on the `cur_agent` key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e1ff2b2-cdb5-44c2-b6ac-aa8f67d24968",
   "metadata": {},
   "outputs": [],
   "source": [
    "env.td_state['cur_agent']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20c1db3c-aa81-4b1e-a2ba-a193a89f84f0",
   "metadata": {},
   "outputs": [],
   "source": [
    "env.td_state['cur_agent']['cur_load']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "118b416f-27d2-43b8-89a1-53c7e9595743",
   "metadata": {},
   "source": [
    "Let's choose to serve node `2`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "467e3f56-e90b-4182-a7db-7dd9d4b65af4",
   "metadata": {},
   "outputs": [],
   "source": [
    "td['action'] = torch.tensor([[2]])\n",
    "td = env.step(td)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad86fb5a-e0fd-4732-a2ab-a745378dea6b",
   "metadata": {},
   "outputs": [],
   "source": [
    "action = torch.tensor([[2]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "408a1877-8611-4591-a818-f14f965b62de",
   "metadata": {},
   "source": [
    "II) What should the new `cur_load` and `action_mask` be? Check your answer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b11d1cc1-acd3-4d83-b26e-6f8c50cf2b8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex5.py\n",
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec875274-7e67-4142-8b5b-34a7d2ccc7ea",
   "metadata": {},
   "source": [
    "III) What happens if the agents try to serve the node `1`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87c38e69-feb6-4bc6-87f1-cc6d3a2ed694",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex6.py\n",
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e11f16b-f5e0-4465-aab2-f7055ab8f8cb",
   "metadata": {},
   "source": [
    "OK. Now, lets see what changes are needed to the CVRTPW environment to obtain SDVRPT. we only need to tweak  `_update_feasibility` and `_update_state` methods on the Environment class. Everything else will remain unchanged.\n",
    "\n",
    "\n",
    "1. **`_update_feasibility`**  \n",
    "   - **CVRPTW behavior**: Once a customer’s is visited, we mark that node as infeasible for all future visits.  \n",
    "   - **SDVRPTW change**: Allow a node to remain feasible until its **remaining demand** reaches zero. That means:  \n",
    "     - Track **remaining demand** at each node (initial demand minus delivered quantity).  \n",
    "     - In `_update_feasibility`, a node is only masked out when its remaining demand == 0.  \n",
    "\n",
    "2. **`_update_state`**  \n",
    "   - **CVRPTW behavior**: When an agent visits node *i*, it:  \n",
    "     1. Deducts the node’s full demand from the vehicle load.  \n",
    "     2. Sets that node’s demand to zero.  \n",
    "   - **SDVRPTW change**: On visiting node *i*, an agent can only deliver up to its remaining vehicle capacity (and no more than the node’s remaining demand). So you must:  \n",
    "     - Compute `delivered = min(vehicle_capacity, node_remaining_demand)`  \n",
    "     - Subtract `delivered` from both the vehicle’s load **and** the node’s remaining demand.  \n",
    "     - If a node’s remaining demand drops to zero, then subsequently `_update_feasibility` will mask it out.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f17e1745-9860-4055-ab98-3578c3027468",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.sdvrptw.toy_instance_generator import ToyInstanceGenerator\n",
    "from maenvs4vrp.environments.sdvrptw.env import Environment\n",
    "from maenvs4vrp.environments.sdvrptw.env_agent_selector import AgentSelector\n",
    "from maenvs4vrp.environments.sdvrptw.env_agent_reward import DenseReward\n",
    "from maenvs4vrp.environments.sdvrptw.observations import Observations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5083c61b-0709-4f49-b5e2-396edfad4f59",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "obs = Observations()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f138066c-13e5-4d07-aa4d-eee7a900f77e",
   "metadata": {},
   "source": [
    "Let's start with the `_update_feasibility` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ade1a8f5-9004-42c9-9742-a60a89bd4bb0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex7.py\n",
    "# your code here\n",
    "class Environment(Environment):\n",
    "\n",
    "    def _update_feasibility(self):\n",
    "\n",
    "        _mask = self.td_state['nodes']['active_nodes_mask'].clone() * self.td_state['cur_agent']['action_mask'].clone()\n",
    "\n",
    "        # time windows constraints\n",
    "        loc = self.td_state['coords'].gather(1, self.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "        ptime = self.td_state['cur_agent']['cur_time'].clone()\n",
    "        time2j = torch.pairwise_distance(loc, self.td_state[\"coords\"], eps=0, keepdim = False)\n",
    "        if self.n_digits is not None:\n",
    "            time2j = torch.floor(self.n_digits * time2j) / self.n_digits\n",
    "        arrivej = ptime + time2j\n",
    "        waitj = torch.clip(self.td_state['tw_low']-arrivej, min=0)\n",
    "        service_startj = arrivej + waitj\n",
    "\n",
    "        c1 = service_startj <= self.td_state['tw_high']\n",
    "        c2 = service_startj + self.td_state['service_time'] + self.td_state['time2depot'] <= self.td_state['end_time'].unsqueeze(-1)\n",
    "\n",
    "        # capacity constraints (if there is no load, the agent can only return to the depot)\n",
    "        c3 = torch.ones_like(_mask, dtype=torch.bool, device=env.device)\n",
    "        #c3[self.td_state['cur_agent']['cur_load'].le(0).squeeze(-1)] = !!your code here!!\n",
    "        #c3[self.td_state['cur_agent']['cur_load'].le(0).squeeze(-1), self.td_state['depot_idx']] = !!your code here!!\n",
    "\n",
    "        _mask = _mask * c1 * c2 * c3\n",
    "        # update state\n",
    "        self.td_state['cur_agent'].update({'action_mask': _mask}) \n",
    "        self.td_state['agents']['feasible_nodes'].scatter_(1, \n",
    "                                            self.td_state['cur_agent_idx'][:,:,None].expand(-1,-1,self.num_nodes), _mask.unsqueeze(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8efbffaa-0e6c-42c9-924e-a271f35a9533",
   "metadata": {},
   "source": [
    "Now the `_update_state` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d1af5ba-c2f0-4f05-9666-a148af623c6e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load snippets/ex8.py\n",
    "class Environment(Environment):\n",
    "\n",
    "    def _update_state(self, action):\n",
    "        loc = self.td_state['coords'].gather(1, self.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "        next_loc = self.td_state['coords'].gather(1, action[:,:,None].expand(-1, -1, 2))\n",
    "\n",
    "        ptime = self.td_state['cur_agent']['cur_time'].clone()\n",
    "        time2j = torch.pairwise_distance(loc, next_loc, eps=0, keepdim = False)\n",
    "        if self.n_digits is not None:\n",
    "            time2j = torch.floor(self.n_digits * time2j) / self.n_digits\n",
    "        tw = self.td_state['tw_low'].gather(1, action)\n",
    "        service_time = self.td_state['service_time'].gather(1, action)\n",
    "\n",
    "        arrivej = ptime + time2j\n",
    "        waitj = torch.clip(tw-arrivej, min=0)\n",
    "\n",
    "        time_update = arrivej + waitj + service_time\n",
    "        # update agent cur node\n",
    "        self.td_state['cur_agent']['cur_node'] = action\n",
    "        self.td_state['agents']['cur_node'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cur_node'])\n",
    "        # update agent cur time\n",
    "        self.td_state['cur_agent']['cur_time'] = time_update\n",
    "\n",
    "        # is agent is done set agent time to end_time\n",
    "        agents_done = ~self.td_state['agents']['active_agents_mask'].gather(1, self.td_state['cur_agent_idx']).clone()\n",
    "        self.td_state['cur_agent']['cur_time'] = torch.where(agents_done, self.td_state['end_time'].unsqueeze(-1), \n",
    "                                                             self.td_state['cur_agent']['cur_time'])\n",
    "        self.td_state['agents']['cur_time'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cur_time'])\n",
    "\n",
    "        # update agent cum traveled time\n",
    "        self.td_state['cur_agent']['cur_ttime'] = time2j\n",
    "        self.td_state['cur_agent']['cum_ttime'] += time2j\n",
    "        self.td_state['agents']['cur_ttime'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cur_ttime'])\n",
    "        self.td_state['agents']['cum_ttime'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cum_ttime'])\n",
    "        \n",
    "        # update agent load and node demands\n",
    "        #cur_demands = !!your code here!!\n",
    "        #current_load = !!your code here!!\n",
    "        #load_transfer =  !!your code here!!\n",
    "        self.td_state['cur_agent']['cur_load'] -= load_transfer\n",
    "\n",
    "        # if agent is done set agent cur_load to 0\n",
    "        self.td_state['cur_agent']['cur_load'] = torch.where(agents_done, 0., \n",
    "                                                             self.td_state['cur_agent']['cur_load'])\n",
    "        \n",
    "        self.td_state['nodes']['cur_demands'].scatter_(1, action, cur_demands-load_transfer)\n",
    "        # update done nodes\n",
    "        self.td_state['nodes']['active_nodes_mask'] = self.td_state['nodes']['cur_demands'].gt(0)\n",
    "        self.td_state['nodes']['active_nodes_mask'].scatter_(1, self.td_state['depot_idx'], True)\n",
    "\n",
    "        self.td_state['agents']['cur_load'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cur_load'])\n",
    "        # update visited nodes\n",
    "        r = torch.arange(*self.td_state.batch_size, device=self.device)\n",
    "        self.td_state['agents']['visited_nodes'][r, self.td_state['cur_agent_idx'].squeeze(-1), action.squeeze(-1)] = True\n",
    "        # update agent step\n",
    "        self.td_state['cur_agent']['cur_step'] = torch.where(~agents_done, self.td_state['cur_agent']['cur_step']+1, \n",
    "                                                             self.td_state['cur_agent']['cur_step'])\n",
    "        self.td_state['agents']['cur_step'].scatter_(1, self.td_state['cur_agent_idx'], self.td_state['cur_agent']['cur_step'])\n",
    "\n",
    "        # if all done activate first agent to guarantee batch consistency during agent sampling\n",
    "        self.td_state['agents']['active_agents_mask'][self.td_state['agents']['active_agents_mask'].sum(1).eq(0), 0] = True\n",
    "        self._update_feasibility()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25a7dd49-e2b6-4956-922f-d5bdea93d8e3",
   "metadata": {},
   "source": [
    "Let's test the environment, and repeat the steps we have performed for de CVRPTW:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29e725e4-b8a6-4aeb-9e38-3c412b1bd84b",
   "metadata": {},
   "outputs": [],
   "source": [
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a76e7c58-a8a8-4dc3-87a3-0d9b1477d9b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e48f246-2fc2-43dd-a43a-1aba0f5f3985",
   "metadata": {},
   "outputs": [],
   "source": [
    "td['action'] = torch.tensor([[2]])\n",
    "td = env.step(td)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bed366c-0ae5-4c40-a3e8-1e5c2476c477",
   "metadata": {},
   "outputs": [],
   "source": [
    "env.td_state['cur_agent']['action_mask']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c53c62d9-b22f-4a20-82ca-7c1d4bf3e9d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "env.td_state['cur_agent']['cur_load']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85814cc2-b4a3-4725-b4a4-348830ff1229",
   "metadata": {},
   "source": [
    "IV) What happens if the agents now goes the node `1`?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b13f3916-7d29-437e-b5ce-d7c7c7d06be8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex9.py\n",
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d22d6c20-1d15-4f65-8a02-c4707bf8a949",
   "metadata": {},
   "source": [
    "V) What should the new `cur_load`, `action_mask` and nodes `cur_demands` be? Check your answer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74bbf834-b8d5-4528-bdb3-0f18ff6272da",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex10.py\n",
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb5cbbae-2e4c-4cbd-8942-4069a9a9a8dd",
   "metadata": {},
   "source": [
    "It seems to be working!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63ea0663-ba45-4615-aeb0-851d3d69ede8",
   "metadata": {},
   "source": [
    "### Ex3. Capacitated Vehicle Routing Problem with Soft Time Windows (CVRPSTW)\n",
    "\n",
    "In this variation of the CVRPTW, time window constraints are relaxed and can be violated at a penalty cost (usually linear proportional to the interval between opening/closing times and vehicle arrival). Although the penalty function can be defined in several ways, we consider the formulation studied in [M. A. Figliozzi](https://www.sciencedirect.com/science/article/abs/pii/S0968090X09001119)). \n",
    "Concretely, the time window violation cannot exceed $P_{max}$, and consequently, for each customer, we can enlarge its time window to $[o_i - P_{max}, c_i + P_{max}] = [o^s_i , c^s_i]$ outside which the service cannot be performed. When a vehicle arrives at a customer at time $t_i \\in [o^s_i , c^s_i]$, it can have an early arrival penalty cost of $p_e \\max (o_i-t_i,0)$ and a late arrival penalty cost of $p_l \\max (t_i-c_i, 0)$.\n",
    "\n",
    "Furthermore, the vehicle's maximum waiting time at any customer, $W_{max}$, is imposed. That is, the vehicles can only arrive at each customer after $o_i - P_{max} - W_{max}$, so that its waiting time doesn't exceed $W_{max}$."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1daa6192-6337-4cf0-b600-20ea1461d1e0",
   "metadata": {},
   "source": [
    "The environment for this problem has already been almost done for us. Compared to the base CVRPTW environment, `early_penalty` and `late_penalty` attributes were added to the environment and `tw_high_limit`, `tw_high_limit`, `arrive_limit` attributes were added to `td_state`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33a96a2a",
   "metadata": {},
   "outputs": [],
   "source": [
    "from maenvs4vrp.environments.cvrpstw.env import Environment\n",
    "from maenvs4vrp.environments.cvrpstw.env_agent_selector import AgentSelector\n",
    "from maenvs4vrp.environments.cvrpstw.observations import Observations\n",
    "from maenvs4vrp.environments.cvrpstw.toy_instance_generator import ToyInstanceGenerator\n",
    "from maenvs4vrp.environments.cvrpstw.env_agent_reward import DenseReward"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "180d9d1f-701b-43a9-892d-084e7017a6d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "gen = ToyInstanceGenerator()\n",
    "sel = AgentSelector()\n",
    "rew = DenseReward()\n",
    "obs = Observations()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "358d0616-439d-469a-bc36-5691b2639ee7",
   "metadata": {},
   "source": [
    "II) Complete the `_update_feasibility` method in order to take into account the waiting time constraint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7bbe7ef-c51f-4190-a295-422f38cab642",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex11.py\n",
    "class Environment(Environment):\n",
    " \n",
    "    def _update_feasibility(self):\n",
    "\n",
    "        _mask = self.td_state['nodes']['active_nodes_mask'].clone() * self.td_state['cur_agent']['action_mask'].clone()\n",
    "\n",
    "        # time windows constraints\n",
    "        loc = self.td_state['coords'].gather(1, self.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "        ptime = self.td_state['cur_agent']['cur_time'].clone()\n",
    "        time2j = torch.pairwise_distance(loc, self.td_state[\"coords\"], eps=0, keepdim = False)\n",
    "        if self.n_digits is not None:\n",
    "            time2j = torch.floor(self.n_digits * time2j) / self.n_digits\n",
    "\n",
    "        arrivej = ptime + time2j\n",
    "        waitj = torch.clip(self.td_state['tw_low_limit']-arrivej, min=0)\n",
    "        service_startj = arrivej + waitj\n",
    "\n",
    "        #c0 = !! your code here !! # agents can only arrive at each customer after $o_i - P_{max} - W_{max}$\n",
    "        c1 = service_startj <= self.td_state['tw_high_limit']\n",
    "        c2 = service_startj + self.td_state['service_time'] + self.td_state['time2depot'] <= self.td_state['end_time'].unsqueeze(-1)\n",
    "\n",
    "        # capacity constraints\n",
    "        c3 = self.td_state['demands'] <= self.td_state['cur_agent']['cur_load']\n",
    "\n",
    "        _mask = _mask * c0 * c1 * c2 * c3\n",
    "        # update state\n",
    "        self.td_state['cur_agent'].update({'action_mask': _mask}) \n",
    "        self.td_state['agents']['feasible_nodes'].scatter_(1, \n",
    "                                            self.td_state['cur_agent_idx'][:,:,None].expand(-1,-1,self.num_nodes), _mask.unsqueeze(1))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91dbbf49-09d8-4923-ad9c-8887195f0293",
   "metadata": {},
   "source": [
    "II) Complete the the `DenseReward` class in order to take into acount the penalty for time windows violation:\n",
    "\n",
    "(hint: check `td_state['cur_agent']['cur_penalty']` )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aef33e6e-9c0e-449f-a44b-b69515bbeea7",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load snippets/ex12.py\n",
    "class DenseReward(DenseReward):\n",
    "    \"\"\"Reward class.\n",
    "    \"\"\"\n",
    "\n",
    "    def get_reward(self, action):\n",
    "        \"\"\"\n",
    "        \n",
    "        \"\"\"\n",
    "\n",
    "        # your code here!!\n",
    "    \n",
    "        return reward, penalty"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7024c657-811f-4619-bd01-890cb9af0501",
   "metadata": {},
   "outputs": [],
   "source": [
    "rew = DenseReward()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7691c03-de88-439e-bf50-f3204aea9f17",
   "metadata": {},
   "outputs": [],
   "source": [
    "env = Environment(instance_generator_object=gen,  \n",
    "                  obs_builder_object=obs,\n",
    "                  agent_selector_object=sel,\n",
    "                  reward_evaluator=rew,\n",
    "                  seed=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d5a5bcb-3ee6-405c-a61d-45ebb4cd0c1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5ff4fc7-f608-4c5f-8583-f805c573ae55",
   "metadata": {},
   "source": [
    "Let's get some information about the environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d143565-bbb0-413d-aa2a-721b132a57d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(3,3))\n",
    "plt.plot(env.td_state['coords'][0][:,0].numpy(), env.td_state['coords'][0][:,1].numpy(), 'o')\n",
    "plt.plot(env.td_state['coords'][0][0,0].numpy(), env.td_state['coords'][0][0,1].numpy(), 'o', color='red' )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ec1d5335-fe22-4cfd-90aa-3337d025afe0",
   "metadata": {},
   "outputs": [],
   "source": [
    "for k, data in enumerate(zip(env.td_state['tw_low'][0].tolist(), env.td_state['tw_high'][0].tolist(), env.td_state['demands'][0].tolist())):\n",
    "    print(f'node {k} time window is: [{data[0]}; {data[1]}], with demand {data[2]}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e8bfa33e-5871-4d1f-aae8-d2b08f6359f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "for k, data in enumerate(zip(env.td_state['tw_low_limit'][0].tolist(), env.td_state['tw_high_limit'][0].tolist(), env.td_state['arrive_limit'][0].tolist())):\n",
    "    print(f'node {k} time window limit is: [{data[0]:.2f}; {data[1]:.2f}], with arrive time limit {data[2]:.2f}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "048b8451-583d-475b-a95f-d5c6c223d2b2",
   "metadata": {},
   "source": [
    "For the active agent in the depot, the times (distances) to customers will be:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f53dd0d-d49b-4c13-a3f5-301397ea9e27",
   "metadata": {},
   "outputs": [],
   "source": [
    "loc = env.td_state['coords'].gather(1, env.td_state['cur_agent']['cur_node'][:,:,None].expand(-1, -1, 2))\n",
    "time2j = torch.pairwise_distance(loc, env.td_state[\"coords\"], eps=0, keepdim = False)\n",
    "time2j[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1939d082-d43a-43c2-98bd-a374149fed7a",
   "metadata": {},
   "source": [
    "I) Make some environment steps to check if our implementation is correct. \n",
    "\n",
    "II) What `reward` and `penalty` values are expected?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "325f332d-a932-40aa-abc6-8e6bfa89c281",
   "metadata": {},
   "outputs": [],
   "source": [
    "# %load snippets/ex13.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21824850-d503-4629-b01b-d4bfed61f750",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "331dd3f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ad181499",
   "metadata": {},
   "outputs": [],
   "source": [
    "# your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c045934f-f823-4f5f-bfb2-0fe318ceb56b",
   "metadata": {},
   "source": [
    "Let's do an episode rollout and check the `reward` and `penalty` through every step:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "263ed3f7-920a-4bee-ae4b-e6c1ffa58ed4",
   "metadata": {},
   "outputs": [],
   "source": [
    "td = env.reset()\n",
    "while not td[\"done\"].all():  \n",
    "    td = env.sample_action(td) \n",
    "    td = env.step(td)\n",
    "    step = env.env_nsteps\n",
    "    reward = td['reward']\n",
    "    penalty = td['penalty']\n",
    "    print(f'env step number:{step}, reward: {reward}, penalty: {penalty}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4df0bf8-ee2d-4390-bb27-7f28de3f9a9c",
   "metadata": {},
   "source": [
    "\n",
    "##### Well done! That's it for today. For any comments and suggestions, please drop us an email.\n",
    "\n",
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "maenvs4vrp",
   "language": "python",
   "name": "maenvs4vrp"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}