Expanding the pioneer shipping route for vaccine distribution: a reinforcement learning-based approach

Zukhruf, Febri; Nugroho, Taufiq Suryo; Maulana, Andrean; Perkasa, Lohdaya; Gunawan, Stefanus; Harahap, Fitrah Nur Muliadin

doi:10.1108/JILT-07-2025-0067

Purpose

In archipelagic countries, sea transport plays a vital role because islands are not always reachable by air because of the absence of airports. This study aimed to explore the use of pioneer shipping routes for delivering vaccines to these remote locations. Pioneer shipping is a service implemented by the government using “pioneer ships” to improve connectivity. This study explored the utilisation of pioneer shipping for vaccine distribution. Despite its potential, vaccine distribution using the pioneer shipping route is challenged by risks related to sea transport, such as delays, weather, sea conditions and handling errors, which have not been addressed in existing studies.

Design/methodology/approach

We introduce a new reinforcement learning-based model that optimises delivery while considering vaccine perishability and wave height. The model was tested in actual shipping routes by utilising historical wave-height data.

Findings

The numerical results indicate that without any optimisation effort, the route can cause 50% damage to the vaccine. Increasing the number of routes can help prevent damage during the voyage owing to the reduced shipping time. By drawing insights from numerical tests, managerial and policy implications can be derived, including the installation of vaccine refrigeration equipment in the vessel. If the equipment is unavailable, it is preferable to use direct shipping from port to port, as our analysis showed that wave variations significantly increase the average travel time and hinder the shipping of the vaccine.

Originality/value

This paper contributes to the literature by addressing the underexplored challenge of vaccine distribution in archipelagic and remote regions. This paper specifically optimises the utilisation of pioneering shipping routes by proposing a novel reinforcement learning-based vessel routing framework that optimises delivery considering goods’ perishability and wave-induced variations. The modelling framework leverages clustering and attention mechanisms to prioritise the delivery of vaccines across multiple shipping routes, enhancing both efficiency and vaccine preservation.

1. Introduction

Vaccines, as cold-chain products, are among the most successful and cost-effective health interventions, particularly in low- and middle-income countries (LMICs) (Anupindi et al., 2022). Their characteristics—such as packaging size, storage temperature, and shelf life—affect logistical handling throughout the supply chain, including storage, transportation, distribution, and administration. For instance, the Pfizer mRNA vaccine requires ultra-cold storage (around −70°C). It has a nonstandard dose size of 0.3 mL, along with a strict 21-day dosing interval, which necessitates a highly coordinated logistics system. In many LMICs—where infrastructure may be limited, transportation networks are fragmented, and healthcare access varies widely—vaccine design must consider the realities of long-distance distribution, unreliable power supplies, and delivery to remote areas to ensure fair and adequate coverage. Additional logistical challenges stem from geographic fragmentation and underdeveloped infrastructure (Berkley et al., 2013). Ensuring vaccine efficacy requires a cold-chain system that maintains a temperature range of 2–8°C (GAVI, 2022). However, distributing vaccines across remote, multi-island regions faces challenges such as unreliable electricity, long transit durations, and limited cold storage facilities, all of which threaten temperature control (Jedermann et al., 2017).

In archipelagic countries, sea transport is critical for ensuring connectivity and offers greater efficiency than air transport, particularly for reaching dispersed islands (Hernández Luis, 2002). Remote islands in LMIC often face significant transport limitations: they lack fixed transportation services, are not commercially viable, have low population densities and incomes, and are frequently inaccessible by air owing to the absence of airport infrastructure. This highlights the crucial role of sea transport, particularly pioneer shipping, in bridging the connectivity gap. Pioneer shipping is a government-subsidised maritime service to connect remote areas that are not served by commercial operators (Riadi and Kurniawan, 2024). The national shipping company is often mandated to operate these routes with financial support from the government. The primary objective is to facilitate the movement of goods and promote economic and social development in remote regions (Humang et al., 2019). Given these characteristics, pioneer shipping has the potential to support vaccine distribution to remote island areas that are accessible only by sea transport. However, this potential has not been explored in existing studies, although it enables the delivery of small-volume shipments—typical of vaccine demands in regions with low population densities—more reliably and cost effectively.

Despite its notable advantages, vaccine distribution using the pioneer shipping route is challenged by the risks associated with sea transport, such as delays, weather, sea conditions, and handling errors (Martínez de Osés and Castells, 2008). This study investigated the possibility of utilising pioneer shipping for delivering vaccines (and other cold-chain-based products), which has remained unexplored in prior research. The study was conducted within the framework of the routing optimisation problem, specifically the vessel routing problem, which primarily focuses on determining cost-effective routes and schedules for vessels delivering goods to multiple destinations (Brouer et al., 2014; Christiansen et al., 2013). The vessel routing problem is more complex than land-based problems, as it requires the consideration of sea conditions, port accessibility, fuel constraints, and time windows in addition to meeting the requirements of cost efficiency and service reliability (Wang and Meng, 2012). To avoid confusion with the acronym VRP, which is the commonly used abbreviation for vehicle routing problem, we refer to our problem as the pioneer vessel routing problem (PVRP). Despite advances in the research on vessel routing problems, existing models rarely incorporate cold-chain-specific constraints, such as perishability and urgent delivery, which are crucial for vaccine distribution. These have been considered in this study.

Building on the above background, this study contributes to the literature by addressing the underexplored challenge of vaccine distribution in archipelagic and remote regions. This study explicitly optimises the utilisation of pioneering shipping routes by proposing a novel reinforcement learning (RL)–based vessel routing framework that optimises delivery by considering perishability of goods and wave-induced variations. The modelling framework leverages clustering and attention mechanisms to prioritise the delivery of vaccines across multiple shipping routes, enhancing both efficiency and vaccine preservation. The model was tested on actual pioneer shipping routes in East Nusa Tenggara to assess its applicability. The results of the numerical experiments offer valuable insights for policy formulation and managerial decision-making in the context of pioneer shipping operations. The remainder of this paper is structured as follows: Section 2 reviews relevant literature in vaccine logistics and maritime routing. Section 3 presents the modelling framework and formulation and explains the RL solution approach. Section 4 discusses the results of the numerical experiments. Section 5 shares policy and managerial insights, and Section 6 concludes with implications and directions for future research.

2. Literature review

This section highlights past vaccine distribution studies and underscores the need to integrate the vessel routing problem and its solution algorithms to ensure efficient vaccine delivery to remote islands. Over the past decades, extensive research has been conducted to explore the challenges associated with vaccine distribution and logistics (Cheyne, 1989; Xu et al., 2021a). The risks, uncertainties, and potential disruptions associated with the vaccine supply chain contribute to its fragility and complexity (Duijzer et al., 2018; Lin et al., 2020). Recent studies on vaccine distribution have highlighted issues such as cold-chain maintenance (Saif and Elhedhli, 2016), demand uncertainty (Shittu et al., 2016), and infrastructural limitations (Al Theeb et al., 2024) as challenges facing vaccine distribution. Notably, vaccine supply chains differ from traditional ones because of stringent temperature requirements and the need for urgent delivery (Al Theeb et al., 2024; Xiao et al., 2025). Effective distribution during the final stage of the vaccine supply chain is crucial for achieving large-scale immunisation coverage. In this regard, implementation studies, such as that by Pavoncello et al. (2025), emphasise the need for vaccine implementation frameworks to adapt to local contexts, particularly in low-resource and remote settings.

The complexity of deploying vaccines to vast and varied geographies has led researchers to explore a wide array of logistics and optimisation models (Kibanga et al., 2025; Xiao et al., 2025). However, most models have assumed a robust infrastructure, overlooking the heterogeneity of accessibility across regions and often neglecting the necessity of vaccine equity across these regions. Xu et al. (2021b) emphasised the importance of conducting a comprehensive analysis of optimal vaccine allocation strategies by considering various equity and efficiency objectives. Despite the progress, distributing vaccines to remote island regions remains a significant hurdle owing to the inherent logistical complexities of maritime transportation (Cano-Marin et al., 2023) and cold-chain distribution (Xiao et al., 2025). Past studies have often neglected the specific needs of geographically isolated areas, particularly remote islands in archipelagic settings. Furthermore, the maritime logistics for vaccine distribution has often been addressed as an afterthought rather than an integral part of the planning process.

VRP has been extensively studied in the context of logistics. Many variants of the classical VRP, including time windows, capacity constraints, and multi-echelon structures, have been applied to cold-chain supply chain and vaccine delivery problems. However, most of these cold-chain VRP problems involve land transportation and its associated uncertainties, such as traffic conditions, particularly in urban areas. For example, Al Theeb et al. (2024) presented a two-echelon VRP for developing countries, considering two-echelon problems involving mobile satellites serviced by large vehicles and small vehicles that pick up and deliver the vaccine to the final customers. The environmental aspect of routing problems has also gained traction. Guo et al. (2018) developed and solved VRP problems for cold-chain distribution by considering multiple factors such as traffic status, total delivery cost, product condition, and carbon emissions. Song et al. (2020) modelled a standard VRP that incorporated different time windows, vehicle types, and varying levels of energy consumption as the key parameters. More recently, the incorporation of drones and hybrid delivery (utilising vehicles and drones) has been considered (Gunaratne et al., 2022). Although previous research has yielded positive results and has contributed to improving vaccine delivery to remote areas, vaccine delivery applications have been generally limited to land-based or short-range logistics (Petroianu et al., 2021). Prior research has seldom addressed the challenges associated with inter-island transportation, particularly those related to small-volume shipments predominantly managed by pioneering shipping networks.

Despite advances in routing research, the specific problem of vessel routing for vaccine delivery remains underexplored. Maritime routing enhances the routing problem complexity because of sea conditions (Timas and Mohammadi, 2025) and irregular ship schedules (Seimetz Chagas et al., 2023). In a recent study (Sharif et al., 2024), the state of the art in maritime transport optimisation was reviewed, emphasising the importance of coordinated scheduling and routing of vessels to improve safety. Some researchers have addressed the complexities of ship-based delivery in archipelagic nations (Accorsi et al., 2021)and recommended customised routing algorithms with strict time windows under uncertainty (Fernández Cuesta et al., 2017; Seimetz Chagas et al., 2023; Timas and Mohammadi, 2025). However, these studies have shown a marked lack of integration between health logistics and maritime transport, especially in the case of remote islands with small-volume shipments. Therefore, this study aims to fill this gap in the literature by introducing the challenges facing vaccine distribution to remote islands within the framework of pioneer shipping.

The increasing focus on health equity demands that health logistics models accomplish more than cost minimisation and improvement in equality. Addressing the distribution of vaccines to remote islands and marginalised populations calls for a multi-criteria decision-making framework (Al Theeb et al., 2024). Owing to the combinatorial nature and dynamic constraints of the VRP, traditional optimisation methods can become computationally expensive or impractical for real-time applications. RL has recently emerged as a promising alternative for solving complex routing problems, enabling agents to learn optimal routing strategies through interactions with the environment (Phiboonbanakit et al., 2021). By exploring broader operational research models, this study proposes a variant of the RL model to solve the PVRP by incorporating not only the clustering process but also vaccine considerations into the attention mechanism.

3. Modelling framework

3.1 Optimisation problem

This study aimed to model the utilisation of vessels for distributing vaccines, which is a cold-chain-based product. Vaccines are time- and temperature-sensitive; they may be damaged if not stored properly, thus necessitating fast transportation. Vessels used in pioneer shipping carry passengers and other goods in addition to vaccines. It is thus important to balance the shipping performance between the distribution of vaccines, transport of passengers, and transport of other goods. Therefore, the model was constructed within the framework of an optimisation problem for minimising not only the number of damaged vaccines but also the monetary cost of unserved passengers and other goods. In addition, we incorporate the multiplication of the number of vaccines and time required to reach the destination port (see Equation (1)), which emphasises that faster delivery of vaccines is preferred. Therefore, this objective function practically aims to minimise the time to ship the vaccine to the destination port while concurrently reducing the unserved demand of passengers and other goods as well as the number of damaged vaccines. Equation (1) also accounts for wave heights that affect the shipment duration, which could damage the vaccines.

\min f = \min (\sum_{g = 1}^{3} α_{g} X_{g} + \sum_{d \in D} λ_{d} t_{d} (ς))

(1)

$α_{1}, α_{2}, α_{3}$ ⁠: monetary values for an unserved passenger, other goods, and damaged vaccines, respectively.
$X_{1}, X_{2}, X_{3}$ ⁠: total number of unserved passengers, other goods, and damaged vaccines, respectively.
$λ_{d}$ ⁠: number of vaccines shipped to port d.
$t_{d} (ς)$ ⁠: time required for shipping vaccines to port d, which is influenced by the wave height $(ς)$

This study also considered the utilisation of several vessels that can serve different routes simultaneously. The routes are first grouped within a cluster, such that the vessel serves only the ports within that cluster. To improve connectivity, we also assign the “hub” to connect seaports among different clusters. We designate $H$ as a set of hubs, namely $H = {1, 2, \dots, h, \dots, H}$ ⁠, $D$ as a set of destination ports, where $D = {1, 2, \dots, d, \dots, D}$ ⁠, and $K$ as a set of vessels, namely $K = {1, 2, \dots, k, \dots, K}$ ⁠, where each vessel services a specific cluster route r. The destination ports are then clustered into the route r, namely $D^{r} = {1, 2, \dots, i, \dots, D^{r}}$ ⁠. Let us assume that $P_{i j}^{k r}$ is a binary variable, which has a value of 1 if a vessel k sails from port i to j on a route r (i.e. $\forall i, j \in {H, D^{r}}$ ⁠) and 0 otherwise.

Constraints (2) and (3) construct the shipping system between the hub and destination port. We assume $τ_{h r}$ is a binary variable with a value of 1 if a vessel moves from the hub $h$ to the destination port within the cluster route r, and value 0 otherwise. Constraints (2) and (3) state that if node $i \in D^{r}$ is selected as the destination port, then vessel k from hub h must visit $i$ ⁠; if it is not selected as the destination port, no vessel from any of the hubs may visit it.

\sum_{k \in K} \sum_{i \in D^{r}} \sum_{j \in D^{r}} P_{i j}^{k r} \geq τ_{h r}, \forall r \in R, h \in H,

(2)

\sum_{k \in K}^{K} \sum_{i \in D^{r}}^{D} \sum_{j \in D^{r}}^{D} P_{i j}^{k r} \leq τ_{h r} ψ, \forall r \in R, h \in H,

(3)

where ψ is a large number.

Constraint (4) shows that each port is visited exactly once by a single vessel and allocated to a specific route cluster. Equation (5) ensures that each vessel that visits a port also departs from that port. Constraint (6) ensures that no destination port in one cluster may be visited by a vessel assigned to a different cluster, and Constraint (7) ensures that no vessel is allowed to travel from one seaport directly back to the same seaport.

\sum_{k \in K} \sum_{r \in R} \sum_{i \in D^{r}} P_{i j}^{k r} \leq 1, \forall j \in D^{r}

(4)

\sum_{k \in K} \sum_{r \in R} \sum_{i \in D^{r}} P_{i j}^{k r} = \sum_{k \in K} \sum_{r \in R} \sum_{i \in D^{r}} P_{j i}^{k r}, \forall j \in D^{r}

(5)

\sum_{k \in K} \sum_{r \in R} P_{i j}^{k r} = 0, \forall i \in D^{r_{1}}, j \in D^{r_{2}}, r_{1} \in R, r_{2} \in R, r_{1} \neq r_{2}

(6)

P_{i j}^{k r} = 0, \forall i \in D^{r}, j \in D, i = j

(7)

Constraint (8) guarantees that the number of passengers (g = 1), goods (g = 2), and vaccines (g = 3) transported from all destination ports, for all routes to the destination port j, must equal the total demand at port j $(S_{j}^{g}$ ⁠), and is tightened with Constraints (9) to (11). Please note $y_{i j}^{k g}$ denotes the number of passengers or goods or vaccines shipped by vessel k from port i to port j.

\sum_{r \in R} \sum_{k \in K} \sum_{i \in D^{r}} \sum_{j \in D^{r}} y_{i j}^{k g} = \sum_{j \in D} S_{j}^{g}, \forall g = {1, 2, 3}

(8)

y_{i j}^{k g} \geq P_{i j}^{k r}, \forall g = {1, 2, 3}, i \in D^{r}, j \in D^{r}, r \in R, k \in K

(9)

y_{i j}^{k g} \leq P_{i j}^{k r} ψ, \forall g = {1, 2, 3}, i \in D^{r}, j \in D^{r}, r \in R, k \in K

(10)

\sum_{j \in D} S_{j}^{g} = \sum_{r \in R} \sum_{k \in K} \sum_{i \in D^{r}} \sum_{j \in D^{r}} y_{i j}^{k g} P_{i j}^{k r}, \forall g = {1, 2, 3}

(11)

Constraint (12) states that the difference in the load carried by a vessel immediately before and after visiting a seaport must equal the demand at the seaport. Constraint (13) ensures that the load carried by a vessel is less than or equal to the capacity (⁠ $u^{k}$ ⁠) of the vessel. These constraints are tightened by Constraint (14), which guarantees that the load carried by a vessel immediately before visiting a seaport does not exceed the capacity of the vessel minus the demand at the last visited seaport; Constraint (15) which ensures that the load carried by a vessel immediately before visiting a port is at least equal to the demand; and Constraint (16), which states that no cargo is allowed to be transported to and from the same seaport on a single route.

\sum_{i \in D^{r}} y_{i j}^{g k} - \sum_{i \in D^{r}} y_{j i}^{g k} = S_{j}^{g}, \forall g = {1, 2, 3}, j \in D^{r}

(12)

y_{i j}^{g k} \leq u^{k}, \forall g = {1, 2, 3}, i \in D^{r}, j \in D^{r}, r \in R, k \in K

(13)

y_{i j}^{g k} \leq \sum_{i \in D^{r}} \sum_{j \in D^{r}} (u^{k} - S_{i}^{g}) P_{i j}^{k r}, \forall g = {1, 2, 3}, r \in R, k \in K

(14)

y_{i j}^{g k} \geq \sum_{i \in D^{r}} \sum_{j \in D^{r}} {S_{i}^{g} P}_{i j}^{k r}, \forall g = {1, 2, 3}, r \in R, k \in K

(15)

y_{i j}^{g k} = 0, \forall i \in D^{r}, j \in D, i = j

(16)

The intermediate variable $t_{j}$ represents a time stamp. In Constraint (17), the arrival time at a port is the time following the voyage time between the origin and destination ports and the loading/unloading activity duration at port j (⁠ $Z_{j}$ ⁠). Please note that $e_{i j}$ represents the distance between ports i and j, and $z_{k} (ς)$ denotes the velocity of vessel k, which is influenced by the wave height $(ς)$ ⁠. Equation (18) imposes the condition that the time of the vaccines when it arrives at the destination seaport j should be less than the maximum voyage duration of the vaccines (T). Equations (19) and (20) indicate the values of the timestamps at the hub and destination ports. Equation (21) refers to the non-negativity characteristic of the decision variables, and Constraints (22) and (23) indicate the binary decision variables.

t_{j} \geq t_{i} + \frac{e_{i j}}{z_{k} (ς)} + Z_{j} + (1 - P_{i j}^{k r}) ψ, \forall i \in D^{r}, j \in D^{r}, r \in R, k \in K

(17)

t_{j} S_{j}^{3} \leq T S_{j}^{3}, \forall i \in D^{r}, j \in D^{r}, r \in R, k \in K

(18)

t_{h} = 0, \forall h \in H

(19)

t_{j} \geq 0, \forall j \in D^{r}

(20)

y_{i j}^{g k} \geq 0, \forall g = {1, 2, 3}, r \in R, k \in K

(21)

P_{i j}^{k r} \in {0, 1} \forall i \in D^{r}, j \in D^{r}, r \in R, k \in K

(22)

τ_{h r} \in {0, 1}, \forall h \in H, r \in R

(23)

3.2 Solution approach

The optimisation problem is solved by proposing a variant of RL, specifically by integrating the attention mechanism concept (Veličkovi'c et al., 2018). The attention mechanism facilitates the provisioning of the context values (i.e. attention values) for each destination to be visited. These values can be subsequently utilised to prioritise the port order within the routing problem. The context values are determined by multiplying the attention features by the weight matrix. We specifically used the number of passengers, amount of goods, weight of vaccines, and average shipping time of the seaport as the features (i.e. feat), as shown in Figure 1. The average shipping time considers the vessel speed, which is affected by the wave height. The incorporation of this shipping time in the attention feature indicates that the decision relating to the vessel routing considers the essential factor of shipping, namely the wave height. To make each feature comparable, we normalised the values in the range of 0–1 using a sigmoid function.

Figure 1

A diagram illustrates the attention feature in reinforcement learning (R L).

View large Download slide

The diagram is divided into two parts: (a) a map and (b) a feature table. Part (a): A simplified map of Brazil shows several seaports labeled “Seaport A” at the top, “Seaport B” on the right, and “Seaport N” at the bottom. Ships are depicted traveling between these seaports, with curved lines indicating routes. Curved lines connect “Seaport A” to “Seaport B” and “Seaport B” to “Seaport C.” Part (b): A table titled “Feature Seaport A” shows four columns representing “Feature-1” (total passengers from and to Seaport A), “Feature-2” (total amount of goods from and to Seaport A), “Feature-3” (total amount of vaccines at Seaport A), and “Feature-4” (average shipping time to Seaport A from other seaports). Below this, sub-tables for “Feature Seaport A,” “Feature Seaport B,” and “Feature Seaport C” are arranged in a horizontal series, with three horizontal dots shown between “Feature Seaport B” and “Feature Seaport C.” Each sub-table contains four numerical values: 0.6, 1.4, 2.0, and 2.0.

Illustration of attention feature in RL. Source: Created by authors

We introduce a share matrix (W) to calculate the importance of node i to node j (⁠ ${i m p o r t}_{i j})$ by multiplying feat by its share matrix value W and the adjacency matrix (see Figure 2 (c) – (f)). A larger value of ${i m p o r t}_{i j}$ represents a greater importance of node i to node j. Hence, it attracts greater attention in terms of number of visits. The attention coefficients (⁠ $\emptyset_{i j}$ ⁠) aim to normalise the importance value of node i to node j with their sum (see Figure 2 (h)). We then invoke the leaky ReLu-based function shown in Equation (24).

Figure 2

A diagram illustrates self-attention estimation from features and a share matrix in reinforcement learning.

View large Download slide

The diagram is divided into nine parts (a) through (i). Part (a): It shows a simplified map with several seaports labeled “Seaport A” at the top, “Seaport B” on the right, and “Seaport N” at the bottom. Ships are depicted traveling between these seaports, with curved lines indicating routes. Curved lines connect “Seaport A” to “Seaport B” and “Seaport B” to “Seaport C.” Part (b): It includes three tables for “Feature Seaport A,” “Feature Seaport B,” and “Feature Seaport C,” arranged in a horizontal sequence, with three horizontal dots shown between “Feature Seaport B” and “Feature Seaport C.” Each table contains numerical values across three columns. The values for “Feature Seaport A” are 0.6, 1.4, and 2.0. The values for “Feature Seaport B” are 1.2, 2.4, and 1.0. The values for “Feature Seaport C” are 2.2, 0.4, and 2.0. Below these tables, the “Normalised Feature” tables are shown. The values for “Normalised Feature Seaport A” are 0.64, 0.80, 0.88. The values for “Normalised Feature Seaport B” are 0.76, 0.91, 0.73. The values for “Normalised Feature Seaport N” are 0.90, 0.59, 0.88. The main calculation is detailed at the top right: “Multiply feature, share, and adjacency matrix.” Part (c): It is the “normalised features” matrix, displayed as a 3 by 3 grid using the normalized data from part (b). The row entries are as follows: Row 1: 0.64, 0.80, 0.88. Row 2: 0.76, 0.91, 0.73. Row 3: 0.90, 0.59, 0.88. Three vertical dots are shown between rows 2 and 3. Part (d): It is the “share matrix,” displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.2, 0.4, 0.4. Row 2: 0.4, 0.3, 0.3. Row 3: 0.1, 0.2, 0.7. Part (e): It is the “adjacency matrix,” displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0, 1, 1. Row 2: 1, 0, 1. Row 3: 1, 1, 0. (Note: The second and third rows and the second and third columns are obscured by a dot.) The result of the multiplication of (c), (d), and (e) is part (f), the “importance matrix.” It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.54, 0.67, 1.11. Row 2: 0.59, 0.72, 1.08. Row 3: 0.50, 0.71, 1.15. Three vertical dots are shown between rows 2 and 3. The process continues with the text “Calculate ‘self attention’ of each customer.” Part (g): It shows the application of a “Leaky ReLU” activation function to the importance matrix (f). It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.54, 0.67, 1.11. Row 2: 0.59, 0.72, 1.08. Row 3: 0.50, 0.71, 1.15. Part (h): It shows the “attention coefficient” matrix, which is derived from (g) (implied by arrows) and contains different values. It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.23, 0.29, 0.47. Row 2: 0.24, 0.30, 0.45. Row 3: 0.21, 0.30, 0.48. Part (i): It is the final “self attention” feature table, where the attention coefficient values are aggregated, shown as a column vector with labels: “Seaport A”: 0.67, “Seaport B”: 0.72, and “Seaport N”: 0.71. Three vertical dots are shown between “Seaport B” and “Seaport C.” A leftward arrow is shown between part (g) and part (h), and between part (h) and part (i).

Self-attention estimation from features and share matrix. Source: Created by authors

Figure 2

View large Download slide

The diagram is divided into nine parts (a) through (i). Part (a): It shows a simplified map with several seaports labeled “Seaport A” at the top, “Seaport B” on the right, and “Seaport N” at the bottom. Ships are depicted traveling between these seaports, with curved lines indicating routes. Curved lines connect “Seaport A” to “Seaport B” and “Seaport B” to “Seaport C.” Part (b): It includes three tables for “Feature Seaport A,” “Feature Seaport B,” and “Feature Seaport C,” arranged in a horizontal sequence, with three horizontal dots shown between “Feature Seaport B” and “Feature Seaport C.” Each table contains numerical values across three columns. The values for “Feature Seaport A” are 0.6, 1.4, and 2.0. The values for “Feature Seaport B” are 1.2, 2.4, and 1.0. The values for “Feature Seaport C” are 2.2, 0.4, and 2.0. Below these tables, the “Normalised Feature” tables are shown. The values for “Normalised Feature Seaport A” are 0.64, 0.80, 0.88. The values for “Normalised Feature Seaport B” are 0.76, 0.91, 0.73. The values for “Normalised Feature Seaport N” are 0.90, 0.59, 0.88. The main calculation is detailed at the top right: “Multiply feature, share, and adjacency matrix.” Part (c): It is the “normalised features” matrix, displayed as a 3 by 3 grid using the normalized data from part (b). The row entries are as follows: Row 1: 0.64, 0.80, 0.88. Row 2: 0.76, 0.91, 0.73. Row 3: 0.90, 0.59, 0.88. Three vertical dots are shown between rows 2 and 3. Part (d): It is the “share matrix,” displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.2, 0.4, 0.4. Row 2: 0.4, 0.3, 0.3. Row 3: 0.1, 0.2, 0.7. Part (e): It is the “adjacency matrix,” displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0, 1, 1. Row 2: 1, 0, 1. Row 3: 1, 1, 0. (Note: The second and third rows and the second and third columns are obscured by a dot.) The result of the multiplication of (c), (d), and (e) is part (f), the “importance matrix.” It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.54, 0.67, 1.11. Row 2: 0.59, 0.72, 1.08. Row 3: 0.50, 0.71, 1.15. Three vertical dots are shown between rows 2 and 3. The process continues with the text “Calculate ‘self attention’ of each customer.” Part (g): It shows the application of a “Leaky ReLU” activation function to the importance matrix (f). It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.54, 0.67, 1.11. Row 2: 0.59, 0.72, 1.08. Row 3: 0.50, 0.71, 1.15. Part (h): It shows the “attention coefficient” matrix, which is derived from (g) (implied by arrows) and contains different values. It is displayed as a 3 by 3 grid. The row entries are as follows: Row 1: 0.23, 0.29, 0.47. Row 2: 0.24, 0.30, 0.45. Row 3: 0.21, 0.30, 0.48. Part (i): It is the final “self attention” feature table, where the attention coefficient values are aggregated, shown as a column vector with labels: “Seaport A”: 0.67, “Seaport B”: 0.72, and “Seaport N”: 0.71. Three vertical dots are shown between “Seaport B” and “Seaport C.” A leftward arrow is shown between part (g) and part (h), and between part (h) and part (i).

Self-attention estimation from features and share matrix. Source: Created by authors

\emptyset_{i j} = \frac{\exp (l e a k y R e L u ({i m p o r t}_{i j}))}{\sum_{j^{'} \in N_{i}^{'}} \exp (l e a k y R e L u ({i m p o r t}_{i j^{'}}))},

(24)

where $j^{'}$ is a neighbour of port i (i.e. $N_{i}^{'}$ ⁠). The self-attention value of each port i (⁠ $l_{i}$ ⁠) is then calculated using Equation (25) and is further used to determine the priority list of customers to be visited (Algorithm 1). The RL agent is employed to seek the value of the share matrix that affects the self-attention mechanism.

l_{i} = \sum_{j^{'} \in N_{i}^{'}} \emptyset_{i j} l_{j^{'}}

(25)

In addition to vessel routing, we trained the RL to cluster the seaports in each route. Pioneer shipping facilitates the simultaneous operation of several routes, with each route serviced by a vessel. We then trained the RL agent to determine the cluster of ports for each route to be serviced. The clustering procedure operates under the premise of port closeness, whereby a closer port possesses a greater likelihood of being assigned within a route. By using an approach similar to routing, the agent is trained to decide the weight of port closeness. We define the closeness of a port as the difference between the minimum and maximum distances to reach another seaport. A low difference indicates closeness to the other port. The agent weighs this closeness to cluster the seaports evenly along each route (see Figure 3).

Figure 3

A diagram illustrates the seaport clustered process, dividing into four parts, (a) through (d).

View large Download slide

Part (a): It shows a simplified map with several seaports labeled “Seaport A” at the top, “Seaport B” on the right, “Seaport C” at the bottom, and “Seaport D”on the left. A ship is depicted traveling between seaports A and B. Part (b): It presents the “Seaport Distance Matrix,” which provides the distances between the seaports. The columns and rows are labeled with “A,” “B,” “C,” and “D.” There is also an additional column labeled “Ma to Min (Closeness).” The row entries are as follows: Row 1: A: B: 10. C: 30. D: 15. Max to Min: 30 to 10. Row 1: B: A: 10. C: 20. D: 40. Max to Min: 40 to 10. Row 1: C: A: 30. B: 20. D: 10. Max to Min: 30 to 10. Row 1: D: A: 15. B: 40. C: 10. Max to Min: 40 to 10. Part (c): It illustrates the calculation of “Weighted Closeness” by multiplying “Closeness” by an “Agent Action” factor: The “Closeness” values for the four seaports are 20, 30, 20, and 30. The corresponding “Agent Action” factors are 0.2, 0.4, 0.1, and 0.3. The resulting “Weighted Closeness” values are 8 (20 times 0.4), 6 (30 times 0.2), 2 (20 times 0.1), and 9 (30 times 0.3). Part (d): It shows the final steps of sorting the seaports and assigning them to routes: The table shows the initial “Seaport” and “Weighted Closeness” pairs: A with 8, B with 6, C with 2, and D with 9. The next step, “Sort the weighted Penalty,” reorders the seaports based on their weighted closeness: C with 2, B with 6, A with 8, and D with 9. Based on the “Number Seaports per Route: 2,” the seaports are then “Assigned the seaports evenly into a route”: The “1st Route” is “C and B.” The “2nd Route” is “A and D.”

Seaport clustered process. Source: Created by authors

We employed the deep deterministic policy gradient (DDPG) algorithm to determine the shared matrix for routing and weight of port closeness (see Figure 3(c)). The DDPG agent was trained to maximise the cumulative reward, which is the sum of the rewards obtained until the end of the episode (M) (Equation (26)). Note that $γ^{m}$ denotes the discounted rewards at episode m. Discounted rewards imply that immediate rewards are more valuable than later rewards. Please note that $r_{v + m}$ denotes the reward obtained at step v of episode m.

\sum_{m = 1}^{M} γ^{m} r_{v + m}

(26)

In the context of the vessel routing problem, the reward relates to the objective function (1). However, owing to the inherent characteristic of the RL agent, which seeks to maximise the rewards, we reformulate the negative of the objective function as a reward to be maximised, as shown in Equation (27).

γ^{m} = - f

(27)

The action a (⁠ $a \in A)$ taken by the agent is related to the value of the share matrix for deciding the routing (see Figure 2) and the weights of closeness for deciding the cluster route (see Figure 3). The actions subsequently affect the environmental conditions, which pertain to the shipping process, including the arrival time at the port, passenger count, cargo tonnage, quantity of vaccinations shipped, and percentage of vaccine damage. The agent will establish a particular strategy known as a policy (⁠ $π$ ⁠) to achieve a greater return. A policy $π (s, a)$ establishes a probability distribution for each action (⁠ $a \in A)$ and state $(s \in S)$ or maps the probability of an action to a specific state. Let us denote the state at step v as $s_{v}$ ⁠. Assume that the agent selects an action based on the policy and obtains an expected return from a given state, as defined in Equation (28).

V_{π} = E_{π} [\sum_{m = 0}^{M} γ^{m} r_{v + m} | s_{v} = s]

(28)

The DDPG approach uses an actor–critic RL agent that employs neural networks to seek an optimal policy that maximises the expected return. The actor and critic networks are randomly initialised using $θ^{μ}$ and $θ^{Q^{'}}$ ⁠, respectively. During each episode, the actor associates the state $s_{v}$ with the action $a_{v}$ at each step and subsequently transmits it to the environment. This state is also utilised for output estimation and loss value calculation. The environment provides the agent with the next state $s_{v + 1}$ and the estimated reward $r_{v}$ as feedback. This feedback is used to update the actor and critic networks. The replay memory holds a collection of states, actions, rewards, and the next state. Once the data are gathered, a small set of experiences is randomly selected from the replay memory. The target Q-value for each state–action pair in this set is then calculated using Equation (29). This target Q-value is also utilised to update the critic network by minimising the loss, as described by Equation (30). The Q-value represents the mapping of the state–action pair with its corresponding return value.

G_{v} = r_{v} + γ Q^{'} (s_{v + 1}, a_{v + 1})

(29)

L = \frac{1}{N} {\sum_{v} (G_{v -} Q (s_{v}, a_{v} | θ^{Q}))}^{2}

(30)

During optimisation, the actor and critic networks aim to maximise the expected return by iteratively updating their parameters. The next essential step in optimising the actor network, particularly the policy function, involves calculating the gradient of the expected Q-value with respect to the policy parameters $θ^{μ}$ ⁠. This computation improves the expected cumulative rewards. The general procedure for applying the DDPG is outlined in Algorithm 1 (see Appendix).

4. Numerical experiment

The proposed model was tested by applying it to an actual pioneer shipping network in East Nusa Tenggara, which comprises 10 destination ports (see Figure 4). The distances between the seaports were calculated using the ship distance calculator (https://www.bednblue.com/sailing-distance-calculator), summarised in Table 1.

Figure 4

A map illustrates a test network, featuring a green outline of an archipelago with multiple islands.

View large Download slide

Several specific locations are identified and marked with a ship anchor icon, indicating seaports or points of interest within this network. On the upper island, four locations are marked: “Waiwole,” “Maumbawa,” “P Ende,” and “Ende.” On the lower-left island, two locations are marked: “Mamboro” and “Waingapu.” Further to the southeast, a chain of smaller islands shows three marked locations: “Raijua,” “Sabu,” and “Ndao,” and on the easternmost island, “Kupang” is marked.

Test network. Source: Courtesy of OpenStreetMap contributors (2024) with authors’ addition

Table 1

Matrix distances between seaports (in miles)

Seaport	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Mamboro	0	77.2	84.7	106.6	151.2	144.8	176.5	210.7	247.3	282.4
Waingapu	77.2	0	55.7	61.7	96.8	87.2	112.5	142.6	188.1	22.1
Waiwole	84.7	55.7	0	21.1	59.7	57	116.7	149	182.7	190.6
Maumbawa	106.6	61.7	21.1	0	40.9	37.4	107.5	148.4	164.8	176.6
Ende	151.2	96.8	59.7	40.9	0	7.9	109	129.1	141.1	153.9
P Ende	144.8	87.2	57	37.4	7.9	0	101.1	121.2	133.2	146
Raijua	176.5	112.5	116.7	107.5	109	101.1	0	78.5	17.1	123.9
Sabu	210.7	142.6	149	148.4	129.1	121.2	78.5	0	60.8	101.2
Ndao	247.3	188.1	182.7	164.8	141.1	133.2	17.1	60.8	0	60.4
Kupang	282.4	22.1	190.6	176.6	153.9	146	123.9	101.2	51.4	0

Seaport	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Mamboro	0	77.2	84.7	106.6	151.2	144.8	176.5	210.7	247.3	282.4
Waingapu	77.2	0	55.7	61.7	96.8	87.2	112.5	142.6	188.1	22.1
Waiwole	84.7	55.7	0	21.1	59.7	57	116.7	149	182.7	190.6
Maumbawa	106.6	61.7	21.1	0	40.9	37.4	107.5	148.4	164.8	176.6
Ende	151.2	96.8	59.7	40.9	0	7.9	109	129.1	141.1	153.9
P Ende	144.8	87.2	57	37.4	7.9	0	101.1	121.2	133.2	146
Raijua	176.5	112.5	116.7	107.5	109	101.1	0	78.5	17.1	123.9
Sabu	210.7	142.6	149	148.4	129.1	121.2	78.5	0	60.8	101.2
Ndao	247.3	188.1	182.7	164.8	141.1	133.2	17.1	60.8	0	60.4
Kupang	282.4	22.1	190.6	176.6	153.9	146	123.9	101.2	51.4	0

Source(s): Created by authors

Considering the safety of the vaccine product as the highest priority, we posit that the penalty for damage to the vaccine product should be the highest. Furthermore, the penalty for undelivered cargo, which often consists of primary products, is larger than that for the passenger count. We set the penalty value (i.e. $α_{g})$ as follows: $α_{1}$ = 50, $α_{2}$ = 75, $α_{3}$ = 100. Owing to insufficient data, we used assumptions regarding the demand for passengers, goods, and vaccine products. Because of the limited need for pioneer shipping, we assumed that the demand will adhere to a uniform distribution, characterised by specific minimum and maximum values. Although this assumption may not be universally applicable to pioneer shipping, it can be regarded as a component of the problem solved by RL agents. Considering the characteristics of RL that are appropriate for complex and dynamic problems, this assumption should be addressed in future studies to train the agent on a broader range of problems. The demand matrices for passengers, goods, and vaccination products are presented in Tables 2–4, which were created randomly under the premise of a uniform distribution. We established the minimum values as 2, 1, and 0.2 for passengers, commodities, and vaccinations, respectively, and the maximum values as 50, 2, and 1 for the same categories. Each route is assumed to be serviced by a single vessel that can carry up to 500 passengers and 50 tonnes of freight. Additionally, we assume the vessel can sail at a maximum speed of 15 knots.

Table 2

Demand matrix for passengers (persons per week)

Port	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Mamboro	0	49	5	8	44	46	18	30	47	6
Waingapu	8	0	4	10	41	33	4	25	37	48
Waiwole	16	36	0	2	48	21	41	27	31	45
Maumbawa	5	26	41	0	5	7	37	34	13	5
Ende	29	29	44	28	0	28	15	4	5	48
P Ende	44	38	19	35	5	0	49	35	4	12
Raijua	13	14	24	20	36	18	0	10	8	33
Sabu	34	3	21	34	33	32	13	0	23	39
Ndao	38	49	21	29	23	20	3	50	0	9
Kupang	31	20	14	37	47	3	41	9	45	0

Port	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Mamboro	0	49	5	8	44	46	18	30	47	6
Waingapu	8	0	4	10	41	33	4	25	37	48
Waiwole	16	36	0	2	48	21	41	27	31	45
Maumbawa	5	26	41	0	5	7	37	34	13	5
Ende	29	29	44	28	0	28	15	4	5	48
P Ende	44	38	19	35	5	0	49	35	4	12
Raijua	13	14	24	20	36	18	0	10	8	33
Sabu	34	3	21	34	33	32	13	0	23	39
Ndao	38	49	21	29	23	20	3	50	0	9
Kupang	31	20	14	37	47	3	41	9	45	0

Source(s): Created by authors

Table 3

Demand matrix for cargo (tonnes per week)

	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Mamboro	0	1	3	2	3	3	3	3	4	4
Waingapu	3	0	3	0	1	5	4	5	5	3
Waiwole	0	0	0	5	5	1	5	0	1	2
Maumbawa	1	2	5	0	2	4	1	1	4	5
Ende	5	5	1	2	0	5	3	5	1	3
P Ende	5	4	5	5	3	0	2	5	1	5
Raijua	0	3	5	2	3	2	0	4	2	0
Sabu	0	2	4	0	5	4	4	0	2	2
Ndao	3	1	2	1	4	1	1	3	0	1
Kupang	4	4	4	4	5	2	1	1	2	0

Source(s): Created by authors

Table 4

Vaccine demand at each seaport (tonnes)

	Mamboro	Waingapu	Waiwole	Maumbawa	Ende	P Ende	Raijua	Sabu	Ndao	Kupang
Vaccine Demand	0.97	0.70	0.89	0.82	0.86	0.28	0.52	0.55	0.73	0.70

Source(s): Created by authors

We also conducted a hyperparameter tuning process based on grid search, in which we tuned the learning rates for actors and critics in the range of 0.001–0.1 and batch size in the range of 64–1024. By using the grid search process, the hyperparameter tuning required up to 125 trials. Considering the hyperparameters used in the existing literature (Osa et al., 2019), the hyperparameters used in DDPG are summarised in Table 5. It is also interesting to note that the DDPG incorporates the neural network architecture that influences the RL performance. In the case of the actor network, we set three hidden layers [256, 256, $| A |$ ], with ReLU activations for the first and second layers and a hyperbolic tangent for the last hidden layer. The last hidden layer of actor networks then provides the action to be decided by the agent (A). For the critic network, we set hidden layers [256, 512], each with ReLU activations.

Table 5

Hyperparameters used in DDPG

No.	Description	Value
1	Discount factor	1
2	Learning rate for actor	0.003162
3	Learning rate for critic	0.01
4	Batch size	64
5	Optimiser	Adam

Source(s): Created by authors

Pambudi et al. (2022) performed comprehensive reviews of the shelf life of vaccines, which varies based on the type of vaccine and equipment available. We can infer that in the case of a vessel equipped with a storage facility that maintains a temperature range of 2–8 °C, many vaccines will survive for an extended period of weeks. However, in the case of a power failure, the vaccine must be moved to a vaccine carrier (e.g. an ice pack), where it may survive only for 24 h (Department of Health and Family Welfare, 2016).

In this numerical example, we assumed that the vaccine can survive a maximum of 36 h of travel, and the cold storage on the vessel is available for only 12 h after travel. Table 6 summarises the results of the RL solution, obtained by varying the number of routes. When only one route is available, with each vessel visiting each port within a single loop, there is a possibility that up to 51% of the vaccine products may be damaged. The numerical results also imply that increasing the number of available routes can reduce the percentage of damage to the vaccine products. It is also interesting to note that in the case of the four available routes, where the average travel time exceeds 36 h, the damage percentage is zero, which implies that the RL procedures can provide an effective strategy to prioritise the transportation of vaccine products.

Table 6

RL solution with variations in the number of routes

Number of routes	Average travel time between ports (h)	Maximum travel time between ports (h)	Vaccines damaged (%)
1	109.47	232.52	0.51
2	78.77	208.73	0.22
3	72.78	157.38	0.12
4	55.87	137.15	–

Source(s): Created by authors

Table 7 shows the network performance with different numbers of routes. Because the vessel also carries passengers and other goods in addition to vaccine products, increasing the number of routes improves the service performance for passengers and freight owing to the decrement in shipping time. The shipping plan with four routes can increase the performance of passenger and freight travel hours by up to 35% when compared with that with just one route.

Table 7

Network performance with variations in the number of routes

Number of routes	Total network travel time (h)	Total passenger travel hours (Pass.hr.)	Total freight travel hours (ton.hr)	Total Vaccine travel hours (ton.hr)
1	7877.06	275866.93	31266.17	263.73
2	7277.77	205035.23	20967.54	154.17
3	5587.02	185902.93	19749.17	149.36
4	9317.09	140957.15	15098.51	112.69

Number of routes	Total network travel time (h)	Total passenger travel hours (Pass.hr.)	Total freight travel hours (ton.hr)	Total Vaccine travel hours (ton.hr)
1	7877.06	275866.93	31266.17	263.73
2	7277.77	205035.23	20967.54	154.17
3	5587.02	185902.93	19749.17	149.36
4	9317.09	140957.15	15098.51	112.69

Source(s): Created by authors

Although the increase in the route options improved vaccine delivery using the vessel, the risk linked to wave-height uncertainty remained, which can extend the voyage duration. We tested the proposed RL for vessel routing using three routes with 600 different sets of randomly generated wave variations. Specifically, we generated different wave heights for each pair of seaports by using a random number and inverse cumulative distribution functions (CDFs), as shown in Figure 5. The CDF uses the data collected annually around the seaports, which includes 2,812,542 significant waves from September 2023 to September 2024 (NASA, 2023). We evaluated several distributions for fitting the wave data, and the Burr distribution was found to provide the best fit based on the Akaike information criterion value. The randomly generated waves $(ς)$ were then used to adjust the vessel speed using Equation (31), as demonstrated by Lewis (1955).

Figure 5

Two plots illustrate statistical characteristics related to wave heights.

View large Download slide

Plot (a): Wave distribution: It is a histogram that shows the “Frequency” versus “Wave heights (meters).” The horizontal axis is labeled “Wave heights (meters)” and ranges from 0 to 15 in increments of 5 units. The vertical axis is labeled “Frequency” and ranges from 0 to 12 in increments of 2 units. This axis is scaled by 10 to the 4 power. The plot displays a highly skewed distribution, with a massive peak in frequency occurring near zero wave height. The frequency reaches approximately 10 times 10 to the 4 power for the smallest waves, and then decreases very rapidly and tails off toward the right, with wave heights extending up to 15 meters, though with very low frequency at the higher end. Part (b): Inverse C D F: The vertical axis is labeled “Wave heights (meters)” and ranges from 0 to 15 in increments of 5 units. The horizontal axis is labeled “Probability Density” and ranges from 0 to 1 in increments of 0.1 units. The curve starts near 0 meters wave height at a probability density of 0 and rises slowly until a probability density of about 0.8. After this point, the curve rises steeply, indicating that the highest wave heights (up to 15 meters) correspond to the highest probability density values. Note: All numerical values are approximated.

Wave distribution and its inverse CDF. Source: Created by authors

z_{k} = - 3.2 ς + 19

(31)

Figure 6 illustrates the effect of wave uncertainty on the travel time. When the wave height is not considered, the average travel time among all seaports is approximately 55.57 h (for the four-route scenario, please see Table 6). The outcome is entirely different when considering wave uncertainty, where 50% of the average travel time among all seaports exceeds 57 h. This situation increases the probability of vaccine product damage to 19%. Specifically, it means that within 600 random wave events, 114 events cause damage to the vaccine product during the voyage. Furthermore, it is noteworthy that the temporal limitation of vaccine viability significantly influences the number of required routes. Table 8 indicates that under conditions of very limited time (i.e. less than 36 h), the four routes are insufficient to prevent vaccination damage. The uncertainty of trip time, combined with the available survival duration of the vaccine, constitutes a critical constraint for utilising pioneering transportation methods for vaccine distribution.

Figure 6

A histogram shows the frequency distribution of the “average travel time (hours)”.

View large Download slide

The horizontal axis is labeled “Average travel time (in hours),” ranging from 40 to 80 in increments of 5 hours. The vertical axis is labeled “Frequency” and ranges from 0 to 90 in increments of 10 units. The distribution is bell-shaped but slightly skewed, with the highest frequency (nearly 60) occurring in the bin representing travel times between approximately 57 and 60 hours. A red dashed vertical line is prominently placed at the 56-hour mark, which, according to the legend, indicates the travel time “without wave variations.” The rest of the histogram bars, outlined in gray, represent the travel times “with wave variations.” Note: All numerical values are approximated.

Average travel time considering wave variations. Source: Created by authors

Table 8

Percentage of vaccine damage with various time constraints and routes

Time constraints (h)	Number of routes
Time constraints (h)	Three	Four
12	68%	46%
24	30%	18%
36	12%	0%
48	0%	0%

Source(s): Created by authors

5. Managerial insight

Remote islands pose a significant challenge in terms of vaccine distribution, as the mode of transportation is unreliable. Practically, the remote islands can be reached only through air and sea transport. Air transport can deliver the vaccine the fastest and thus minimise the risk of vaccine damage. However, it requires specific infrastructure and higher shipping costs. We interviewed the staff handling vaccine distribution in the archipelagic province of Indonesia. Although the delivery time of an aeroplane is significantly faster (i.e. 12 times faster than the direct port-to-port shipment), we found that air transport is approximately 10 times more costly than sea transport. In addition, sea transport provides alternatives for delivering the vaccine, that is, directly from port to port or by following an available shipping route (e.g. pioneer shipping route). Both shipping processes require a longer time than that required for air transport, necessitating a storage system to be installed on the vessel. The direct port-to-port route provides a shorter travel time than that obtained by following an available shipping route, but it requires additional vessels for direct shipping. On the contrary, the delivery via the available shipping route requires fewer vessels but involves a longer journey duration.

Considering the possibility of delivering the vaccine via a vessel, the following key points must be addressed to ensure a safe delivery process:

Prepare and plan the vaccine transport.

Because the vessel voyage requires a longer travel time, the preparation and planning of vaccine transport are crucial. The vaccine should be handled appropriately and administered on time. Planning the vaccine schedule in conjunction with the route of the vessel is crucial. Vessel route planning should consider the quantity of vaccine and time constraint to ensure the vaccine is in good condition upon arrival. The routing algorithm proposed in this paper is one option for planning the pioneer shipping route for vaccine delivery. Additionally, trained staff should be assigned for preparation, delivery, and acceptance.

Delay consideration

Delay consideration is a crucial issue in vaccine delivery using a vessel. As can be inferred from the numerical result, delays may ruin the quality of the vaccine. Delays are possible at the departure time as well as during the voyage. In the case of a departure delay, the best option is to equip the seaport with storage equipment. However, a more straightforward solution involves strengthening the communication between the vaccine staff and seaport staff to maintain a suitable temperature during the intermodal process (i.e. changes in the mode of transport from land to vessel). Additionally, delays may occur during the voyage owing to weather uncertainty and operational issues with the vessel. A storage equipment with a battery supply will be helpful to extend the shelf life of the vaccine.

Storage equipment

Several types of portable vaccine refrigerators are currently available, which can operate on 12/24 V DC or 110/240 V AC. These portable vaccine refrigerators can maintain a temperature as low as −40 °C, making them the best option for vessels used in pioneer shipping. It is advisable to avoid equipping the vessel with commercially available soft-sided food or beverage coolers, as most are poorly insulated and are likely to be affected by the room or outdoor temperature. In addition to portable vaccine refrigerators, vaccine transport should utilise qualified containers, coolant materials, and insulating materials.

Temperature monitoring

Maintaining the vaccine temperature is vital to ensure its quality. A temperature monitoring device is required on the storage equipment to record the accurate temperature history. Although this device requires additional investment, it will be less expensive than replacing the wasted vaccines. Additionally, upon arrival at the seaport, it is crucial to ensure that trained staff are available to inspect the vaccine and store it properly.

Based on the above analysis of the mode characteristic, the schematic decision process for managing vaccine distribution in remote areas is depicted in Figure 7. We categorise vaccines into two main types: emergency and routine vaccines. Emergency vaccines must be delivered as soon as possible, as they are crucial for controlling situations such as a disease outbreak. In contrast, the vaccine delivery can be priorly scheduled for routine vaccines. Emergency vaccines must be delivered by air transport or direct port-to-port shipping processes, whereas routine vaccines can be delivered using the available shipping routes on vessels equipped with vaccine refrigeration equipment. If such equipment is unavailable, it is preferable to use port-to-port direct shipping procedures.

Figure 7

A flowchart shows the logical steps to determine the delivery method for routine and emergency vaccines.

View large Download slide

The process begins with two starting points: “Routine Vaccine” and “Emergency Vaccine.” For “Routine Vaccine,” the first decision diamond asks “Shipping Route Available?” If “yes,” the process moves to a second decision: “Vaccine Storage Equipped on Vessel?” If “yes,” the delivery method is “Vaccine Delivery via Route Shipping.” If “no,” the delivery defaults to “Vaccine Delivery via Direct Shipping.” If the initial decision “Shipping Route Available?” is “no,” the delivery also defaults to “Vaccine Delivery via Direct Shipping.” For “Emergency Vaccine,” the process moves to the decision “Air Transport Available?” If “yes,” the delivery method is “Vaccine Delivery via Air Transport.” If “no,” it defaults to “Vaccine Delivery via Direct Shipping.”

Schematic of decision process for distributing vaccines to a remote island. Source: Created by authors

6. Conclusion

This paper discussed the utilisation of pioneer shipping for transporting cold-chain goods, such as vaccines. Pioneer shipping refers to the shipping service that ensures connectivity to remote islands. A variant of RL is proposed for deciding the shipping cluster of routes as well as the route for transporting the vaccine product using a vessel. Numerical testing was conducted using an actual pioneer shipping route. The numerical results indicated that a single loop (i.e. one route) can cause 51% damage to the vaccine products, whereas increasing the number of routes can help prevent damage during the voyage owing to the reduced shipping time. Additionally, increasing the routes may reduce the average travel time between all seaports by up to 46%, which benefits not only the vaccine product but also passengers and other goods being shipped.

From the numerical example, we found that wave uncertainty significantly increases the average travel time and disrupts the shipping of vaccine products. Therefore, owing to their time-sensitive nature, air transport is the most effective option for delivering emergency vaccines. If air transport is unavailable, the direct port-to-port route offers a shorter travel time than that when following an available shipping route. The pioneer shipping route is mostly suitable for routine or scheduled vaccines. However, it must be ensured that the vessel is equipped with vaccine refrigeration equipment. If such equipment is unavailable, it is preferable to use the direct port-to-port route.

Further research can focus on developing a method based on multimodal perspectives, as the end consumer of cold-chain products is usually located far from seaport locations. Additionally, extending the model to address stochastic optimisation would be intriguing, as it would enable a thorough examination of the uncertainty within the optimal framework. Furthermore, additional training of RL agents to address a broader spectrum of problems should be undertaken to ensure the applicability of RL-based methods.

The authors would like to acknowledge the financial support from the National Research and Innovation Agency (BRIN) under the Research and Innovation for Advanced Indonesia (RIIM) Batch IV program in collaboration with the Education Fund Management Agency (LPDP). We also express our gratitude to ITB for its support through the Research “Unggulan” Program.

The supplementary material for this article can be found online.

References

Accorsi

,

R.

,

Ferrari

,

E.

,

Manzini

,

R.

and

Tufano

,

A.

(

2021

), “

An application of a vessel route planning model to second the import/export of seasonal products

”, pp.

47

-

56

, doi:

https://doi.org/10.1007/978-3-030-60135-5_4

.

Google Scholar

Al Theeb

,

N.

,

Abu-Aleqa

,

M.

and

Diabat

,

A.

(

2024

), “

Multi-objective optimization of two-echelon vehicle routing problem: vaccines distribution as a case study

”,

Computers and Industrial Engineering

, Vol.

187

, 109590, doi:

https://doi.org/10.1016/j.cie.2023.109590

.

Google Scholar

Anupindi

,

R.

,

Yadav

,

P.

,

Jefferson

,

K.M.P.

and

Ashby

,

E.

(

2022

), “Globally resilient supply chains for seasonal and pandemic Influenza vaccines”, in

Globally Resilient Supply Chains for Seasonal and Pandemic Influenza Vaccines

,

National Academies Press

, doi:

https://doi.org/10.17226/26285

.

Google Scholar

Crossref

Berkley

,

S.

,

Elias

,

C.

,

Lake

,

A.

,

Chan

,

M.

,

Fauci

,

A.

and

Phumaphi

,

J.

(

2013

), “

Global vaccine action plan

”,

available at:

www.who.int

Google Scholar

Brouer

,

B.D.

,

Desaulniers

,

G.

and

Pisinger

,

D.

(

2014

), “

A matheuristic for the liner shipping network design problem

”,

Transportation Research Part E: Logistics and Transportation Review

, Vol.

72

, pp.

42

-

59

, doi:

https://doi.org/10.1016/j.tre.2014.09.012

.

Google Scholar

Crossref

Cano-Marin

,

E.

,

Ribeiro-Soriano

,

D.

,

Mardani

,

A.

and

Blanco Gonzalez-Tejero

,

C.

(

2023

), “

Exploring the challenges of the COVID-19 vaccine supply chain using social media analytics: a global perspective

”,

Sustainable Technology and Entrepreneurship

, Vol.

2

No.

3

, 100047, doi:

https://doi.org/10.1016/j.stae.2023.100047

.

Google Scholar

Cheyne

,

J.

(

1989

), “

Vaccine delivery management

”,

Clinical Infectious Diseases

, Vol.

11

No.

Supplement_3

, pp.

S617

-

S622

, doi:

https://doi.org/10.1093/clinids/11.Supplement_3.S617

.

Google Scholar

Crossref

Christiansen

,

M.

,

Fagerholt

,

K.

,

Nygreen

,

B.

and

Ronen

,

D.

(

2013

), “

Ship routing and scheduling in the new millennium

”,

European Journal of Operational Research

, Vol.

228

No.

3

, pp.

467

-

483

, doi:

https://doi.org/10.1016/j.ejor.2012.12.002

.

Google Scholar

Crossref

Department of Health and Family Welfare, G. of I

(

2016

), “

Immunization handbook for medical officers

”.

Duijzer

,

L.E.

,

van Jaarsveld

,

W.

and

Dekker

,

R.

(

2018

), “

Literature review: the vaccine supply chain

”,

European Journal of Operational Research

, Vol.

268

No.

1

, pp.

174

-

192

, doi:

https://doi.org/10.1016/j.ejor.2018.01.015

.

Google Scholar

Crossref

Fernández Cuesta

,

E.

,

Andersson

,

H.

,

Fagerholt

,

K.

and

Laporte

,

G.

(

2017

), “

Vessel routing with pickups and deliveries: an application to the supply of offshore oil platforms

”,

Computers and Operations Research

, Vol.

79

, pp.

140

-

147

, doi:

https://doi.org/10.1016/j.cor.2016.10.014

.

Google Scholar

Crossref

GAVI

(

2022

), “

Cold chain equipment optimisation platform

”,

available at:

www.gavi.org

Gunaratne

,

K.

,

Thibbotuwawa

,

A.

,

Vasegaard

,

A.E.

,

Nielsen

,

P.

and

Perera

,

H.N.

(

2022

), “

Unmanned aerial vehicle adaptation to facilitate healthcare supply chains in low-income countries

”,

Drones

, Vol.

6

No.

11

, p.

321

, doi:

https://doi.org/10.3390/drones6110321

.

Google Scholar

Crossref

Guo

,

Z.

,

Zhang

,

D.

,

Liu

,

H.

,

He

,

Z.

and

Shi

,

L.

(

2018

), “

Green transportation scheduling with pickup time and transport mode selections using a novel multi-objective memetic optimization approach

”,

Transportation Research Part D: Transport and Environment

, Vol.

60

, pp.

137

-

152

, doi:

https://doi.org/10.1016/j.trd.2016.02.003

.

Google Scholar

Crossref

Hernández Luis

,

J.Á.

(

2002

), “

Temporal accessibility in archipelagos: inter-island shipping in the canary islands

”,

Journal of Transport Geography

, Vol.

10

No.

3

, pp.

231

-

239

, doi:

https://doi.org/10.1016/S0966-6923(02)00014-5

.

Google Scholar

Crossref

Humang

,

W.P.

,

Hadiwardoyo

,

S.P.

and

Nahry

,

N.

(

2019

), “

Factors influencing the integration of freight distribution networks in the Indonesian archipelago: a structural equation modeling approach

”,

Advances in Science, Technology and Engineering Systems

, Vol.

4

No.

3

, pp.

278

-

286

, doi:

https://doi.org/10.25046/aj040335

.

Google Scholar

Crossref

Jedermann

,

R.

,

Praeger

,

U.

and

Lang

,

W.

(

2017

), “

Challenges and opportunities in remote monitoring of perishable products

”,

Food Packaging and Shelf Life

, Vol.

14

, pp.

18

-

25

, doi:

https://doi.org/10.1016/j.fpsl.2017.08.006

.

Google Scholar

Crossref

Kibanga

,

W.

,

Munishi

,

C.

,

Ntissi

,

H.

,

Ndayishimiye

,

P.

,

Myemba

,

D.T.

,

Mfinanga

,

E.

,

Mutagonda

,

R.F.

and

Kaale

,

E.

(

2025

), “

Vaccine handling practices and conformity to cold chain temperature requirements in selected regions of Tanzania: a descriptive cross-sectional study

”,

Journal of Pharmaceutical Policy and Practice

, Vol.

18

No.

1

, 2479062, doi:

https://doi.org/10.1080/20523211.2025.2479062

.

Google Scholar

Lewis

,

E.V.

(

1955

), “

The influence of sea conditions on the speed of ships

”,

Journal of the American Society for Naval Engineers

, Vol.

67

No.

2

, pp.

303

-

319

, doi:

https://doi.org/10.1111/j.1559-3584.1955.tb03100.x

.

Google Scholar

Crossref

Lin

,

Q.

,

Zhao

,

Q.

and

Lev

,

B.

(

2020

), “

Cold chain transportation decision in the vaccine supply chain

”,

European Journal of Operational Research

, Vol.

283

No.

1

, pp.

182

-

195

, doi:

https://doi.org/10.1016/j.ejor.2019.11.005

.

Google Scholar

Crossref

Martínez de Osés

,

F.X.

and

Castells

,

M.

(

2008

), “

Heavy weather in European short sea shipping: its influence on selected routes

”,

Journal of Navigation

, Vol.

61

No.

1

, pp.

165

-

176

, doi:

https://doi.org/10.1017/S0373463307004468

.

Google Scholar

Crossref

NASA

(

2023

), “

Earthdata login

”,

available at:

https://urs.earthdata.nasa.gov/

OpenStreetMap contributors

(

2024

), “

OpenStreetMap

”,

[Data set]. Available as open data under the Open Data Commons Open Database License (ODbL) at openstreetmap.org. OpenStreetMap Foundation

.

Osa

,

T.

,

Tangkaratt

,

V.

,

Sugiyama

,

M.

,

Aip

,

R.

and

Tokyo

,

J.

(

2019

), “

Hierarchical reinforcement learning via advantage-weighted information maximization

”,

International Conference on Learning Representations

, doi:

https://doi.org/10.48550/arXiv.1901.01365

.

Google Scholar

Pambudi

,

N.A.

,

Sarifudin

,

A.

,

Gandidi

,

I.M.

and

Romadhon

,

R.

(

2022

), “

Vaccine cold chain management and cold storage technology to address the challenges of vaccination programs

”,

Energy Reports

, Vol.

8

, pp.

955

-

972

, doi:

https://doi.org/10.1016/j.egyr.2021.12.039

.

Google Scholar

Crossref

Pavoncello

,

V.

,

Kislaya

,

I.

,

Andrianarimanana

,

D.K.

,

Marchese

,

V.

,

Rakotomalala

,

R.

,

Rasamoelina

,

T.

,

Veilleux

,

S.

,

Guth

,

A.

,

Zafinimampera

,

A.O.T.

,

Ratefiarisoa

,

S.

,

Totofotsy

,

O.

,

Doumbia

,

C.O.

,

Rakotonavalona

,

R.

,

Ramananjanahary

,

H.

,

Randriamanantany

,

Z.A.

,

May

,

J.

,

Rakotoarivelo

,

R.A.

,

Puradiredja

,

D.I.

and

Fusco

,

D.

(

2025

), “

Optimizing vaccine uptake in sub-Saharan Africa: a collaborative COVID-19 vaccination campaign in Madagascar using an adaptive approach

”,

Implementation Science

, Vol.

20

No.

1

, p.

2

, doi:

https://doi.org/10.1186/s13012-024-01412-5

.

Google Scholar

Crossref

PubMed

Petroianu

,

P.G.L.

,

Zabinsky

,

Z.B.

,

Zameer

,

M.

,

Chu

,

Y.

,

Muteia

,

M.M.

,

Resende

,

M.G.C.

,

Coelho

,

A.L.

,

Wei

,

J.

,

Purty

,

T.

,

Draiva

,

A.

and

Lopes

,

A.

(

2021

), “

A light-touch routing optimization tool (RoOT) for vaccine and medical supply distribution in Mozambique

”,

International Transactions in Operational Research

, Vol.

28

No.

5

, pp.

2334

-

2358

, doi:

https://doi.org/10.1111/itor.12867

.

Google Scholar

Crossref

PubMed

Phiboonbanakit

,

T.

,

Horanont

,

T.

,

Huynh

,

V.-N.

and

Supnithi

,

T.

(

2021

), “

A hybrid reinforcement learning-based model for the vehicle routing problem in transportation logistics

”,

IEEE Access

, Vol.

9

, pp.

163325

-

163347

, doi:

https://doi.org/10.1109/ACCESS.2021.3131799

.

Google Scholar

Crossref

Riadi

,

A.

and

Kurniawan

,

A.

(

2024

), “

Optimizing pioneer vessel efficiency: empirical insights and mini-container integration in remote maritime transport

”,

Journal for Maritime Research

, Vol.

21

No.

2

, pp.

103

-

109

,

available at:

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205988072&partnerID=40&md5=0c2996c1a8c4c19f236ad3d040c5876f

Google Scholar

Saif

,

A.

and

Elhedhli

,

S.

(

2016

), “

Cold supply chain design with environmental considerations: a simulation-optimization approach

”,

European Journal of Operational Research

, Vol.

251

No.

1

, pp.

274

-

287

, doi:

https://doi.org/10.1016/j.ejor.2015.10.056

.

Google Scholar

Crossref

Seimetz Chagas

,

R.D.

,

Soares

,

J.B.C.de O.

,

Longhi

,

R.P.

,

Vieira

,

B.F.

,

de Arruda

,

E.F.

,

Leite

,

L.S.B.da S.

and

Ferreira Filho

,

V.J.M.

(

2023

), “

A solution framework for the integrated periodic supply vessel planning and port scheduling in oil and gas supply logistics

”,

Optimization and Engineering

, Vol.

24

No.

2

, pp.

1115

-

1155

, doi:

https://doi.org/10.1007/s11081-022-09723-6

.

Google Scholar

Crossref

Sharif

,

N.

,

Rönnqvist

,

M.

,

Cordeau

,

J.F.

,

Audy

,

J.F.

,

Warya

,

G.

and

Ngo

,

T.

(

2024

), “

Multi-objective vessel routing problems with safety considerations: a review

”,

Maritime Transport Research

, Vol.

7

, 100122, doi:

https://doi.org/10.1016/j.martra.2024.100122

.

Google Scholar

Shittu

,

E.

,

Harnly

,

M.

,

Whitaker

,

S.

and

Miller

,

R.

(

2016

), “

Reorganizing Nigeria's vaccine supply chain reduces need for additional storage facilities, but more storage is required

”,

Health Affairs

, Vol.

35

No.

2

, pp.

293

-

300

, doi:

https://doi.org/10.1377/hlthaff.2015.1328

.

Google Scholar

Crossref

PubMed

Song

,

M.

,

Li

,

J.

,

Han

,

Y.

,

Han

,

Y.

,

Liu

,

L.

and

Sun

,

Q.

(

2020

), “

Metaheuristics for solving the vehicle routing problem with the time windows and energy consumption in cold chain logistics

”,

Applied Soft Computing

, Vol.

95

, 106561, doi:

https://doi.org/10.1016/j.asoc.2020.106561

.

Google Scholar

Timas

,

A.

and

Mohammadi

,

M.

(

2025

), “

Integrating weather-informed routing and energy optimization for sustainable maritime transportation

”,

Ocean Engineering

, Vol.

333

, 121463, doi:

https://doi.org/10.1016/j.oceaneng.2025.121463

.

Google Scholar

Veličkovi´c

,

P.

,

Cucurull

,

G.

,

Casanova

,

A.

,

Romero

,

A.

,

Lì

,

P.

and

Bengio

,

Y.

(

2018

), “

Graph attention networks

”,

International Conference on Learning Representations, ICLR

.

Google Scholar

Wang

,

S.

and

Meng

,

Q.

(

2012

), “

Liner ship fleet deployment with container transshipment operations

”,

Transportation Research Part E: Logistics and Transportation Review

, Vol.

48

No.

2

, pp.

470

-

484

, doi:

https://doi.org/10.1016/j.tre.2011.10.011

.

Google Scholar

Crossref

Xiao

,

S.

,

Peng

,

P.

,

Yu

,

S.

and

Puchinger

,

J.

(

2025

), “

Route optimisation for sustainable cold chain logistics: a collaborative distribution model considering on-demand reuse of returnable transport items

”,

International Journal of Production Research

, Vol.

63

, pp.

1

-

20

, doi:

https://doi.org/10.1080/00207543.2025.2501165

.

Google Scholar

Crossref

Xu

,

J.

,

Yao

,

R.

and

Qiu

,

F.

(

2021a

), “

Mitigating cascading outages in severe weather using simulation-based optimization

”,

IEEE Transactions on Power Systems

, Vol.

36

No.

1

, pp.

204

-

213

, doi:

https://doi.org/10.1109/TPWRS.2020.3008428

.

Google Scholar

Crossref

Xu

,

X.

,

Rodgers

,

M.D.

and

Guo

,

W.(G.)

(

2021b

), “

A hub-and-spoke design for ultra-cold COVID-19 vaccine distribution

”,

Vaccine

, Vol.

39

No.

41

, pp.

6127

-

6136

, doi:

https://doi.org/10.1016/j.vaccine.2021.08.069

.

Google Scholar

Crossref

2025

Febri Zukhruf, Taufiq Suryo Nugroho, Andrean Maulana, Lohdaya Perkasa, Stefanus Gunawan and Fitrah Nur Muliadin Harahap

Published in Journal of International Logistics and Trade. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

Expanding the pioneer shipping route for vaccine distribution: a reinforcement learning-based approach

1. Introduction

2. Literature review

3. Modelling framework

3.1 Optimisation problem

3.2 Solution approach

4. Numerical experiment

5. Managerial insight

6. Conclusion

References

Supplementary data

Email Alerts

Cited By

Expanding the pioneer shipping route for vaccine distribution: a reinforcement learning-based approach Open Access

1. Introduction

2. Literature review

3. Modelling framework

3.1 Optimisation problem

3.2 Solution approach

4. Numerical experiment

5. Managerial insight

6. Conclusion

References

Supplementary data

Email Alerts

Suggested Reading

Recommended for you

Cited By

Sharing Unavailable

Expanding the pioneer shipping route for vaccine distribution: a reinforcement learning-based approach