Post harvest losses (PHLs) pose significant challenges to food security and farmers' livelihoods. Within the complex supply chain, truck transportation emerges as a primary contributor to these losses. This study introduces a decoupled methodology, combining process mining and data mining, to predict and mitigate PHL. It is achieved by combining insights from process mining and advanced data mining techniques. By leveraging transportation logistics, environmental conditions and other relevant environmental and operational factors, this work aims to provide actionable insights for reducing waste and enhancing the efficiency of the post-harvest supply chain.
This study introduces a novel framework for PHL prediction utilizing decoupled methodologies that integrate process mining and data mining. We leverage transportation data from our self-developed Raw Material Tracking (RMT) mobile application and environmental information. Initially, process mining refines noisy transportation data from our Raw Material Tracking (RMT) mobile application to extract process-aware features. Subsequently, these refined features feed into a stack ensemble classification model for PHL prediction. We utilize self-supervised clustering to determine optimal high/low loss thresholds. Following the prediction, association rule mining identifies key patterns linked to these high/low loss outcomes. This modular design that integrates process insights with predictive and pattern analysis underpins our decoupled approach.
The analysis result yielded several critical insights into PHL. First, important feature analysis clearly identified the most influential attributes for PHL prediction. We validated our clustering results by experimenting with various loss thresholds to determine the optimal cutoff point for distinguishing high from low losses. In addition, through associative pattern analysis, we uncover that environmental factors such as temperature and humidity, combined with transportation-related data, are key determinants of PHL. These findings offer actionable insights into the precise conditions and process elements contributing to PHL.
The current analysis is geographically constrained to a single city, that is Surabaya. It is necessary to provide broader validation across diverse settings to enhance generalizability. Furthermore, reliance on city-level local weather data limits micro-level accuracy along truck routes; thus, future work should integrate more granular, route-specific meteorological data. Additionally, incorporating more relevant features, such as product characteristics and other operational aspects, is essential to understand their influence on PHL and further refine prediction accuracy.
The advantage of our decoupled methodology lies in its ability to isolate and optimize distinct analytical steps from process data refinement to predictive modeling and pattern discovery. This modularity allows for robust insights that would be challenging to achieve with monolithic approaches. By utilizing this framework, stakeholders can make more informed decisions regarding storage conditions and delivery schedules that lead to a significant reduction in PHL, improved supply chain efficiency and enhanced profitability.
This study advances PHL prediction by introducing a novel data-driven framework. Unlike prior research works that predominantly rely on static tabular datasets, this study incorporates both sequential data and tabular data within a robust machine learning pipeline. This dual data integration, particularly leveraging process-aware sequential insights, enhances predictive accuracy and offers a better understanding of loss factors. Furthermore, our methodology is highlighted by its ability to equip practitioners with reliable, data-derived loss thresholds to enable more precise loss classification and targeted mitigation strategies.
