This study explores the potential of machine learning techniques to reduce administrative burdens in participatory budgeting, focusing on the case of Seoul, which received over 25,000 proposals between 2013 and 2021.
Similar to a coffee dripper, the budgeting process filters and refines numerous citizen inputs within budgetary constraints, requiring large-scale public deliberation to review, synthesise and prioritise these inputs. This process typically involves labour-intensive processing by both citizens and public officials, generating significant administrative burdens. This paper argues that machine learning techniques can automate redundant preparatory tasks, thereby enabling institutional resources to focus more on deliberation and decision-making. By combining unsupervised and supervised learning approaches, this study identifies 17 major topics within the proposal corpus and develops a proposal selection classifier that achieves an Area Under the Receiver Operating Characteristic score of 0.73, indicating reasonable predictive performance.
The findings demonstrate the potential of machine learning to alleviate learning costs through automated proposal categorisation and summarisation as well as compliance costs by providing predictive feedback that supports citizens in developing more robust proposals.
This exploratory study contributes to the emerging discourse on digital administrative burden, offering practical insights for mitigating administrative challenges in large-scale participatory budgeting.
1. Introduction
Participatory budgeting (PB) enables citizens to participate in the allocation of public funds, fostering empowerment and advancing local democracy (Sinervo et al., 2024). Since its inception in Porto Alegre, Brazil, in the late 1980s, PB has spread globally over the past few decades (Sintomer et al., 2012a; Touchton et al., 2023a). Recent advancements in digital technologies have further lowered barriers, enabling citizens to voice their concerns and influence decision-making more effectively (Bartocci et al., 2023). Prominent PB programmes in cities such as Barcelona (Bua and Bussu, 2021), Brazilian cities (Touchton et al., 2019), Helsinki (Shin et al., 2022), Madrid (Secinaro et al., 2022a), New York (Shybalkina, 2022), Paris (Côme et al., 2023), Polish cities, including Wroclaw (Sroska et al., 2022), and Seoul (Shin, 2023), have adopted digital platforms to facilitate proposal submission and deliberation. These developments reflect a broader scholarly interest in the transformative potential of emerging technologies, such as big data and artificial intelligence (AI), in public service delivery and democratic processes (Alcaide–Muñoz et al., 2017; Costa et al., 2024; Rodríguez Bolívar and Alcaide Muñoz, 2022; Shin et al., 2024), particularly within the context of smart cities (Secinaro et al., 2022b).
Despite these advancements, questions remain regarding the effectiveness of digital technologies in participatory processes. While innovative tools can streamline administrative tasks, they also introduce challenges such as managing large volumes of data, ensuring effective governance, and addressing ethical concerns (Madsen et al., 2022; Peeters, 2023). Scholars have highlighted the risks of excluding marginalised groups and threats to data privacy, accountability, and transparency when using AI in democratic processes (de Bruijn et al., 2022; de Laat, 2018). Public authorities, already overwhelmed by the influx of citizen input through digital channels, face increasing administrative challenges in processing and responding to this data (Shin, 2023). Furthermore, the ecosystem of digital participatory tools reveals a systemic gap in linking citizen input with final decision-making, accountability mechanisms, or effective feedback loops (Shin et al., 2024).
These challenges can be metaphorically illustrated using a coffee dripper, symbolising the inherent structural constraints of a many-in-small-out dynamic. Although PB is designed to involve all citizens, often through digital tools, the sheer volume of citizen proposals far exceeds the available funding, significantly narrowing possibilities. Just as a dripper filters out most of the grounds, only a small fraction of proposals progress to the final decision-making stage. Original ideas are often merged or discarded during the filtering process, highlighting the need for approaches that support dialogic accounting (Grossi et al., 2021), deliberative quality (Shin et al., 2022), and collective intelligence (Rask and Shin, 2024) to address critical challenges within the throughput stage, where citizen inputs are transformed into budgetary decisions. While these approaches offer theoretical insights, recent literature reviews have noted that PB research remains predominantly qualitative, with limited application of digital technologies, such as machine learning (ML) and AI (Bartocci et al., 2023; Klimovský et al., 2024).
This exploratory study fills the gap by investigating how ML can alleviate administrative burdens in PB by automating labour-intensive processes, thereby supporting democratic decision-making. Administrative burden, characterised by onerous experiences required to navigate government processes (Burden et al., 2012; Herd and Moynihan, 2019; Peeters, 2023), is obvious in PB when formulating, categorising, and analysing large volumes of citizen inputs. The Seoul PB programme provides a compelling case for exploring the role of ML in enhancing public participation. As the most active PB initiative in South Korea, with over 25,000 proposals submitted between 2013 and 2021, it presents both challenges and opportunities for enhancing the processing of citizen input.
This study applies ML techniques to address two types of administrative burden: learning costs, which relate to understanding and categorising proposals, and compliance costs, which involve elaborating and evaluating them (Herd and Moynihan, 2019; Moynihan et al., 2015). Specifically, this article employs a sequential approach that integrates the combined topic model (Bianchi et al., 2021) and the random forest model (Breiman, 2001) to identify topical structures and predict which proposals are most likely to be selected. This article contributes to the growing body of research on digital innovations in public sector accounting and governance by exploring the following research questions:
How can ML techniques lower the learning costs involved in understanding and categorising citizen proposals?
How can ML streamline compliance costs in developing and evaluating citizen proposals?
The rest of this article is organised as follows. The next section reviews the literature on digital administrative burdens and participatory budgeting, elucidating the concepts of learning and compliance costs within this context. The third section presents a case study of the Seoul participatory budgeting initiative. The fourth section outlines the methodology employed for data collection and research design. Subsequently, the fifth section presents the study’s findings, which are then discussed in the subsequent section. Finally, the article concludes with a summary of key insights and implications.
2. Theoretical backgrounds
2.1 Administrative burden in the digital era
The administrative burden approach is a relatively recent field that focuses on the challenges individuals face when accessing public services and asserting their rights (Halling and Baekgaard, 2023; Herd and Moynihan, 2019; Peeters, 2020). Burden et al. (2012: 742) established the groundwork and defined it as “an individual’s experience of policy implementation as onerous”. When involved in policymaking or administrative activities, individuals may encounter burdensome situations due to complex procedures, overwhelming amounts of information, and high cognitive demands (Baekgaard and Tankink, 2022; Jakobsen et al., 2019; Moynihan et al., 2015).
It has been observed that administrative burden is a common occurrence in public administration due to its nature (Burden et al., 2012). As a result of its duty to execute policies, manage public resources, and address the community’s needs, public administration inevitably involves various administrative tasks that come with standard rules, procedural complexities, and demands. Therefore, scholars share a common ground that administrative burdens cannot be eliminated but could be reduced as much as possible or shifted to the state (Baekgaard and Tankink, 2022; Herd and Moynihan, 2019; Madsen et al., 2022).
In this regard, Moynihan et al. (2015) conceptualised administrative burden as costs, categorised into learning, compliance, and psychological costs. Learning costs are incurred when individuals spend time and effort understanding the policy programme, including its eligibility criteria, accessibility, and relevance. Psychological costs stem from stress and frustration experienced during interactions or the stigma associated with applying for unpopular programmes. Compliance costs, on the other hand, arise when individuals must follow administrative rules and requirements such as filling out forms, attending meetings, or providing documents. The extent of these costs depends on how individuals perceive their engagement and weigh the costs against the benefits (Moynihan et al., 2015). Marginalised groups with fewer resources and lower administrative capacity tend to experience higher administrative burdens (Chudnovsky and Peeters, 2022; Nisar, 2018).
Therefore, the distribution of administrative burdens within society is far from uniform, often arising from administrative and political choices (Herd and Moynihan, 2019). These burdens, demanding time, effort, and resources, play a pivotal role in shaping public trust in their government, consequently influencing their inclination to engage in civic participation (Peeters, 2020). While Burden et al. (2012) focused on the day-to-day experiences of public officials, recent studies have broadened their scope to address citizens’ experiences on the interface between citizens and the state (Halling and Baekgaard, 2023; Herd and Moynihan, 2019; Peeters, 2020). Heinrich (2016) provided additional insights by clarifying various interaction types and their associated burdens, considering the individual’s position within public administration (insider, comprising public and elected officials) or outside it (outsider, comprising citizens or NGOs), as well as their role as an initiator or a target of the interaction. The four types are:
Initiator (insider)—Target (insider): Both the initiator and the target are insiders of the public administration.
Initiator (outsider)—Target (insider): Citizens initiate interactions with targeted public officials, such as applying for welfare benefits.
Initiator (insider)—Target (outsider): Public officials initiate interactions with targeted citizens, such as for law enforcement.
Initiator (outsider)—Target (outsider): Citizens initiate interactions with other citizens, such as interactions or discussions within a programme.
For enhancing readability, this article employs the terms government-to-citizen (G2C), citizen-to-government (C2G), and citizen-to-citizen (C2C) interactions, as proposed by Linders (2012), to refer to the second, third, and fourth types of interactions. Heinrich (2016) noted that administrative burdens are primarily observed in the G2C and C2G interactions. For instance, previous studies have examined administrative burdens in the contexts of health services (Moynihan et al., 2015), applications for citizenship (Chudnovsky and Peeters, 2021), legal IDs (Nisar, 2018), unemployment (Mikkelsen et al., 2023), and welfare benefits (Heinrich, 2016), involving bureaucratic encounters between citizens and public officials.
Peeters (2023, p. 7) has contributed to the current debate by adding a new element of digital technology and participation—digital administrative burden: “digital bureaucratic encounters that individuals experience as onerous”. Peeters (2023) argued that advancements in digital technology and AI have altered the nature of interactions with an increasing trend toward automated administrative decision-making procedures, mass public data and Big Data analytics, resulting in a complex web of interactions among citizens, public officials, and machines.
Digital tools facilitate C2G interactions by leveraging e-campaigning and e-petitioning platforms to mobilise social actions in agenda-setting processes (Medaglia, 2012). Citizens have also gained increasing opportunities to voice their concerns, submit complaints, and propose their ideas via PB (Secinaro et al., 2022a; Shin et al., 2022), citizen feedback (Minelli and Ruffini, 2018), and e-voting (Zissis and Lekkas, 2011). Such platforms enable citizens to draw attention to overlooked issues and amplify their voices, fostering a more inclusive and participatory approach to addressing societal concerns. In return, governments are increasingly establishing G2C platforms that disseminate information and report through open data schemes (Purwanto et al., 2020; Secinaro et al., 2022b) and provide digitally enhanced services to streamline traditional bureaucratic encounters in face-to-face settings (Giest and Samuels, 2023). Citizens also engage in C2C co-production through crowdsourcing platforms (Zhao et al., 2023), comments (Shin et al., 2022), debates (Grossi et al., 2021), and legal drafting (Toots, 2019).
Recent advancements in digitalisation within public sectors suggest great potential for resource efficiency and improved access to public services. Moreover, conceptualising digital administrative burden as a specific application of ML and AI offers a valuable analytical framework for defining their supportive roles in human-centred public decision-making while mitigating the risks associated with algorithmic governance. Nevertheless, recent studies indicate that individuals encounter new administrative burdens in automated or data-assisted decision-making processes and digital interactions without proper face-to-face support (Madsen et al., 2022; Peeters, 2023). The ongoing discourse highlights the importance of understanding the interplay between administrative burdens and digital advancements. Despite this, a notable lack of empirical research remains in exploring how digital solutions can effectively reduce administrative burdens (Costa et al., 2024).
2.2 Participatory budgeting and citizen proposals
As the name suggests, participatory budgeting refers to processes that involve citizens in shaping priorities or allocating public finances (Bartocci et al., 2023; Sintomer et al., 2012b). It is worth noting that citizens also participated in the traditional budgeting processes by consulting (e.g., public hearings) or monitoring how budgets were spent after decisions were made. However, PB is distinct from these practices as citizens participate in budgetary processes (neither after nor external) that occur iteratively at the municipal level via public deliberation and receive feedback on how decisions are made (Sintomer et al., 2012b).
The global diffusion of PB over the past 3 decades has inevitably led to the emergence of heterogeneous PB-like programmes worldwide (Touchton et al., 2023a). Despite these variations, PB follows five fundamental steps (Touchton et al., 2023b):
Step 1 (Information): Citizens develop and submit proposals based on their knowledge of local needs and solutions.
Step 2 (Deliberation): Citizens advocate for their proposals to be developed into projects that align with community needs.
Step 3 (Project selection): Citizens collectively select appropriate projects (e.g., popular vote or citizen committee).
Step 4 (Oversight): Governments implement projects monitored by citizens.
Step 5 (Outcomes): PB projects impact the well-being of a community, and the citizens’ feedback influences the next iteration.
A PB proposal is a structured document that presents an idea or plan for others to review and decide upon. Proposals are critical throughout the PB process, from initial expression of community needs to policy implementation. In the early stages, proposals allow citizens to articulate their community’s needs, goals, and solutions. During the decision-making phase, proposals serve as a foundation for public deliberation and decision-making as citizens review their contents. Finally, during the implementation stage, proposals are used to monitor and evaluate how the policy is realised. In each round, citizens submit hundreds to thousands of proposals across different cities (Bua and Bussu, 2021; Côme et al., 2023; Shin et al., 2022; Shybalkina, 2022; Sroska et al., 2022; Touchton et al., 2019).
However, processing citizen proposals poses significant challenges for citizens and public officials, particularly during the throughput stage, where citizen inputs are transformed into final decisions. Table 1 summarises the learning and compliance costs associated with PB. In the digital context, learning costs arise when individuals must extract structured information from extensive data collections (Peeters, 2023). Public officials are responsible for summarising the content of submitted proposals and categorising relevant themes to support deliberation. Grossi et al. (2021) emphasise the importance of mapping citizens’ diverse perspectives to foster democratic debate, enabling individuals to consider opposing viewpoints from various groups — a fundamental principle of dialogic accounting. For citizens, learning costs involve understanding the benefits and requirements of proposal submission and effectively navigating through proposals to participate in deliberation and voting processes. This information supports the “bundling” of similar proposals into fewer, actionable plans (Hagelskamp et al., 2023). For instance, the OmaStadi PB initiative in Helsinki employed a bundling strategy to reduce approximately 1,500 proposals to a final list of a few hundred for popular voting (Shin et al., 2022).
Administrative burdens regarding citizen proposals in participatory budgeting
| Citizens | Public officials |
|---|---|
| Learning costs | |
|
|
| Compliance costs | |
|
|
| Citizens | Public officials |
|---|---|
| Learning costs | |
Identify the benefits and requirements for submitting proposals Navigate proposals to participate in deliberation and voting | Summarise the contents and themes of submitted proposals to support deliberation |
| Compliance costs | |
Develop ideas into proposals to participate in PB Elaborate or synthesise proposals to prepare for the voting process | Screen submitted proposals for eligibility, legality, and redundancy Consult citizens to improve the quality of proposals |
Note(s): This table was inspired by Madsen et al.’s (2022: 101653) work, which summarises administrative burdens associated with digital self-service
Compliance costs require citizens to develop their ideas into proposals within specific guidelines. Once submitted, public officials are tasked with reviewing numerous proposals and filtering them within a short timeframe before the deliberation process (Côme et al., 2023; McNulty, 2020; Shin, 2023; Shin et al., 2022). When a proposal is rejected or deemed ineligible, officials are expected to provide clear explanations for this decision. Initial drafts often require further refinement through consultation to ensure clear and effective communication with stakeholders. However, a significant challenge arises as most citizens, constrained by time, participate primarily through voting and base their choices largely on the content of the proposals. As a result, proposals that are insufficiently developed, despite their potential, may fail to be selected.
Previous studies have identified various strategies for garnering public support and votes, such as enhancing proposal content (Secinaro et al., 2022a), bundling similar proposals (Hagelskamp et al., 2023), and conducting intense campaigns (Rask et al., 2021). While these tactics can be effective, they often exacerbate inequality by favouring a few organised interests with experience in proposal production or access to resources, while disadvantaging those lacking technical skills but offering valuable community insights. Such disparities undermine political equality, as PB should not simply function as a writing or campaigning competition.
3. Case: participatory budgeting in Seoul
PB was introduced in Korea in 2001 through social campaigns led by civil society organisations, reflecting broader efforts to deepen democratic practices during the country’s post-authoritarian transition (Shin, 2025). A decade later, PB was mandated by national law in 2011, resulting in its operation across all 243 municipalities. Seoul launched its PB initiative in 2012 by establishing an ordinance, becoming the most active municipality in Korea regarding budget ceilings and project scope (Seoul Metropolitan Government, 2022).
The adoption of PB in Korea aligns with a broader regional pattern, where the spread of PB was tied to transitions to democratic rule in the late 20th century (Sintomer et al., 2012a). Asian countries, including Korea, Indonesia, and the Philippines, embraced PB as part of their democratisation efforts, influenced by pioneering cases in Brazil (Touchton et al., 2023a). Seoul’s PB model reflects these global trends while adapting them to its local governance structures, as its implementation has undergone a series of adjustments and refinements, reflected in the ordinance amendment on 13 occasions between 2012 and 2023. Despite these revisions, the fundamental framework of the programme has remained consistent, predominantly operating at the city and district levels (No and Hsueh, 2022). At the city level, citizens propose initiatives that span multiple districts to enhance Seoul’s overall well-being. Conversely, at the district (gu in Korean) level, citizens submit proposals tailored to their specific districts or neighbourhoods.
The PB process in Seoul involves distinct stages that generate learning costs for citizens and public officials. Between January and April, citizens are required to submit proposals within a designated one-month submission period. In 2013, residents submitted 1,460 proposals, which grew significantly to a peak of 3,824 proposals in 2016 (Shin, 2023). City-wide proposals undergo a deliberation process involving committee members comprising 120 citizens and public officials (Seoul Metropolitan Government, 2023). This co-development stage, held between May and July, leads into the voting stage in August. During this phase, participants face learning costs as they categorise thousands of proposals and refine a final list of candidates for voting.
Seoul’s PB process also incurs high compliance costs. After the submission stage, public officials screen proposals in April and May to assign them to relevant departments and categorise them as city- or district-level proposals while ensuring their eligibility and legality (Seoul Metropolitan Government, 2023). Citizen proposals often have low content quality or errors, requiring substantial processing effort. To address this, the Seoul government offers the “PB School”, a programme to enhance citizens’ capacity to engage effectively in PB (Seoul Metropolitan Government, 2023). The programme includes training on the historical background of PB, public budgeting principles, practical tips for proposal preparation, and guidelines for committee participation (Seoul Metropolitan Government, 2023). In 2021, the PB School conducted 1,649 sessions nationwide, representing a 46.8% increase from the previous year, which demonstrates its established role in addressing compliance costs (Ministry of the Interior and Safety, 2023).
The Seoul PB case illustrates how the Seoul government addresses the learning costs of proposal processing through manual and collective efforts, while offering structured training programmes to mitigate compliance costs. Notably, after a decade of operation under a liberal mayor, the Seoul government underwent significant revisions to its PB programme in 2022 under the new conservative mayor, Se-hoon Oh. This revision narrowed the programme’s focus to specific themes, including transportation, health, and the environment (Shin, 2023). Within the context, this article focused on the period between 2013 and 2021, as 2012 was considered a pilot phase.
4. Data and methods
4.1 Machine learning approach
This article examines the potential of ML to address these learning and compliance costs by integrating unsupervised and supervised approaches. ML is a subfield of AI that focuses on building systems that improve automatically through experience, learning from data to identify patterns and make decisions under uncertainty (Jordan and Mitchell, 2015). It has demonstrated practical applications across various domains, including computer vision, speech recognition, and robot control, and is increasingly applied in public administration (Anastasopoulos and Whitford, 2019). While ML centres on data-driven learning, AI encompasses a broader field that includes simulating human intelligence through language interpretation and generation, reasoning, and problem-solving. In this sense, AI systems are often empowered by ML techniques. Nevertheless, the terms ML and AI are frequently used interchangeably in public administration to refer to digital technologies for managing information overload and supporting decision-making (Arana-Catania et al., 2021; Costa et al., 2024; Rossello et al., 2025; Shin et al., 2024).
ML methods are broadly categorised into unsupervised and supervised learning (Anastasopoulos and Whitford, 2019). Unsupervised learning identifies patterns or clusters in data without human guidance, with applications such as topic modelling, clustering analysis, and dimensionality reduction. In contrast, supervised learning requires labelled data to train algorithms that map inputs to target values, enabling accurate predictions or classifications for new data.
A notable trend in ML research is the integration of unsupervised and supervised learning approaches (Maturo and Verde, 2022; Talukdar and Biswas, 2024). For example, Rijcken et al. (2022) compared 17 topic models in predicting inpatient violence using logistic regression classifiers on clinical notes. Chiu et al. (2022) and Geletta et al. (2019) used topic model outputs (document-topic probabilities) as predictors for ICU patient mortality and clinical trial termination, respectively. Ugochi et al. (2022) employed a topic model to identify themes in Twitter data, classifying sentiments as positive, neutral, negative, and vandalism. Similarly, Putranto et al. (2021) utilised a topic model to extract themes from hotel customer reviews and subsequently employed these themes as predictors for hotel ratings. The common thread across these studies is the application of unsupervised topic models to uncover textual data structures, which are then integrated with structured data as predictors in supervised learning frameworks. This article also adopts this two-stage approach, as illustrated in Figure 1. While this section outlines the conceptual research flow, Appendix provides the technical details.
The flowchart shows the methodological pipeline in three stages: data cleaning, unsupervised phase, and supervised phase. The process begins with proposal data (n = 21,833) drawn from an original set of 25,415 proposals collected between 2013 and 2021, after removing 3,582 duplicates. Pre-processing follows, involving tokenisation with the KoNLPy/Okt tagger and removal of numbers and stop words. In the unsupervised phase, the processed text is used to build a combined topic model with 17 topics selected based on coherence and diversity metrics. Document-level topic probabilities are then generated and combined with metadata variables, including 26 districts, proposal duration, budget, and number of words. This produces a dataset with 46 features (17 topics and 29 metadata variables) and one target variable (voted or non-voted). After excluding 9,312 district-level proposals and imputing missing values, the dataset contains 12,521 proposals. In the supervised phase, the data is split into training (80 percent) and testing (20 percent) subsets. The training set includes 758 voted and 9,258 non-voted proposals, while the testing set contains 190 voted and 2,315 non-voted proposals. Model building uses SMOTE to balance the classes and applies a Random Forest classifier with hyperparameter tuning through random search and 5-fold cross-validation. Model evaluation is carried out using ROC curves, AUC, and feature importance. The flow moves from top to bottom, tracing the process from raw proposals through data preparation to predictive model evaluation.Research flow chart. Author’s own work
The flowchart shows the methodological pipeline in three stages: data cleaning, unsupervised phase, and supervised phase. The process begins with proposal data (n = 21,833) drawn from an original set of 25,415 proposals collected between 2013 and 2021, after removing 3,582 duplicates. Pre-processing follows, involving tokenisation with the KoNLPy/Okt tagger and removal of numbers and stop words. In the unsupervised phase, the processed text is used to build a combined topic model with 17 topics selected based on coherence and diversity metrics. Document-level topic probabilities are then generated and combined with metadata variables, including 26 districts, proposal duration, budget, and number of words. This produces a dataset with 46 features (17 topics and 29 metadata variables) and one target variable (voted or non-voted). After excluding 9,312 district-level proposals and imputing missing values, the dataset contains 12,521 proposals. In the supervised phase, the data is split into training (80 percent) and testing (20 percent) subsets. The training set includes 758 voted and 9,258 non-voted proposals, while the testing set contains 190 voted and 2,315 non-voted proposals. Model building uses SMOTE to balance the classes and applies a Random Forest classifier with hyperparameter tuning through random search and 5-fold cross-validation. Model evaluation is carried out using ROC curves, AUC, and feature importance. The flow moves from top to bottom, tracing the process from raw proposals through data preparation to predictive model evaluation.Research flow chart. Author’s own work
4.2 Data collection and cleaning
The primary dataset comprises all the proposals submitted by citizens to Seoul PB between 2013 and 2021. To obtain the proposal data up to 2019, an application programming interface from the Open Government Seoul Portal (https://opengov.seoul.go.kr/data/19195057) was used. Additionally, the data from 2020 to 2021 were collected using a custom scraper from the Seoul PB website (https://yesan.seoul.go.kr/). Table 2 presents a sample of the collected proposals, including proposal texts (aim, description, and effects) with associated metadata such as proposal periods (e.g. 365 days), estimated costs (e.g. 4,000 thousand Korean Won), and target area (e.g. Jung-gu/there are 25 districts in Seoul). Each proposal was also assigned a value based on whether it was selected. In total, 25,415 proposals were collected.
A sample of citizen proposals (title: road name information sign replacement project)
| Proposal attribute | Content |
|---|---|
| Proposer | Anonymised |
| Proposal date | 2019–03–25 |
| Category | Transportation |
| Receipt no. | 06867 |
| Period | 2020–01–01 – 2020–12–31 |
| Cost | 50,000 thousand Korean Won |
| Area | Jung-gu |
| Aim | Enhance the convenience of residents by replacing road name information signs with more accurate ones, which will provide clear directions to road users and help them identify the correct road name address |
| Description | Replace the location information sign installed in the building with the road name information sign Changing the entire area of Jung-gu requires an excessive budget, so replace road signs on Namdaemun-ro, which many drivers use on a trial basis |
| Effects | Promote the convenience of residents by establishing a road name address system Rapid response in the event of a disaster, such as fire and security |
| Proposal attribute | Content |
|---|---|
| Proposer | Anonymised |
| Proposal date | 2019–03–25 |
| Category | Transportation |
| Receipt no. | 06867 |
| Period | 2020–01–01 – 2020–12–31 |
| Cost | 50,000 thousand Korean Won |
| Area | Jung-gu |
| Aim | Enhance the convenience of residents by replacing road name information signs with more accurate ones, which will provide clear directions to road users and help them identify the correct road name address |
| Description | Replace the location information sign installed in the building with the road name information sign |
| Effects | Promote the convenience of residents by establishing a road name address system |
After removing duplicated data, the final dataset consisted of 21,833 proposals. The text data were pre-processed to remove punctuation, symbols, numbers, stopwords, and URLs using KoNLPy (Park and Cho, 2014), a widely applied Python package for tokenisation and stemming Korean texts. The model outputs in the Korean language were translated into English by the author. For numerical data, errors and missing values were corrected, with missing period values imputed by averaging non-missing values.
4.3 Unsupervised phase: topic modelling
After the data cleaning, this study addresses learning costs by employing unsupervised learning to identify latent topics within proposal texts. These texts, including the aim, description, and effects, were incorporated into the model to uncover key themes underlying the corpus (see Table 2). A combined topic model (Bianchi et al., 2021) was employed, overcoming the limitations of traditional Bag-of-Words (BoW) models by integrating Sentence-BERT (SBERT) embeddings, which capture semantic meanings more effectively. The models were trained with various topic numbers, and the optimal number — 17 topics — was chosen using two metrics: NPMI-Coherence and IRBO-Diversity (Terragni and Fersini, 2021). The resulting topic proportions were combined with additional metadata, such as proposal duration, budget, district, and word count, to create a 46-feature dataset. This comprehensive feature set was then used as input for the supervised learning phase, providing a foundation for further predictive analysis.
4.4 Supervised phase: random forest
The supervised learning phase aims to address compliance costs by predicting which proposals are most likely to be selected. Using the 46-feature dataset, a predictive model was trained to provide immediate feedback to citizens upon submission of their proposals. This feedback allows citizens to refine and elaborate their proposals through the PB School programme and the formal deliberation stages. The binary classification task involved training and testing sets, with an 80–20 split of the proposals. Due to the imbalanced dataset, a stratification technique and SMOTE (Synthetic Minority Over-sampling Technique) (Chawla et al., 2002) were employed to balance the class distribution. Following the approach of Chiu et al. (2022), five classification algorithms were evaluated using the AUROC metric (Probst et al., 2019): decision tree, known as Classification and Regression Tree (CART) (Lewis, 2000), logistic regression, gradient boosting (Bentéjac et al., 2021), support vector machine (Osisanwo et al., 2017), and random forest (RF) (Breiman, 2001). Among these, Random Forest demonstrated the best predictive performance for this task (see Figure 6).
The Random Forest model was implemented using Python’s scikit-learn library, leveraging an ensemble of multiple uncorrelated decision trees to improve prediction accuracy. Hyperparameter tuning was conducted through a randomised search with 5-fold cross-validation, and the model’s performance was assessed based on AUROC metrics. Additionally, feature importance and partial dependence plots provided insights into the contributions and relationships of various features in predicting proposal selection.
5. Results
This section presents the study’s results, organised into three parts to reflect the research framework, as illustrated in Figure 1. First, exploratory data analysis provides an overview of the proposal dataset, including categorical distributions and submission patterns. Second, topic modelling addresses learning costs by identifying latent themes within the proposal texts, offering insights into the underlying topical structure.
Thirdly, predictive modelling focuses on compliance costs, utilising supervised learning to predict proposal selection outcomes before decision-making processes.
5.1 Exploratory data analysis
When submitting proposals, citizens are required to provide detailed information, including the budget, category, duration and target area, as outlined in Table 2. They must select the most appropriate category for their proposal from predefined options: environment, welfare and health, culture, sports and tourism, transportation, economy and job, women and education, safety, governance, and housing. Figure 2 shows the distribution of proposals across categories, showing that the environment category (23.8%) was the most common, followed by welfare and health (15.8%) and culture, sports, and tourism (12.7%). The preference for environmental topics in PB has also been observed in other cities, including Helsinki (Rask et al., 2021), Lisbon (Falanga et al., 2021), and Madrid (Secinaro et al., 2022a).
The horizontal axis is labeled “Count” and ranges from 0 to 6000 in increments of 1000 units. The vertical axis is labeled “Proposal categories.” The bars are ordered from largest to smallest. The bar for “Environment” is at the top, colored dark green, with a count of approximately 6000, representing 23.8 percent of proposals. The next bar, “Welfare and health,” is a medium green, with a count of approximately 4000, representing 15.8 percent. The remaining bars, in a lighter green, are: Culture, sports, tourism: 3200: 12.7 percent. Transportation: 2750: 10.8 percent. Economy and job: 2200: 8.7 percent. Women and education: 2100: 8.3 percent. Safety: 2800: 8.3 percent. Governance: 7.1 percent. Housing: 1200: 4.6 percent. Note: All numerical values are approximated.Distribution of citizens’ registered topics. Author’s own work
The horizontal axis is labeled “Count” and ranges from 0 to 6000 in increments of 1000 units. The vertical axis is labeled “Proposal categories.” The bars are ordered from largest to smallest. The bar for “Environment” is at the top, colored dark green, with a count of approximately 6000, representing 23.8 percent of proposals. The next bar, “Welfare and health,” is a medium green, with a count of approximately 4000, representing 15.8 percent. The remaining bars, in a lighter green, are: Culture, sports, tourism: 3200: 12.7 percent. Transportation: 2750: 10.8 percent. Economy and job: 2200: 8.7 percent. Women and education: 2100: 8.3 percent. Safety: 2800: 8.3 percent. Governance: 7.1 percent. Housing: 1200: 4.6 percent. Note: All numerical values are approximated.Distribution of citizens’ registered topics. Author’s own work
The classification plays a crucial role in systematically organising and bundling proposals to determine the number of votes allocated to each category. However, this approach often relies on human judgement, which may result in inconsistent criteria. For instance, a proposal to remove graffiti could be categorised under the environment by some but classified as a safety issue by others. Topic modelling offers an alternative approach by identifying latent co-occurring word patterns, or topics, within a corpus, expressed in varying proportions (e.g., Proposal A consists of 25% environment, 13% safety, and 8% welfare topics). This approach provides a systematic and data-driven overview of the topical structure, reducing reliance on subjective judgements (Schmiedel et al., 2019).
Figure 3 presents histograms illustrating the distribution of project budgets, durations, and word counts in proposals, along with a Pearson correlation matrix. A notable proportion of proposals requested budgets of 0.1 billion Korean Won (70,000 euros; 12.7% of proposals), 0.2 billion Won (140,000 euros; 8.1%), and 0.3 billion Won (209,000 euros; 7.6%), resulting in a right-skewed distribution with an average budget of 0.32 billion Won. Similarly, project durations clustered around 364 days (30% of proposals) and 365 days (12.8%), with an average duration of 301 days. These patterns suggest that citizens often rely on rough estimates for project budgets and durations, irrespective of project scale, likely due to limited access to expert guidance.
Top row: The vertical axis shows four markings from bottom to top: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. Left plot: Requested budget: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. A histogram of “Requested budget,” showing a highly right-skewed distribution. Most projects have a small requested budget, with a long tail extending to very high values. A rug plot along the horizontal axis indicates individual data points. Middle plot: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. It shows the correlation between the requested budget and project duration. It has a value of 0.045 with a pair of asterisks (significant correlation). Right plot: The horizontal axis ranges from 0 to 1500 in increments of 500 units. It shows the correlation between the requested budget and the number of words in proposals. The correlation is negative, with a coefficient of negative 0.0031. Middle row: The vertical axis ranges from 0 to 4000 in increments of 1000 units. Left plot: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. The scatterplot shows the relationship between “Project duration” and “Requested budget.” A red line indicates a trend. The plot is a mirror image of the middle plot in the top row. Middle plot: Project duration: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. A histogram of “Project duration,” displaying a highly right-skewed distribution. Most projects have a short duration, with a long tail extending to longer periods. A rug plot along the horizontal axis indicates individual data points. Right plot: The horizontal axis ranges from 0 to 1500 in increments of 500 units. It shows the correlation between project duration and the number of words in proposals, with a coefficient of 0.091 and three asterisks (significant correlation). Bottom row: The vertical axis ranges from 0 to 1500 in increments of 500 units. Left plot: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. The scatterplot shows the relationship between “Number of words in proposals” and “Requested budget.” A red line indicates a trend. The plot is a mirror image of the right plot in the top row. Middle plot: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. The scatterplot depicts the relationship between “Number of words in proposals” and “Project duration.” A red line indicates a trend. The plot is a mirror image of the right plot in the middle row. Right plot: Words in proposals: The horizontal axis ranges from 0 to 1500 in increments of 500 units. A histogram and density plot show a skewed distribution, indicating most proposals have a low word count. The scatter plots in the lower panels visually represent these relationships, showing a wide, scattered distribution of data points with no clear linear trend. The “three asterisk” symbols in the top middle and middle right panels indicate the statistical significance of the correlations.Histogram and correlation matrix of features. Author’s own work
Top row: The vertical axis shows four markings from bottom to top: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. Left plot: Requested budget: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. A histogram of “Requested budget,” showing a highly right-skewed distribution. Most projects have a small requested budget, with a long tail extending to very high values. A rug plot along the horizontal axis indicates individual data points. Middle plot: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. It shows the correlation between the requested budget and project duration. It has a value of 0.045 with a pair of asterisks (significant correlation). Right plot: The horizontal axis ranges from 0 to 1500 in increments of 500 units. It shows the correlation between the requested budget and the number of words in proposals. The correlation is negative, with a coefficient of negative 0.0031. Middle row: The vertical axis ranges from 0 to 4000 in increments of 1000 units. Left plot: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. The scatterplot shows the relationship between “Project duration” and “Requested budget.” A red line indicates a trend. The plot is a mirror image of the middle plot in the top row. Middle plot: Project duration: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. A histogram of “Project duration,” displaying a highly right-skewed distribution. Most projects have a short duration, with a long tail extending to longer periods. A rug plot along the horizontal axis indicates individual data points. Right plot: The horizontal axis ranges from 0 to 1500 in increments of 500 units. It shows the correlation between project duration and the number of words in proposals, with a coefficient of 0.091 and three asterisks (significant correlation). Bottom row: The vertical axis ranges from 0 to 1500 in increments of 500 units. Left plot: The horizontal axis shows four markings from left to right: 0.0 e plus 00, 1.0 e plus 08, 2.0 e plus 08, and 3.0 e plus 08. The scatterplot shows the relationship between “Number of words in proposals” and “Requested budget.” A red line indicates a trend. The plot is a mirror image of the right plot in the top row. Middle plot: The horizontal axis ranges from 0 to 4000 in increments of 1000 units. The scatterplot depicts the relationship between “Number of words in proposals” and “Project duration.” A red line indicates a trend. The plot is a mirror image of the right plot in the middle row. Right plot: Words in proposals: The horizontal axis ranges from 0 to 1500 in increments of 500 units. A histogram and density plot show a skewed distribution, indicating most proposals have a low word count. The scatter plots in the lower panels visually represent these relationships, showing a wide, scattered distribution of data points with no clear linear trend. The “three asterisk” symbols in the top middle and middle right panels indicate the statistical significance of the correlations.Histogram and correlation matrix of features. Author’s own work
The average length of pre-processed words in the proposals was 128, nearly three times longer than the sample proposal in Table 2, which contained 57 words. The histograms in Figure 3 show a distribution skewed towards shorter proposals, though a notable proportion extends beyond 100 words. The Pearson correlation matrix reveals a positive relationship between project duration and the requested budget (r = 0.045) and word count (r = 0.091). This suggests that citizens tend to request larger budgets and provide more detailed information for longer projects, as reflected in the higher word counts and budget sizes associated with extended project durations.
Only 7.6% of the city-wide proposals submitted to Seoul’s PB programme were selected during the investigation period (2013–2021). To provide further context, Figure 4 illustrates the percentage of proposals voted for across the city’s 25 districts. This percentage was calculated by dividing the number of proposals voted for by the total number submitted, as citizens could select multiple districts when submitting a city-wide proposal. A clear pattern emerges: peripheral districts show higher percentages of proposals voted for, with the exception of affluent districts in the city’s southeastern area. This trend reflects Seoul PB’s redistributive orientation, which prioritises economically disadvantaged districts (No and Hsueh, 2022).
The map uses a green color scale to represent the percentage of a population metric, with lighter shades indicating a higher percentage and darker shades indicating a lower percentage. The scale ranges from 5 to 9 in increments of 1 unit. The map shows the following: Lighter-colored districts (for example, Nowon-gu, Dobong-gu, Jungnang-gu, Gangseo-gu, and Mapo-gu) have a higher percentage, likely in the 8 percent to 9 percent range, as indicated by the legend. Medium-colored districts (for example, Seongbuk-gu, Gwanak-gu, Gwangjin-gu, Jung-gu, and Geumcheon-gu) have an intermediate percentage, likely in the 6 percent to 8 percent range. Darker-colored districts are in eastern, northern, and some central regions.The percentage of proposals voted for by districts. Note: District labels are displayed if the percentage is over 8%. Author’s own work
The map uses a green color scale to represent the percentage of a population metric, with lighter shades indicating a higher percentage and darker shades indicating a lower percentage. The scale ranges from 5 to 9 in increments of 1 unit. The map shows the following: Lighter-colored districts (for example, Nowon-gu, Dobong-gu, Jungnang-gu, Gangseo-gu, and Mapo-gu) have a higher percentage, likely in the 8 percent to 9 percent range, as indicated by the legend. Medium-colored districts (for example, Seongbuk-gu, Gwanak-gu, Gwangjin-gu, Jung-gu, and Geumcheon-gu) have an intermediate percentage, likely in the 6 percent to 8 percent range. Darker-colored districts are in eastern, northern, and some central regions.The percentage of proposals voted for by districts. Note: District labels are displayed if the percentage is over 8%. Author’s own work
The explorative data analysis highlights the importance of incorporating proposal-level variables, including the requested budget, project duration, word count, and 26 target areas, including the option to select “Seoul” for proposals addressing the entire city, as features for predicting proposal selection.
5.2 Topic modelling
The fundamental step in topic modelling is to set the number of topics, denoted as the hyperparameter K (see Appendix). In Figure 5, NPMI-Coherence and IRBO-Diversity scores are plotted for different topic numbers, with higher scores indicating better model quality (Bianchi et al., 2021). These metrics were calculated five times for each topic number. Light blue dots represent each score, and blue lines represent mean values. The scores suggest that the optimal range of topic numbers lies between 17 and 20. After manual inspection and considering their semantic interpretability, the 17-topic model was selected.
The figure contains two line graphs with the horizontal axis labeled Topic Numbers, ranging from 0 to 100. The top graph shows NPMI-Coherence scores by topic numbers. The vertical axis is Coherence Score, ranging from −0.05 to 0.10. The line rises steeply at first, then gradually levels off, with topic number 17 highlighted as an optimal point where the coherence score is relatively high. The bottom graph shows IRBO-Diversity scores by topic numbers. The vertical axis is Topic Diversity Score, ranging from 0.65 to 1.00. The line shows a clear downward trend, with higher diversity for fewer topics and lower diversity as the number of topics increases. Topic number 17 is also highlighted here, marking where the diversity score begins to drop more sharply. Note: All numerical values are approximated.Topic coherence and diversity scores by topic numbers. Note: Light blue points mark the scores of each iteration, and the blue line graph shows their mean values at different topic numbers. Author’s own work
The figure contains two line graphs with the horizontal axis labeled Topic Numbers, ranging from 0 to 100. The top graph shows NPMI-Coherence scores by topic numbers. The vertical axis is Coherence Score, ranging from −0.05 to 0.10. The line rises steeply at first, then gradually levels off, with topic number 17 highlighted as an optimal point where the coherence score is relatively high. The bottom graph shows IRBO-Diversity scores by topic numbers. The vertical axis is Topic Diversity Score, ranging from 0.65 to 1.00. The line shows a clear downward trend, with higher diversity for fewer topics and lower diversity as the number of topics increases. Topic number 17 is also highlighted here, marking where the diversity score begins to drop more sharply. Note: All numerical values are approximated.Topic coherence and diversity scores by topic numbers. Note: Light blue points mark the scores of each iteration, and the blue line graph shows their mean values at different topic numbers. Author’s own work
Table 3 presents the 17-topic model results, providing an overview of significant bottom-up aspects reflected in citizen proposals. These topics can be grouped into two clusters based on their content. The first cluster comprises nine topics related to urban infrastructure and public facilities, including Traffic facility (Topic 1), Green space (3), Transportation facility (5), Park and playground (6), Sidewalk (7), Walking trail (8), Crime prevention (10), Community space (14), and Public facility (16). These topics focus on the physical components of urban development, including transportation systems, pedestrian pathways, recreational areas, and crime prevention measures. This result indicates residents’ priorities for enhancing neighbourhoods’ functionality, safety, and attractiveness, contributing to more sustainable and enjoyable living environments.
17-topic model results
| ID | Topic labels | Six keywords | % |
|---|---|---|---|
| 1 | Traffic facility | traffic accident, vehicle, crosswalk, driver, traffic light | 6.4 |
| 2 | Tourism and culture | tourism, tourist, tradition, market, history | 6.3 |
| 3 | Green space | tree, beautiful, plant, scenery, flower road | 6.2 |
| 4 | Family programme | family, multicultural, home, school, youth | 6.1 |
| 5 | Transportation facility | bus, subway, bus stop, public transportation, station | 6.1 |
| 6 | Park and playground | park, playground, play, children, equipment | 6.1 |
| 7 | Sidewalk | sidewalk, maintenance, walk, damage, boundary | 6.1 |
| 8 | Walking trail | bicycle, trail, use, Han River, park | 6 |
| 9 | Eldercare programme | elderly, service, class, income, health | 6 |
| 10 | Crime prevention | alley, crime, relief, installation, crime prevention | 5.9 |
| 11 | Collaboration | manual, governance, channel, solidarity, draw | 5.8 |
| 12 | Job and business | job, youth, employment, support, company | 5.7 |
| 13 | Waste management | garbage, fine dust, collection, smoking, recycling | 5.7 |
| 14 | Community space | resident, village, area, community, space | 5.6 |
| 15 | Community programme | culture, area, art, youth, space | 5.5 |
| 16 | Public facility | use, library, facility, sports, replacement | 5.4 |
| 17 | Education programme | education, project, about, disabled, Seoul | 5.2 |
| ID | Topic labels | Six keywords | % |
|---|---|---|---|
| 1 | Traffic facility | traffic accident, vehicle, crosswalk, driver, traffic light | 6.4 |
| 2 | Tourism and culture | tourism, tourist, tradition, market, history | 6.3 |
| 3 | Green space | tree, beautiful, plant, scenery, flower road | 6.2 |
| 4 | Family programme | family, multicultural, home, school, youth | 6.1 |
| 5 | Transportation facility | bus, subway, bus stop, public transportation, station | 6.1 |
| 6 | Park and playground | park, playground, play, children, equipment | 6.1 |
| 7 | Sidewalk | sidewalk, maintenance, walk, damage, boundary | 6.1 |
| 8 | Walking trail | bicycle, trail, use, Han River, park | 6 |
| 9 | Eldercare programme | elderly, service, class, income, health | 6 |
| 10 | Crime prevention | alley, crime, relief, installation, crime prevention | 5.9 |
| 11 | Collaboration | manual, governance, channel, solidarity, draw | 5.8 |
| 12 | Job and business | job, youth, employment, support, company | 5.7 |
| 13 | Waste management | garbage, fine dust, collection, smoking, recycling | 5.7 |
| 14 | Community space | resident, village, area, community, space | 5.6 |
| 15 | Community programme | culture, area, art, youth, space | 5.5 |
| 16 | Public facility | use, library, facility, sports, replacement | 5.4 |
| 17 | Education programme | education, project, about, disabled, Seoul | 5.2 |
The second cluster consists of eight topics revolving around social and welfare programmes: Tourism and culture (Topic 2), Family programme (4), Eldercare programme (9), Collaboration (11), Job and business (12), Waste management (13), Community programme (15), and Education programme (17). These topics address social challenges and promote community well-being through diverse initiatives. This result suggests that residents value solutions to strengthen community cohesion, address social issues, and build resilient societies with accessible public support and resources.
Table 3 organises topics by their probabilities, calculated by summing each topic’s probabilities across the text corpus. Among the prominent topics, those related to urban mobility, such as Traffic facilities (Topic 1), Transportation facilities (5), and Sidewalks (7), were particularly notable. This focus is likely influenced by Seoul’s population of ten million and its significant traffic volume. According to the OECD (2021, p. 82), Korea recorded the longest average commuting time among 34 countries, exceeding 50 min daily. Seoul, in particular, reported the longest commute time nationwide, with an average of 1.5 h (Statistics Korea, 2020). Citizens also expressed considerable environmental concerns, as reflected in topics such as Green space (Topic 3), Parks and playgrounds (5), Walking trails (8), and Waste management (13). Interestingly, the Madrid PB case also revealed a preference for environmental and mobility-related topics (Secinaro et al., 2022a), highlighting common citizen priorities across different urban contexts.
After completing model training, extracting associated topics for each proposal becomes possible, as shown in Table 4. For instance, the proposal to improve a public facility for small business owners (first row of Table 4) was initially categorised by the proposer under the Economy and job category. While this categorisation aligns with the target group (business owners), the proposal primarily concerns renovation. This example highlights a common challenge of manual and subjective categorisation, as many proposals encompass overlapping topics in varying proportions, making it difficult to assign a single category. In contrast, the topic model identifies the predominant topics for this proposal as Topic 12 (Job and business: 40.9%) and Topic 16 (Public facility: 8%), providing a more nuanced and data-driven representation of its content.
Topics assigned to proposal samples
| Proposal title | Category by citizens | Topic model results | |
|---|---|---|---|
| Most common | Second most | ||
| Please improve the small and medium business promotion centre for small business owners | Economy and job | 12. Job and business: 40.9% | 16. Public facility: 8% |
| Installation of secure delivery box at subway station | Women and education | 10. Crime prevention: 43.8% | 16. Public facility: 8.2% |
| It’s scary in the evening! Installing streetlights and CCTV in the forest eco-playground | Environment | 10. Crime prevention: 36.7% | 6. Park and playground: 10.5% |
| Removing fine dust from bus stops: smart bus stop | Transportation | 5. Transportation facility: 80.6% | 13. Waste management: 3.2% |
| Proposal title | Category by citizens | Topic model results | |
|---|---|---|---|
| Most common | Second most | ||
| Please improve the small and medium business promotion centre for small business owners | Economy and job | 12. Job and business: 40.9% | 16. Public facility: 8% |
| Installation of secure delivery box at subway station | Women and education | 10. Crime prevention: 43.8% | 16. Public facility: 8.2% |
| It’s scary in the evening! Installing streetlights and CCTV in the forest eco-playground | Environment | 10. Crime prevention: 36.7% | 6. Park and playground: 10.5% |
| Removing fine dust from bus stops: smart bus stop | Transportation | 5. Transportation facility: 80.6% | 13. Waste management: 3.2% |
Additionally, the model enables navigation of related proposals within specific topics. Table 5, for example, presents the most representative proposals for Topics 1 (Traffic facility) and 2 (Transportation facility), as well as Topics 14 (Community space) and 15 (Community programme), highlighting their distinct characteristics. While Topics 1 and 5 address urban mobility, and Topics 14 and 15 focus on community cohesion, the representative samples illustrate their unique emphases. Moreover, Table 5 shows that citizens often employed catchy phrases and titles in their proposals to attract public attention and votes, as suggested by Secinaro et al. (2022a).
Representative proposals’ titles of selected topics
| Proposal titles | % |
|---|---|
| #1. Traffic facility (traffic accident, vehicle, crosswalk, driver, traffic light) | |
| “Installation of crosswalk floor signal lights” for pedestrian safety in front of the elementary school | 87.7 |
| No more school zone safety accidents!! (Please secure a safe school route.) | 83.2 |
| This is a children’s shelter. You can recognise it from afar | 82.6 |
| #5. Transportation facility (bus, subway, bus stop, public transportation, station) | |
| I want to wait for the bus without worrying about fine dust! | 91.9 |
| Please install the bus stop and the bus information terminal! | 90.7 |
| Please install the bus information terminal (BIT) | 85.5 |
| #14. Community space (resident, village, area, community, space) | |
| Rooftop garden for sharing culture and leisure | 75.8 |
| We need a stable “croak, croak” community space for various people | 72.6 |
| Bridging space project: a joint management space for village welfare | 71.9 |
| #15. Community programme (culture, area, art, youth, space) | |
| Young artist embraces local jobs in idle space | 80.9 |
| Creating a youth culture and arts playground | 80.4 |
| “Art sympathy in life” between citizens and young artists | 77.3 |
| Proposal titles | % |
|---|---|
| “Installation of crosswalk floor signal lights” for pedestrian safety in front of the elementary school | 87.7 |
| No more school zone safety accidents!! (Please secure a safe school route.) | 83.2 |
| This is a children’s shelter. You can recognise it from afar | 82.6 |
| I want to wait for the bus without worrying about fine dust! | 91.9 |
| Please install the bus stop and the bus information terminal! | 90.7 |
| Please install the bus information terminal (BIT) | 85.5 |
| Rooftop garden for sharing culture and leisure | 75.8 |
| We need a stable “croak, croak” community space for various people | 72.6 |
| Bridging space project: a joint management space for village welfare | 71.9 |
| Young artist embraces local jobs in idle space | 80.9 |
| Creating a youth culture and arts playground | 80.4 |
| “Art sympathy in life” between citizens and young artists | 77.3 |
5.3 Proposal selection prediction
While the previous section addressed learning costs through topic modelling, this section focuses on compliance costs, predicting which proposals will likely be selected. As delineated in Table 3, the distribution of topic probabilities provides a foundational representation of the topical structure within the text corpus. This structure is a key feature in the forecasting proposal selection process. Additionally, insights from Figures 3 and 4 suggest that various proposal-level attributes—including the requested budget (reflecting the financial magnitude of the proposal), word count (representing the extent of information provided), project duration (indicating the temporal scope of the proposal), and the geographical distribution across 26 districts (depicting the locations of proposed initiatives, including the entire Seoul)—may also influence citizens’ voting decisions.
A Random Forest algorithm was employed to model these predictive dynamics comprehensively, incorporating the 46 features. Following a 5-fold cross-validation process, Figure 6 presents the Receiver Operating Characteristic (ROC) curve, which assesses the performance of the Random Forest classifier on unseen test data, comprising 20% of the original dataset. The ROC curve illustrates the trade-off relationship between the True Positive Rate (TPR), which represents the proportion of correctly classified actual positive cases, and the False Positive Rate (FPR), which indicates the proportion of incorrectly classified actual negative cases. Each point on the curve corresponds to a specific decision threshold, with a higher TPR and lower FPR indicating superior performance. For instance, in Figure 6, increasing the TPR from 0.4 to 0.8 requires an increase in FPR from 0.2 to 0.4. The diagonal dotted line represents a model with no predictive power, indicating random guessing. Thus, the closer a model’s ROC curve is to the dotted line, the poorer its predictive performance. Conversely, the Area Under the Curve (AUC) quantifies overall performance on a scale from zero to one, with higher AUC values indicating better model performance.
The graph shows a Receiver Operating Characteristic (ROC) curve, plotting True Positive Rate against False Positive Rate. Five machine learning models are compared: Logistic Regression (AUC 0.70), Random Forest (AUC 0.73), Gradient Boosting (AUC 0.72), Support Vector Machine (AUC 0.69), and Decision Tree (AUC 0.57). A dashed diagonal line represents random classification. The Random Forest model performs best, with curves closest to the top-left corner. The Decision Tree performs the worst, with its curve closest to the random classifier line. Note: All numerical values are approximated.The receiver operating characteristic curve. Author’s own work
The graph shows a Receiver Operating Characteristic (ROC) curve, plotting True Positive Rate against False Positive Rate. Five machine learning models are compared: Logistic Regression (AUC 0.70), Random Forest (AUC 0.73), Gradient Boosting (AUC 0.72), Support Vector Machine (AUC 0.69), and Decision Tree (AUC 0.57). A dashed diagonal line represents random classification. The Random Forest model performs best, with curves closest to the top-left corner. The Decision Tree performs the worst, with its curve closest to the random classifier line. Note: All numerical values are approximated.The receiver operating characteristic curve. Author’s own work
Compared to the Decision Tree classifier (AUC = 0.57) and other models, the Random Forest classifier achieves a higher AUC of 0.73, implying a 73% likelihood of accurately distinguishing between selected and non-selected proposals. The AUC metric is widely regarded as a key performance indicator in predictive modelling, particularly in medical research, where it is often used to predict cancer outcomes or patient survival. An AUC between 0.7 and 0.8 is generally considered reasonable for such tasks. Once trained, the model enables citizens to receive immediate prediction results upon submitting their proposals, allowing for timely feedback and potential refinement.
Figure 7 illustrates the relative importance of features in predicting proposal selection through voting. The analysis highlights that citizens prioritised a few variables when making voting decisions, including proposal duration, budget, total words, and geographical scope (specifically, the entire Seoul area). Additionally, thematic elements, such as Sidewalks (Topic 7), Community Programs (15), Waste Management (13), and Crime Prevention (10), were identified as influential factors shaping voting outcomes.
The chart is titled Random Forest Classifier – Feature Importance (Threshold greater than 0.02). The horizontal axis is Feature Importance, ranging from 0.00 to 0.12. The vertical axis lists features that contributed to the binary classification of proposals into selected or not selected. The most important feature is Area Seoul, representing city-wide proposals, with an importance score above 0.12. The next most important feature is duration, followed by budget and sidewalk. Topic-related features such as community programs, waste management, crime prevention, community space, and public facilities also play a significant role. Additional topics, including education programs, walking trails, traffic facilities, family programs, parks and playgrounds, collaboration, and tourism and culture, show lower importance, with tourism and culture being the least influential. Darker green bars represent higher feature importance values, while lighter green bars represent lower values. Note: All numerical values are approximatedFeature importance. Author’s own work
The chart is titled Random Forest Classifier – Feature Importance (Threshold greater than 0.02). The horizontal axis is Feature Importance, ranging from 0.00 to 0.12. The vertical axis lists features that contributed to the binary classification of proposals into selected or not selected. The most important feature is Area Seoul, representing city-wide proposals, with an importance score above 0.12. The next most important feature is duration, followed by budget and sidewalk. Topic-related features such as community programs, waste management, crime prevention, community space, and public facilities also play a significant role. Additional topics, including education programs, walking trails, traffic facilities, family programs, parks and playgrounds, collaboration, and tourism and culture, show lower importance, with tourism and culture being the least influential. Darker green bars represent higher feature importance values, while lighter green bars represent lower values. Note: All numerical values are approximatedFeature importance. Author’s own work
The directionality and non-linear effects of the variables on voting decisions were analysed using partial dependence plots, as shown in Figure 8. The plot for the Seoul variable, which indicates whether a proposal targeted the entire city, shows a marginal negative effect on voting outcomes. This finding is noteworthy given the city government’s active encouragement for citizens to submit city-wide proposals spanning multiple districts. It also highlights the inherently spatial orientation of PB, where proposals linked to specific localities may resonate more strongly with participants than those addressing the city as a whole. The result suggests that proposers may benefit from specifying target districts rather than submitting proposals for all areas. However, the analysis did not reveal any particular district that was significantly more favoured in influencing voting decisions.
The figure presents nine partial dependence plots, each showing the relationship between one feature and the likelihood of proposal selection. The Area Seoul (proposals that target the whole city) displays a strong negative linear relationship, where higher values reduce the probability of selection. Duration shows a non-linear pattern, remaining stable before sharply increasing and then dropping slightly. Budget has a generally negative trend, with higher budgets reducing the probability of selection. Sidewalk shows a positive, curved relationship, indicating that higher values increase the likelihood of selection. Total words have a primarily positive influence, with more words linked to a higher probability of selection. The community program shows a slight positive upward trend. Waste management exhibits a slight downward slope, indicating a minor adverse effect. Crime prevention exhibits a slight upward trend, indicating a minor positive impact. Community space shows a similar slight upward trend, representing a weak positive effect. Note: All numerical values are approximated.Partial dependence plot. Author’s own work
The figure presents nine partial dependence plots, each showing the relationship between one feature and the likelihood of proposal selection. The Area Seoul (proposals that target the whole city) displays a strong negative linear relationship, where higher values reduce the probability of selection. Duration shows a non-linear pattern, remaining stable before sharply increasing and then dropping slightly. Budget has a generally negative trend, with higher budgets reducing the probability of selection. Sidewalk shows a positive, curved relationship, indicating that higher values increase the likelihood of selection. Total words have a primarily positive influence, with more words linked to a higher probability of selection. The community program shows a slight positive upward trend. Waste management exhibits a slight downward slope, indicating a minor adverse effect. Crime prevention exhibits a slight upward trend, indicating a minor positive impact. Community space shows a similar slight upward trend, representing a weak positive effect. Note: All numerical values are approximated.Partial dependence plot. Author’s own work
The result for the duration variable in Figure 8 shows a non-significant marginal effect on voting for proposals within durations under 300 days. However, a notable increase in approval is observed for proposals lasting up to one year, followed by a significant decline for those requiring more than one year. For the budget variable, the analysis indicates a diminishing marginal effect on proposal selection, suggesting a general preference among citizens for proposals with lower costs. Regarding word counts, citizens appear to favour proposals with more detailed information, up to 100 words. Given that the average proposal in the sample contained 57 words (Table 2), providing details can enhance the likelihood of selection. However, proposals exceeding 100 words do not significantly improve their chances, as shown in Figure 8. This result highlights the importance of well-structured proposal content within a reasonable length.
The marginal effects of specific topics also reveal important trends. Topics such as Sidewalk Development (Topic 7), Community Programs (15), Crime Prevention Initiatives (Topic 10), and Community Spaces (14), demonstrate positive effects, highlighting citizen preferences for proposals focusing on these thematic areas. In contrast, the Waste Management topic (13) shows a negative impact, suggesting that citizens may perceive waste management as a responsibility better suited to the city rather than being addressed through PB. This perception could explain why waste management proposals receive less support in city-wide initiatives.
6. Discussion
Recent advancements in digital technologies have spurred the proliferation of diverse participatory tools, creating a complex ecosystem that enhances citizen participation in policymaking (Shin et al., 2024). While these tools hold considerable promise, they also introduce new challenges in processing the growing volume of citizen input within democratic processes (Arana-Catania et al., 2021; Rossello et al., 2025). Like a coffee dripper, the throughput stage in PB functions as a filtering mechanism, distilling diverse citizen voices into a limited number of projects under strict resource constraints. This many-in, small-out dynamic places pressure on public institutions to review, synthesise, and prioritise inputs effectively. During the study period, only 7.6% of all submitted proposals were ultimately selected for implementation, meaning more than nine out of ten were rejected. Crucially, without meaningful deliberation and practices of dialogic accounting, PB risks being reduced to little more than a competition of good ideas for local neighbourhoods rather than a forum for collective debate and democratic learning. Despite growing scholarship on digital transformation in public sector accounting, the implications for PB remain an open empirical question (Bartocci et al., 2023; Klimovský et al., 2024).
This exploratory article contributes to the literature by examining the potential of ML techniques to reduce administrative burdens in PB. The rationale for applying the administrative burden framework is twofold. First, it highlights how ML can alleviate burdensome experiences for participants in PB by addressing learning and compliance costs. Second, it demonstrates how ML can automate repetitive preparatory tasks required during deliberation and decision-making, while ensuring that its application remains confined to the preparatory stages of the democratic process, thereby preserving its human-centred nature. To ensure the practical and ethical use of this study, it relied solely on publicly available data, thereby avoiding private or sensitive information and protecting privacy.
The research findings, summarised in Table 6, highlight that ML should be understood as a set of tools rather than a replacement for deliberation. This prompts a critical inquiry into when and how such tools should be designed and integrated into democratic processes. Learning costs represent a prominent administrative burden in PB, especially when thousands of proposals must be screened, summarised, and categorised to identify major themes for deliberation. Topic modelling enables the automated assignment of proposals to thematically relevant categories, thereby reducing reliance on manual classification. This finding aligns with prior research emphasising the advantages of AI in processing large volumes of public sector text through natural language processing (Costa et al., 2024; Rossello et al., 2025). However, the role of human participants remains indispensable as inspectors and decision-makers who assess model quality and triangulate results with citizen-submitted categories (see Figure 2). Furthermore, labelled proposals can generate automatic summaries and visual aids to improve communication, thereby reducing the time and effort required for proposal screening and interpretation.
Current practices and ML approaches for reducing administrative burdens
| Current practices of the Seoul PB | Proposed ML approach | |
|---|---|---|
| Learning costs |
|
|
| Compliance costs |
|
|
| Current practices of the Seoul PB | Proposed ML approach | |
|---|---|---|
| Learning costs | Public officials manually read and assign proposals to relevant departments or classify them as city- or district-level proposals Committees of citizens and public officials screen, reduce and synthesise proposals based on content | Automate proposal categorisation by assigning proposals to thematic topics, allowing comparison with citizen-submitted categories Generate summaries, assist in categorisation, and create visual materials to enhance communication and data literacy |
| Compliance costs | Operate PB School programmes, delivered online and offline, to build citizens’ capacity for narrowing knowledge gaps and writing effective proposals | Provide instant predictive feedback on likely voting outcomes, including popular topics, project duration, budgets, areas, and information levels |
Compliance costs in PB arise primarily from the need for citizens to create informative and persuasive proposals. Seoul’s PB School programme addresses this challenge by providing training in proposal writing and debate, but participants with limited experience in budgeting may remain disadvantaged. The random forest model developed in this study offers a complementary strategy, reducing compliance costs by providing real-time feedback during proposal drafting. Trained on past voting outcomes, the model predicts the likelihood of proposal selection based on topic, project duration, target location, budget, and word count. Such predictive feedback can empower citizens to refine their proposals, reduce barriers to participation, and contribute to a more equitable PB process.
While ML techniques hold promise in reducing administrative burdens, their application also introduces potential risks, particularly concerning learning costs. Automating proposal categorisation through topic modelling may inadvertently oversimplify the richness and complexity of citizen inputs, potentially disregarding nuanced or context-specific details that do not fit neatly into predefined topics (Shin, 2023). This risk is exacerbated when human oversight is limited, increasing the likelihood of misclassification or misalignment between algorithmic outputs and the original intent of the proposals (Rossello et al., 2025). Additionally, an over-reliance on digital platforms without face-to-face interactions risks widening social inequality, as evidence suggests that in-person PB settings can be more effectively tailored to reach marginalised groups, whereas online-only processes may unintentionally privilege more affluent groups (Touchton et al., 2019).
Applying ML to reduce compliance costs also raises ethical and administrative concerns. Predictive models may unintentionally reinforce existing biases present in historical data (de Laat, 2018). For instance, topics or areas that have historically received higher approval rates may influence the model’s predictions, potentially discouraging innovative or unconventional proposals. Although real-time feedback on a proposal’s likelihood of selection can enhance transparency and empower citizens, it may also produce unintended consequences. Certain groups might perceive the process as overly data-driven, which could discourage participation in the system (Chokki et al., 2022). Moreover, predictive feedback can shape citizen behaviour, prompting proposers to focus on superficial features that increase the chances of selection rather than addressing deeper community needs. Even when digital tools reduce administrative costs by offering self-service functionalities, they may introduce new burdens. For example, citizens may struggle to navigate unfamiliar systems or lack clear guidance on where and how to obtain support, as highlighted by Madsen et al. (2022).
Building on these concerns, this article echoes Peeters (2023) in cautioning against overreliance and overconfidence on automated decision-making. While ML offers valuable support by processing large volumes of information and reducing the administrative burdens associated with PB, it must not be treated as a substitute for human-centred deliberation. As Lafont (2019) reminds us, there are no “shortcuts” to ensuring equal opportunity for all citizens to express their voices and shape decisions that affect their lives. This article contends that ML should not be used to bypass deliberation, but rather to support its preparation by structuring inputs, identifying thematic patterns, and offering accessible information that enables more informed and evidence-based democratic engagement.
7. Conclusion
This study examined how machine learning (ML) can alleviate administrative burdens in participatory budgeting (PB), with a focus on reducing learning and compliance costs. By automating tasks such as proposal categorisation, summarisation, and feedback provision, ML can enhance efficiency and accessibility in PB processes, allowing public officials and citizens to concentrate on deliberation and decision-making. However, the study also highlights risks, including the marginalisation of nuanced proposals, reinforcement of biases in historical data, and potential barriers for citizens unfamiliar with technical systems. To maximise its potential, ML must be thoughtfully designed and implemented to complement human oversight, ensuring it enhances participatory governance without compromising inclusivity, transparency, or fairness.
This research has several limitations. Firstly, although the article discusses how ML can reduce administrative burdens within the exploratory design, it does not quantitatively measure the extent of that reduction. Future research could explore this by conducting surveys or in-depth interviews to assess stakeholders’ perceptions of administrative burdens before and after implementing ML techniques. Secondly, the study is based on the specific case of Seoul, and the findings may not be generalisable to other contexts. Future studies could replicate this analysis in different cities or countries to assess whether the results are consistent across various settings. Finally, while this article focuses on ML methods for proposal processing, it does not directly examine their influence on decision-making in PB processes. There is a risk that algorithmic models may prioritise more popular or ‘majority’ voices, potentially overlooking minority perspectives that could provide valuable societal insights.
Digital technologies, including ML and AI, are becoming robust public governance and accounting tools. Comparable to technologies such as video conferencing, virtual reality, and digital voting systems, their effective deployment requires careful consideration of when and at which stages they are integrated, as well as who develops them, how they are blended with other technologies, and for what purposes (Shin et al., 2024). It is, therefore, timely to demystify how these technical innovations are applied to democratic innovations. This article explored how ML can effectively mitigate administrative burdens in PB by processing diverse citizen inputs. Although some administrative complexities remain necessary to ensure transparency and democratic integrity, the findings suggest that ML can alleviate many labour-intensive tasks when integrated into an appropriate institutional framework. By addressing both the opportunities and risks of using ML in PB, this article encourages future research on technological innovation to enhance governance while upholding the fundamental principles of democracy.
Appendix Research flow
This article sequentially combined unsupervised and supervised methods to identify topics from citizen-submitted proposals and ultimately predict whether they are selected. Figure 1 presents the flowchart of the proposed approach.
1. Pre-processing
After collecting the data, duplicated data were removed, resulting in 21,833 proposals. Regarding text data, this article considered the aim, description, and effects as a whole proposal text. The text data are unstructured, containing punctuations, symbols, numbers, stopwords (such as “I” and “you”), and URLs that were removed due to the unnecessary information they provided. KoNLPy, a Python library for Korean natural language processing (Park and Cho, 2014), was used to tokenise the text (extracting nouns, adjectives, and verbs) and perform stemming with a Part-of-Speech tagger (Okt tagger). For numerical data, entry errors and missing values were corrected. Specifically, 1,566 proposals had missing or unidentifiable period values, which were imputed by taking the mean of non-missing values (Emmanuel et al., 2021).
2. Topic modelling (combined TM)
Topic modelling is a family of statistical algorithms aiming to identify latent topics within an extensive corpus (Blei, 2012). The most popular model used in various academic and business fields is Latent Dirichlet Allocation (LDA), also used in previous studies that combined topic models with supervised algorithms employed (Chiu et al., 2022; Geletta et al., 2019; Putranto et al., 2021; Ugochi et al., 2022). Developed by Blei et al. (2003), LDA has also been applied in public administration research (Anastasopoulos and Whitford, 2019; Hollibaugh, 2019; Walker et al., 2019).
The “Bag-of-Words” (BoW) model is the foundation of LDA. It represents a document as an unordered vector of word counts or a document-term matrix, meaning it focuses on word frequency within a corpus. The BoW assumption is widely used for representing text data due to its simplicity and effectiveness in handling large text corpora. However, it has a notable drawback: it does not consider the sequential nature of text and disregards the contextual relationships between words conveyed through their order. For example, BoW models consider the sentences “Jane loves Tom” and “Tom loves Jane” identical because they only count word frequency. Additionally, BoW models treat words like “citizens” and “residents” as independent entities, even though they are often used interchangeably in everyday language, signifying a loss of contextual information.
The combined topic model (Combined TM) was proposed to overcome the limitation by combining conventional BoW representations with contextualised sentence representations called Sentence-BERT (SBERT) as model inputs (Bianchi et al., 2021). BERT stands for Bidirectional Encoder Representations from Transformers, a pre-trained language model (e.g., trained in a large corpus, such as Wikipedia or news) that can be fine-tuned for specific NLP applications. It takes a bidirectional training approach, allowing the model to consider each word’s context in a sentence by looking at both the left and right contexts. While BERT is mainly designed for predicting words in a sentence bidirectionally, Sentence-BERT is an extension of BERT that captures the semantic meanings of sentences at significantly lower computational costs (Reimers and Gurevych, 2019).
The concatenated BoW representations and SBERT embeddings are trained using a Variational AutoEncoder, which has been built on a neural network architecture consisting of an encoder and a decoder (Srivastava and Sutton, 2017). An encoder uses the concatenated BoW and SBERT embeddings to learn a probabilistic mapping to a latent space representation. Then, a decoder takes samples from the latent space to reconstruct the BoW representation. The training objective is to minimise the difference between the input and reconstructed BoW representations. Previous studies that compared the performances of topic models showed that the Combined TM outperformed other models regarding coherence metrics in several datasets (Bianchi et al., 2021) and prediction with fewer than 20 topics (Rijcken et al., 2022).
The first step for topic modelling is to set a hyperparameter, topic number K. A standard procedure uses coherence and diversity metrics to identify the optimal range of topic numbers, and experts in the domain decide K upon manual inspection. This article follows the same metrics, Normalised Pointwise Mutual Information Coherence (NPMI-Coherence) (Bouma, 2009) and Inversed Ranked-Biased Overlap Diversity (IRBO-Diversity) (Terragni and Fersini, 2021), used by the authors of Combined TM (Bianchi et al., 2021). The two metrics commonly aim to measure the quality of the topic model results regarding semantic interpretability, albeit with different approaches. NPMI-Coherence computes how a set of most common words (e.g., 10 words) in topics co-occur in documents more often than expected by random chance within a range between −1 (not co-occurring) and 1 (co-occurring) based on the assumption that highly co-occurring words are semantically more interpretable (Bouma, 2009). In contrast, IRBO-Diversity measures the diversity of topics within a range between 0 (identical topics) and 1 (completely different) (Bianchi et al., 2021; Terragni and Fersini, 2021).
In this article, the CTM Python package developed by Bianchi et al. (2021) was used to build Combined TM models using the ko-sroberta-multitask sentence transformer (https://huggingface.co/jhgan/ko-sroberta-multitask) for processing Korean (Ham et al., 2020). The models were trained with various topic numbers (5, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 50, 70, 100), each being trained 5 times. The NPMI-Coherence and IRBO-Diversity scores were then calculated for each model. The article will report the result of the Combined TM model with the optimal number.
3. Supervised learning (random forest for binary classification)
The supervised learning component utilises the topic model outputs (document-topic probabilities) along with proposal periods, costs, target area, and word count to predict which proposals will be selected. It is a binary classification task, such as classifying new data into spam/non-spam emails, positive/negative sentiments, and survival/non-survival in clinical tests.
In ML, the input variables that determine the output are referred to as features or predictors, while the output variables are known as targets or labels (Anastasopoulos and Whitford, 2019). This article’s target variable is the voting result, which could be 1 (voted for) or 0 (not voted for). The dataset was first split into training and testing sets to develop models, with 80% of the 12,521 city-wide proposals assigned to the former and the remaining 20% to the latter. Since only 7.6% of proposals were voted for, the dataset is highly imbalanced, which poses a challenge for accurate model prediction. A standard technique for addressing this issue is to over-sample or down-sample the data to achieve a balanced result (Chawla et al., 2002). In this article, I used a stratification technique to ensure that the random data split reflects the imbalanced target values, then applied the SMOTE (synthetic-minority oversampling technique) using Python’s imbalanced-learn library to balance the class distribution of the target value by creating synthetic minority class samples, rather than simply over-sampling with replacement (Chawla et al., 2002).
After addressing the imbalanced dataset, the random forest (RF) classifier (Breiman, 2001) was trained to predict proposal selection using Python’s sci-kit-learn library. The idea behind RF is the wisdom of crowds (Teoh and Rong, 2017). Rather than relying on the decision made by a single model (tree), the algorithm constructs multiple and uncorrelated decision trees—forming a forest (or committee)—and aggregates their predictions (majority vote) to improve overall accuracy and resilience. Each tree is built using a bootstrapped subset of the dataset (bagging), and at each node, only a random subset of features is considered for splitting decisions (feature randomness). Therefore, RF addresses overfitting problems by considering individual predictions of decision trees while reducing their variances. It is one of the versatile and popular algorithms for handling high-dimensional and complex datasets (Biau and Scornet, 2016).
However, it requires a significant level of computational resources and is less interpretable than decision tree models. In addition, it requires several hyperparameters, including the number of trees (n_estimators parameter in the sci-kit-learn library), the number of random features to sample at each split (max_features), the maximum depth of the tree (max_depth), the minimum number of samples required to be a leaf node (min_samples_leaf), and the minimum number of samples required to split (min_samples_split), as reviewed by Probst et al. (2019).
This article addressed the limitation of RF using hyperparameter tuning (Probst et al., 2019). The approach used was a randomised search with 5-fold cross-validation, which trains various combinations of five parameters and evaluates their performance. The cross-validation method allows us to perform multiple iterations of different model settings, compare, and choose the best model. The metric used to evaluate the performance of binary classification models was AUROC (Area Under the Receiver Operating Curve), calculated based on the True Positive Rate (TPR) and False Positive Rate (FPR), as shown in Table A1. Once the best classifier was found, hyperparameter tuning was applied with parameters to enhance its performance and generalisation capabilities (Probst et al., 2019).
Confusion matrix and performance metrics
| Prediction | |||
|---|---|---|---|
| Positive | Negative | ||
| Actual | Positive | True positive (TP) | False negative (FN) |
| Negative | False positive (FP) | True negative (TN) | |
| TPR | True Positive Rate (TPR): the proportion of correctly predicted positive cases to the actual positive cases. An ideal result is to get a TPR of 1 | ||
| FPR | False Positive Rate (FPR): the proportion of wrongly predicted positive cases to the actual negative cases. An ideal result is to get an FPR of 0 | ||
| AUROC | A ROC curve is a plot that shows trade-offs between the TPR and FPR values. AUROC measures the area under the curve within a range between 0 and 1. A closer to 1 indicates better model performance | ||
| Prediction | |||
|---|---|---|---|
| Positive | Negative | ||
| Actual | Positive | True positive (TP) | False negative (FN) |
| Negative | False positive (FP) | True negative (TN) | |
| TPR | True Positive Rate (TPR): the proportion of correctly predicted positive cases to the actual positive cases. An ideal result is to get a TPR of 1 | ||
| FPR | False Positive Rate (FPR): the proportion of wrongly predicted positive cases to the actual negative cases. An ideal result is to get an FPR of 0 | ||
| AUROC | A ROC curve is a plot that shows trade-offs between the TPR and FPR values. AUROC measures the area under the curve within a range between 0 and 1. A closer to 1 indicates better model performance | ||
After confirming the best-performing model, for this study I conducted further analysis of the model outcomes by examining feature importance and partial dependence. Feature importance elucidates the relative contributions of each feature in predicting proposal selection by assessing the performance drop, quantified through the mean decrease in Gini impurity, after shuffling feature values and iteratively predicting based on the altered dataset (Archer and Kimes, 2008; Breiman, 2001). Higher values of the performance drop indicate that the feature is more important than others. However, a limitation of this approach lies in its inability to illustrate the directionality (e.g. positive or negative) and non-linear relationships between features and the target variable. To address this gap, partial dependence plots offer a valuable means by which to depict the marginal effect of a chosen feature on the target variable (Hastie et al., 2008).

