Skip to Main Content
Article navigation
Purpose

The study aims to design, develop and demonstrate a Python-based application that enhances the accessibility, usability, and analytical potential of big open data. It addresses the growing challenge of transforming large-scale, publicly available datasets into actionable insights that can support research activities and evidence-based policy decision-making.

Design/methodology/approach

The study adopts a design-oriented research approach. A systematic literature search was conducted to identify conceptual gaps, methodological limitations, and suitable open-source technologies for big open data analytics. The Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology guided the structured design and implementation of the application, integrating Apache Spark and PySpark to support scalable data processing. The proposed framework was validated through a case study analysing the City of Chicago’s open crime dataset. It demonstrates the complete data science pipeline from data collection and preprocessing to visualization and predictive modelling.

Findings

The developed application streamlines big open data analysis by operationalising the CRISP-DM stages within a single workflow and automating key data science tasks. The case study results show that the application effectively identifies crime trends and patterns, illustrating its capability to support data-driven urban management and informed decision-making by users with varying levels of technical expertise.

Practical implications

The proposed framework enables public authorities, researchers, and policymakers to analyse large open datasets more efficiently, supporting evidence-based planning and enhancing the operational value and transparency of open data initiatives.

Social implications

By lowering technical barriers to advanced data analytics, the application promotes broader engagement with open data and encourages data-driven innovation, contributing to improved societal outcomes in domains such as public safety, governance, and urban development.

Originality/value

This research contributes one of the few open-source, Python-based analytical frameworks explicitly designed for big open data analysis using CRISP-DM principles. The study advances the literature by delivering a reusable, replicable, and scalable software artifact that bridges the gap between conceptual data analytics frameworks and practical implementation for non-specialist users.

Licensed re-use rights only
You do not currently have access to this content.
Don't already have an account? Register

Purchased this content as a guest? Enter your email address to restore access.

Please enter valid email address.
Email address must be 94 characters or fewer.
Pay-Per-View Access
$41.00
Rental

or Create an Account

Close Modal
Close Modal