Evaluating cultural knowledge processing in large language models: a cognitive benchmarking framework integrating retrieval-augmented generation

Lee, Hung-Shin; Chang, Chen-Chi; Chen, Ching-Yuan; Hsu, Yun-Hsiang

doi:10.1108/EL-04-2025-0136

Article navigation

Research Article| November 26 2025

Evaluating cultural knowledge processing in large language models: a cognitive benchmarking framework integrating retrieval-augmented generation

Hung-Shin Lee;

Hung-Shin Lee

United Link Co., Ltd

, Taipei,

Taiwan

Search for other works by this author on:

This Site

PubMed

Google Scholar

Chen-Chi Chang;

Chen-Chi Chang

Department of Culture Creativity and Digital Marketing,

National United University

, Miao-li,

Taiwan

Corresponding author Chen-Chi Chang kiwi@gm.nuu.edu.tw

Search for other works by this author on:

This Site

PubMed

Google Scholar

Ching-Yuan Chen;

Ching-Yuan Chen

Department of Culture Creativity and Digital Marketing,

National United University

, Miao-li,

Taiwan

Search for other works by this author on:

This Site

PubMed

Google Scholar

Yun-Hsiang Hsu

Department of Culture Creativity and Digital Marketing,

National United University

, Miao-li,

Taiwan

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Corresponding author Chen-Chi Chang kiwi@gm.nuu.edu.tw

Publisher: Emerald Publishing

Received: April 16 2025

Revision Received: July 13 2025

Revision Received: August 18 2025

Revision Received: October 12 2025

Accepted: October 30 2025

Online ISSN: 1758-616X

Print ISSN: 0264-0473

Funding

Funding Group:

Award Group:
- Funder(s):
  National Science and Technology Council
- Award Id(s):
  NSTC 114–2420-H-239–002.
Funding Statement(s):
This study was supported by the National Science and Technology Council, NSTC 114–2420-H-239–002.

2025

Emerald Publishing Limited

Licensed re-use rights only

The Electronic Library 1–22.

https://doi.org/10.1108/EL-04-2025-0136

Purpose

This paper aims to evaluate how effectively large language models (LLMs) represent and generate minority cultural knowledge, specifically Taiwanese Hakka culture. To address this, the study proposes a structured and replicable evaluation framework integrating Bloom’s taxonomy and retrieval-augmented generation (RAG). The research is guided by the following questions: (1) How do LLMs perform across different cognitive domains when processing Hakka cultural content? (2) To what extent does the integration of RAG enhance the accuracy and contextual appropriateness of LLM outputs? And (3) How do different LLM architectures compare in their ability to recall, analyse and creatively synthesize culturally grounded information?

Design/methodology/approach

This study proposes a cognitive benchmarking framework to evaluate how LLMs process and apply culturally specific knowledge. The framework integrates Bloom’s taxonomy with RAG to assess model performance across six hierarchical cognitive domains: remembering, understanding, applying, analysing, evaluating and creating. Using a curated Taiwanese Hakka digital cultural archive as the primary testbed, the evaluation measures LLM-generated responses’ semantic accuracy and cultural relevance.

Findings

The evaluation results indicate that LLMs augmented with RAG exhibit marked improvements over baseline models in the cognitive domains of remembering, understanding and analysing. These enhancements are particularly evident in tasks requiring factual accuracy, contextual relevance and semantic precision, underscoring RAG’s effectiveness in addressing the knowledge sparsity typically observed in underrepresented cultural data sets. However, a notable limitation persists across all models including those equipped with RAG in the domain of creating. This suggests that while retrieval mechanisms bolster the reproduction and comprehension of cultural knowledge, they do not yet sufficiently support culturally nuanced generative synthesis.

Originality/value

This study introduces a novel evaluation framework integrating cognitive domain benchmarks with RAG-enhanced LLMs to assess cultural knowledge processing. The research advances culturally grounded artificial intelligence (AI) systems and digital archival quality by empirically demonstrating RAG’s impact on improving factual accuracy in lower and mid-level tasks. The findings affirm the strategic value of retrieval integration for enhancing representational fidelity in cultural AI applications, while also highlighting the need for future research into hybrid architectures that combine external grounding with culturally adaptive generation strategies.

2025

Emerald Publishing Limited

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

Evaluating cultural knowledge processing in large language models: a cognitive benchmarking framework integrating retrieval-augmented generation

New and popular articles

Email Alerts

Cited By

Evaluating cultural knowledge processing in large language models: a cognitive benchmarking framework integrating retrieval-augmented generation

Sign in

Client Account

ICE Member Sign In

New and popular articles

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable