Visualization algorithm based on FDR control testing for dimension reduction of textual data

Pyo, Sung-Inn; Ahn, Soohyun; Kwon, Soon-Sun

doi:10.1108/DTA-04-2024-0373

Article navigation

Research Article| February 26 2025

Visualization algorithm based on FDR control testing for dimension reduction of textual data

Sung-Inn Pyo

0009-0001-0970-0092

;

Sung-Inn Pyo

Department of Mathematics

,

Ajou University

, Suwon,

South Korea

Search for other works by this author on:

This Site

PubMed

Google Scholar

Soohyun Ahn

0000-0001-5016-5469

;

Soohyun Ahn

Department of Mathematics

,

Ajou University

, Suwon,

South Korea

Search for other works by this author on:

This Site

PubMed

Google Scholar

Soon-Sun Kwon

0000-0002-3344-1609

Soon-Sun Kwon

Department of Mathematics

,

Ajou University

, Suwon,

South Korea

Soon-Sun Kwon can be contacted at: qrio1010@ajou.ac.kr

Search for other works by this author on:

This Site

PubMed

Google Scholar

Author & Article Information

Soon-Sun Kwon can be contacted at: qrio1010@ajou.ac.kr

Publisher: Emerald Publishing

Received: April 01 2024

Revision Received: July 17 2024

Revision Received: July 23 2024

Revision Received: August 17 2024

Revision Received: October 16 2024

Accepted: November 22 2024

Online ISSN: 2514-9318

Print ISSN: 2514-9288

2025

Emerald Publishing Limited

Licensed re-use rights only

Data Technologies and Applications (2025) 59 (2): 338–361.

https://doi.org/10.1108/DTA-04-2024-0373

Purpose

Visualizing relations of textual data requires dimension reduction to increase the interpretability of output. However, traditional dimension reduction methods have some limitations, such as the loss of feature information during extraction or projection in dimension reduction and uncertain results due to the mixture of word labels. In this study, we develop the textual data visualization algorithm using statistical methods to present statistical inferences on the data. We also construct the algorithm in a way that the user can analyze textual data easily.

Design/methodology/approach

Unstructured data, such as textual data, is sensitive to choosing analysis methods. In addition, textual data is generally large-sized and sparse. Considering such characteristics, we applied latent Dirichlet allocation to separate data to minimize the loss of information, and false discover rate (FDR) control to reduce dimension in a statistical way.

Findings

The relation of textual data can be derived in a one-click way, and the output can be interpreted without background information, with separated topics.

Originality/value

The algorithm is constructed based on the Korean language. However, any language can be used without linguistic information. This study can be an example of usage and flow, which using not well-known dimension reduction methods can replace traditional methods.

2025

Emerald Publishing Limited

Licensed re-use rights only

You do not currently have access to this content.

Don't already have an account? Register

Visualization algorithm based on FDR control testing for dimension reduction of textual data

Email Alerts

Cited By

Visualization algorithm based on FDR control testing for dimension reduction of textual data Available to Purchase

Sign in

Client Account

ICE Member Sign In

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable

Visualization algorithm based on FDR control testing for dimension reduction of textual data