Editorial for Special Issue on Pre-trained Large Language Models for Information Processing

Wang, Bin; Kawahara, Tatsuya; Li, Haizhou; Meng, Helen; Wu, Chung-Hsien

doi:10.1561/116.00004100

The concept of pre-training originates from transfer learning. With the development of self-supervised training objectives, a large language model can be trained on massive unlabeled data and applied to various downstream tasks by either fine-tuning or as pre-trained features. Starting in 2018, pre-trained language models (PLMs) and large language models (LLMs) such as BERT, GPT-series, and ChatGPT have vastly transformed the field of natural language processing (NLP). It has revolutionized various tasks, including machine translation, sentiment analysis, summarization, and conversational systems. Meantime, the success also broadcasts to other fields, including computer vision and speech processing. Several works have reshaped the mainstream in specific tasks, including BEIT for image representation learning and wav2vec 2.0 for speech signals.

The field of information processing encompasses a wide range of research areas, from traditional modalities to emerging fields. Among the most transformative recent developments in this domain are pre-trained language models, which have significantly impacted language processing, speech processing, and computer vision. As these models continue to evolve and advance rapidly, it is essential to reflect on the reasons for their success, how they are reshaping existing paradigms in information processing, and their potential to shape the future of various applications. This special issue focuses on recent advances and novel approaches in large-scale pre-trained language models and their applications. We also provide a forum for researchers to share the latest cutting-edge developments related to pre-trained language models and pre-trained models in other modalities. However, while pre-trained large language models have shown remarkable success in various applications, many challenges remain. For example, researchers are exploring ways to improve these models’ efficiency, interpretability, and explainability and addressing ethical considerations and biases in their training data. Additionally, there is a need to explore the potential of these models in areas beyond language processing, such as multimodal information processing and cross-domain transfer learning. This special issue has accepted four articles, highly recommended by the editors and reviewers.

The 1st paper is titled “Informative and Long-Term Response Generation using Multiple Suggestions and User Persona Retrieval in a Dialogue System”, authorized by Jia-Hao Hsu, Tsai-Yi Chen, and Chung-Hsien Wu. This paper introduces a novel dialogue system that enhances user satisfaction by generating more relevant and informative responses. This system incorporates a Multi-Suggestions Transformer (MST) which utilizes empathy, system persona, and knowledge suggestions to create comprehensive responses. Additionally, it employs a persona detection and extraction model to identify and use user persona information from dialogue history, aiding in maintaining long-term conversations. The system shows improved performance over baselines in various metrics, indicating its effectiveness in generating informative responses and utilizing user personas for extended dialogues. Future work includes further improving the use of multiple suggestions and more effectively utilizing user personas for personalized responses.

The 2nd paper is titled “An Overview of Language Models: Recent Developments and Outlook”, authorized by Chengwei Wei, Yun-Cheng Wang, Bin Wang, and C.-C. Jay Kuo. This paper provides a comprehensive examination of language models (LMs) in natural language processing. It covers various aspects of LMs, including types (conventional and pre-trained models), linguistic units, model architectures, training methods, evaluation, and applications. The paper delves into the evolution from conventional LMs to modern pre-trained LMs, highlights their training paradigms, and discusses their applications in tasks like text generation and machine translation. It also explores future research directions, focusing on the integration of LMs with knowledge graphs, incremental and lightweight models, domain-specific versus universal models, and the interpretability of these models. The paper aims to provide a thorough understanding of LMs for both newcomers and experienced researchers in the field.

The 3rd paper is titled “Bias and Fairness in Chatbots: An Overview”, authorized by Jintang Xue, Yun-Cheng Wang, Chengwei Wei, Xiaofeng Liu, Jonghye Woo, and C.-C. Jay Kuo. It examines biases in chatbot design and operation, categorizing them into biases from chatbot design, user interactions, and social deployment. It discusses the potential harms from negative biases, like allocation and representation harms, and presents methods for bias mitigation in chatbots. The paper also explores future research directions, such as the development of open-domain versus domain-specific chatbots, bias control in multi-modal chatbots, and the creation of green and interpretable chatbots. It highlights the challenges in achieving fairness in chatbot applications and suggests that bias and fairness issues in chatbots are an ongoing concern requiring further research.

The 4th paper is titled “Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text”, authorized by Lingyi Yang, Feng Jiang, and Haizhou Li. It first introduces a new dataset, HPPT (ChatGPT Polished Paired abstracts), for detecting ChatGPT-polished academic texts. It proposes a novel metric called “Polish Ratio”, which quantifies the extent of ChatGPT’s modifications to the original human-written text. The study shows that this method provides better robustness and explanatory power in identifying texts influenced by ChatGPT, compared to existing methods. The paper’s findings demonstrate a significant advancement in distinguishing human-written texts from those polished or generated by ChatGPT, addressing a key challenge in the field of AI-generated content detection.

This special issue presents a wide array of research papers in the realm of Large Language Models (LLMs) for diverse information processing methods. The collection not only highlights the challenges and constraints inherent in this rapidly evolving field but also introduces novel approaches for a variety of predictive applications. These applications span text processing, dialogue systems, and AI-generated content detection through polish ratio analysis. We hope that this special issue will motivate researchers to pursue fresh avenues and inspire new entrants to engage in LLM-related research. Finally, we express our sincere thanks to all the reviewers for their committed partnership and insightful input.

Guest Editors

Bin Wang

Tatsuya Kawahara

Haizhou Li

Helen Meng

Chung-Hsien Wu

2024

B. Wang et al.

Published in APSIPA Transactions on Signal and Information Processing. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution-NonCommercial (CC BY-NC 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for non-commercial purposes only), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY-NC 4.0 licence.

Editorial for Special Issue on Pre-trained Large Language Models for Information Processing

Email Alerts

Cited By

Editorial for Special Issue on Pre-trained Large Language Models for Information Processing

Email Alerts

Suggested Reading

Related Chapters

Recommended for you

Cited By

Sharing Unavailable