Automatic text summarisation using a modular pipeline approach with LangChain

Mustafa, Kamran; Mahbub, Khaled

doi:10.1108/ACI-10-2025-0437

Purpose

This paper aims to present a modular pipeline approach by combining extractive and abstractive summarisation methods in order to improve the accuracy of automatic text summarisation (ATS).

Design/methodology/approach

In our framework we have used LangChain [20], a large language model integration platform, to orchestrate an extractive algorithm (LexRank) and an abstractive algorithm (Bidirectional and Auto-Regressive Transformers (BART)). LexRank identifies key sentences from the input text, ensuring core information is retained without redundancy, BART is used for refining these extracted sentences into human-like, concise summaries.

Findings

We evaluated the performance of our framework by using CNN/Daily Mail, Newsroom and XSum datasets, which are widely used benchmark for summarisation tasks. Evaluation results demonstrate that this modular approach improves accuracy of generated summaries in two ways, (1) word level similarities between reference summary and generated summary and (2) semantic similarity between reference summary and generated summary.

Research limitations/implications

In our experiments, we used datasets that have short length documents (within 7,000 characters). We need to perform more experiments with long documents to evaluate the robustness of our pipeline. We have not investigated error propagation from extractive phase to abstractive phase that may have an impact on the performance of our framework.

Originality/value

The primary contribution of this study is the development and evaluation of a modular pipeline combining LexRank, an extractive summarisation technique, and BART, an advanced abstractive model, using LangChain. The results indicate that the hybrid approach outperforms traditional single-model methods and other existing hybrid models, as evidenced by higher Recall-Oriented Understudy for Gisting Evaluation and BERTScores across various examples.

1. Introduction

In the age of digital information, the rapid expansion of textual data plays an enormous challenge for users to efficiently access and comprehend relevant information. With the Internet and various textual sources generating massive amounts of data daily, finding concise and accurate summaries is becoming increasingly essential. Automatic text summarisation (ATS) is a solution that enlightens this challenge by filtering essential information from large datasets into brief, manageable summaries [1, 2]. ATS is significant in domains like news aggregation, scientific research and business intelligence, where summarising vast amounts of data quickly and accurately is vital.

As the field of text summarisation evolves, different techniques have manifested each with their advantages and limitations [3]. These ATS techniques are often classified based on input and output characteristics [1–3]. Input-based classification is concerned with the nature, quantity and structure of the input data. It primarily includes two categories: single-document summarisation and multi-document summarisation. Single-document summarization involves summarising a single source of text, such as a news article or research paper [1, 4, 5]. In contrast, multi-document summarisation processes multiple documents, often on the same topic. This summarisation process identifies and merges overlapping information while maintaining coherence and diversity [1, 4, 5]. Output-based classification, meanwhile, is based on the form and nature of the summary generated. The two main types are extractive and abstractive summarisation. Both the approaches involve various algorithms and techniques to accomplish ATS [1, 4, 5].

Traditional extractive methods include frequency-based approaches, where words that occur frequently are assumed to be more important, and sentences containing those words are selected [4]. More advanced extractive techniques involve graph-based algorithms like TextRank [6], which constructs a graph of sentences with edges representing semantic similarity and ranks them to identify the most central ones [7, 8]. In recent past, deep learning models and pre-trained transformers (e.g. variant of Bidirectional Encoder Representations from Transformers (BERT) [9, 10]) have been utilised for extractive summarisation by exploring richer contextual relationships [4, 5].

On the other hand, abstractive methods aim to generate new sentences that paraphrase and condense the original text resembling human-like summaries [9, 11]. Traditional abstractive methods relied on rule-based or statistical approaches, which were limited in flexibility and fluency [4, 5]. The advent of sequence-to-sequence (Seq2Seq) neural networks allowed models to focus on relevant parts of the input while generating each word in the output [4, 5, 12]. Further improvements came with the introduction of transformer-based architectures, such as Bidirectional and Au-to-Regressive Transformers (BART) [7, 13, 14], T5 (Text-To-Text Transfer Transformer) [13, 15, 16] and PEGASUS [13]. Typically these models are pre-trained on large corpora and fine-tuned on summarisation datasets.

Automatic text summarisation (ATS) has shown tremendous growth but still faces obstacles where one of the key limitations of current models is the trade-off between speed and quality. Extractive summarisation selects and stitches together sentences directly from the original text, often resulting in summaries that lack coherence, context and flow [2]. These summaries may include redundant or less relevant information and may miss the document's overall meaning if key ideas are spread across multiple sentences. Extractive methods also struggle with paraphrasing and do not simplify or rephrase complex content, which makes them unsuitable for applications requiring concise or reader-friendly summaries. On the other hand, abstractive summarisation models are prone to factual inaccuracies, known as “hallucinations”, where the generated content includes information not present in the source [17]. This limits the application of abstractive summarisation in critical applications such as medical or legal summaries. Recent transformer models, such as BART and PEGASUS [13], have improved performance but continue to struggle with maintaining factual accuracy across varied domains [9]. In the recent past, hybrid method of text summarisation has evolved, and it combines extractive and abstractive summarisation to improve the performance of ATS. However, these methods have their own limitations including disjointed summary that lacks fluency or misrepresents the original text that results in inaccuracy [5]. Overall, as found in the literature, accuracy is a big obstacle in ATS and typical average accuracy achieved by various approaches is not very satisfactory [4, 7, 9, 12, 18], which is the main motivation behind this work.

In this work, we aim to address the limitation of accuracy of ATS, found in the literature, by exploring a modular pipeline approach, combining extractive and abstractive methods to leverage their strengths. As highlighted in recent research that modular systems, which incorporate multiple specialised models, can improve factual consistency and adaptability across domains [19]. Following this we have used LangChain [20], a large language model (LLM) integration platform, to orchestrate pre-trained models for extractive and abstractive summarisation of texts. More specifically, in our framework we use LexRank [7, 21], an extractive algorithm, to identify key sentences from the input text, ensuring core information is retained without redundancy. We have used Hugging Face's BART [7, 13, 14] model for the abstractive stage, refining these extracted sentences into human-like, concise summaries. We have selected LexRank and BART for the pipeline after evaluating a number of popular extractive and abstractive algorithms as discussed in Section 3. We evaluated the performance of our framework by using CNN/Daily Mail [22], Newsroom [23] and XSum [24] datasets, which are widely used benchmark for summarisation tasks. Evaluation results demonstrate that this modular approach improves accuracy of generated summaries in two ways:

Word level similarities, ensuring coverage of key information, between reference summary and generated summary, as indicated by Recall-Oriented Understudy for Gisting Evaluation (ROUGE) [25] score and
Semantic similarity between reference summary and generated summary, as indicated by BERTScore [26].

The rest of the paper is structured as follows: in Section 2, we provide an overview of recent developments in the area of ATS including categories of ATS and paradigms applied for ATS; in Section 3, we discuss the design and development of our modular framework for text summarisation; in Section 4, we present and discuss the experimental results that were carried out to evaluate our framework and, finally, in Section 5, we provide concluding remarks and recommendations for future enhancements of our modular framework.

2. Related works

The goal of ATS is to distil vast amounts of information while keeping the essential elements. As shown in Figure 1, ATS has been classified in different ways in literature, e.g. based on the input of the ATS, based on the output of ATS and algorithms or paradigms applied in the ATS.

Figure 1

View large Download slide

Automatic text summarisation classification

Based on the format of input to the ATS process, it has been classified as single-document, multi-document and query-focused summarisation [1 4, 9]. Single-document summarisation summarises the content of a single document, while in multi-document summarisation, multiple documents, typically on the same topic, are selected as input to generate the summary. On the other hand, query-focused summarisation tries to generate a summary that specifically addresses an input query, such as a topic or keywords.

Based on how the output summary is generated, ATS is classified into extractive, abstractive and hybrid methods [1, 4, 9, 10, 18]; extractive ATS, which selects important sentences or phrases directly from the source, has been widely used due to their simplicity and speed [1, 4, 9]. Conversely, abstractive summarisation methods aim to generate new sentences based on the original content, resembling human-like summaries that are more fluid and natural but come with higher computational complexity [1, 4, 9, 11]. Various paradigms have been applied for the realization of output-based ATS over the years [1, 4, 9], e.g. statistical methods, deep learning methods, pre-trained language model and, more recently, LLMs. A key limitation of both approaches lies in balancing speed and quality. Extractive ATS often leads to summaries that lack coherence and fluidity, as they rely on exact sentences from the original text [2, 3]; while abstractive summarisation, though more accurate in terms of meaning, is computationally intensive and prone to generating factual errors [2, 9]. Hybrid summarisation methods have evolved to combine the strengths of both extractive and abstractive methods to achieve more balanced and accurate results [4, 12].

Among the different types of ATS, output-based ATS, i.e. abstractive ATS, extractive ATS and hybrid ATS have attracted more attention in the literature [1, 4, 9, 10, 18]. In the rest of this section, we discuss the recent developments in extractive, abstractive and hybrid summarisation. We also review the data pre-processing mechanisms typically followed for the realisation of ATS.

2.1 Extractive summarisation

To increase the efficacy of extractive summarisation, a number of approaches have been developed over time. Graph-based, machine learning and deep learning-based methods are some of the most popular methodologies. Each approach has advantages and disadvantages based on the use and level of summarisation.

Graph-based techniques depict texts as nodes in a graph, with relationships – often based on similarity – reflected by the edges connecting the nodes [10]. This method is based on the intuitive understanding that significant sentences for summarisation are closely related to other significant sentences. Because of their effectiveness, simplicity and interpretability, these techniques were among the first to be developed for summarisation and are still in use today. TextRank [6–8] and LexRank [7, 21, 27], the two most popular graph-based algorithms, rank the significance of phrases using the centrality principle in graphs. TextRank was one of the first efforts to adapt the graph-based algorithm for extracting the most important sentences from a document [7]. The algorithm builds an undirected graph, where the nodes represent sentences, and the edges are weighted based on the similarity between sentences. Typically, the similarity between the sentences is computed using methods like cosine similarity or Jaccard similarity. Graph-based methods have several advantages, including their simplicity and computational efficiency [10, 27]. For example, they do not require training data, making them less dependent on annotated corpora, which can be difficult and expensive to obtain. However, these methods are often limited in their ability to capture the full semantic meaning of sentences. This is because they primarily focus on sentence similarity, which may not fully reflect the complexity of natural language and the context in which the sentences are used. Additionally, graph-based methods may struggle with handling documents that involve ambiguous or nuanced language. The fuzzy logic-based approach is also employed to identify the most important sentence from the source document or context. However, to achieve improved results, it requires the application of a redundancy removal technique [5, 12, 28].

Supervised learning techniques for extractive summarisation require labelled datasets with human-generated summaries. Features such as sentence length, term frequency, sentence position and semantic similarity are extracted and used to train models like support vector machines, logistic regression or random forests. These models learn to classify sentences as either important or unimportant based on their features. Once trained, the model can select key sentences for summarisation [15, 29, 30]. Despite their widespread popularity, supervised machine learning techniques do have several drawbacks. The largest obstacle is the requirement for extensively annotated datasets, which aren't always accessible, particularly in specialized fields. Furthermore, these techniques may have trouble generalizing, especially when used on texts with writing styles or domains that differ greatly from those used for training. Notwithstanding these difficulties, supervised learning approaches have demonstrated considerable promise, especially in fields requiring domain-specific summarisation tasks or where labelled data are accessible [4].

By using neural networks to automatically learn features from data, deep learning-based techniques have completely changed extractive summarisation by eliminating the necessity of manually created features. These methods, specifically recurrent neural networks (RNNs) [4, 9] and, more recently, transformer-based architectures, are able to understand the complex syntactic and semantic relationships between words and reflect long-range dependencies in the text. Extractive summarisation has been accomplished by recurrent neural networks (e.g. long short-term memory [4, 9]), in which model sentence dependencies choose phrases according to their significance to the entire manuscript. Because the meaning of a sentence frequently depends on the sentences that come before or after it, RNNs are especially well suited for sequential data, such as text. Transformer models are another innovation in deep learning-based extractive summarisation. These models calculate the relative value of each word or sentence in the document using attention mechanisms. One particularly significant contribution in this area has been Bidirectional Encoder Representations from Transformers [9, 10], which is a general pre-trained model that can be used for various applications in natural language processing (NLP.) By optimizing BERT for summarisation tasks, it can better comprehend sentence relationships and context, which makes it very successful at selecting the most important sentences in a document. Among the many benefits of deep learning-based approaches is their capacity to comprehend intricate linkages and adjust to a broad range of data. Because models like BERT can be pre-trained on big datasets and adjusted for certain tasks or domains, they are also incredibly versatile. However, there are drawbacks to deep learning models, including the requirement for a lot of labelled data, the processing power needed for training and the models' interpretability.

2.2 Abstractive summarisation

There are various approaches to abstractive summarisation implementation. Among the most well-known methods are Seq2Seq models and transformer-based pre-trained language models.

Seq2Seq (also known as encoder-decoder) models are the fundamental architecture for abstractive text summarisation [31]. These models comprise of an encoder and a decoder. In typical scenario, the encoder identifies and parses the input sequence and then utilizes high dimensional dense feature vector to characterize the input sequence. The decoder of the model then uses the feature vectors of the input items to generate the output items. Seq2Seq models do, however, have certain drawbacks, such as they have trouble sustaining long-range dependencies, which are essential for comprehending the document's whole context. Furthermore, the encoder's fixed-length vector might not adequately represent the document's complexity, which makes it challenging for these models to produce grammatically and fluently accurate summaries [32]. Seq2Seq models, despite these difficulties, helped to promote abstractive summarisation and are still a basis for more sophisticated methods. Rule-based methods are utilized when input documents must be represented as classes and lists of aspects. However, this method requires the creation of rules, which is a time-consuming task. The reliance on manually crafted rules makes it less efficient compared to the other methods discussed earlier in this subsection [5, 12]. The semantic graph-based model primarily extracts semantic information by assigning weights to the nodes and edges within sentences. As a result, it performs effectively in most cases but depends on having a semantic representation of the text [12].

Abstractive summarisation has greatly benefited from the use of transformer-based pre-trained language models like T5 (text-to-text transfer transformer) [13, 15] and BART [7, 8, 13]. These models have developed rich linguistic and contextual representations by extensive pre-training on massive text data corpora, which enables them to produce fluent, coherent and contextually appropriate summaries with little task-specific fine-tuning. These models can be trained for a variety of tasks, such as summarising, by pre-training them to learn a broad range of language patterns, syntax and semantic understanding. While transformer models like T5 and BART have set new benchmarks in abstractive summarisation, they are computationally expensive and require significant resources for training. Despite this, the ability to process long sequences in parallel and capture complex relationships between words and phrases in a document has made transformer-based models the leading architecture for modern NLP applications, including abstractive summarisation.

2.3 Hybrid model for summarisation

A modular pipeline approach is one in which multiple models or components are chained together, each responsible for a specific task within the overall process. This concept is gaining attention in NLP due to the increasing availability of specialized models that perform well on specific subtasks. For example, LangChain [20] enables the integration of multiple models and techniques within a pipeline, facilitating both extractive and abstractive summarisation workflows. According to recent research, combining multiple models – such as using an extractive summarizer to select important sentences followed by an abstractive model to rephrase them – can reduce hallucination and improve factual consistency [19]. LangChain's flexibility in chaining different models and allowing custom steps offers opportunities to experiment with various model architectures. For instance, an extractive summarizer can serve as a filter to identify key parts of a text, after which a transformer-based abstractive model generates a more refined summary. Furthermore, pipelines are more adaptive to context-specific requirements since they may be customised to domains [33]. Despite the promise, hybrid models have not yet achieved full potential in text summarisation. For example [34], introduced pointer-generator networks that integrate abstractive and extractive summarisation methods. With the help of an RNN-based encoder-decoder architecture, the model may generate new words (abstractive) and choose sentences (extractive) to create more fluid summaries. The pointer method used by pointer-generator networks enable the model to “point” to words in the input text so that it can directly copy significant words or phrases into the summary, enhancing the quality of the summary. In Reference [35], extractive summarisation and abstractive summarisation are combined by applying clustering and NLP to reduce redundancy in the generated summary. In the first step, Term Frequency-Inverse Document Frequency is used for feature extraction; K-means clustering is then used to group similar sentences. In the final step, selected sentences are then fed into the BART model to generate the final abstractive summary. However, summary generated by this approach may still contain redundant or less relevant information if the clustering or similarity measures are not correctly aligned with the summary objectives. Similarly, in Reference [18] extractive summarization and abstractive summarisation are combined to generate final summarisation. In the extractive phase, sentence similarity matrix and sentence differentiation are used to identify the most important sentences from the input, while in abstractive phase, beam search algorithm is used to rewrite the extracted sentences. In Reference [36], a hybrid ATS approach is developed by combining a recurrent neural network and a sequence-to-sequence attentional model for extraction and abstraction. This approach is restricted to only two layers of RNN model. In Reference [37], a hybrid ATS method is proposed based on deep learning and a rapid self-attention mechanism. This approach is focused only on Chinese text summarisation. Similarly, in Ref. [38] an ensemble-based framework tailored for Turkish legal document summarisation is presented. This framework integrates four transformer models, LED, Long-T5, BART-Large and GPT-3.5 Turbo, where these models leverage both extractive precision and abstractive fluency to generate the final summary. In Reference [39], a hybrid model is developed only for summarising Amharic texts. This approach combines statistical and semantic features for extractive ATS and then combines the extractive method with standard abstractive ATS methods like BART.

Despite hybrid ATS not being very new, it should be noted that, as reported by the works on hybrid ATS in literature, most of the approaches only achieved mediocre accuracy.

2.4 Data pre-processing for automatic text summarisation

Although the overall process of ATS depends on various factors, such as the context and domain of the summarisation problem, two steps are inevitable for any summarisation process and these are data pre-processing and summarisation methods [1, 2, 5, 40]. We have discussed about summarisation methods in section 2.1 – 2.3. In this section, we briefly focus on typical methods applied for data pre-processing in ATS. Pre-processing is the initial step in text summarisation, where unstructured data are transformed into structured data based on the requirements for summarisation. Several techniques are used to extract the most significant information from the source text and provide a summary, such as (1) text cleaning involves removal of extra characters, symbols and punctuation to lessen noise and raise the precision of later processing methods [2, 5]; (2) stopword removal aims to eliminate irrelevant words (e.g. conjunctions, substitute words and pointers) that do not carry distinctive information [5, 40]; (3) stemming is used to eliminating prefixes and suffixes from words [2, 5, 12] (4) tokenising is used to divide sentences, paragraphs or documents, into certain to-kens/parts [2, 40]. In our approach, we have applied most of these pre-processing techniques as discussed in Section 3.1 and in Section 4.

3. Design and implementation of the framework

In this section, we discuss the design and implementation of our ATS framework. We follow a structured process to implement our framework. Figure 2 shows a very high-level view of this structured process, which integrates multiple stages, starting from data input to model evaluation and deployment. It begins with the dataset Input, where the raw data, such as text documents, are ingested for processing. The data flow into the data pre-processing step, where it is cleaned, tokenized and split into manageable chunks to ensure compatibility with downstream models. Following this, the process enters into modular pipeline step, which connects the pre-processing step with the model inference process, ensuring a smooth flow of data. Once the summarisation model generates an Output Summary, the pipeline assesses the output through an evaluation mechanism. At this stage, a decision point asks whether a reference summary exists for comparison. If a reference summary is available, ROUGE and BERTScore evaluations are performed. ROUGE [25] is a common automated metric for assessing summarisation models, which compares the generated summary with the ground truth. On the other hand, BERTScore measures semantic similarity of the generated summary and the reference summary [26]. In the absence of a reference summary, the pipeline offers human evaluation, in which professionals evaluate the summary's relevance, coherence and quality by hand. Depending on the evaluation results, the summary or model might require fine-tuning or further adjustment.

Figure 2

View large Download slide

Process followed to implement modular pipeline for automatic text summarisation (high-level view)

Once the generated summaries pass the evaluation criteria, the process concludes with deployment, where the summarisation model is integrated into production environments such as web applications. Overall, the figure outlines an end-to-end structured workflow for text summarisation, combining data pre-processing, model selection and evaluation for robust and efficient deployment in real-world applications.

In the rest of this section, we discuss the implementation of the modular pipeline and deployment of the modular pipeline with the help of detailed process diagrams.

3.1 Implementation of modular pipeline

Figure 3 shows the process that is followed to implement the modular pipeline of our framework. We have used several tools and libraries for the implementation of the modular pipeline. LangChain is used for modular pre-processing, as it guarantees the efficient handling of huge text data by breaking it up into digestible chunks. LangChain also supports the integration of other LLMs used in our implementation. For extractive summarisation, we have used LexRank algorithm, and for abstractive summarisation, the BART model from HuggingFace is used. The reason behind the selection of LexRank and BART for modular pipeline is explained in Section 4. We have also used ROUGE and BERTScores to assess the quality of the summaries, as these metrics provide numerical assessment of summary correctness in terms of the standard metrics (e.g. precision, recall and F1-score) by comparing the generated and reference summaries. Furthermore, WordCloud and Matplotlib are used for evaluating and displaying the generated summaries.

Figure 3

View large Download slide

Process followed to implement modular pipeline for automatic text summarisation

Step 1 - Dataset Setup:

We used the pre-existing CNN/Daily Mail dataset in our implementation as this is the most widely used dataset for summarisation [4, 7, 12]. Approximately 300,000 news stories are included in this dataset, which is frequently used to assess ATS methods. Each story is accompanied by a human-written summary known as “highlights.” Although the articles cover a wide range of subjects, they mostly concentrate on current affairs, and the highlights provide succinct, citation-based summaries that emphasise the articles' key ideas.

Step 2 – Pre-processing:

The pre-processing entails use of LangChain's built-in text splitters to divide long texts into digestible sections. This guarantees context continuity while keeping the input text inside the model's token bounds. We used LangChain's functions to allow smooth transitions between sections by defining a chunk size of 7,000 characters with a 250-character overlap between consecutive chunks. Newline characters (“\n”) and spaces are used as separators to support a variety of text formats. Large chunk size and overlapping was chosen to ensure preservation of context [41, 42]. Pre-processing improves performance in both extractive and abstractive summarisation tasks, simplifies text handling and optimises input for summarisation pipelines, all of which lead to improved results in subsequent tasks.

Step 3: Tokenisation:

By employing an application programming interface (API) token for authentication, safe access to external resources such as Hugging Face Hub is guaranteed. In order to integrate pre-trained models, like BART, for summarisation tasks, this procedure verifies credentials. Through authentication, users can download models and manage settings with efficiency, protect their privacy and stop unwanted access. Tokens should be kept private to safeguard sensitive operations.

Step 4: Modular pipeline implementation:

As mentioned before, the modular pipeline combines extractive and abstractive summarisation. Using graph-based ranking, LexRank first selects important sentences for extractive summarisation. For a more polished, coherent synopsis, the extracted sentences are subsequently fed into LangChain's BART-based abstractive model. The BART model creates summaries with parameters like beam search, maximum length and length penalty for balance using a Hugging Face API. By combining both techniques, the pipeline guarantees high-quality summaries by rephrasing important elements for readability (BART) and capturing them (LexRank). Both shallow and deep summarisation techniques are used in this two-step hybrid approach to increase flexibility and performance.

Step 5: Evaluation using ROUGE and BERTScore metrics:

The key evaluation metrics used for this task are ROUGE and BERTScore. ROUGE compares the overlap between model-generated summaries and human-written summaries. BERTScore computes the similarity of generated and reference summaries as a sum of cosine similarities between their tokens' embeddings. We have selected ROUGE and BERTScore as evaluation metrics, as this is reported as the most widely used evaluation metric for text summarisation [1, 2, 4, 10].

ROUGE Metric relies on n-gram matching, where the more overlapping words or phrases, the better the alignment. We have used three variants of ROUGE metric which are ROUGE-1, ROUGE-2 and ROUGE-L as defined below:

ROUGE-1: Overlapping unigrams (individual words);
ROUGE-2: Overlapping bigrams (pairs of words) and
ROUGE-L: Based on the length of the longest common subsequence.

For each of these ROUGE metrics, we have calculated recall, precision and F1 score using the formulas as shown below,

Recall = \frac{number of overlapping n - grams}{t o t a l n u m b e r o f n - g r a m s i n r e f e r e n c e s u m m a r y}

Precision = \frac{N u m b e r o f o v e r l a p p i n g n - g r a m s}{t o t a l n u m b e r o f n - g r a m s i n c a n d i d a t e s u m m a r y}

F_{1} = 2 * \frac{P r e c i s i o n * R e c a l l}{Precision + Recall}

For a given a reference sentence $x = (x_{1} \dots . . x_{k})$ and a generated sentence $y = (y_{1} \dots . . y_{l})$ ⁠, recall, precision and F1 score are calculated using the formulas shown below. In the formulas below, $x \oplus y$ represents inner product.

R e c a l l = \frac{1}{| x |} \sum_{x_{i} \in x} \max_{y_{j} \in y} x_{i} \oplus y_{j}, P r e c i s i o n = \frac{1}{| y |} \sum_{y_{j} \in y} \max_{x_{i} \in x} x_{i} \oplus y_{j}, F_{1} = 2 * \frac{P r e c i s i o n * R e c a l l}{Precision + Recall}

3.2 Deployment of modular pipeline using StreamLit

We have used StreamLit framework to create a dashboard that allows users to interact with the modular pipeline. Figure 4 presents the extended process diagram that shows the integration of the modular pipeline with SteramLit framework.

Figure 4

View large Download slide

Process followed to deploy modular pipeline for automatic text summarisation as application

As shown in Figure 4, the user is first prompted to select whether to enter text or a DOCX file. Text is extracted if a DOCX file is selected; if not, text is entered directly into the textbox. The application asks if a reference summary is available after identifying the input type. If the user already has a reference summary, then they can paste it into the box supplied. Once input text and reference summary are provided, the text summarisation is carried out as described in Section 3.1.

The system combines findings and verifies mistakes through post-processing after summarisation. If a reference summary is given, the process calculates precision, recall and F1 scores for different evaluation metrics, such as ROUGE-1, ROUGE-2 and ROUGE-L. The process is concluded with the display of the final report, with or without scores as appropriate.

4. Testing and evaluation of our framework

In order to test and evaluate our framework we have examined the performance of various abstractive and extractive algorithms separately on a given dataset (CNN/DM). With the ROUGE and BERTScore serving as the assessment metric, the emphasis is on contrasting models such as TextRank and LexRank for extractive summarisation and BART and T5 for abstractive summarisation. We have selected these four models, as these models are very widely studied in literature for text summarisation [7, 8, 15, 27, 43].

Table 1 shows the configuration for extractive models that we applied for our experiments. As shown in the table, we have chosen en_core_web_sm as our tokeniser as this lightweight English language processing model is optimised for CPU usage. We have not set any threshold for sentence ranking so that all the ranked sentences are included in the summary. The rest of the settings are self-explanatory.

Table 1

Hyper parameter settings for extractive models

Model	Text cleaning	Stopwords	Tokeniser	Sentence rank threshold
LexRank	Lowercasing Leading/Trailing space removal Replace consecutive spaces by single space	English	en_core_web_sm	None
TextRank	Lowercasing Leading/Trailing space removal Replace consecutive spaces by single space	English	en_core_web_sm	None

Model	Text cleaning	Stopwords	Tokeniser	Sentence rank threshold
LexRank	Lowercasing Leading/Trailing space removal Replace consecutive spaces by single space	English	en_core_web_sm	None
TextRank	Lowercasing Leading/Trailing space removal Replace consecutive spaces by single space	English	en_core_web_sm	None

Table 2 shows the configuration for abstractive models that we applied for our experiments. As shown in the table, apart from the output token size, for other parameters (i.e. max input token, beam size and length penalty), we used empirically default commonly used values as reported by References. [44, 45]. Max output token is set as 200 to allow for richer summaries as suggested in Reference [46].

Table 2

Hyper parameter settings for abstractive models

Model	Max input token	Max output token	Beam size	Length penalty
BART	1,024	200	5	1
T5	1,024	200	5	1

Table 3, shows the precision, recall and F1 score for each ROUGE metric for each model. Table 4 shows the precision, recall and F1 for BERTScore. Figure 5 shows F1 score for each ROUGE metric for each model as bar charts. It should be noted that our implementation also used word clouds to show the terms that appear most frequently in the summaries that are produced, giving information on the main ideas that each model captures. We have not included any image of word clouds in this paper for brevity.

Table 3

ROUGE metric scores for individual abstractive and extractive text summarisation models

	ROUGE score →		ROUGE-1	ROUGE-2	ROUGE-L
Abstractive	BART [14]	Precision	0.5588	0.3939	0.5000
		Recall	0.5135	0.3611	0.4595
		F1	0.5352	0.3768	0.4789
	T5 [16]	Precision	0.2353	0.0303	0.1471
		Recall	0.1951	0.0250	0.1220
		F1	0.2133	0.0274	0.1333
Extractive	TextRank [6]	Precision	0.1619	0.0673	0.0952
		Recall	0.5000	0.2121	0.2941
		F1	0.2446	0.1022	0.1439
	LexRank [21]	Precision	0.1022	0.1125	0.1605
		Recall	0.5000	0.2727	0.3824
		F1	0.2957	0.1593	0.2261

	ROUGE score →		ROUGE-1	ROUGE-2	ROUGE-L
Abstractive	BART [14]	Precision	0.5588	0.3939	0.5000
		Recall	0.5135	0.3611	0.4595
		F1	0.5352	0.3768	0.4789
	T5 [16]	Precision	0.2353	0.0303	0.1471
		Recall	0.1951	0.0250	0.1220
		F1	0.2133	0.0274	0.1333
Extractive	TextRank [6]	Precision	0.1619	0.0673	0.0952
		Recall	0.5000	0.2121	0.2941
		F1	0.2446	0.1022	0.1439
	LexRank [21]	Precision	0.1022	0.1125	0.1605
		Recall	0.5000	0.2727	0.3824
		F1	0.2957	0.1593	0.2261

Table 4

BERTScores for individual abstractive and extractive text summarisation models

		Precision	Recall	F1
Abstractive	BART [14]	0.7170	0.7189	0.7179
Abstractive	T5 [16]	0.5383	0.5307	0.5344
Extractive	TextRank [6]	0.6394	0.4799	0.5483
Extractive	LexRank [21]	0.6586	0.5629	0.6070

		Precision	Recall	F1
Abstractive	BART [14]	0.7170	0.7189	0.7179
Abstractive	T5 [16]	0.5383	0.5307	0.5344
Extractive	TextRank [6]	0.6394	0.4799	0.5483
Extractive	LexRank [21]	0.6586	0.5629	0.6070

Figure 5

View large Download slide

Comparison of ROUGE metric scores for individual abstractive and extractive text summarisation models

As shown in Tables 3 and 4 and Figure 5, BART performs better than other models on every metric. It achieves the highest F1 score for ROUGE-1 (0.5352) and for BERTScore (0.7179), demonstrating good overall performance in identifying important words and phrases from reference summaries. A difficult component of summarisation, the efficient modelling of bi-gram sequences is demonstrated by F1 score (0.3768) for ROUGE-2. BART's F1 score (0.4789) for ROUGE-L also demonstrates that it can generate summaries that are in good agreement with the references' sequence structure. These results validate the excellent abstractive summarisation ability of BART.

Tables 3 and 4 and Figure 5 also show that the graph-based extractive summarisation algorithm LexRank does well, particularly when contrasted with other extractive techniques like TextRank. It routinely outperforms TextRank in terms of F1 Score for ROUGE-1 (0.2957), for ROUGE-L (0.2261) and for BERTScore (0.6070). Additionally, LexRank appears to capture meaningful bi-gram relations more effectively than TextRank, as indicated by its F1 score (0.1593) for ROUGE-2. Although it performs worse than BART in abstractive tasks, LexRank is a powerful extractive summariser that is useful in situations where a purely extractive method is needed.

After evaluating the performance of each model, they are combined into a modular pipeline. More specifically we have combined the best performing abstractive model (BART) and the best performing extractive model (LexRank). This setup guarantees that only the best methods are used to enhance summarisation results. Table 5 shows the parameter settings (in addition to the settings shown in Tables 1 and 2 for LexRank and BART, respectively) for the pipeline that we applied for our experiments. As shown in the table, LangChain splits the initial input into chunks of 7,000 characters with a 250-character overlap between consecutive chunks. lexrank_sentences specifies the preferable number of sentences in the summary generated by LexRank, min_length and max_length signifies the preferred minimum and maximum length of the summary (in terms of characters) generated by BART.

Table 5

Parameter settings for pipeline

LangChain	LexRank	BART
chunk_size = 7,000	lexrank_sentences = 15	min_length = 20 max_length = 300
chunk_overlap = 250	lexrank_sentences = 15	min_length = 20 max_length = 300

In addition to CNN/DM dataset, for evaluation of our framework we have used Newsroom [23] and XSum [24] datasets. Both Newsroom and XSum are large datasets, widely used for training and evaluating summarisation systems [4, 9, 10]. Table 6 shows the precision, recall and F1 score for each ROUGE metric for the modular pipeline. Table 7 shows the precision, recall and F1 score for BERTScore. Figure 6 shows F1 score for each ROUGE metric for BART, LexRank and Hybrid (BART + LexRank) model as bar charts for CNN/Daily Mail dataset (as we evaluated individual models using CNN/Daily Mail dataset).

Table 6

ROUGE metric scores for our modular pipeline (LexRank + BART)

Data set	ROUGE score	Precision	Recall	F1
CNN/DM	ROUGE-1	0.7188	0.6571	0.6866
	ROUGE-2	0.4839	0.4412	0.4615
	ROUGE-L	0.5000	0.4571	0.4776
Newsroom	ROUGE-1	0.7083	0.3820	0.4964
	ROUGE-2	0.5745	0.3068	0.4000
	ROUGE-L	0.6667	0.3596	0.4672
XSum	ROUGE-1	0.4250	0.6071	0.5000
	ROUGE-2	0.2658	0.3818	0.3134
	ROUGE-L	0.2500	0.3571	0.2941

Data set	ROUGE score	Precision	Recall	F1
CNN/DM	ROUGE-1	0.7188	0.6571	0.6866
	ROUGE-2	0.4839	0.4412	0.4615
	ROUGE-L	0.5000	0.4571	0.4776
Newsroom	ROUGE-1	0.7083	0.3820	0.4964
	ROUGE-2	0.5745	0.3068	0.4000
	ROUGE-L	0.6667	0.3596	0.4672
XSum	ROUGE-1	0.4250	0.6071	0.5000
	ROUGE-2	0.2658	0.3818	0.3134
	ROUGE-L	0.2500	0.3571	0.2941

Table 7

BERTScores for our modular pipeline (LexRank + BART)

Data set	Precision	Recall	F1
CNN/DM	0.7541	0.7747	0.7643
Newsroom	0.6723	0.8116	0.7354
XSum	0.6616	0.6222	0.6413

Figure 6

View large Download slide

Comparison of ROUGE metric scores of BART, LexRank and our modular pipeline (LexRank + BART) for CNN/daily mail dataset

Tables 6 and 7 and Figure 6 demonstrate that, as compared to the performance of independent models (Tables 3 and 4 and Figure 5), text summarisation performance, for CNN/DM dataset, significantly improves when LexRank and BART are combined using LangChain. Superior ROUGE scores are obtained by the hybrid model for most of the metrics. For example, F1 score for ROUGE-1 achieves 68.66%, surpassing the previous peak score of 53.52% achieved by BART alone. BART's previous best F1 score for ROUGE-2 of 37.68% has been surpassed by the pipeline's F1 score, which shows the retention of meaningful bi-grams in the extractions. Additionally, F1 score for ROUGE-L for the hybrid model is very close to BART's prior maximum of 47.89%. Similarly, BERTScore for the pipeline (for CNN/DM dataset) has improved compared to BERTScore for individual models. These improvements demonstrate the hybrid model's capacity to extract important information and generate logical, well-organised summaries.

We have also compared the ROUGE score of our hybrid model with the other existing models that have used CNN/Daily Mail dataset for text summarisation. As reported in the exhaustive survey on text summarisation in Reference [4], several other models have conducted evaluation based on CNN/Daily Mail data set. Among these models, we have selected three best performing models, which are (1) extractive summarisation model is DiffuSum [47], (2) abstractive model is SliSum [48] and (3) hybrid model is GSum [49]. The F1 score for each ROUGE metric for these models is reported in Reference [4], and F1 score of our hybrid model is shown in Table 8. Figure 7 shows the F1 score for each ROUGE metric for these models and our hybrid model as bar charts.

Table 8

ROUGE metric scores for existing models (DiffuSum, SliSum and GSum) and our modular pipeline (LexRank + BART) for CNN/daily mail dataset

ROUGE score	DiffuSum [47]	SliSum [48]	GSum [49]	LexRank + BART (our hybrid model)
ROUGE-1	0.4483	0.4775	0.4594	0.6866
ROUGE-2	0.2256	0.2316	0.2232	0.4615
ROUGE-L	0.4056	0.4426	0.4248	0.4776

ROUGE score	DiffuSum [47]	SliSum [48]	GSum [49]	LexRank + BART (our hybrid model)
ROUGE-1	0.4483	0.4775	0.4594	0.6866
ROUGE-2	0.2256	0.2316	0.2232	0.4615
ROUGE-L	0.4056	0.4426	0.4248	0.4776

Figure 7

View large Download slide

Comparison of ROUGE metric scores for existing models (DiffuSum, SliSum and GSum) and our modular pipeline (LexRank + BART)

As shown in Table 8 and Figure 7, our hybrid model has outperformed the other models for each ROUGE metric. The use of LangChain to integrate LexRank with BART and complimentary advantages of extractive and abstractive approaches account for the pipeline's outstanding performance. Through our experiments, we made two observations that enabled our pipeline to achieve the performance reported in this section. Firstly, the selection of chunk size for LangChain. In the first place, LangChain splits the input text into smaller chunks, and the selection of the chunk size (as stated in Table 5) played a crucial role in the overall performance. In our initial experiments, a smaller chuck size (e.g. less than 5,000 characters) did not produce good results, and better results were only achieved with larger chunk size (e.g. 7,000 characters). Secondly, the selection of the size of summary is generated by LexRank (as stated as lexrank_sentence is Table 5). The election of a smaller size (e.g. less than 12) or larger value (e.g. greater than 15) did not produce good results, and better results were only achieved with summary size of 12 to 15 sentences. This ensured that in extractive phase, LexRank makes sure that crucial sentences are accurately detected, and BART rewords these extracts into coherent summaries that are rich in context. It should be noted that error propagation from extractive phase to abstractive phase is a typical issue as reported in the literature, e.g. in References [44, 50]. Though this is not investigated in our approach, our results as compared with other approaches indicate that this error propagation could be minimal in our approach. Nevertheless, we endeavour to investigate this in future work to improve the accuracy of our approach.

5. Conclusions and future directions

The primary contribution of this study is the development and evaluation of a modular pipeline combining LexRank, an extractive summarisation technique, and BART, an advanced abstractive model, orchestrated via LangChain. The results indicate that our hybrid approach outperforms traditional single-model methods and other existing hybrid models as evidenced by significantly higher ROUGE scores across various experiments. Although our work demonstrated very good results, it should be noted that it had certain shortcomings. In our experiments, we used datasets that have short length documents (within 7,000 characters). We need to perform more experiments with long documents to evaluate the robustness of our pipeline. As mentioned in Section 4, we have not investigated error propagation from extractive phase to abstractive phase, which may have an impact on the performance of our framework. Finally, the focus of our study was restricted to general-purpose summarisation; domain-specific modifications were noted as a possible topic for further research but were not thoroughly examined in this study.

Several suggestions can direct future research and real-world applications in autonomous text summarisation, building on the findings from our work, which are listed below,

Firstly, applying the modular pipeline to domain-specific datasets, such as scientific, legal or medical texts, would offer important insights about how adaptable and effective it is in particular settings. Further improving the quality and relevance of generated summaries may involve tailoring pre-processing and summarisation methods for domain-specific terminology and data structures.
Second, adding other sophisticated evaluation metrics like MoverScore or LLM based metrics (e.g. LLM-as-a-judge) could enhance ROUGE and BERTScores and offer a more sophisticated review of semantic coherence and factual consistency. These measures may also be used to detect small enhancements in the quality of abstractive summarisation that are missed by conventional evaluation techniques. Larger-scale human inspection, especially for domain-specific summaries, could confirm the pipeline's functionality and point out areas that need improvement.
Third, further research in the domain of multilingual capabilities exploration is crucial. Adding multilingual functionality to the pipeline will greatly boost its usefulness in international applications. Using pre-trained multilingual models and modifying pre/post-processing procedures to accommodate various linguistic structures would be necessary to achieve this.

References

1.

Patel

V

,

Tabrizi

N

.

An automatic text summarization: a systematic review

.

Comp Sist

.

2022

;

26

(

3

). doi:

https://doi.org/10.13053/cys-26-3-4347

.

Google Scholar

2

Pramita Widyassari

A

,

Rustad

S

,

Shidik

GF.

,

Noersasongko

E

,

Syukur

A

,

Affandy

A

,

Moses Setiadi

RI

.

Review of automatic text summarization techniques & methods

.

J King Saud Univ - Comp Inf Sci

.

2022

;

34

(

4

):

1029

-

1046

.

Google Scholar

Crossref

3.

Andhale

N

,

Bewoor

LA

.

An overview of text summarisation techniques

. In:

2016 International Conference on Computing Communication Control and automation (ICCUBEA)

.

IEEE

;

2016

. p.

1

-

7

. doi:

https://doi.org/10.1109/ICCUBEA.2016.7860024

.

Google Scholar

4.

Zhang

H

,

Yu

PS

,

Zhang

J

.

A systematic survey of text summarization: from statistical methods to large language models

.

ACM Comput Surv

.

2025

;

57

(

11

):

41

: 277.

(November 2025)

doi:

https://doi.org/10.1145/3731445

.

Google Scholar

Crossref

5.

Khan

B

,

Shah

ZA

,

Usman

M

,

Khan

I

,

Niazi

B

.

Exploring the landscape of automatic text summarization: a com-prehensive survey

.

IEEE Access

.

2023

;

11

:

109819

-

40

. doi:

https://doi.org/10.1109/ACCESS.2023.3322188

.

Google Scholar

Crossref

6.

Mihalcea

R

,

Tarau

P

.

TextRank: bringing order into text

. In:

Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP)

;

2004

.

Google Scholar

7.

Jijo

SM

,

Panchal

D

,

Ardeshana

J

,

Chaudhari

U

.

Text summarization using Textrank, Lexrank and Bart model

. In:

2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)

,

Kamand

;

2024

. p.

1

-

7

. doi:

https://doi.org/10.1109/ICCCNT61001.2024.10725676

.

Google Scholar

Crossref

8

Chen

Y

,

Song

Q

.

News text summarization method based on BART-TextRank model. 2005-2010

.

IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference

;

2021

.

Google Scholar

9

Zhang

M

,

Zhou

G

,

Yu

W

,

Huang

N

,

Liu

W

. A comprehensive survey of abstractive text summarisation based on deep learning. In:

Hošovský

A

(Ed).

Computational intelligence and neuroscience

.

Hindawi

;

2022

.

1

-

21

.

Google Scholar

10.

Mulla

S

,

Shaikh

NF

.

Over comparative study of text summarisation techniques based on graph neural networks

.

Web Intell

.

2024

;

22

(

2

):

231

-

48

. doi:

https://doi.org/10.3233/WEB-230014

.

Google Scholar

Crossref

11.

Shukre S, Salunkhe S, Rathi P, Shinde V, Mane MV. Research paper summarisation using NLP. IRE J.

2023

;

6(12). ISSN: 2456-8880. Available from:

https://www.irejournals.com/formatedpaper/1704569.pdf

12.

Mridha

MF

,

Lima

AA

,

Nur

K

,

Das

SC

,

Hasan

M

,

Kabir

MM

.

A survey of automatic text summarization: progress, process and challenges

.

IEEE Access

.

2021

;

9

:

156043

-

70

. doi:

https://doi.org/10.1109/ACCESS.2021.3129786

.

Google Scholar

Crossref

13.

Asmitha

M

,

Kavitha

CR

,

Radha

D

.

Summarizing news: unleashing the power of BART, GPT-2, T5, and Pegasus models in text summarization

. In:

2024 4th International Conference on Intelligent Technologies (CONIT)

,

Bangalore

;

2024

. p.

1

-

6

. doi:

https://doi.org/10.1109/CONIT61985.2024.10626617

.

Google Scholar

Crossref

14.

Lewis

M

,

Liu

Y

,

Goyal

N

,

Ghazvininejad

M

,

Mohamed

A

,

Levy

O

,

Stoyanov

V

,

Zettlemoyer

L

.

BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension

.

2020

.

7871

-

80

.

https://doi.org/10.18653/v1/2020.acl-main.703

.

Google Scholar

15.

Wang

M

,

Xie

P

,

Du

Y

,

Hu

X

.

T5-based model for abstractive summarization: a semi-supervised learning approach with consistency loss functions

.

Appl Sci

.

2023

;

13

(

12

):

7111

. doi:

https://doi.org/10.3390/app13127111

.

Google Scholar

Crossref

16.

Raffel

C

,

Shazeer

N

,

Roberts

A

,

Lee

K

,

Narang

S

,

Matena

M

,

Zhou

Y

,

Li

W

,

Liu

PJ

.

Exploring the limits of transfer learning with a unified text-to-text transformer

.

J Mach Learn Res

.

2020

;

21

(

1

):

67

. 140.

(January 2020)

.

Google Scholar

17.

Maynez J, Narayan S, Bohnet B, McDonald R.

On faithfulness and factuality in abstractive summarisation

;

2020

.

Available from:

http://arxiv.org/abs/2005.00661

18.

Liu

KW

,

Gao

Y

,

Li

J

,

Yang

Y

.

A combined extractive with abstractive model for summarization

.

IEEE Access

.

2021

;

9

:

43970

-

80

. doi:

https://doi.org/10.1109/ACCESS.2021.3066484

.

Google Scholar

Crossref

19.

Yin S, Fu C, Zhao S, Li K, Sun X, Xu T, Chen E.

A survey on multimodal large language models

.

IEEE

.

2024

.

Available from:

https://arxiv.org/pdf/2306.13549

20.

Chase

H

.

LangChain

.

Available from:

https://www.langchain.com/ (

accessed

2 July 2025).

21.

Erkan

G

,

Radev

DR

.

LexRank: graph-based lexical centrality as salience in text summarization

.

J Artif Int Res

.

2004

;

22

(

1 (July 2004

):

457

-

79

. doi:

https://doi.org/10.1613/jair.1523

.

Google Scholar

Crossref

22.

Wolf

T

,

Tunstall

L

,

Plu

J

,

Bragg

J

,

von Platen

P

,

McMillan-Major

A

.

CNN/daily mail dataset

.

Available from

: https://huggingface.co/datasets/abisee/cnn_dailymail

23.

Grusky

M

,

Naaman

M

,

Artzi

Y

.

CORNELL NEWSROOM summarization dataset

.

Available from

: https://lil.nlp.cornell.edu/newsroom/index.html

24.

Wolf

T

,

Tunstall

L

,

Abarham

M

,

Bragg

J

,

Lhoest

Q

,

von Platen

P

.

Extreme summarization (XSum) dataset

.

Available from

: https://huggingface.co/datasets/EdinburghNLP/xsum

25.

Lin

C-Y

.

ROUGE: a package for automatic evaluation of summaries

. In:

Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004)

,

July 25 - 26, 2004

,

Barcelona

;

2004

.

Google Scholar

26.

Zhang

T

,

Kishore

V

,

Wu

F

,

Kilian

Q

.

Weinberger, Yoav Artzi, BERTScore: evaluating text generation with BERT

. In:

The Eighth International Conference on Learning Representations

.

ICLR

;

2020

.

Google Scholar

27.

Erkan

G

,

Radev

DR

.

LexRank: graph-based lexical centrality as salience in text summarization

.

J Artif Int Res

.

2004

;

22

(

1 (July 2004

):

457

-

79

. doi:

https://doi.org/10.1613/jair.1523

.

Google Scholar

Crossref

28.

Patel

D

,

Shah

S

,

Chhinkaniwala

H

.

Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique

.

Expert Syst Appl

.

2019

;

134

: 167177.

Google Scholar

29.

Nallapati R, Zhou B, Nogueira dos santos C, Gulcehre C, Xiang B.

Abstractive text summarisation using sequence-to-sequence RNNs and beyond

;

2016

.

Available from:

https://arxiv.org/abs/1602.06023

30.

Mandal

S

,

Achary

P

,

Phalke

S

,

Poorvaja

K

,

Kulkarni

M

.

Extractive text summarization using supervised learning and natural language processing

.

2021 International Conference on Intelligent Technologies (CONIT)

,

Hubli

,

2021

, pp.

1

-

7

, doi:

https://doi.org/10.1109/CONIT51480.2021.9498322

.

Google Scholar

Crossref

31.

Sutskever

I

,

Vinyals

O

,

Le

QV

.

Sequence to sequence learning with neural networks

;

2014

.

Available from:

https://arxiv.org/abs/1409.3215

Google Scholar

32.

Rush

AM

,

Chopra

S

,

Weston

J

.

A neural attention model for abstractive sentence summarisation

;

2015

.

Available from:

https://arxiv.org/abs/1509.00685

Google Scholar

33.

Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H, Chen Q, Peng W, Feng X, Qin B, Liu T

A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions

;

2023

.

Available from:

http://arxiv.org/abs/2311.05232

34.

See

A

,

Liu

PJ

,

Manning

CD

.

Get to the point: summarisation with pointer-generator networks

;

2017

.

Available from:

https://arxiv.org/abs/1704.04368

Google Scholar

35.

Edrees

Z

,

Ortakci

Y

.

Optimizing text summarization with sentence clustering and natural language processing

.

Int J Adv Comp Sci Appl

.

2024

;

15

:

1123

-

32

. doi:

https://doi.org/10.14569/IJACSA.2024.01510115

.

Google Scholar

Crossref

36.

Pei

J

,

Hantach

R

,

Ben Abbès

S

,

Calvez

P

.

Towards hybrid model for automatic text summarization

. In:

2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA)

,

Miami, FL

;

2020

. p.

987

-

93

. doi:

https://doi.org/10.1109/ICMLA51294.2020.00160

.

Google Scholar

Crossref

37.

Xu

K

,

Liu

B

,

Li

J

,

Li

Y

,

Hl

C

,

Qu

G

.

Automatic text summary generation method based on hybrid model DNM

. In:

2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

,

Melbourne

;

2021

. p.

637

-

42

. doi:

https://doi.org/10.1109/SMC52423.2021.9659022

.

Google Scholar

Crossref

38.

Abdullah Albayati

MA

,

Mustafa Karaoğlan

K

,

Findik

O

.

Towards efficient multi-legal document summarization: an ensemble approach for Turkish law

.

Eng Sci Technol Int J

.

2025

;

70

: 102138. ISSN:

[PubMed]

. doi:

https://doi.org/10.1016/j.jestch.2025.102138

.

Google Scholar

39.

Moges

AM

,

Xiong

S

,

Aberha

AF

.

Hybrid approach for automatic text summa-rization for low-resourced Amharic language

.

ACM Trans Asian Low-Resour Lang Inf Process

.

2025

;

24

(

7

):

14

: 64.

(July 2025)

.

Google Scholar

40.

Widyassari

AP

,

Noersasongko

E

,

Syukur

A

,

Affandy

.

The 7-phases preprocessing based on extractive text summarization

. In:

2022 Seventh International Conference on Informatics and Computing (ICIC), Denpasar

,

Bali

;

2022

. p.

1

-

8

. doi:

https://doi.org/10.1109/ICIC56845.2022.10006998

.

Google Scholar

Crossref

41.

Bonnie

I

.

A guide to setting embedding chunk length: finding the sweet spot between small and large chunks for RAG

,

2025

.

Available from:

https://medium.com/@averyaveavi/a-guide-to-setting-embedding-chunk-length-finding-the-sweet-spot-between-small-and-large-chunks-03093464ee8a

Google Scholar

42.

Chase

H

.

LangChain documentation

.

Available from:

https://docs.langchain.com/oss/python/integrations/splitters/recursive_text_splitter#splitting-text-from-languages-without-word-boundaries (

accessed

31 December 2025).

43.

Rehman

T

,

Das

S

,

Sanyal

D

,

Chattopadhyay

S

.

An analysis of abstractive text summarization using pretrained models

;

2022

. doi:

https://doi.org/10.1007/978-981-19-1657-1_21

.

Google Scholar

44.

Wu

Y

,

Li

H

,

Nenadic

G

,

Zeng

X-J

.

Extract-and-abstract: unifying extractive and abstractive summariza-tion within single encoder-decoder framework

.

arXiv:2409.11827v2

.

[cs.CL] 8 Aug 2025

.

45.

Zhang

J

,

Zhao

Y

,

Saleh

M

,

Liu

PJ

.

Pegasus: pre-training with extracted gap-sentences for abstractive summarization

. In:

Proceedings of the 37th International Conference on Machine Learning, ICML’20

.

JMLR.org

.

46.

Lopez Yse

D

.

Abstractive text summarization in Python: comparing transformer models

;

2025

.

Available from:

https://lopezyse.medium.com/abstractive-text-summarization-in-python-comparing-transformer-models-25e382606fe9

Google Scholar

47.

Zhang

H

,

Liu

X

,

Zhang

J

. DiffuSum: generation enhanced extractive summarization with diffusion. In:

Findings of the Association for Computational Linguistics

.

ACL

;

2023

. p.

13089

-

100

.

Google Scholar

48.

Li

T

,

Li

Z

,

Zhang

Y

.

Improving faithfulness of large language models in summarization via sliding generation and self-consistency

.

arXiv preprint arXiv:2407.21443

.

2024

.

Google Scholar

49.

Dou

Z-Y

,

Liu

P

,

Hayashi

H

,

Jiang

Z

,

Neubig

G

.

GSum: a general framework for guided neural abstractive summarization

. In:

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

.

Online

;

2021

. p.

4830

-

42

.

Google Scholar

Crossref

50.

Mehta

P

.

From extractive to abstractive summarization: a journey

. In:

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics – Student Research Workshop

,

August 7-12, 2016

,

Berlin

.

Association for Computational Linguistics

;

2016

. p.

100

-

6

.

Google Scholar

Crossref

2026

Kamran Mustafa and Khaled Mahbub

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence

Automatic text summarisation using a modular pipeline approach with LangChain

1. Introduction

2. Related works

2.1 Extractive summarisation

2.2 Abstractive summarisation

2.3 Hybrid model for summarisation

2.4 Data pre-processing for automatic text summarisation

3. Design and implementation of the framework

3.1 Implementation of modular pipeline

3.2 Deployment of modular pipeline using StreamLit

4. Testing and evaluation of our framework

5. Conclusions and future directions

References

Email Alerts

Cited By

Automatic text summarisation using a modular pipeline approach with LangChain Open Access

1. Introduction

2. Related works

2.1 Extractive summarisation

2.2 Abstractive summarisation

2.3 Hybrid model for summarisation

2.4 Data pre-processing for automatic text summarisation

3. Design and implementation of the framework

3.1 Implementation of modular pipeline

3.2 Deployment of modular pipeline using StreamLit

4. Testing and evaluation of our framework

5. Conclusions and future directions

References

Email Alerts

Suggested Reading

Recommended for you

Cited By

Automatic text summarisation using a modular pipeline approach with LangChain