Scholarly blogs serve as a medium for sharing scholarly output both inside and outside of academia. However, questions arise concerning the assurance of the long-term accessibility of scholarly blogs and their contents. The aim of this paper is to provide an overview of the scholarly blogging landscape in Germany and examine how scholarly blogs are integrated into digital research and information infrastructures.
Following a review of the literature, in this paper, we map the landscape of scholarly blogs in Germany. Therefore we (1) collected a sample of 866 German scholarly blogs, (2) developed analysis criteria and (3) collected data of each blog according to the analysis criteria. We present and discuss our data analysis in turn.
Our findings confirm the lack of integration of scholarly blogs into existing digital research and information infrastructure but also uncover efforts that facilitate and build research and information infrastructure around scholarly blogging. We recommend stakeholders with the potential to further facilitate the long-term accessibility and preservation of scholarly blogs.
Our study presents novel, original and large-scale findings on German scholarly blogs and their integration into digital research and information infrastructures that are relevant to information professionals and blogging scholars.
Introduction
Web-based content may seem persistent, but its long-term accessibility is not ensured. One-quarter of all web pages that existed at any point between 2013 and 2023 had disappeared by the end of that period (Chapekis et al., 2024). The availability of content was found to decline with age: 38% of web pages that were accessible in 2013 could no longer be reached in 2023, while 8% of websites that were active at some point in 2023 were already gone by October 2023 (Chapekis et al., 2024). This data loss has practical consequences. A 2014 study revealed that approximately 50% of all hyperlinks cited in decisions of the United States Supreme Court were no longer reliable because (1) the originally referenced content was missing, (2) the content had significantly changed, or (3) the links were completely broken (Zittrain et al., 2014). In 2019, the platform Myspace lost all music uploaded before 2016 during a server migration. This resulted in the permanent deletion of over 50 million songs from 14 million artists (Hern, 2019). These incidents highlight the importance of addressing the long-term digital accessibility and preservation of web content and of finding infrastructural solutions to safeguard it. The long-term accessibility of web content is also particularly important with regard to good scientific practice (ALLEA, 2023).
Data loss and a lack of long-term accessibility also raises concerns about scholarly blogs, blogs that are written by scholars or are concerned with scholarly topics, which will be the focus of this article (Littek, 2012; Puschmann and Mahrt, 2012; Wenninger, 2019). Scholarly blogs enable the distribution of scholarly output both inside the academic environment and outside of it (Luzón, 2013). Compared to traditional academic publishing venues, scholarly blogs are an accessible, cheap, fast, open, and more informal way of communicating research for both authors and recipients (Burton, 2015). Since scholarly blogs are particularly vulnerable to data loss and disappearance, the necessity to develop strategies to preserve scholarly blogs, for example through digital preservation efforts, has become increasingly urgent. Digital preservation ensures the long-term access to digitally stored information, for example through web archiving (Lee et al., 2002). Web archiving contributes to the preservation of cultural heritage and knowledge (Kalb et al., 2013). A common way to practice web archiving is through the internet Archive, an initiative aimed at developing a digital library of web pages (Internet Archive, n.d.; Kalb et al., 2013). Efforts to support the web archiving of blogs have been made in the BlogForever project, a project aimed at developing archiving strategies for weblogs. However, the during the project developed repository developed during the project is no longer available (Pampel and Rothfritz, 2024).
Furthermore, within the scholarly literature on blogs, the topic of their long-term accessibility has been little addressed (Hank, 2011). While digital information infrastructures have been organised around other types of scholarly output (e.g. textual publication types such as books and journals) to ensure their permanent accessibility, these processes and infrastructures have not been ensured when it comes to scholarly blogs, which bears the risk of information loss (Burton, 2015; Hank, 2011; Kalb et al., 2013; Kasioumis et al., 2014). Scholarly blogs pose legal and technical challenges for preservation, since they embed dynamic content such as pictures, videos, links and comments (Burton, 2015; Hank, 2011). Even though authors of traditional scholarly publications rarely view the long-term accessibility of their content to be their responsibility (Altenhöner and Schrimpf, 2014), the long-term accessibility of blogs has predominantly been discussed by bloggers themselves (Fenner, 2022; Hank, 2011) or projects concerned with building infrastructure to preserve scholarly blogs (Fenner, 2022; Guilleux, 2024; Kalb et al., 2013; Lazaridou et al., 2013). A study concerned with the digital preservation of scholarly blogs found that bloggers (1) view their blogs to be part of the scholarly record and (2) are interested in their blogs being preserved (Hank, 2011).
Within the scholarly literature on the topic, there is no overview of existing scholarly blogs or data on how scholarly blogs are integrated into already existing digital research and information infrastructure, which we define to be a shared distribution of social, organisational, and technical systems and activities that enable and support research practices (Bowker et al., 2010). Therefore, we define information infrastructure facilities to entail libraries, but also technical and organisational measures that facilitate the reuse of scholarly blogs. In light of these challenges, this study aims to investigate how scholarly blogs are currently maintained, preserved, and integrated into scholarly infrastructure. We pursue the following research questions:
How do scholarly blogs vary across disciplines in terms of activity, institutional affiliation and language?
How are scholarly blogs and their content already being integrated into existing digital research and information infrastructures to support long-term accessibility and preservation?
What efforts do scholarly bloggers apply to facilitate the accessibility and reusability of their blog contents?
To answer these questions, we collected 866 German scholarly blogs, and developed three sets of analysis criteria that (1) categorise the blogs into research fields, monitor their activity and determine their institutional affiliation and language, (2) uncover ways that scholarly blogs are already being integrated into digital research and information infrastructure and (3) discover other efforts scholarly bloggers apply to make their content accessible and reusable. We discuss our findings in turn. The paper concludes with recommendations of stakeholders with the potential to further facilitate the long-term accessibility of scholarly blogs.
Literature review
To gain an overview of the current state of research we conducted a scoping review, which is especially suitable for topics where key concepts, trends, and gaps have not yet been reviewed (Arksey and O’Malley, 2005). We searched Google Scholar and Scopus using the keywords “scholarly blog”, “science blogs”, as well as the German terms “Wissenschaftsblog” (Eng. science blog) and “Wissenschaftlicher Blog” (Eng. scientific blog), and then followed citations forwards and backwards. From the literature we then extracted keywords, method, terminology (e.g. “academic blogs”, “science blogs”, “scholarly blogs”), definition of scholarly blogs, objective and research question, results, limitations, and remaining knowledge gaps, and recorded the data into a data charting form. Based on the extracted information, we developed topic categorisations to sort the literature thematically, which we present in the following. We found that the literature on scholarly blogs is diverse and discusses a variety of subtopics and questions, using a wide range of methods. We first provide a brief overview on these topics and methods and then discuss gaps and limitations within the literature on scholarly blogs.
Background, definitions and terminology
Blogs (short for Weblogs) are frequently updated Web pages that store content or posts in reverse chronological order (Herring et al., 2005; Wilkins, 2008) and are publicly available. They can be operated by individuals, institutions or companies (Fenner, 2008). The first blogs emerged in the late 90s (Rettberg, 2008). The blogging format then gained more popularity in the 2000s as part of the Web 2.0. The term Web 2.0 refers to a change in the use of the Internet in which content is increasingly created by the users themselves (Fenner, 2008). Similar to other types of textual content, blogs often contain a number of bibliographic information, such as title information, a description of the blog, information about the blogger as well as standard features such as posts that have their own titles, blog archives (an overview of links to older blogposts), blogrolls (recommendations for other blogs) and feeds (Bader et al., 2012; Rettberg, 2008). Feeds are machine-readable files in XML format that can be integrated into other applications to share data. The most common feed formats are Rich Site Summary, also called Really Simple Syndication (RSS) and Atom (Bhatt, 2005; Yee, 2008). While the popularity of blogs heavily increased during the early 2000s (Rettberg, 2008), blogs on the subject of science and scholarship emerged. The involvement of publishers in the 2000s, such as Burda (ScienceBlogs), Spektrum der Wissenschaft (Eng. spectrum of science (SciLogs)) and the Nature Publishing Group, which began hosting scholarly blogs, increased the importance of scholarly blogs during that time. Furthermore, in 2008, the conference “ScienceBlogging” took place in London as one of the first specific conferences on the topic (Fenner, 2008).
A universally accepted definition of the term “scholarly blog” does not yet exist within the scholarly literature on the topic (Colson, 2011; Wenninger, 2019). However, at a broad level, scholarly blogs can be understood as those blogs that are written by scholars or are concerned with scholarly topics (Littek, 2012; Luzón, 2013; Puschmann and Mahrt, 2012; Wenninger, 2019). Among the scholarly community, there have been discussions about what content and which bloggers are scholarly enough to be considered scholarly blogs or scholarly bloggers (Wenninger, 2019). Some scholars argue that scholarly blogs must be written by active academic researchers (Luzón, 2013; Mauranen, 2013) with an academic affiliation (Puschmann and Mahrt, 2012). Others also include non-academic authors (Blanchard, 2011; Littek, 2012), since the mere fact that a blog is concerned with scholarly topics or written by an active academic does not guarantee their quality or that blog contents are dealt with in a scholarly manner (Wenninger, 2019). Scholars have therefore also included professions such as university or college students (Blanchard, 2011), teachers (Goldstein, 2009), and science journalists, since blogging about science has been considered a form of science journalism (Bonetta, 2007; Kjellberg, 2015). Unlike traditional academic publications, however, most scholarly blogs are not edited and peer reviewed prior to publication and also not preserved (Burton, 2015). Therefore, the level of recognition of scholarly blogs as academic output still varies, especially among different disciplines (Puschmann and Mahrt, 2012).
There are alternative terms that have been used for scholarly blogs, such as academic blogs (Davies and Merchant, 2007; Zou and Hyland, 2020), research blogs (Ferguson et al., 2010), science blogs, which are sometimes described to be blogs on the topic of natural sciences (Blanchard, 2011; Kouper, 2010; Puschmann and Mahrt, 2012), and scientific blogs (Bondi, 2022; Hanauska and Leßmöllmann, 2021). However, to our knowledge there is little literature that addresses these differences in terminology, other than the further categorizations of scholarly blogs mentioned above. Köhler (2008) has differentiated between “Wissenschaftlerblogs” (Eng. scientist blog), a blog that he defines to be connected to a scientist and functions as their online journal that is not limited to publishing research results, and “Wissenschaftsblog” (Eng. science blog), which Köhler (2008) describes to communicate science in a more classical scholarly manner (Köhler, 2008). However, this distinction is not broadly recognised or used by other scholars, who continue to use a diverse terminology often without providing a definition. Some scholars and bloggers further differentiate between different types of contents or purposes (Walker, 2006), such as research program blogs, event blogs, laboratory blogs, thesis blogs, seminar blogs (Köhler, 2008) and research group blogs (Luzón, 2006).
Prior research
A common topic within the research on scholarly blogs is scholarly communication and how scholarly blogs fit into and have expanded existing scholarly communication structures (Metcalfe, 2020; Zoukas, 2019). The term scholarly communication describes the interactions of scholars and the ways they socialise and communicate scholarly knowledge (Borgman and Furner, 2002; Burton, 2015), among each other but also other stakeholders such as publishers, libraries and society (Burton, 2015). Scholarly blogs enable internal science communication as well as the external scholarly communication to the broader public. Furthermore, scholarly blogs enable the dissemination of research according to Open Science principles (Shema et al., 2012). Publishing research according to Open Science and the Open Access principles has been emphasised to be good scientific practice (ALLEA, 2023). While blogs play a valuable role in fostering open scholarly dialogue, particularly within community-driven settings, they often do not fully correspond to the formal, procedural, and infrastructural frameworks that characterize Diamond Open Access journals (Dalkilic, 2025). Additionally, scholarly blogs are an alternative place for academic discourse and also an additional or even alternative way to publish academic output, which is viewed to be a positive development (Hank, 2011).
Further studies have been conducted on how blogging enhances science communication in specific disciplines, the most common being climate science (e.g. Metcalfe, 2020), history (e.g. Gebert, 2015; Haber and Pfanzelter, 2013), law (e.g. Birkenkötter and Steinbeis, 2015), and medicine (e.g. Sokół, 2021). The topic of science communication also includes scholarly microblogging. Microblogs are usually situated on centralised platforms such as Twitter/X, Bluesky or Mastodon and unlike scholarly blogs, publish shorter content in more frequent intervals (Puschmann, 2014). Another much discussed topic is the positioning of scholarly blogs in the academic landscape. This entails the question about the validity of scholarly blog as academic output (e.g. Hank, 2013; Hendricks, 2010; Lindgren, 2006), which involves concerns about quality control (e.g. Fausto et al., 2011). Scholarly blogs are often not recognised as formal academic output (Kirkup, 2010) and are consequently, typically not considered in the evaluation of research performance. Connected to this is the literature on blogging practices, which examines the use of links and references in scholarly blogs (Luzón, 2008), e.g. by examining alt-metrics (Jamali and Sangari, 2015; Shema et al., 2014), and blog network communication patterns (Wang et al., 2010). Furthermore, the literature on scholarly blogs is concerned with stakeholders, such as scholarly bloggers and the readers of those blogs. This includes studies that have researched bloggers’ motivation to blog (e.g. Jarreau, 2015), the construction of their identity (e.g. Luzón, 2018) and the interaction between bloggers and readers (e.g. Metcalfe, 2020).
Within the literature on scholarly blogs we found a wide variety of methods, such as autoethnography (Davies and Merchant, 2007), and ethnography (Dennen, 2009), case studies (Vestergaard, 2017), content analysis (Luzón, 2017, 2018; Zou and Hyland, 2020), topic modelling (Burton, 2015), quantitative segment analysis (Yeo et al., 2017), qualitative interviews (Yuan and Besley, 2021) and surveys (Mahrt and Puschmann, 2014). We also discovered that many publications are essays without a clear method (Bell, 2012; Blood, 2000; Hodel, 2013), especially in legal papers (Berman, 2006; Birkenkötter and Steinbeis, 2015).
Studies that used blogs as a research object have examined the language and interactions between bloggers and audiences (Bondi, 2018, 2022; Diani, 2022; Shema et al., 2012; Zou and Hyland, 2019, 2020) as well as the use of links (Luzón, 2008) and references (Jamali and Sangari, 2015; Wang et al., 2010). Furthermore, scholars have examined the use of blogs in their specific respective subject, like blogs in the German jurisprudence (Birkenkötter and Steinbeis, 2015; Diani, 2023), climate blogs (Zoukas, 2019), medical blogs (Zou and Hyland, 2024) and the use of city urban planning blogs in research (Elshater and Abusaada, 2023). Existing research on the preservation of blogs has been conducted in the “BlogForever” project, which focused on possible web archiving solutions for scholarly blogs. However, the blog repository that was developed during the project is no longer available (Pampel and Rothfritz, 2024).
The scholarly literature on the topic of scholarly blogs is diverse in topics and methods. While scholars have primarily focused on the new possibilities blogs offer for science communication (Burton, 2015; Craddock, 2008; Fraumann and Colavizza, 2022), no work explicitly and primarily engages with their long-term integration into research and information infrastructures, like archives or libraries. Considering the discussion about blogs since the mid-2000s and the relevance that scholarly blogs have undoubtedly developed within scholarly communication (Fenner, 2008; Puschmann and Mahrt, 2012), it becomes clear that the long-term preservation of blogs is an important, yet largely unexplored topic of research. Preserving scholarly blogs is crucial for comprehensively understanding discussions and outcomes of scholarly knowledge production. If blogs that are, for example, referenced in traditional academic publications are not preserved for future generations of researchers, a gap in the scholarly record emerges. This topic intersects with both practices and procedures in the field of electronic publishing as well as the domain of digital long-term archiving. Furthermore, it also connects with discussions surrounding open science and the evolution of research evaluation, particularly when considering whether research information systems at academic institutions should incorporate blog posts as recognised research contributions.
Method
Data collection
Inclusion and exclusion criteria
Since the scholarly literature on the topic does not provide a clear definition of scholarly blogs, we determined content-related, formal and technical inclusion and exclusion criteria. Table 1 summarises the criteria that we applied to all blogs that were included in our sample.
Inclusion and exclusion criteria
| Inclusion criteria | Exclusion criteria |
|---|---|
| Websites/Blogs with reversed chronological feeds | Microblogs, newsletters, news portals, podcasts, static websites |
| German (based on address, content, or author affiliation) | Non-German blogs or those with no clear country affiliation |
| Blogs by active scholars, science journalists, doctoral students and supervised students | Blogs by news journalists or practitioners that are not communicating research or research-related content |
| Blogs from archives, libraries, or museums communicating research | Blogs from academic or infrastructure institutions that are not communicating research or research- related content |
| Content about research, academia, and science journalism | Journalistic news or practical content that are not communicating research or research- related content |
| Inclusion criteria | Exclusion criteria |
|---|---|
| Websites/Blogs with reversed chronological feeds | Microblogs, newsletters, news portals, podcasts, static websites |
| German (based on address, content, or author affiliation) | Non-German blogs or those with no clear country affiliation |
| Blogs by active scholars, science journalists, doctoral students and supervised students | Blogs by news journalists or practitioners that are not communicating research or research-related content |
| Blogs from archives, libraries, or museums communicating research | Blogs from academic or infrastructure institutions that are not communicating research or research- related content |
| Content about research, academia, and science journalism | Journalistic news or practical content that are not communicating research or research- related content |
Blog collection
To collect the sample of blogs, we started out by including blogs listed by the annual award “Wissenschaftsblog des Jahres” (Eng. Scholarly Blog of the Year) hosted by the blog “Wissenschaft kommuniziert” (Eng. Science communicated/Science communicating) (Wissenschaft kommuniziert, 2022). From 2011 to 2022, the blog published a list of varying numbers of scholarly blogs as part of the award. Using this process, we identified 34 German scholarly blogs across various disciplines that met our inclusion criteria. As a second step, we included seven scholarly blogs that were mentioned in the project proposal for the project Infra Wiss Blogs (Pampel and Rothfritz, 2024).
We then expanded the sample by including blogs from the research centres Helmholtz (15) and Jülich (16), as well as 10 relevant blogs from the Staatsbibliothek zu Berlin (Eng. State Library Berlin). Additionally, we included blogs from several blogging portals, such as de.hypotheses (446), ScienceBlogs (88) and SciLogs (24). To collect blogs from de.hypotheses, we used the OpenEdition blog catalogue and limited the results to blogs where the country of publication was Germany and included all the results that fit our inclusion criteria. We started this process on the 11th of July 2024 and updated the list until the 14th of October 2024. Furthermore, we researched scholarly blogs that are hosted by the German blog portal ScienceBlogs. This included their active and archived blogs, which resulted in 88 blogs from ScienceBlogs. Lastly, we included 24 blogs hosted by SciLogs, a German blogging platform that hosts interdisciplinary scholarly blogs. Even though we were provided with a list of blog names from the SciLogs editorial office, finding inactive blogs was difficult, since these were not listed on the SciLogs blog portal.
Additionally, we were provided with a complete list (up until the 30th of July 2024) of catalogued blogs in the German National Library (DNB). This list contained 707 blogs, however due to duplicates (especially from de.hypotheses), inaccessible blogs and blogs that didn’t meet our inclusion criteria, we ended up including 76 blogs into our sample. Then we added 14 blogs that were listed in the blog section of Rogue Scholar, a scholarly blog archive (Fenner, 2023b). We also researched blogs hosted by German universities and ended up including 41 blogs. Lastly, we expanded the sample using the snowball method to include blogs that were linked to by the blogs in the initial sample. We only considered blogs that were linked in the starting pages of the blogs of our initial sample, like blogrolls or recommendations. We found another 95 blogs using this method. Since many blogs were mentioned in multiple sources, we avoided duplicate entries by only counting each blog once. Our final sample therefore consists of 866 blogs. Table 2 summarises the steps we took to collect the blogs for our sample.
Blog collection process
| Step | Outcome |
|---|---|
| Nominees of the “Scholarly Blog of the Year” (2011–2022) | 34 blogs |
| Project proposal | 7 blogs |
| Helmholtz blogs | 15 blogs |
| Jülich blogs | 16 blogs |
| SBB blogs | 10 blogs |
| Open Edition | 446 blogs |
| ScienceBlogs | 88 blogs |
| SciLogs | 24 blogs |
| DNB entries | 76 blogs |
| Desk research | 41 blogs |
| Rogue Scholar | 14 blogs |
| Blogroll | 95 blogs |
| Final sample size | 866 blogs |
| Step | Outcome |
|---|---|
| Nominees of the “Scholarly Blog of the Year” (2011–2022) | 34 blogs |
| Project proposal | 7 blogs |
| Helmholtz blogs | 15 blogs |
| Jülich blogs | 16 blogs |
| SBB blogs | 10 blogs |
| Open Edition | 446 blogs |
| ScienceBlogs | 88 blogs |
| SciLogs | 24 blogs |
| DNB entries | 76 blogs |
| Desk research | 41 blogs |
| Rogue Scholar | 14 blogs |
| Blogroll | 95 blogs |
| Final sample size | 866 blogs |
Data analysis
After collecting the blog sample, we developed three sets of analysis criteria in coherence with our research questions. We then examined each blog for the pre-determined criteria, which we list and define in the following. We have further summarised our analysis criteria, answer options and their definitions in Table 3. The categories feed, license, citation proposal and persistent identifiers (ISSN and DOI) were based on the guidelines for scholarly blogs, a set of recommendations for scholarly bloggers to support the long-term archiving, discoverability, and citation of blog contents (Fenner, 2023a).
Data analysis criteria
| Criteria | Options | Definition |
|---|---|---|
| Discipline | Humanities and Social Sciences, Life Sciences, Natural Sciences, Engineering Sciences | Assigned using the German Research Foundation’s classification |
| Institutional affiliation | Yes/No | Yes: Blogs affiliated with research performing organisations (e.g. universities, colleges, research institutions) archives, libraries, museums. No: Independent blogs, by scholars |
| Blog archive | Yes/No | Integrated archive of blog posts |
| Blogroll | Yes/No | List of recommended blogs on starting page |
| Comments | Yes/No | Applies to blog posts |
| ISSN | Yes/No | Provision of an ISSN |
| DOI | Yes/No | Applies to blog posts |
| Citation proposal | Yes/No | Indicates if blog posts provide a citation proposal |
| German National Library entry | Yes/No | Indicates entry in the German National Library catalogue |
| Feed | Yes/No (or only via reader) | Indicates availability of an RSS/Atom feed |
| Activity status | Active/Inactive | Active: Last post after June 30, 2023 Inactive: Last post before this date |
| Creative Commons | License/NA | Applies to full-text content |
| Platform | Open field | Blogging platform used |
| Software | Open field | Underlying software |
| Language | Open field | Languages of the blog |
| First post date | Open field | Date of the first blog entry |
| Last post date | Open field | Date of the most recent blog entry |
| Criteria | Options | Definition |
|---|---|---|
| Discipline | Humanities and Social Sciences, Life Sciences, Natural Sciences, Engineering Sciences | Assigned using the German Research Foundation’s classification |
| Institutional affiliation | Yes/No | Yes: Blogs affiliated with research performing organisations (e.g. universities, colleges, research institutions) archives, libraries, museums. No: Independent blogs, by scholars |
| Blog archive | Yes/No | Integrated archive of blog posts |
| Blogroll | Yes/No | List of recommended blogs on starting page |
| Comments | Yes/No | Applies to blog posts |
| ISSN | Yes/No | Provision of an ISSN |
| DOI | Yes/No | Applies to blog posts |
| Citation proposal | Yes/No | Indicates if blog posts provide a citation proposal |
| German National Library entry | Yes/No | Indicates entry in the German National Library catalogue |
| Feed | Yes/No (or only via reader) | Indicates availability of an RSS/Atom feed |
| Activity status | Active/Inactive | Active: Last post after June 30, 2023 |
| Creative Commons | License/NA | Applies to full-text content |
| Platform | Open field | Blogging platform used |
| Software | Open field | Underlying software |
| Language | Open field | Languages of the blog |
| First post date | Open field | Date of the first blog entry |
| Last post date | Open field | Date of the most recent blog entry |
Overview of scholarly blogs: To examine how scholarly blogs vary across disciplines in terms of activity, authorship, and language, we categorised blogs by discipline and analysed their activity status, institutional affiliation, and the languages they used. To categorise blogs into subjects, we used the first layer of the subject classification of the German Research Foundation (e.g. humanities and social sciences, life sciences, natural sciences and engineering sciences (German Research Foundation, n.d.)). Multiple answers were possible if blogs could not be sorted into only one discipline. Since the platform de.hypotheses only hosts scholarly blogs from the humanities and social sciences, we automatically sorted the de.hypotheses blogs into the humanities and social sciences. In order to determine the blogs’ activity status, we researched the dates of first and last posts of blogs. We considered blogs to be active when they posted a year before we started our research, so as we started our research on the first of July 2024, all blogs that posted at least once from the first of July 2023 until the 14th of October 2024 (when we concluded data analysis) were considered active. All blogs whose last post was dated before the first of July 2023 were considered inactive. We defined blogs with an institutional affiliation to include blogs that are explicitly affiliated with research performing organisations, e.g. through research programs or researchers that explicitly blog about their work. We also included blogs by archives, libraries and museums. Blogs with no institutional affiliation are independent blogs by scholars (who may work at a research performing organisation but whose blog is not explicitly connected to the research they performed at their institution), associations, clubs, or publishers. Lastly, we noted the language used in blogs. When recording multilingual blogs, we only included blogs that consistently posted in multiple languages and did not include blogs that only sparsely published posts in another language.
Integration into digital research and information infrastructure: To assess how scholarly blogs and their content are integrated into existing digital research and information infrastructures for long-term accessibility and preservation, we evaluated platform and software choices and checked whether the blogs are indexed by the German National Library with an International Standard Serial Number (ISSN), an identification system for periodical publications (Deutsche Nationalbibliothek, n.d.). Additionally, we checked if bloggers are using Digital Object Identifiers (DOIs) and citation proposals for their posts. We determined blogging platforms by checking URLs and the presentation layer of blogs and its integration into other websites or infrastructure. We knew prior to the data collection that the platforms de.hypotheses, ScienceBlogs and SciLogs used the WordPress Software. For other blogs, we checked for software information on the blogs or used the browser add-on Wappalyzer [1]. We noted entries in the German National Library using the list of catalogued blogs the library provided for us. Checking other libraries or collective library catalogues for entries was unfortunately beyond the scope of this study. Once again, we relied on ISSNs provided by the German National Library through the blog list they provided for us and added entries if we found ISSNs on the blogs themselves. Furthermore, we checked the blog posts for the provision of DOIs and citation proposals.
Efforts by Bloggers: To explore the strategies bloggers use to enhance the accessibility and reusability of their content, we searched for a blog archive, a blogroll, comment functions, a Creative Commons (CC) license, and an RSS/Atom feed. We only collected archives that were integrated into the blogs themselves, not whether the blogs had an entry in an institutional archive. Furthermore, we only checked the landing pages of blogs for blogrolls and made an entry only if we noticed that a blogroll was provided there. We checked starting pages, posts and imprints for information on license and only included blogs that licensed their whole blog content, as opposed to individual posts. We then noted whether blogs provided feeds directly on their website and – in case they didn’t – noted if we could access feeds through feedreaders. Lastly, we checked if the comment function was enabled for blog posts.
Finally, we collected data for the criteria above for our entire blog sample and quantitatively analysed the data using R. The complete data set is available on Zenodo (Ochsner et al., 2025). The findings of this analysis will be presented and described in the section below.
Findings
In this section, we present the findings of our analysis of the German scholarly blog landscape. We begin with presenting an overview of the scholarly blogs in our sample. We then present findings on the integration of German scholarly blogs into existing digital research and information infrastructures, followed by strategies employed by bloggers to enhance the accessibility and reusability of their blogs.
Overview of scholarly blogs
To provide an overview of our sample, we examined the disciplines that blogs belong to, their activity, their institutional affiliation and the languages that were used.
As can be seen in Figure 1, 671 (77.48%) of the 866 blogs in our sample are exclusively situated in the humanities and social sciences, 40 (4.62%) blogs are exclusively situated in the natural sciences, 26 (3%) blogs are exclusively situated in the engineering sciences, 38 (4.39%) blogs are exclusively situated in the life sciences. 91 (10.51%) blogs are interdisciplinary and cannot be classified into only one category.
The Venn diagram is titled “Overlap of Disciplines.” It has four overlapping oval shapes labeled: “Humanities and Social Sciences (739)” on the left, “Natural Sciences (107)” at the top left, “Engineering Sciences (85)” at the top right, and “Life Sciences (103)” on the right. The “Humanities and Social Sciences” oval has the number 671, the “Natural Sciences” oval has the number 40, the “Engineering Sciences” oval has 26, and the “Life Sciences” oval has 38. The two-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences: 6. Humanities and Social Sciences intersection Engineering Sciences: 8. Humanities and Social Sciences intersection Life Sciences: 13. Natural Sciences intersection Engineering Sciences: 10. Natural Sciences intersection Life Sciences: 8. Engineering Sciences intersection Life Sciences: 3. The three-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences: 2. Humanities and Social Sciences intersection Natural Sciences intersection Life Sciences: 5. Humanities and Social Sciences intersection Engineering Sciences intersection Life Sciences: 0. Natural Sciences intersection Engineering Sciences intersection Life Sciences: 2. Four-way overlap (all disciplines): Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences intersection Life Sciences: 34.Overlap of disciplines. Source: Authors’ own work
The Venn diagram is titled “Overlap of Disciplines.” It has four overlapping oval shapes labeled: “Humanities and Social Sciences (739)” on the left, “Natural Sciences (107)” at the top left, “Engineering Sciences (85)” at the top right, and “Life Sciences (103)” on the right. The “Humanities and Social Sciences” oval has the number 671, the “Natural Sciences” oval has the number 40, the “Engineering Sciences” oval has 26, and the “Life Sciences” oval has 38. The two-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences: 6. Humanities and Social Sciences intersection Engineering Sciences: 8. Humanities and Social Sciences intersection Life Sciences: 13. Natural Sciences intersection Engineering Sciences: 10. Natural Sciences intersection Life Sciences: 8. Engineering Sciences intersection Life Sciences: 3. The three-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences: 2. Humanities and Social Sciences intersection Natural Sciences intersection Life Sciences: 5. Humanities and Social Sciences intersection Engineering Sciences intersection Life Sciences: 0. Natural Sciences intersection Engineering Sciences intersection Life Sciences: 2. Four-way overlap (all disciplines): Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences intersection Life Sciences: 34.Overlap of disciplines. Source: Authors’ own work
Since the blogs hosted by de.hypotheses make up a large share of our sample and are exclusively blogs from the humanities and social sciences, Figure 2 shows the distribution of our sample without de.hypotheses blogs, with n coming up to 394 blogs. Even without the de.hypotheses blogs, the humanities and social sciences remain the most strongly represented field, with 199 (50.51%) blogs that are situated exclusively in the humanities and social sciences. Again, 40 (10.15%) blogs are exclusively situated in the natural sciences, 26 (6.60%) blogs are situated exclusively in the engineering sciences, and 38 (9.64%) blogs are exclusively situated in the life sciences. 91 (23.1%) blogs are interdisciplinary and cannot be classified into only one category.
The Venn diagram is titled “Overlap of Disciplines (excluding de.hypotheses).” It has four overlapping oval shapes labeled: “Humanities and Social Sciences (267)” on the left, “Natural Sciences (107)” at the top left, “Engineering Sciences (85)” at the top right, and “Life Sciences (103)” on the right. The “Humanities and Social Sciences” oval has the number 199, the “Natural Sciences” oval has the number 40, the “Engineering Sciences” oval has 26, and the “Life Sciences” oval has 38. The two-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences: 6. Humanities and Social Sciences intersection Engineering Sciences: 8. Humanities and Social Sciences intersection Life Sciences: 13. Natural Sciences intersection Engineering Sciences: 10. Natural Sciences intersection Life Sciences: 8. Engineering Sciences intersection Life Sciences: 3. The three-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences: 2. Humanities and Social Sciences intersection Natural Sciences intersection Life Sciences: 5. Humanities and Social Sciences intersection Engineering Sciences intersection Life Sciences: 0. Natural Sciences intersection Engineering Sciences intersection Life Sciences: 2. The four-way overlap (all disciplines): Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences intersection Life Sciences: 34.Overlap of disciplines (excluding de.hypotheses). Source: Authors’ own work
The Venn diagram is titled “Overlap of Disciplines (excluding de.hypotheses).” It has four overlapping oval shapes labeled: “Humanities and Social Sciences (267)” on the left, “Natural Sciences (107)” at the top left, “Engineering Sciences (85)” at the top right, and “Life Sciences (103)” on the right. The “Humanities and Social Sciences” oval has the number 199, the “Natural Sciences” oval has the number 40, the “Engineering Sciences” oval has 26, and the “Life Sciences” oval has 38. The two-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences: 6. Humanities and Social Sciences intersection Engineering Sciences: 8. Humanities and Social Sciences intersection Life Sciences: 13. Natural Sciences intersection Engineering Sciences: 10. Natural Sciences intersection Life Sciences: 8. Engineering Sciences intersection Life Sciences: 3. The three-way overlaps in the diagram are as follows: Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences: 2. Humanities and Social Sciences intersection Natural Sciences intersection Life Sciences: 5. Humanities and Social Sciences intersection Engineering Sciences intersection Life Sciences: 0. Natural Sciences intersection Engineering Sciences intersection Life Sciences: 2. The four-way overlap (all disciplines): Humanities and Social Sciences intersection Natural Sciences intersection Engineering Sciences intersection Life Sciences: 34.Overlap of disciplines (excluding de.hypotheses). Source: Authors’ own work
Out of 866 blogs, 462 blogs (53.35%) are still actively posting, while 398 blogs (45.96%) are inactive. Six blogs did not provide information on the date of their posts. Figure 3 shows the distribution of the lifespans of blogs in years. We found that the majority of blogs have short lifespans, with a peak around zero to two years. As lifespan increases, the number of active blogs gradually declines, meaning that fewer blogs remain active over time. However, a small number of blogs persist for five to ten years. The long tail in the distribution suggests that a few blogs have lasted up to or over 20 years, though these cases are rare. The high number of blogs with a lifespan of zero years are due to blogs that had their first and last post in the same year. Since not all blogs provided information on their dates and some dates could not be determined the number of blogs in Figure 3 comes up to 851.
The vertical bar graph is titled “Blog Lifespan Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 105 in increments of 15. The horizontal axis is labeled “Lifespan (in years),“ and shows the numbers from 0 to 19 in increments of 1 year and 20 plus. The data from the graph is as follows: 0: 132. 1: 103. 2: 89. 3: 88. 4: 67. 5: 51. 6: 48. 7: 39. 8: 42. 9: 41. 10: 25. 11: 29. 12: 27. 13: 7. 14: 15. 15: 17. 16: 13. 17: 5. 18: 5. 19: 1. 20 plus: 7.Blog lifespan distribution. Source: Authors’ own work
The vertical bar graph is titled “Blog Lifespan Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 105 in increments of 15. The horizontal axis is labeled “Lifespan (in years),“ and shows the numbers from 0 to 19 in increments of 1 year and 20 plus. The data from the graph is as follows: 0: 132. 1: 103. 2: 89. 3: 88. 4: 67. 5: 51. 6: 48. 7: 39. 8: 42. 9: 41. 10: 25. 11: 29. 12: 27. 13: 7. 14: 15. 15: 17. 16: 13. 17: 5. 18: 5. 19: 1. 20 plus: 7.Blog lifespan distribution. Source: Authors’ own work
In addition to the blogs’ activity status, we also investigated the activity status of blogs in relation to the time of their first post, as can be seen in Figure 4. Figure 4 shows nine blogs with a first posted between the years 1998 and 2003, of which six blogs are still active. There is an increase in the number of blogs started in the mid and late 2000s, which then doubled in the year 2012. While there was a slight decrease in new blogs in the years 2016–2019, numbers were initially rising again in 2020 but have been decreasing ever since. The majority of the blogs that were started from 2007 until 2019 are inactive now, while newer blogs that were started from 2020 on remain mostly active. A chi-squared test revealed a significant relationship (χ2(23) = 126.52, p < 0.001), indicating that activity status varies depending on the year of the first post. The effect of this relationship was moderate (Cramér’s V = 0.383).
The stacked vertical bar chart is titled “Blogs Started by Year and Activity Status.“ The vertical axis is labeled “Number of Blogs,“ and ranges from 0 to 80 in increments of 20. The horizontal axis is labeled “Year,“ with the following years: “1998,” “2000,” “2001,” “2003,” “2004,” “2006,” “2007,” “2008,” “2009,” “2010,” “2011,” “2012,” “2013,” “2014,” “2015,” “2016,” “2017,” “2018,” “2019,” “2020,” “2021,” “2022,” “2023,” and “2024.” Each year shows two stacked bars. A legend at the bottom indicates that the light blue bars represent “Inactive“ and the dark blue bars represent “Active.“ The data from the graph is as follows: 1998: Inactive: 1; Active: Not given. 2000: Inactive: 1; Active: Not given. 2001: Inactive: 2; Active: 1. 2003: Inactive: 2; Active: 2. 2004: Inactive: 2; Active: Not given. 2006: Inactive: 4; Active: 5. 2007: Inactive: 11; Active: 17. 2008: Inactive: 6; Active: 24. 2009: Inactive: 17; Active: 18. 2010: Inactive: 12; Active: 18. 2011: Inactive: 12; Active: 11. 2012: Inactive: 31; Active: 27. 2013: Inactive: 16; Active: 42. 2014: Inactive: 26; Active: 51. 2015: Inactive: 25; Active: 35. 2016: Inactive: 25; Active: 27. 2017: Inactive: 22; Active: 20. 2018: Inactive: 25; Active: 26. 2019: Inactive: 30; Active: 22. 2020: Inactive: 47; Active: 21. 2021: Inactive: 60; Active: 15. 2022: Inactive: 36; Active: 14. 2023: Inactive: 31; Active: 2. 2024: Inactive: 18; Active: Not given.Blogs started by year and activity status. Source: Authors’ own work
The stacked vertical bar chart is titled “Blogs Started by Year and Activity Status.“ The vertical axis is labeled “Number of Blogs,“ and ranges from 0 to 80 in increments of 20. The horizontal axis is labeled “Year,“ with the following years: “1998,” “2000,” “2001,” “2003,” “2004,” “2006,” “2007,” “2008,” “2009,” “2010,” “2011,” “2012,” “2013,” “2014,” “2015,” “2016,” “2017,” “2018,” “2019,” “2020,” “2021,” “2022,” “2023,” and “2024.” Each year shows two stacked bars. A legend at the bottom indicates that the light blue bars represent “Inactive“ and the dark blue bars represent “Active.“ The data from the graph is as follows: 1998: Inactive: 1; Active: Not given. 2000: Inactive: 1; Active: Not given. 2001: Inactive: 2; Active: 1. 2003: Inactive: 2; Active: 2. 2004: Inactive: 2; Active: Not given. 2006: Inactive: 4; Active: 5. 2007: Inactive: 11; Active: 17. 2008: Inactive: 6; Active: 24. 2009: Inactive: 17; Active: 18. 2010: Inactive: 12; Active: 18. 2011: Inactive: 12; Active: 11. 2012: Inactive: 31; Active: 27. 2013: Inactive: 16; Active: 42. 2014: Inactive: 26; Active: 51. 2015: Inactive: 25; Active: 35. 2016: Inactive: 25; Active: 27. 2017: Inactive: 22; Active: 20. 2018: Inactive: 25; Active: 26. 2019: Inactive: 30; Active: 22. 2020: Inactive: 47; Active: 21. 2021: Inactive: 60; Active: 15. 2022: Inactive: 36; Active: 14. 2023: Inactive: 31; Active: 2. 2024: Inactive: 18; Active: Not given.Blogs started by year and activity status. Source: Authors’ own work
Figure 5 shows that 555 (64.09%) of blogs indicate an institutional affiliation, 291 (33.60%) blogs indicate they are without institutional affiliation, and 20 (2.31%) blogs did not provide information on authors or institutions. While Figure 5 shows the distribution of bloggers who enclosed an institutional affiliation, Figure 6 shows the distribution of activity based on institutional affiliation. Since there are missing activity values, the number of blogs with institutional affiliation in Figure 6 is 551. Out of the 551 blogs who have an institutional affiliation, 327 (59.35%) blogs remain active, while 224 (40.65%) blogs are inactive. Since there are missing activity values, the number of blogs without institutional affiliation in Figure 6 is 289. Out of the 289 blogs without institutional affiliation, 134 (46.37%) blogs remain active, while 155 blogs (53.63%) are inactive. More blogs with institutional affiliation are therefore still active compared to active blogs without institutional affiliation (59.35% vs. 46.37%).
The vertical bar graph is titled “Institutional Affiliation Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 600 in increments of 200. The horizontal axis lists three categories: “Institutional Affiliation,“ “No Institutional Affiliation,“ and “Not available.“ The data from the graph is as follows: Institutional Affiliation: 555. No Institutional Affiliation: 291. Not available: 20.Institutional affiliation distribution. Source: Authors’ own work
The vertical bar graph is titled “Institutional Affiliation Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 600 in increments of 200. The horizontal axis lists three categories: “Institutional Affiliation,“ “No Institutional Affiliation,“ and “Not available.“ The data from the graph is as follows: Institutional Affiliation: 555. No Institutional Affiliation: 291. Not available: 20.Institutional affiliation distribution. Source: Authors’ own work
The stacked vertical bar chart is titled “Institutional Affiliation and Activity Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 600 in increments of 200. The horizontal axis has two categories: “Institutional Affiliation“ and “No Institutional Affiliation.“ Each category shows two stacked bars. A legend at the bottom indicates that the light blue bars represent “Inactive“ and the dark blue bars represent “Active.“ Data from the graph is as follows: Institutional Affiliation: Active: 327; Inactive: 224. No Institutional Affiliation: Active: 134; Inactive: 155.Institutional affiliation and activity distribution. Source: Authors’ own work
The stacked vertical bar chart is titled “Institutional Affiliation and Activity Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 600 in increments of 200. The horizontal axis has two categories: “Institutional Affiliation“ and “No Institutional Affiliation.“ Each category shows two stacked bars. A legend at the bottom indicates that the light blue bars represent “Inactive“ and the dark blue bars represent “Active.“ Data from the graph is as follows: Institutional Affiliation: Active: 327; Inactive: 224. No Institutional Affiliation: Active: 134; Inactive: 155.Institutional affiliation and activity distribution. Source: Authors’ own work
Since the distribution was similar for inactive and active blogs with and without institutional affiliation, the institutional affiliation of bloggers does not seem to reduce the probability of inactivity. To test this hypothesis, we investigated whether the difference is significant using a chi-squared test. The test revealed a significant relationship between the two variables at a 95% confidence interval (1) = 12.05 p < 0.001), indicating that the null hypothesis can be rejected. Institutional affiliation has a weak effect on blog activity (Cramér’s V = 0.12).
Figure 7 shows the distributions of languages used in the blogs in our sample. Since multiple answers were possible, the number of entries is larger than the total number of blogs, which is why n comes up to 1,008. Out of our main sample (N = 866) we counted 716 (82.68%) monolingual blogs and 150 (17.32%) multilingual blogs. Out of the 1,008 language entries, 584 (57.94%) blogs were monolingual and written in German. When including multilingual blogs, we counted 713 (70.73%) German-language blogs. 124 (12.3%) blogs were monolingual and written in English, and 250 (24.8%) were English-language blogs that were also written in at least one other language. The most used languages were German and English, followed by 30 (2.98%) French blogs). The remaining 15 (1.49%) entries of other languages are (in descending and then alphabetical order) Spanish, Italian, Dutch, Arabic, Catalan, Chinese, Russian and Serbian.
The vertical bar chart is titled “Language Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 800 in increments of 200. The horizontal axis is labeled “Language“ with the abbreviations for 11 languages as “ger,” “eng,” “fre,” “spa,” “ita,” “dut,” “ara,” “cat,” “chi,” “rus,” “srp.” The data from the chart is: ger: 713. eng: 250. fre: 30. spa: 5. ita: 3. dut: 2. ara: 1. cat: 1. chi: 1. rus: 1. srp: 1.Language distribution. Source: Authors’ own work
The vertical bar chart is titled “Language Distribution.“ The vertical axis is labeled “Count,“ and ranges from 0 to 800 in increments of 200. The horizontal axis is labeled “Language“ with the abbreviations for 11 languages as “ger,” “eng,” “fre,” “spa,” “ita,” “dut,” “ara,” “cat,” “chi,” “rus,” “srp.” The data from the chart is: ger: 713. eng: 250. fre: 30. spa: 5. ita: 3. dut: 2. ara: 1. cat: 1. chi: 1. rus: 1. srp: 1.Language distribution. Source: Authors’ own work
Integration into digital research and information infrastructure
In this section, we present findings on the integration of German scholarly blogs into existing digital research and information infrastructures. For that we looked at the use of blogging platforms and software, the number of blogs who have entries in the German National Library, the distribution of ISSNs, and the allocation of DOIs and citation proposals.
Figure 8 presents the distribution of platforms and shows that 472 (54.5%) blogs are hosted on the blogging platform de.hypotheses. 91 (10.51%) blogs are hosted by the platform ScienceBlogs, 36 (4.16%) blogs are hosted by SciLogs, and 87 (10.05%) blogs are hosted on various other platforms, which are presented in descending order. The platforms of 180 (20.79%) blogs could not be identified.
The vertical bar chart is titled “Platform Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 500 in increments of 100. The horizontal axis is labeled “Platform“ with the categories “Hypotheses,” “Not available,” “ScienceBlogs,” “SciLogs,” “Wordpress dot com,” “Helmholtz Blogs,” “Jülich Blogs,” “S B B Blogs,” “Blogger,” “Medium,” “Tumblr,” “Twoday.” The data from the chart is as follows: Hypotheses: 472. Not available: 180. ScienceBlogs: 91. SciLogs: 36. Wordpress dot com: 32. Helmholtz Blogs: 18. Jülich Blogs: 16. S B B Blogs: 10. Blogger: 8. Medium: 1. Tumblr: 1. Twoday: 1.Platform distribution. Source: Authors’ own work
The vertical bar chart is titled “Platform Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 500 in increments of 100. The horizontal axis is labeled “Platform“ with the categories “Hypotheses,” “Not available,” “ScienceBlogs,” “SciLogs,” “Wordpress dot com,” “Helmholtz Blogs,” “Jülich Blogs,” “S B B Blogs,” “Blogger,” “Medium,” “Tumblr,” “Twoday.” The data from the chart is as follows: Hypotheses: 472. Not available: 180. ScienceBlogs: 91. SciLogs: 36. Wordpress dot com: 32. Helmholtz Blogs: 18. Jülich Blogs: 16. S B B Blogs: 10. Blogger: 8. Medium: 1. Tumblr: 1. Twoday: 1.Platform distribution. Source: Authors’ own work
While Figure 8 shows which platforms are used most often, Figure 9 shows the activity status of blogs on the platforms de.hypotheses, ScienceBlogs and SciLogs. The distribution shows that 252 (53.39%) out of the 472 blogs on de.hypotheses are still active, while 220 (46.61%) blogs are inactive. Eight (8.79%) blogs out of the 91 blogs on ScienceBlogs are still active and 83 (91.21%) blogs are inactive. 31 (86.11%) blogs out of the 36 blogs from SciLogs are still active and five (13.89%) blogs are inactive. The large relative frequency of active blogs on SciLogs could be caused by inactive and older SciLogs blogs being hard to find and access, since they are not listed on the SciLogs website. A chi-squared test revealed a significant relationship between activity status and the platforms Hypotheses, SciLogs, and ScienceBlogs (χ2(2) = 82.34, p < 0.001). Platforms have a moderate effect on activity (Cramér’s V = 0.371).
The grouped vertical bar chart is titled “Activity Status Across Platforms.“ The vertical axis is labeled “Count“ and ranges from 0 to 250 in increments of 50. The horizontal axis lists the platforms “Hypotheses,“ “ScienceBlogs,“ and “SciLogs.“ Two bars are shown for each platform. The legend on the right indicates that dark blue bars represent “Active“ and light blue bars represent “Inactive.“ Data from the graph is as follows: Hypotheses: Active: 252; Inactive: 220. ScienceBlogs: Active: 8; Inactive: 83. SciLogs: Active: 31; Inactive: 5.Activity status across platforms. Source: Authors’ own work
The grouped vertical bar chart is titled “Activity Status Across Platforms.“ The vertical axis is labeled “Count“ and ranges from 0 to 250 in increments of 50. The horizontal axis lists the platforms “Hypotheses,“ “ScienceBlogs,“ and “SciLogs.“ Two bars are shown for each platform. The legend on the right indicates that dark blue bars represent “Active“ and light blue bars represent “Inactive.“ Data from the graph is as follows: Hypotheses: Active: 252; Inactive: 220. ScienceBlogs: Active: 8; Inactive: 83. SciLogs: Active: 31; Inactive: 5.Activity status across platforms. Source: Authors’ own work
Figure 10 shows the distributions of software used by the blogs in our sample. 738 (85.22%) blogs used WordPress, the software of 89 (10.28%) blogs could not be determined and 13 (1.5%) blogs used various other software. Since all major blogging platforms (de.hypotheses, ScienceBlogs and SciLogs) require the use of WordPress for their bloggers, we also investigated the software use excluding these platforms, resulting in 267 blogs. We found that WordPress remained the dominating software, since the majority of blogs (162; 60.67%) who are not hosted by de.hypotheses, ScienceBlogs and SciLogs still use WordPress. The software of 92 (now 34.46%) blogs remains unknown. Again, 13 blogs (4.87%) use various other software.
The vertical bar chart is titled “Software Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 800 in increments of 200. The horizontal axis is labeled “Software” and lists the following categories: “WordPress,” “Not available,” “Blogger,” “Jimdo,” “Quarto,” and “Ghost.” The data from the chart is as follows: WordPress: 738. Not available: 89. Blogger: 8. Jimdo: 2. Quarto: 2. Ghost: 1.Software distribution. Source: Authors’ own work
The vertical bar chart is titled “Software Distribution.“ The vertical axis is labeled “Count“ and ranges from 0 to 800 in increments of 200. The horizontal axis is labeled “Software” and lists the following categories: “WordPress,” “Not available,” “Blogger,” “Jimdo,” “Quarto,” and “Ghost.” The data from the chart is as follows: WordPress: 738. Not available: 89. Blogger: 8. Jimdo: 2. Quarto: 2. Ghost: 1.Software distribution. Source: Authors’ own work
Figure 11 shows the prevalence of blogs with an entry in the German National Library and an ISSN. 289 (33.37%) blogs were catalogued in the German National Library, while 577 (66.63%) were not. While 310 (35.8%) blogs had an ISSN, 556 (64.2%) blogs did not.
The grouped vertical bar chart is titled “German National Library Entry and I S S N Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 600 in increments of 200. The horizontal axis shows two categories: “D N B-Eintrag” and “I S S N.” The legend on the right indicates that the medium blue bars represent “Available” and the light blue bars represent “Not available.” Data in the chart is as follows: D N B-Eintrag: Available: 289; Not available: 577. I S S N: Available: 310; Not available: 556.German national library entry and ISSN distribution. Source: Authors’ own work
The grouped vertical bar chart is titled “German National Library Entry and I S S N Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 600 in increments of 200. The horizontal axis shows two categories: “D N B-Eintrag” and “I S S N.” The legend on the right indicates that the medium blue bars represent “Available” and the light blue bars represent “Not available.” Data in the chart is as follows: D N B-Eintrag: Available: 289; Not available: 577. I S S N: Available: 310; Not available: 556.German national library entry and ISSN distribution. Source: Authors’ own work
Figure 12 shows the presence of citation proposals and DOIs for blog posts for (1) the whole sample and (2) the sample excluding de.hypotheses blogs. We found that 472 (54.50%) blogs provide citation proposals for their blog posts and 394 (45.50%) do not. Similarly, 454 (52.42%) blogs provide a DOI, while 412 (47.58%) did not. However, once blogs from de.hypotheses are excluded from the sample, only 23 blogs remain that provide citation proposals for blog posts and eleven blogs provide DOIs for their blog posts. A chi-squared test revealed a significant relationship (χ2(2) = 82.34, p < 0.001), Cramér’s V = 0.371 shows a moderate effect between the two variables.
The grouped vertical bar chart is titled “D O I and Citation Proposal Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 500 in increments of 100. The horizontal axis has two sections: “All Blogs” and “Without Hypotheses.” Each section shows categories for “Zitationsvorschlag” and “D O I.” The legend on the right indicates medium blue represents “Available” and light blue represents “Not available.” Data from the chart is as follows. For All Blogs: Zitationsvorschlag: Available: 472; Not available: 394. D O I: Available: 454; Not available: 412. For Without Hypotheses: Zitationsvorschlag: Available: 23; Not available: 371. D O I: Available: 11; Not available: 383.DOI and citation proposal distribution. Source: Authors’ own work
The grouped vertical bar chart is titled “D O I and Citation Proposal Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 500 in increments of 100. The horizontal axis has two sections: “All Blogs” and “Without Hypotheses.” Each section shows categories for “Zitationsvorschlag” and “D O I.” The legend on the right indicates medium blue represents “Available” and light blue represents “Not available.” Data from the chart is as follows. For All Blogs: Zitationsvorschlag: Available: 472; Not available: 394. D O I: Available: 454; Not available: 412. For Without Hypotheses: Zitationsvorschlag: Available: 23; Not available: 371. D O I: Available: 11; Not available: 383.DOI and citation proposal distribution. Source: Authors’ own work
Efforts by bloggers
In this section, we examine efforts of German scholarly bloggers that increase the accessibility and reusability of blogs and their contents. For this, we include findings on the blog archive use of blogs in our sample, the use of blogrolls, the use of CC, the availability of blog feeds such as RSS and Atom and the availability of comments.
Concerning blog archives, we found that 556 (64.2%) blogs provided a blog archive, while 310 (35.8%) blogs did not. Additionally, we found that 190 (21.94%) blogs included a blogroll, while 676 (78.06%) blogs did not.
Figure 13 shows the availability of CC licenses found in the blogs in our sample. 645 (74.48%) blogs do not provide CC information, 217 (25.06%) blogs provided a wide variety of CC BY licenses and four (0.46%) blogs were under the CC 0 information. All in all, there is no consistent provision of CC attributions, while most blogs provide none.
The vertical bar chart is titled “Creative Commons Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 600 in increments of 200. The horizontal axis is labeled “License” and lists eight categories: “Not available,” “C C B Y,” “C C B Y-S A,” “C C B Y-N C-N D,” “C C B Y-N C-S A,” “C C B Y-N C,” “C C 0,” and “C C B Y-N D.” The data from the chart is as follows: Not available: 645. C C B Y: 110. C C B Y-S A: 59. C C B Y-N C-N D: 16. C C B Y-N C-S A: 16. C C B Y-N C: 14. C C 0: 4. C C B Y-N D: 2.Creative commons distribution. Source: Authors’ own work
The vertical bar chart is titled “Creative Commons Distribution.” The vertical axis is labeled “Count” and ranges from 0 to 600 in increments of 200. The horizontal axis is labeled “License” and lists eight categories: “Not available,” “C C B Y,” “C C B Y-S A,” “C C B Y-N C-N D,” “C C B Y-N C-S A,” “C C B Y-N C,” “C C 0,” and “C C B Y-N D.” The data from the chart is as follows: Not available: 645. C C B Y: 110. C C B Y-S A: 59. C C B Y-N C-N D: 16. C C B Y-N C-S A: 16. C C B Y-N C: 14. C C 0: 4. C C B Y-N D: 2.Creative commons distribution. Source: Authors’ own work
Furthermore, we found that 722 (83.37%) blogs had a feed such as RSS or Atom, which could be directly accessed, 118 (13.63%) blog feeds were only accessible with a feed reader and 26 (3%) blog feeds could not be accessed or found. Lastly, we found that 768 (88.68%) blogs allow comments, while 98 (11.32%) blogs do not.
Discussion
In this section, we discuss the findings presented above and answer our research questions. Furthermore, we discuss the implications of our findings and propose directions for future research.
RQ1: Overview of scholarly blogs
Discipline: Our findings show that 77.48% of the blogs in our sample are thematically located in the humanities and social sciences. Even though the over-representation of de.hypotheses blogs influences our data, similar outcomes still apply once the de.hypotheses blogs are removed from the sample. Without the de.hypotheses blogs, the social sciences and humanities make up 50.51% of blogs and therefore remain the most strongly represented discipline. These findings match the findings of a study by Voigt and Kuhle (2024), who examined the influence of digitisation on knowledge transfer using the example of digital media. For the study, they collected scholarly blogs from research performing organisations based in Germany and found that 64% of scholarly blogs in their sample are situated in the humanities and social sciences, 14% in the life sciences, 26% in natural sciences and 19% in engineering sciences, while 13% could not be classified (GESIS Leibniz Institut für Sozialwissenschaften, n.d.). In contrast, a genre analysis of 30 scholarly blogs from the year 2012 could not recognise a stronger representation from soft sciences over hard sciences (Tiainen, 2012). The over-representation of the de.hypotheses blogs limits this study. We discuss these limitations in the following subsection. Since there is no comparable platform to de.hypotheses in the STEM fields, the question arises for further research whether the existence of such a platform—one that provides an easily accessible blogging opportunity for researchers—would lead to an increase in STEM blogs.
Activity: When discussing the activity of our blog sample, it is important to emphasize that our findings only reflect the blogs included in our sample, since we could not include inactive blogs that are now inaccessible. The large number of inaccessible blogs became particularly evident when searching for blogs through blogrolls, as many links led to unavailable content. Furthermore, we noticed a considerate number of blogs that were catalogued by the German National Library, but also inaccessible through the link they provided. The fact that a large number of blogs were no longer available is an alarming sign, because it means that information is being lost due to a lack of long-term preservation initiatives for scholarly blogs. This information loss emphasizes the need for (1) solutions that enable long-term accessibility of blogs and (2) long-term data and studies. Long-term studies would provide more meaningful data, especially considering research data is an important part of the scholarly documentation process (Strecker et al., 2023). Additionally, six blogs did not provide information on the date of their posts, contrary to many definitions within the scholarly literature, which states that dates are a mandatory feature of scholarly blogs (Herring et al., 2005).
Institutional affiliation: There is a slightly higher percentage (59.35) of active blogs with institutional affiliation compared to active blogs in those without institutional affiliation (46.37). These results match a 2012 study that analysed blogs and bloggers and found that 59% of bloggers from their sample held an institutional affiliation (Shema et al., 2012). Therefore, there are slightly more inactive blogs without an institutional affiliation (53.63%) than inactive blogs with institutional affiliation (40.65%). 20 (2.31%) blogs did not provide information on authors or institutions. Even though the Chi-squared test found that there is a statistically significant relationship between institutional affiliation and activity, the effect size is small. Still, we see greater challenges for blogs without institutional affiliation in ensuring their long-term operation, as they cannot rely on central support to resolve technical issues or implement measures for the long-term preservation of blogs. Furthermore, it can be assumed that in institutions that provide specific blogging platforms for their employees, blogging is more widely recognised as a legitimate academic activity compared to institutions that do not offer such infrastructure and support. This highlights that research institutions can actively support blogging, thereby promoting the availability and long-term accessibility of content published in blogs. These results become especially relevant considering that activity, reach, social media engagement by scholarly bloggers and podcasters depend on whether they have sufficient resources (Kuhle et al., 2025).
Language: Even though 70.73% of blogs (including multilingual blogs) in our sample are written in German, 24.8% of the blogs (including multilingual blogs) in our sample are also published in English. Even within the German scholarly blog community, English is a relevant scholarly language. This can be seen in the findings of the 2012 study mentioned above that found that 86% of bloggers in their sample blog in English (Shema et al., 2012). With reference to the Helsinki Initiative on Multilingualism in Scholarly Communication (Federation of Finnish Learned Societies et al., 2019), blogs offer great potential, as they can be used to publish summaries of journal articles in additional languages. Blogging in English also appeals to a larger audience (Luzón, 2017). With regard to automatic translations, such as those provided by web browsers, this field presents additional potential and possibilities for bloggers.
RQ2: Integration into information infrastructure
Platform and Software: The majority (599; 69.17%) of blogs in our sample are integrated into platforms that explicitly host scholarly blogs (de.hypotheses, ScienceBlogs, SciLogs). However, even for those blogs who are hosted by platforms for scholarly blogs, long-term accessibility is not always ensured. This becomes evident by the supposed discontinuation of the scholarly blog platform ScienceBlogs. In October 2022 it was announced that the publisher of ScienceBlogs will no longer provide funding for the platform (Schönstein, 2022). Subsequently, only eight of the blogs in our sample that were hosted by ScienceBlogs are still active, while 83 blogs are inactive, without measures to ensure the accessibility of their content. We found that the majority (85.22%) of bloggers rely on the WordPress software. This de facto monopoly position harbours dangers, since it fosters dependence on only one software, especially considering recent developments within the company Automattic that announced to significantly reduce its contribution to WordPress (Kunz, 2025).
German National Library: Only 33.37% (289) of the scholarly blogs in our sample are catalogued in the German National Library. It is important to note here that these also make up the entirety of scholarly blog entries in the German National Library, since the library provided us with a complete list of blog entries in August 2024. Even though there was probably some development in numbers since that point in time, the number of scholarly blogs that have an entry in the German National Library is small in our sample and therefore also compared to the whole German scholarly blogging landscape. Therefore, there is still a need for libraries and archives to increase their engagement with the topic of long-term accessibility of German scholarly blogs. However, it is not only the German National Library that can be tasked with preserving scholarly blogs, but also academic libraries that are responsible for preserving the blogs of their institutions’ scholarly bloggers. Concerning the specifics of scholarly blogs and their content as part of the scholarly record, unique challenges for archiving arise, particularly due to interactions through references. We see the need for specialised solutions to ensure the long-term preservation of academic blogs and their distinctive characteristics within dedicated academic blog infrastructures.
ISSN: In Germany, ISSNs are distributed by the German National Library (Deutsche Nationalbibliothek, n.d.), which explains the similarity between the distribution of those blogs in our sample who have an ISSN and that of the German National Library entries. Similar to the distribution of blog entries in the German National Library, ISSNs are still underrepresented in the blogs in our sample but also in German scholarly blogs overall. Since ISSNs are distributed through the German National Library, there is likely only a little number of more blogs that we did not include in our sample that have an ISSN. Registering a blog with an ISSN helps the German National Library implement digital long-term preservation measures for these blogs. It therefore seems beneficial to promote ISSN registration for scholarly blogs more actively. Additionally, assigning an ISSN enhances the visibility of blogs, as it allows them to be better aggregated in library catalogues, discovery systems, and academic search engines.
DOI and Citation proposal: Almost only de.hypotheses assigns DOIs and citation proposals, otherwise only minimal use is recognisable. Without the assignment of DOIs, the visibility of blog content is minimised and has implications for the implementation into other infrastructures, such as databases. Again, almost only bloggers on de.dypotheses use citation proposals, otherwise there is only minimal use. DOIs have become the central PID system for scholarly content in recent years. They are assigned not only through Crossref for textual publications but also through DataCite for a wide range of research-related scholarly objects. In version 4.6 of the DataCite metadata schema (DataCite Metadata Working Group, 2024), the publication type blog is not yet recognised as a resource type. Considering the specific characteristics of blogs, which are distinct from the currently used resource types such as book, book chapter, journal, journal Article, or report in the DataCite schema, it would be beneficial to include blog posts as a specific resource type. Similar to how an ISSN supports the visibility of an entire blog, DOIs enhance the visibility of individual blog posts, as they are easily aggregated by search engines and databases. Additionally, DOIs facilitate the citation of blog posts and make it easier to integrate them into references (Fenner, 2022).
RQ3: Efforts by Bloggers
Blog Archive: The common use of blog archives (64.2%) shows that bloggers are interested in making their content searchable. When given the opportunity, many bloggers provide the means to easily search through and access blog content. In standard software solutions like WordPress, the blog archive is easily enabled (WordPress, 2019). In custom-built solutions, specific adjustments are often required. The presence of a blog archive can also support content aggregation, as posts become easier for search engines to crawl, especially when no feed is available.
Blogroll: Even though blogrolls were not heavily found in our sample (only 21.94% of blogs featured a blogroll) we still see value in providing a blogroll, since it makes blog content findable and enhances the community aspect of blogging, by promoting other bloggers and content. The presence of a blogroll can be understood as part of a blog’s social embedding within a community. This referencing between blogs also constitutes a scholarly “blogosphere” (Wenninger, 2017). The blogosphere is particularly relevant for network analyses, as it allows studying the interplay of content and positions. A study on scholarly blogs and podcasts found that linking blogs (e.g. through social media) plays a key role in extending a blog’s reach (Kuhle et al., 2025). However, the blogosphere also poses technical challenges for the archiving of scholarly blogs, since they link to external content and archives are not prepared to consider the social context of blogrolls.
License: CC is a non-profit organization that provides licenses which can be used to define the legal conditions of distributed content (Creative Commons, n.d.). However, within the blogs in our sample, 74.48% of blogs do not provide license information. However, using a CC license would enhance the reusability of blogs and their content. Without the designation of content with open licenses, the distribution of blog content is limited. This is especially true for copyright regulations in Europe, which, unlike in the United States, do not have a Fair Use doctrine that allows archives and libraries to provide easy access to copyrighted content for research. As a result, many digitised materials in European libraries are currently only available in a very limited capacity. Bloggers should be further sensitised to this issue and encouraged to promote the open availability of their blogs in the spirit of Open Access by using licenses such as those provided by Creative Commons.
Feed: Blog feeds, such as RSS or Atom, enable the integration and data exchange of blog contents into other sites or infrastructure, by using standardised, machine-readable formats (like XML or JSON). Since feeds are already being used by 97% of blogs in our sample (either directly on the blog or through a feedreader), there is potential to reuse and archive feeds and standardised metadata of blogs. Blog feeds are not only central to the distribution of blog posts, but they also enable the aggregation of content through archives. For example, the technical architecture of the Rogue Scholar archive for academic blogs is built on the functionality of feeds. Blog posts stored in the archive are retrieved along with their metadata via feeds. The presence of such a feed can therefore be considered an important prerequisite for archiving (Fenner, 2022, 2023b).
Comments function: Even though the majority (88.68%) of blogs enable comments in their blog posts, there are still blogs who do not. Comments promote interactions between bloggers and readers and offer a place for dialog, discussion and socialising. However, archiving comments poses technical and legal questions for information infrastructure facilities.
Limitations and future research
Several limitations of our study should be acknowledged. Even though the sample for this study was enriched by the de.hypotheses blogs, these blogs were also over-represented. To counteract the over-representation of de.hypotheses blogs, we analysed findings as a whole and without the de.hypotheses blogs when necessary. However, the over representation of de.hypotheses blogs also gave us insights into practices that de.hypotheses provides for their bloggers that would also be a valuable addition for other blogs, such as the assignment of DOI and citation proposals. Since our data set is limited to German-language blogs, our findings may not fully apply to other linguistic and cultural contexts. Future research should take an international perspective to explore how scholarly blogging practices differ across regions and disciplines. Additionally, our study primarily relied on quantitative methods to assess blog characteristics, but a qualitative approach, such as interviews with bloggers, could provide deeper insights into the motivations behind scholarly blogging and the barriers to its preservation.
Further studies could also investigate how blogs contribute to academic discourse by examining their citation impact. Also, while previous research has examined specific aspects like the use of references in blogs as an alt-metric (Jamali and Sangari, 2015) and blog network communication patterns (Wang et al., 2010), there is a need for more fundamental studies of referencing and linking practices of scholarly bloggers, as links are often considered fundamental to the blog format (Herring et al., 2005; Puschmann, 2013; Rettberg, 2008) and cross-linkage between blogs create the blogosphere (Wenninger, 2019). Long-term research tracking the lifecycle of scholarly blogs over time could furthermore provide insights into their sustainability and the effectiveness of different preservation efforts. Future research could compose long-term studies on scholarly blogs by monitoring their activity over a longer period of time and taking note on preservation distributions and strategies of inactive blogs. Furthermore, since the discourse on scholarly blogs is large in scope and diverse in content, there would be value in conducting a further literature review on the topics of scholarly blogs. This would help provide an overview on the current state of research or answer in depth questions on more specific topics.
Conclusion
Scholarly blogs are an important means of scholarly communication, yet their long-term accessibility and preservation remain uncertain (Burton, 2015; Hank, 2011). To address this research gap, our study examined the characteristics of scholarly blogs in Germany and their integration into existing digital research and information infrastructures. We analysed how German scholarly blogs vary in terms of activity, authorship, discipline and language. Additionally, we investigated how German scholarly blogs are integrated into platforms, library catalogues and other measures that facilitate their accessibility. Lastly, we examined what efforts bloggers themselves apply to facilitate the accessibility and reusability of their blogs. Our study contributes to Library and Information Science (LIS) research by providing the first analysis of the German scholarly blogging landscape across disciplines. Moreover, our findings are relevant for experts from information infrastructure facilities and blogging platforms as well as scholarly bloggers. We have shown that scholarly blogs are diverse, but their integration into existing research and information infrastructures is not consistent. We also identified efforts that can improve the accessibility of blogs.
We identified several stakeholders that could be able to improve the visibility and preservation of scholarly blogs. Our findings highlight the need for greater institutional support for scholarly blogging, particularly from universities and other research performing organisations that should recognize blogs as legitimate scholarly outputs and provide resources for their affiliated bloggers. Academic libraries could furthermore support the scholarly bloggers of their parent institutions by cataloguing their institution’s scholarly blogs and ensuring that their contents are archived and made permanently accessible. Blogging platforms can support their bloggers to take proactive steps to ensure the long-term accessibility of their work by providing standardised citation proposals, DOIs, and encouraging bloggers to provide detailed metadata and licensing information that facilitates reuse and preservation. Assigning persistent identifiers such as DOIs or ISSNs also has the potential to further strengthen the status of scholarly blogs within academic publishing. While platforms such as de.hypotheses are making progress in the humanities and social sciences, comparable efforts for other disciplines are lacking. Furthermore, existing workflows, such as blog feeds, can be recycled by research and information infrastructure facilities. Additionally, we found that while bloggers express an interest in making their content more accessible (Hank, 2011), the actions they take remain inconsistent. However, measures like blogrolls, blog archives, standardised licenses could improve the preservation of scholarly content with minimal effort for bloggers.
In conclusion, our study underscores the need for increased integration of scholarly blogs into research and information infrastructure and institutional support to ensure the long-term accessibility of scholarly blogs, which is essential for enhancing the diversity of scholarly communication. While individual bloggers and some platforms have taken steps toward integrating blogs into existing digital research and information infrastructures, increased institutional and technical support is necessary to secure their accessibility in the long term. Furthermore, by incorporating blogs into existing information infrastructures, they are being recognised as valid components of the scholarly discourse and mediums to disseminate research.
We thank Christopher Onzie Khamis, Pia Kretschmar and Max Liebel for their help during the data collection process. We thank Johanna Schielke for proof reading and Dorothea Strecker for proof reading and her advise during the data analysis.

