Skip to Main Content
Purpose

This study examined how influencer power and tier relate to social media engagement and personal information sharing, noting fluctuations of both over time.

Design/methodology/approach

A combination of content analysis, change point detection methods (CUSUM Drift, Change Point Detection) and time series modeling (ARIMA) was used to analyze social media conversations and identify temporal trends in engagement and personal information disclosures.

Findings

The study demonstrates a strong correlation between influencer reach, increased engagement and disclosures, with a declining trend in both engagements (79.78% of threads) and disclosures (66.12% of threads) across 183 conversational threads. An observed 29.41% of posts contained personally identifiable information (PII), with conservative sensitivity analysis suggesting an adjusted prevalence of 15.00% to 20.27% after accounting for automated detection error.

Research limitations/implications

The cross-platform comparison revealed architectural differences between Twitter/X's follower hierarchy and Reddit's community voting structures that influence influencer effects on PII disclosure. Rethinking influencer identification on community-based platforms requires tailored models that consider community norms instead of just follower counts. Manual validation of automated PII detection was infeasible due to data access constraints.

Originality/value

This project provides exploratory insights into how platform architecture may moderate influencer dynamics, with implications for privacy-conscious platform design and future comparative studies.

Social media has reshaped industries, crisis communication and socio-cultural research, transforming our understanding of human behavior (Valle-Cruz, López-Chau, & Sandoval-Almazán, 2020) while blurring lines between private and public sharing. Social media influencers play a crucial role in these dynamics, persuading large audiences and impacting marketing campaigns and elections (Okuah, Scholtz, & Snow, 2019). Followers often adopt similar hashtags and language, increasing the risk of identifiable information being shared. Third-party consumers can access users' posts through APIs, relying on consent not to disclose identities. Once public, posts can be archived and used by marketers or adversaries, raising concerns over personal data control (Tene and Polonetsky, 2013; Trepte, 2020), particularly as personal information in viral content can be exploited (Beigi, Shu, Zhang, & Liu, 2018).

Users must be cautious about sharing information online, while organizations collecting this data are responsible for protecting privacy. Poor data handling can result in identifying individuals from ostensibly anonymous posts (Beigi et al., 2018), making it essential to understand extractable data types and potential attack methods (Orooji, Rabbanian, & Knapp, 2023). Despite privacy concerns, incentives motivate users to share personal information (Bélanger & Crossler, 2011; Yang & Huang, 2019), driven by factors such as FOMO (Alutaybi, Al-Thani, McAlaney, & Ali, 2020), guilt (Yang & Huang, 2019) and the desire to manage public personas (Zhang et al., 2020a, b). While research highlights these behavioral factors, it often overlooks how they relate specifically to influencer networks (Farivar, Wang, & Turel, 2022), where engagement-driven frequent posting can diminish self-control and risk assessment as users emulate influencers as role models (Gross & von Wangenheim, 2018; Ki & Kim, 2019). A limited understanding remains of the actual behaviors associated with follower engagement in these networks (Farivar et al., 2022).

Social media networks consist of individuals with varying degrees of relationships, often featuring influencers who significantly impact followers' engagement behaviors (Okuah et al., 2019). This influence can produce similar posting patterns, exposing identifiable data that third parties can exploit. Engaging with trends that encourage personal disclosure can lead to privacy-invasive behaviors (Valle-Cruz et al., 2020), and users may not realize their shared data can be aggregated to identify them or be misused by malicious actors (Keküllüoğlu et al., 2020; Moura and Serrão, 2016). This study critically reviews the literature on influencer identification, network analysis methods, privacy risks, personally identifiable information (PII) and social media disclosure.

Trust and behavioral mimicry

Zhang and Choi (2022) observed that influencers often share entertaining content to capture audience attention before building trust to achieve their objectives. However, social networking apps can create a deceptive sense of security, leading users to believe they are sharing information safely when third-party access may be extensive (Waldman, 2016). Even privacy-concerned users may still participate in social media to gain social capital or fulfill psychological needs, as obtaining social capital will often require compromising privacy (Westin & Chiasson, 2019).

Using social identity theory (Tajfel & Turner, 1979), users may disclose PII to align with perceived community norms or impress influencers they follow. If these norms evolve to discourage PII sharing, followers may alter their behavior accordingly. Through social learning theory (Bandura, 1977), PII disclosures may increase when individuals observe influencers or peers engaging in this behavior and mimic it. Conversely, declines in PII sharing might result from decreased influencer relevance or shifting audience norms around disclosure.

Social media as a “technology of self”

The variety of information shared on social media varies in potential risk. Yang and Huang (2019) explored types of self-disclosures and their motivations, introducing Guilty Information Disclosure. Their findings indicate that contradictory behaviors such as seeking confirmation and imitation are common, with social media serving as a means for self-expression where individuals convey guilt and other sensitive information.

When users reveal past actions or details to engage with others, they risk divulging PII such as names or location, sensitive health information (Liao, 2019) or relationship details (Yang & Huang, 2019). Such disclosures can lead to discrimination by entities like insurers or employers (Weston & Wells, 2020), and analysis of comment interactions can compound this risk through discrimination based on network connections (Beigi & Liu, 2020).

The influencer marketing industry was projected to exceed $16 billion in 2022 (Shopify, 2023). Influencer marketing remains one of the most sought-after forms of marketing due to the ability of these social media users to reach other users and assuage their hesitations in purchasing decisions. In a survey by Ki and Kim (2019), respondents admitted they would commit to a purchase if an influencer endorsed the product or service. Influencers do not strictly apply to marketing activities. When targeting an audience, several types of influencers with specific goals and compatibility considerations exist.

Identifying influencers

Okuah et al. (2019) define an influencer as someone with credibility and a significant following, capable of impacting individuals' decisions. Rahman (2022) categorizes influencers by follower count into Nano (under 10,000), Micro (10,000–100,000), Macro (100,000–1 million) and Mega (1 million+) tiers, a classification crucial for marketing campaigns and consumer behavior research.

Influencer power varies in calculation methods across studies, from tools like Klout (Rao, Spasojevic, Li, & Dsouza, 2015) to custom calculations based on network analysis metrics (Kumar, Choudhury, Rawat, & Jayaraman, 2016; Stieglitz & Dang-Xuan, 2013) or combinations of retweets, followers, mentions and favorites (Essaidi, Zaidouni, & Bellafkih, 2020; Sharma, Agarwal, & Sardana, 2018). Alternative approaches include influence scores using user contacts and followers (Lahuerta-Otero & Cordero-Gutiérrez, 2016). Essaidi et al. (2020) highlighted the follower-following ratio, finding that higher values indicate greater influence power. Due to limitations in third-party software, centrality measures emerged as the most feasible method for this study, particularly eigenvector centrality, which accounts for the influence of interconnected nodes (Gunaratne, Coomes, & Haghbayan, 2019).

Information disclosure types and associated attacks

Even with private account features, users can inadvertently reveal information through photos or text exposure (Keküllüoğlu et al., 2020; Powale & Bhutkar, 2013). Beigi and Liu (2020) identify two disclosure types: identity disclosure, mapping a dataset instance to an individual and attribute disclosure, where an adversary infers information from released data. Two corresponding threats were modeled in this study: the Identity Disclosure Attack, using social network data to map users to known identities, and the Attribute Disclosure Attack, using social network data to infer attributes for users within a network group.

PII and sensitive data types

Data elements like birthdates, real names, addresses, phone numbers, emails and financial details are considered personal data, increasing privacy risks in social interactions (Milne, Pettinico, Hajjat, & Markos, 2017). Milne et al. (2017) introduced the Information Sensitivity Typology, categorizing information into Basic Demographics, Secure Identifiers, Contact Information, Financial Information, Community Interaction and Personal Preferences, aligned with NIST and Homeland Security standards for classifying direct PII or potentially linkable data. Their study identified four risk categories (monetary, social, physical and psychological), with consumers perceiving higher risks for Secure Identifiers and lower risks for Basic Demographics.

Rosado (2023) expanded this work with PII-Codex, a Python package for detecting and assessing PII tokens using Microsoft Presidio (Microsoft, 2018), which employs rule-based and named entity recognition models to assign categories and severities based on Milne et al’s. (2017) typology (2016). NER models are effective for entity extraction but tend toward false positives, with one study reporting a precision of 0.82 and a recall of 0.81 (Macri et al., 2023).

Purpose of the study

This quantitative study examined how social media influencers affect followers' sharing of PII across platforms with different architectural designs. This impact is relevant for businesses choosing influencers and forming social media policies, as well as within private corporate networks where excessive sharing can expose sensitive information (Turban, Bolloju, & Liang, 2011). The research investigates Reddit's community-based architecture after an initial Twitter/X pilot, offering exploratory insights into how platform design influences influencer-driven disclosure patterns. No IRB approval was necessary since the study used publicly available posts, and no raw data were retained after analysis.

RQ1.

How do influencer tier and influence power relate to follower engagement and PII disclosure?

RQ2.

Do these relationships vary between follower hierarchy platforms (Twitter/X) and community-based platforms (Reddit)?

RQ3.

Do engagement and PII disclosure rates exhibit temporal decay patterns in social media conversations?

Past studies show that influence power, which incorporates follower count, tends to boost involvement in trending activities (Arora, Bansal, Kandpal, Aswani, & Dwivedi, 2019; Lahuerta-Otero & Cordero-Gutiérrez, 2016) and engagement likelihood increases with influencer prominence (Rahman, 2022). Therefore, influencers with higher influence power and tier likely encourage more followers to share personal information through increased interaction, even when the original author does not share their own details.

H1.

Influence Power positively correlates with follower engagements.

H2.

Influence Power positively correlates with PII disclosure detections.

H3.

Influencer Tier positively correlates with follower engagements.

H4.

Influencer Tier positively correlates with PII disclosure detections.

Engagement has been observed to decline over time due to trend or influencer irrelevance (Zhang, Zhao, Yang, Paris, & Nepal, 2019), and analyzing this temporal pattern aids understanding of how attitudes and behavior change with social media usage (Saha et al., 2019). Given this decline, the same was hypothesized for PII disclosure rates.

H5.

As time elapses, the engagements in a cluster will decrease.

H6.

As time elapses, the PII disclosure detections in a cluster will decrease.

The research model is presented in Figure 1.

Figure 1
A diagram representing the research model of influencer impact on engagements and PII disclosures.A diagram representing the research model of influencer impact on engagements and PII disclosures. The diagram includes three main components: Influence Power, Influencer Tier, and Time Elapsed, which are connected to two outcomes: Engagement and PII Disclosures. Influence Power is linked to Engagement through hypotheses H1 and H2, and to PII Disclosures through hypotheses H5 and H6. Influencer Tier is linked to Engagement through hypotheses H3 and H4, and to PII Disclosures through hypotheses H5 and H6. Time Elapsed is linked to Engagement through hypotheses H5 and H6, and to PII Disclosures through hypotheses H5 and H6. Arrows indicate the directional relationships between these components and outcomes.

Research model of influencer impact on engagements and PII disclosures

Figure 1
A diagram representing the research model of influencer impact on engagements and PII disclosures.A diagram representing the research model of influencer impact on engagements and PII disclosures. The diagram includes three main components: Influence Power, Influencer Tier, and Time Elapsed, which are connected to two outcomes: Engagement and PII Disclosures. Influence Power is linked to Engagement through hypotheses H1 and H2, and to PII Disclosures through hypotheses H5 and H6. Influencer Tier is linked to Engagement through hypotheses H3 and H4, and to PII Disclosures through hypotheses H5 and H6. Time Elapsed is linked to Engagement through hypotheses H5 and H6, and to PII Disclosures through hypotheses H5 and H6. Arrows indicate the directional relationships between these components and outcomes.

Research model of influencer impact on engagements and PII disclosures

Close modal

The next subsections show steps from data collection to final analysis and hypothesis testing.

Throughout the pilot study, X (Twitter) served as the data source. Due to unprecedented changes in X's API offering, the data source was switched to Reddit for the main study. With both sources, some conversations may still have been developing when polling for trending topics; therefore, full conversational threads cannot be guaranteed, as conversations may evolve over time.

In the pilot study, posts from X were collected every 15 minutes on various days during the transition from legacy to new tier limits. The dataset used top conversations with keywords yielding four collections: Zelda, Jedi, Liverpool and Ferrari. When the final collection was retrieved, API limits and terms changed, prompting a pivot to Reddit. In the main study, Reddit posts were gathered at 30-min intervals over 11 days (September 12–22, 2023), modeled after Kim, Jang, Kim, and Wan (2018) X study methodology. Collections stayed within the Free Reddit API's limit of 100 requests per minute (Reddit API, n.d.; Stoddard, 2021) and occurred from 7 AM to 10 PM Mountain Standard Time to permit human oversight of data collection. Posts and subsequent thread interactions were recorded by examining comments on initial posts and replies that followed.

PII identification and risk calculation

The PII identification and risk calculation allow the evaluation of the risk severity of information disclosed across the network graph. The PII-Codex (2023) was used for PII detections and categorizations in combination with the Not Identified, Identifiable and Identified categories by Schwartz and Solove (2011). The library uses a severity scale of 1, 2 and 3 for the categories of Not-Identified, Identifiable and Identified, respectively, to determine the risk score rs of a token. The isolated set of PII types and their associated categories and severities provided by Milne et al. (2017) and the PII-Codex Risk Values (2023) are presented in Table 1.

Table 1

Data typology for risk assessments with risk Enum coding from PII codex

TypeCluster membershipNIST categoryHomeland security categoryPII-codex idPII-codex risk value
Country of CitizenshipBasic DemographicsLinkableLinkableCOUNTRY_OF_CITIZENSHIP2
Zip code +4Basic DemographicsLinkableNot MentionedZIPCODE2
GenderBasic DemographicsLinkableNot MentionedGENDER2
Birth DateBasic DemographicsLinkableLinkableDATE2
Online Screen NamePersonal PreferencesDirectly PIINot MentionedSCREEN_NAME3
ReligionPersonal PreferencesLinkableNot MentionedNRP2
Political AffiliationPersonal PreferencesLinkableNot MentionedNRP2
Email AddressPersonal PreferencesDirectly PIIStand Alone PIIEMAIL_ADDRESS3
IP AddressContact InformationDirectly PIINot MentionedIP_ADDRESS3
Phone NumberContact InformationDirectly PIIStand Alone PIIPHONE_NUMBER3
AddressContact InformationLinkableNot MentionedLOCATION2
Social Network ProfileCommunity InteractionLinkableNot MentionedSCREEN_NAME2
Credit Card NumberFinancial InformationDirectly PIIStand Alone PIICREDIT_CARD_NUMBER3
Financial Account NumbersFinancial InformationDirectly PIIStand Alone PII…Various3
Home AddressSecure IdentifiersDirectly PIIStand Alone PIILOCATION3
LocationSecure IdentifiersLinkableNot MentionedLOCATION2

The risk score mean provided by the library was calculated using the mean severity score of each token detected in a text.

Each post's risk score mean value is then added to the collection's final calculation of the risk score using the mean of means formula:

(1)

The min, median and max calculations of this mean risk score, alongside what types of PII were detected with the input, are provided per node and per cluster within the final dataset for future evaluation.

Metrics collected

Influencer power, influencer tier, disclosure detections and the associated cluster details were collected per node, as shown in Table 2.

Table 2

Node metrics collected

Column nameTypeDescription
Node IDUUIDUnique identifier for post (replaces original platform identifier)
User IDUUIDUnique identifier assigned for user (replaces original platform identifier)
Cluster NameStrComposite ID for subgraph using collection name and subgraph index
Influence PowerFloatEigenvector centrality
Influencer TierStrCategorical label calculated by follower count
Collection NameStrTrend collection assigned based on search query
HashtagsSet(str)The set of hashtags included in the node
PII DisclosedBoolWhether or not PII was disclosed
PII DetectedSet(str)The detected token types in post
PII Risk ScoreFloatThe PII score for all tokens in a post
Is CommentBoolWhether or not the post is a comment or reply
Is Text StarterBoolWhether or not the post has text content
CommunityStrThe group, community, channel, etc. associated with
TimestampTimestampCreation timestamp (provided by social media API)
Time ElapsedIntTime elapsed (seconds) from original influencer's post

Metrics summarizing the cluster details were collected, including influencer summaries (for all influencers within the cluster), risk score statistics, disclosure and engagement counts and ratios, the periods of each cluster and the average time elapsed between responses within. These metrics and details are enumerated with their respective types in Table 3.

Table 3

Cluster metrics and summarizations

Column nameTypeDescription
Cluster NameStrComposite ID for subgraph using collection name and subgraph index
Influencer Tiers FrequenciesList[dict]Frequency of influencer tiers of all users in the cluster
Top Influence Power ScoreFloatEigenvector centrality of top influencer
Top Influencer TierStrSize tier of top influencer
Collection NameStrTrend collection assigned based on search query
HashtagsSet(str)The set of hashtags included in the cluster
PII Detection FrequenciesList[dict]The detected token types in post with frequencies
Node CountIntCount of all nodes in the influencer cluster
Node DisclosuresIntCount of all nodes with mean_risk_score >1*
Disclosure RatioFloatSum of nodes with confirmed disclosed PII divided by cluster size
Mean Risk ScoreFloatThe mean risk score for an entire network cluster
Median Risk ScoreFloatThe median risk score for an entire network cluster
Min Risk ScoreFloatThe min risk score for an entire network cluster
Max Risk ScoreFloatThe max risk score for an entire network cluster
Time SpanFloatTotal Time Elapsed

Constructing social graphs

The submission_name and parent_id attributes from Reddit posts drove the cluster construction in the study. Since identifiers like the conversation_id, submission_name, id and parent_id attribute types can be used to track down pieces of a conversation thread, an internal unique identifier labeled post_uuid was used instead to track relationships between the nodes, as shown in the pseudocode in Algorithm 1.

ALGORITHM 1.

Reddit Social Graph and Summary Construction Overview.

  • Function build_social_graph_summaries Object posts

  •  G = nx.Graph()

  • forall child c in posts do

  •   if c's post_id is not in G then

  •    G.add_node(c)

  •   end

  •   forall replies r in c.comments do

  •    if c's post_id is not in G then

  •     G.add_node(r)

  •     G.add_edge(c, r)

  •    end

  •   end

  •  end

  •  influence_ratings = calculate_influence_ranks(G, posts)

  •  return build_graph_summaries(G, posts, influence_ratings)

  • end

Influencer identification and power score calculation

The two most referenced methods of ranking influencers are centrality values from graph analysis and custom score calculations using favorites, mentions and retweets. This study employs eigenvector centrality as the Influence Power score, calculated using the NetworkX library (Hagberg, Schult, & Swart, 2008). For a cluster graph G = (V, E), NetworkX calculates eigenvector centrality for every node n. Since each trend contains multiple clusters, centrality values and top-influencing node selection are evaluated per sub-graph, with each social graph independently determining its top-influencing node.

Calculating engagements

Engagements in this study included shares and replies on X (Twitter) and comments on Reddit, which enhance post visibility based on platform algorithms. Only response-type engagements may reveal new PII disclosures, as favorites/likes on X or upvotes/downvotes on Reddit do not allow for new text sharing. Two metrics were used: Response Type Engagements, which count only responses, and Total Engagements, which include all engagement types. Limitations in X's API may restrict access to full conversation archives, whereas Reddit allows polling for submissions, though issues may arise from post or user deletions and late participants in discussions.

Data groupings and isolating variable effect

Time series analyses were performed on engagements and PII disclosures per cluster to test whether both trend downward over time (H5, H6). Cluster-wide metrics included total disclosures, total engagements, rates of both, mean risk score, total time span, primary influence power score and influencer tier. These metrics informed descriptive statistics and some analyses, but time series testing required individual node-level data points extracted per cluster.

Data were resampled into 5-, 10- or 15-min time bins based on data density: sparser conversations required larger bins (15-min) to ensure sufficient observations for stationarity testing, while denser conversations used smaller bins (5-min) to capture finer temporal dynamics without over-smoothing. Stationarity was assessed using the Augmented Dickey-Fuller test at α = 0.05, with differencing applied as needed. The differencing order combined with Autocorrelation and Partial Correlation results informed ARIMA model parameters and outcomes were plotted using Plotly.

To assess robustness, we incorporated CUSUM (Cumulative Sum Control Chart), Change Point Detection (CPD) and the Mann–Kendall test as comparative baselines. While ARIMA is parametric and sensitive to local fluctuations, the Mann–Kendall test provides a model-free evaluation of monotonic trends. CUSUM and CPD complement both approaches by detecting gradual shifts and distinct breakpoints in time-binned social media data that traditional methods may miss.

Analyzing within and between clusters

After data collection and processing, each daily collection contained multiple conversation groups, with each collection c having k clusters and every cluster containing n nodes (observations). Each trend collection represents a sample from the broader platform population, with individual groupings within. Figure 2 presents the data grouping breakdown.

Figure 2
A diagram of data groupings using clusters within trend collections.The diagram illustrates the process of data groupings using clusters within trend collections. It starts with social media data as the population, from which samples are taken to form collections. These collections are further divided into clusters. The diagram shows two main collections, each split into two clusters. Each cluster then leads to a series of outcomes, representing observations over time. The relationships between clusters and within clusters are depicted, showing the flow of data from collections to outcomes.

Data groupings using clusters within trend collections

Figure 2
A diagram of data groupings using clusters within trend collections.The diagram illustrates the process of data groupings using clusters within trend collections. It starts with social media data as the population, from which samples are taken to form collections. These collections are further divided into clusters. The diagram shows two main collections, each split into two clusters. Each cluster then leads to a series of outcomes, representing observations over time. The relationships between clusters and within clusters are depicted, showing the flow of data from collections to outcomes.

Data groupings using clusters within trend collections

Close modal

With clustered data, correlation within the hierarchical structure derived from aggregated user-level interactions may violate independence assumptions (Nielsen, Smink, & Fox, 2021). This approach requires sufficient sample sizes to estimate random effects, both in cluster count and nodes per cluster. Between-cluster analysis used Spearman's Rank correlation, given non-normal distributions, confirmed via the Anderson-Darling test. Both tests used α = 0.05 and were conducted using SciPy.

Hypothesis testing

Spearman's correlation tests assessed the relationship between the primary influencer's power index and the dependent variables (Engagement and PII Disclosures), given non-normal distributions. Hypothesis pairs H1/H2, H3/H4 and H5/H6 were tested at the collection level using α = 0.05. Time series analysis was performed per cluster to visualize trends in engagements and disclosures.

The pilot analysis used X (Twitter) as the data source, pulling 10,259 posts, many of which were one-off posts unrelated to conversational exchanges. Following X's API tier restructuring, which moved key endpoints to Pro and Enterprise tiers, the data source pivoted to Reddit. The main study collected 122,904 posts and subscriber/follower data for 93,982 users across 285 conversation clusters from Reddit (September 12–22, 2023). Table 4 presents the full dataset composition.

Table 4

Main study– Reddit dataset composition

CollectionClustersPostsUsersString tokensPII tokens% nodes disclosed
2023–09–12157,5385,664178,0543,63030.8835
2023–09–13439,1026,934241,4823,85127.3237
2023–09–142612,3249,266329,8666,01429.3817
2023–09–152611,9308,785294,4546,56831.8022
2023–09–163014,81612,260282,0216,35527.6795
2023–09–172012,2378,899288,0464,99826.2074
2023–09–182410,7528,226265,5496,55034.0402
2023–09–191612,52910,840244,2813,58529.4668
2023–09–204611,6198,485339,4046,97133.6512
2023–09–21106,8524,449187,5575,49733.4501
2023–09–222913,20510,174321,7306,68929.5798

Across daily collections, Person types (strings identified as potential individual names) constituted the vast majority of PII detections, followed by Datetime, NRP (Nationality, Religious or Political mentions), Location and one Medical License identification determined to be a false positive.

The following sections cover the individual analysis attempts of the main study and its results.

There were 183 total clusters when using the 30-node minimum threshold for clusters. Due to the volume of graphs, summaries for every collection are provided in lieu of providing every graph associated with the clusters. While some collections held a small set of clusters, others held significantly more, as presented by the September 15th collection in Figure 3.

Figure 3
Two line graphs showing engagements and disclosures over time on Reddit.Two line graphs display engagements and disclosures across Reddit conversations on September 15th. The top graph shows engagements with multiple colored lines representing different data sets, peaking around 10:00. The bottom graph shows disclosures with similar colored lines, also peaking around 10:00. Both graphs have time on the x-axis in 15-minute intervals and counts on the y-axis. A dashed black line represents the average for each graph. The highest average engagements and disclosures occur around 10:00. All values are approximated.

Engagements and disclosures across Reddit conversations on September 15th

Figure 3
Two line graphs showing engagements and disclosures over time on Reddit.Two line graphs display engagements and disclosures across Reddit conversations on September 15th. The top graph shows engagements with multiple colored lines representing different data sets, peaking around 10:00. The bottom graph shows disclosures with similar colored lines, also peaking around 10:00. Both graphs have time on the x-axis in 15-minute intervals and counts on the y-axis. A dashed black line represents the average for each graph. The highest average engagements and disclosures occur around 10:00. All values are approximated.

Engagements and disclosures across Reddit conversations on September 15th

Close modal

Of the 183 clusters analyzed, ARIMA identified descending trends in 146 (79.78%) for engagements, with 73 showing statistically significant AR and MR coefficients. For disclosure-active clusters, 121 showed descending trends, with 68 statistically significant. CUSUM Drift detected declining trends in 73 clusters for engagements and 68 for disclosures; Change Point Decline detected 182 and 153, respectively; Mann-Kendall detected none. When requiring agreement among at least three methods, 61 clusters showed declining engagement trends, and 50 showed declining disclosure trends.

For each collection (Figure 4), consensus across decline detection methods was assessed, with maximum average agreement just under 3. As shown in Figure 5, Mann–Kendall contributed zero decline detections across all collections for both engagement and disclosure data, likely due to the noisy, non-monotonic nature of time-binned social media activity associated with cross-sectional sampling. Change Point Decline and ARIMA consistently identified declines, followed by CUSUM Drift.

Figure 4
A bar graph comparing average agreement scores for engagement and disclosure over different collection dates.A bar graph compares average agreement scores for engagement and disclosure over different collection dates. The horizontal axis represents the collection dates from 2023-09-12 to 2023-09-22, and the vertical axis represents the average agreement score ranging from 0 to 2.5. The graph features two sets of bars for each date: one in purple representing engagement average agreement and one in green representing disclosure average agreement. Notable trends include a peak in disclosure average agreement on 2023-09-16 and relatively consistent engagement average agreement scores across most dates. The highest engagement average agreement is observed on 2023-09-20, while the lowest disclosure average agreement is on 2023-09-18.

Agreement on declining trend detections across detection methods

Figure 4
A bar graph comparing average agreement scores for engagement and disclosure over different collection dates.A bar graph compares average agreement scores for engagement and disclosure over different collection dates. The horizontal axis represents the collection dates from 2023-09-12 to 2023-09-22, and the vertical axis represents the average agreement score ranging from 0 to 2.5. The graph features two sets of bars for each date: one in purple representing engagement average agreement and one in green representing disclosure average agreement. Notable trends include a peak in disclosure average agreement on 2023-09-16 and relatively consistent engagement average agreement scores across most dates. The highest engagement average agreement is observed on 2023-09-20, while the lowest disclosure average agreement is on 2023-09-18.

Agreement on declining trend detections across detection methods

Close modal
Figure 5
A radar chart showing aggregate method sensitivity across conversations with two data series labeled Engagements and Disclosures.A radar chart with four axes labeled CUSUM Drift, Mann-Kendall, ARIMA, and Change Point Decline. The chart features two data series: Engagements in purple and Disclosures in green. The axes are marked with values ranging from 0 to 200. The Engagements series shows higher values on the Change Point Decline axis, while the Disclosures series shows higher values on the Mann-Kendall axis. Both series have lower values on the CUSUM Drift and ARIMA axes.

Aggregate method sensitivity across conversations

Figure 5
A radar chart showing aggregate method sensitivity across conversations with two data series labeled Engagements and Disclosures.A radar chart with four axes labeled CUSUM Drift, Mann-Kendall, ARIMA, and Change Point Decline. The chart features two data series: Engagements in purple and Disclosures in green. The axes are marked with values ranging from 0 to 200. The Engagements series shows higher values on the Change Point Decline axis, while the Disclosures series shows higher values on the Mann-Kendall axis. Both series have lower values on the CUSUM Drift and ARIMA axes.

Aggregate method sensitivity across conversations

Close modal

Reddit data clusters were more comprehensive than Twitter's, with the largest containing 7,470 nodes (September 16th) and the second largest 6,672 (September 19th). Of the 285 clusters, 210 contained more than one node, 183 had at least 30 nodes, 171 had at least 100 and only 30 (10.53%) exceeded 1,000. A baseline of 30 nodes was established to address sample size issues after time-of-post adjustments. Table 5 summarizes correlation results across varying size thresholds.

Table 5

Main study correlation results using dataset segmented by cluster size

Run 1 (N = 183)Run 2 (N = 182)Run 3 (N = 30)
Influencer Tier & Engagements0.1952**0.1763*−0.3025
Influencer Tier & Disclosures0.14250.1223−0.0837
Influence Power & Engagements0.5728***0.5658***0.4972**
Influence Power & Disclosures0.4646***0.4559***0.0527
Time Elapsed & Engagements0.1608*0.1677*−0.1880
Time Elapsed & Disclosures0.1645*0.1711*−0.1782

Note(s): Asterisks denote p-value. *p < 0.05, **p < 0.01 and ***p < 0.001

Anderson-Darling tests confirmed non-normal distributions, necessitating Spearman's Rank Correlation across three robustness analyses. Run 1 (n = 183, 30-node minimum) ensured adequate sample size for time series resampling. Run 2 (n = 182, 50-node threshold) increased statistical power while maintaining breadth, showing consistently positive coefficients (p < 0.05). Run 3 (n = 30, 1,000+ nodes) isolated the largest conversations to test whether fuller thread capture strengthened effects. Influence power maintained significant correlations with engagement (p < 0.01), but influencer tier and time elapsed pairings showed negative coefficients with reduced significance, suggesting measurement challenges at this threshold.

The unplanned platform transition from Twitter/X to Reddit due to API access changes created non-equivalent comparison groups with different sampling methods and user populations. However, this pivot revealed how platform architecture may shape influencer disclosure relationships. Twitter/X's follower hierarchy model showed clear correlations between influencer tier and both engagement (r = 0.26, p < 0.01) and disclosures (r = 0.20, p < 0.01), while Reddit's community-first architecture, where users follow subreddits rather than individuals, showed weaker tier correlations.

This architectural difference created measurement challenges: Reddit's “active user count” (15-min interaction window) proved unstable for tier classification compared to Twitter/X's persistent follower counts. Despite this, influence power (eigenvector centrality) maintained significant positive correlations with both engagement and disclosures across platforms, suggesting network position matters regardless of architecture. These exploratory observations suggest community-based platforms may diffuse influencer power differently than follower hierarchy models, but designed studies with equivalent sampling are needed to confirm this. The Mann–Kendall test contributed no decline detections, consistent with research showing monotonic trend tests are poorly suited to bursty social media patterns (Lehmann, Gonçalves, Ramasco, & Cattuto, 2012; Mathioudakis & Koudas, 2010), supporting the need for adaptive methods like CPD and ARIMA in social media analysis. Reddit's fuller conversation threads also made it more straightforward to observe engagement fluctuations and eventual decline, with more complete and numerous clusters facilitating the time series analyses. Using active user counts for Reddit tier classification introduces an additional reliability concern beyond architectural non-equivalence: these counts capture a 15-min interaction window and fluctuate substantially with time of day and algorithmic promotion, making tier assignments transient rather than stable proxies for audience size; subscriber counts or longitudinally averaged activity metrics would provide more consistent classifications in future studies.

The dataset showed diverse correlations, with filtering revealing predominantly positive relationships among variable pairings. Some results may have limited reliability due to the mixed-media nature of threads (text, images, videos) and limited understanding of community context.

On Reddit, community groups rather than individual users are emphasized. Individual subscriber data were limited, so the active member count (accounts interacting within a 15-min window) was used instead. However, this count fluctuates over time, making influencer tier classification transient and less reliable than Twitter/X's persistent follower counts.

In the Twitter/X corpus, significant correlations (p < 0.01) were found between influencer tier and both engagements (r = 0.2569) and disclosures (r = 0.2001). Reddit's correlations were positive (r = 0.3887, r = 0.3133) but both p-values exceeded 0.05. Given these measurement challenges, influencer tier results (H3, H4) should be interpreted cautiously, and the cross-platform comparison suggests tier effects may be platform-dependent, with stronger effects in follower hierarchy architectures. Researchers studying influencer tiers on community-based platforms may not benefit from the tier assignment method used here.

For H1 and H2, influence power showed positive correlations with engagements and PII disclosures in the Reddit dataset, though variations may exist between image/video-initiated and text-initiated clusters. In the Twitter/X dataset, this was confirmed only after filtering smaller clusters, as eigenvector centrality values were significantly affected by cluster size.

For H5 and H6, time series analysis indicated post-peak declines for both engagements and disclosures, with 76.50% of Reddit clusters showing engagement declines and 66.12% showing disclosure declines. Among declining clusters, more than half achieved consensus across at least three trend detection methods. These hypotheses are supported, though caution is advised given that many conversations were still in progress when data collection ended.

Across all analyses, Twitter/X showed significant relationships for influence power with engagements and disclosures after filtering smaller clusters, while Reddit showed predominantly positive correlations at p < 0.05 or p < 0.01 across variable pairings. ARIMA analysis confirmed descending trends in most Reddit clusters and approximately half of the pilot dataset. Table 6 summarizes hypotheses and associated results per phase.

Table 6

Hypothesis testing and results summary

HypothesisResult
H1: Influence Power positively correlates with follower engagementsSupported
H2: Influence Power positively correlates with PII disclosure detectionsSupported
H3: Influencer Tier positively correlates with follower engagementsSupported*
H4: Influencer Tier positively correlates with PII disclosure detectionsSupported*
H5: As time elapses, the engagements in a cluster will decreaseSupported
H6: As time elapses, the PII disclosure detections in a cluster will decreaseSupported

Note(s): Asterisks (*) denote reliability concerns from platform architectural differences. Caution advised

This study faced several limitations. Firstly, the focus started exclusively on a single social media platform, X (Twitter), which underwent various changes during the data collection, leading to issues such as the unavailability of full conversation histories and reduced post availability due to updated API limits. The analysis expanded to Reddit's community-based architecture, providing exploratory cross-platform insights while introducing measurement challenges for influencer tier classification.

Additionally, the study relied solely on eigenvector centrality to measure influence, thereby overlooking factors such as community involvement and cross-platform activity. Manual validation of automated PII detection was not feasible due to the dataset size. As with any NER-based approach, automated detection introduces false positives and false negatives (Macri et al., 2023). Published evaluations of Microsoft Presidio report variable performance depending on configuration and domain, with precision estimates ranging from approximately 0.51 in baseline clinical evaluations to about 0.89 in customized implementations (Alrazihi, Biswas, & George, 2025; Kotevski et al., 2022).

To provide conservative uncertainty bounds using parameters reported within a single empirical evaluation, precision and recall estimates from the baseline study (precision = 0.51, recall = 0.74) were applied to the observed disclosure rate of 29.41%. This yields a precision-adjusted lower estimate of approximately 15.00%, representing confirmed true detections, and a recall-corrected prevalence estimate of approximately 20.27% that accounts for likely missed cases. Although the observed rate may overestimate absolute prevalence under conservative assumptions, the dominant PII categories identified in this study, including person names, datetime expressions and professional identifiers, are commonly reported in NER research to achieve comparatively higher precision due to stronger contextual cues (Macri et al., 2023). Since the same detection pipeline was consistently used throughout the dataset, it would be expected to introduce predominantly non-differential measurement error, which may reduce effect sizes but is unlikely to invalidate the correlational findings.

Data collection was intentionally limited to 7 AM through 10 PM Mountain Standard Time to allow for human oversight during data pulls, excluding overnight trending submissions. Reddit's “hot submissions” feature was selected as the closest analog to Twitter/X's trending conversations to maintain methodological consistency across platforms despite architectural differences. Its score is calculated based on how many upvotes vs downvotes it has received (Kuutila, Rantala, Li, Hosio, & Mäntylä, 2024). This approach resulted in some clusters being captured before peak activity, potentially obscuring full trend patterns. The study also did not account for varying content types in trending posts or the impact of post length differences, as Reddit allows much longer posts compared to X (Twitter).

Finally, the study was completed entirely without obtaining input from the platform users as to why they were interacting with the post or user. An individual's motivations to engage will differ throughout their time on the platform. Obtaining their input on why they engage in combination with the actual observed behavior limited the study by not accounting for individual differences, motivators and psychological factors.

This study shows a connection between influencer reach, measured by eigenvector centrality and the sharing of PII on social media. As reach grows, so does the chance for engagement and information disclosure, with an average of 29.41% of 122,904 posts containing PII (adjusted estimate: 15.00–20.27% after accounting for automated detection errors) in higher-interaction threads. Though viral discussions can start from non-influencers and platform algorithms may also play a role, influencers and marketers should be aware that some exchanges might put their followers' privacy at risk. Importantly, it was not determined whether conversation starters actively encouraged sharing or whether sensitive data were exchanged through media. These disclosures were made publicly by users; while regulations like GDPR and CCPA control how platforms handle data, user-initiated public disclosures create unique privacy challenges that go beyond current laws.

The observed behavioral shifts can be understood through Social Learning and Social Identity theories. Social Learning Theory suggests influencers can be targeted to model desirable privacy behaviors, while also accounting for undesirable disclosure patterns. Social Identity Theory suggests that group identity and shared norms-based campaigns can drive privacy-aware behaviors as part of a community's core values. Platform algorithms that prioritize high-engagement content may inadvertently amplify PII-containing threads, compounding individual privacy risks through increased visibility. Users must adopt privacy-conscious practices to reduce exposure to fraud or discrimination, and targeted campaigns (influencer-led or otherwise) can address these challenges when traditional educational efforts fall short.

A between-subjects experiment could assign participants to simulated social media threads where an influencer either discloses or withholds PII (independent variable 1: influencer disclosure condition) across hierarchical and community-based platform designs (independent variable 2: platform architecture). This design directly operationalizes the two key variables that emerged from the observational findings, providing the controlled, comparable sampling test needed to establish causal inference where the unplanned platform transition could not. Participant disclosure behavior, measured by the number and sensitivity of PII items shared in a mock posting task, would serve as the primary dependent variable, with a secondary self-report measure of perceived disclosure norms to capture attitudinal shifts. Based on the present findings, we predict that influencer disclosure increases follower PII sharing and that this effect is stronger under hierarchical designs, consistent with the observed attenuation on Reddit relative to Twitter/X.

This study examined how social media influencers impact followers' PII disclosures, revealing that influence power (eigenvector centrality) significantly and positively correlates with both engagement and disclosure rates across platforms (RQ1). Influencers with greater network reach drove higher engagement and disclosure rates, with 29.41% of posts in the Reddit dataset containing PII (adjusted estimate: 15.00% to 20.27% after accounting for automated detection error). Temporal analysis showed declining trends in both engagement (79.78% of threads) and disclosures (66.12% of threads), supporting H5 and H6.

The unplanned platform transition from Twitter/X to Reddit provided exploratory insights into how platform architecture moderates influencer effects (RQ2). Twitter/X's follower hierarchy model showed stronger correlations between influencer tier and disclosures (r = 0.26, p < 0.01) compared to Reddit's community-based structure, suggesting community-driven platforms may inherently diffuse influencer power. However, influence power maintained significant positive correlations across both architectures, indicating network position matters regardless of platform design. Temporal patterns were examined using a triangulated approach combining ARIMA, CUSUM Drift and CPD, each capturing distinct aspects of nonlinear, bursty engagement data. Convergent findings across methods strengthen confidence in the observed trends and offer a replicable framework for short-window social media analysis. The Reddit tier findings (H3, H4) warrant caution, as active user counts used for tier classification fluctuate within short windows, producing transient assignments less stable than Twitter/X's persistent follower counts. Likewise, the 29.41% PII disclosure rate reflects automated NER output, and the precision-adjusted estimate of 15.00% to 20.27% is the more defensible figure for downstream policy or design recommendations.

Future research should use designed platform comparisons with equivalent sampling to test whether community-based architectures offer privacy-protective effects. Longitudinal studies with complete conversation threads would clarify patterns of temporal decay, and enhanced PII detection using machine learning classifiers beyond NER could lower false positive rates while maintaining PII-Codex's categorization framework. Including user motivations through survey data would complement behavioral observations, addressing why users disclose PII despite privacy concerns. Standardizing influencer tier classification on community-based platforms through subscriber counts or longitudinally averaged activity metrics, rather than instantaneous active user counts, would also resolve the measurement instability identified here and enable more reliable cross-platform comparisons.

This research used publicly accessible social media posts from Twitter/X and Reddit. Raw post content was not retained following analysis. All datasets (Rosado, 2024) were sanitized by replacing original platform identifiers such as user IDs, post IDS, usernames and similar identifiers with unique internal identifiers to prevent re-identification. No IRB approval was required as the study involved no direct interaction with human participants.

Alrazihi
,
L. A.
,
Biswas
,
S.
, &
George
,
J.
(
2025
).
Evaluating the accuracy of automated and semi-automated anonymization tools for unstructured health records
.
Surgical Neurology International
,
16
,
313
. doi: .
Alutaybi
,
A.
,
Al-Thani
,
D.
,
McAlaney
,
J.
, &
Ali
,
R.
(
2020
).
Combating fear of missing out (FOMO) on social media: The fomo-r method
.
International Journal of Environmental Research and Public Health
,
17
(
17
), 6128. doi: .
Arora
,
A.
,
Bansal
,
S.
,
Kandpal
,
C.
,
Aswani
,
R.
, &
Dwivedi
,
Y.
(
2019
).
Measuring social media influencer index- insights from Facebook, Twitter and Instagram
.
Journal of Retailing and Consumer Services
,
49
,
86
101
. doi: .
Bandura
,
A.
(
1977
).
Social learning theory
.
Englewood Cliffs, NJ
:
Prentice Hall
.
Beigi
,
G.
, &
Liu
,
H.
(
2020
).
A Survey on privacy in social media
.
ACM/IMS Transactions on Data Science
,
1
(
1
),
7
38
. doi: .
Beigi
,
G.
,
Shu
,
K.
,
Zhang
,
Y.
, &
Liu
,
H.
(
2018
).
Securing social media user data
. In
Proceedings of the 29th ACM Conference on Hypertext and Social Media
(pp. 
165
173
). doi: .
Bélanger
,
F.
&
Crossler
,
R. E.
(
2011
).
Privacy in the digital age: A review of information privacy research in information systems
.
MIS Quarterly
,
35
(
4
),
1017
1042
. doi: .
Essaidi
,
A.
,
Zaidouni
,
D.
, &
Bellafkih
,
M.
(
2020
).
New method to measure the influence of Twitter users
. In
Proceedings of the 4th International Conference on Intelligent Computing in Data Sciences (ICDS)
(pp. 
1
5
). doi: .
Farivar
,
S.
,
Wang
,
F.
, &
Turel
,
O.
(
2022
).
Followers’ problematic engagement with influencers on social media: An attachment theory perspective
.
Computers in Human Behavior
,
133
, 107288. doi: .
Gross
,
J.
,
Wangenheim
,
F. V.
(
2018
).
The big four of influencer marketing. A typology of influencers
.
Marketing Review St. Gallen
,
35
(
2
),
30
-
38
.
Gunaratne
,
K.
,
Coomes
,
E. A.
, &
Haghbayan
,
H.
(
2019
).
Temporal trends in anti-vaccine discourse on Twitter
.
Vaccine
,
37
(
35
),
4867
4871
. doi: .
Hagberg
,
A. A.
,
Schult
,
D. A.
, &
Swart
,
P. J.
(
2008
).
Exploring network structure, dynamics, and function using NetworkX
.
Proceedings of the 7th Python in Science Conference (SciPy2008)
,
11
15
.
Keküllüoglu
,
D.
,
Magdy
,
W.
, &
Vaniea
,
K.
(
2020
).
Analysing privacy leakage of life events on Twitter
. In
Proceedings of the 12th ACM Conference on Web Science
(pp. 
287
294
). doi: .
Ki
,
C. W. ‘C.
, &
Kim
,
Y. K.
(
2019
).
The mechanism by which social media influencers persuade consumers: The role of consumers’ desire to mimic
.
Psychology and Marketing
,
36
(
10
),
905
922
. doi: .
Kim
,
H.
,
Jang
,
S. M.
,
Kim
,
S.-H.
, &
Wan
,
A.
(
2018
).
Evaluating sampling methods for content analysis of Twitter data
.
Social Media + Society
,
4
(
2
),
1
10
. doi: .
Kotevski
,
D. P.
,
Smee
,
R. I.
,
Field
,
M.
,
Nemes
,
Y. N.
,
Broadley
,
K.
, &
Vajdic
,
C. M.
(
2022
).
Evaluation of an automated presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting
.
International Journal of Medical Informatics
,
168
, 104880. doi: .
Kumar
,
P.
,
Choudhury
,
T.
,
Rawat
,
S.
, &
Jayaraman
,
S.
(
2016
).
Analysis of various machine learning algorithms for enhanced opinion mining using Twitter data streams
. In
Proceedings of 2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE)
(pp. 
265
270
). doi: .
Kuutila
,
M.
,
Rantala
,
L.
,
Li
,
J.
,
Hosio
,
S.
, &
Mäntylä
,
M.
(
2024
).
What makes programmers laugh? Exploring the submissions of the subreddit r/ProgrammerHumor
. In
Proceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM ’24)
(pp. 
371
381
).
Association for Computing Machinery
. doi: .
Lahuerta-Otero
,
E.
, &
Cordero-Gutiérrez
,
R.
(
2016
).
Looking for the perfect tweet. The use of data mining techniques to find influencers on twitter
.
Computers in Human Behavior
,
64
,
575
583
. doi: .
Lehmann
,
J.
,
Gonçalves
,
B.
,
Ramasco
,
J. J.
, &
Cattuto
,
C.
(
2012
).
Dynamical classes of collective attention in Twitter
. In
Proceedings of the 21st International Conference on World Wide Web
(pp. 
251
260
).
Association for Computing Machinery
. doi: .
Liao
,
Y.
(
2019
).
Sharing personal health information on social media
. In
Proceedings of the 10th International Conference on Social Media + Society
(pp. 
194
204
). doi: .
Macri
,
C. Z.
,
Teoh
,
S. C.
,
Bacchi
,
S.
,
Tan
,
I.
,
Casson
,
R.
,
Sun
,
M. T.
, …
Chan
,
W.
(
2023
).
A case study in applying artificial intelligence-based named entity recognition to develop an automated ophthalmic disease registry
.
Graefe's archive for clinical and experimental ophthalmology = Albrecht von Graefes Archiv fur klinische und experimentelle Ophthalmologie
,
261
(
11
),
3335
3344
. doi: .
Mathioudakis
,
M.
, &
Koudas
,
N.
(
2010
).
TwitterMonitor: Trend detection over the twitter stream
. In
Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data
(pp. 
1155
1158
).
Association for Computing Machinery
. doi: .
Microsoft
(
2018
).
Microsoft Presidio: Context aware, pluggable and customizable PII anonymization service for text and images
.
Available from
: https://microsoft.github.io/presidio/
Milne
,
G. R.
,
Pettinico
,
G.
,
Hajjat
,
F. M.
, &
Markos
,
E.
(
2017
).
Information sensitivity typology: Mapping the degree and type of risk consumers perceive in personal data sharing
.
Journal of Consumer Affairs
,
51
(
1
),
133
161
. doi: .
Moura
,
J.
, &
Serrão
,
C.
(
2016
). Security and privacy issues of big data. In
Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence
(pp. 
20
52
).
IGI Global
. doi: .
Nielsen
,
N. M.
,
Smink
,
W. A.
, &
Fox
,
J.-P.
(
2021
).
Small and negative correlations among clustered observations: Limitations of the linear mixed effects model
.
Behaviormetrika
,
48
(
1
),
51
77
. doi: .
Okuah
,
O.
,
Scholtz
,
B. M.
, &
Snow
,
B.
(
2019
).
A grounded theory analysis of the techniques used by social media influencers and their potential for influencing the public regarding environmental awareness
.
Proceedings of the South African Institute of Computer Scientists and Information Technologists 2019
,
36
,
1
10
. doi:
Orooji
,
M.
,
Rabbanian
,
S. S.
, &
Knapp
,
G. M.
(
2023
).
Flexible adversary disclosure risk measure for identity and attribute disclosure attacks
.
International Journal of Information Security
,
10
(
3
),
631
645
. doi: .
Powale
,
P. I.
, &
D Bhutkar
,
G.
(
2013
).
Overview of privacy in social networking sites (SNS)
.
International Journal of Computer Applications
,
74
(
19
),
39
46
. doi: .
Rahman
,
K. T.
(
2022
).
Influencer marketing and behavioral outcomes: How types of influencers affect consumer mimicry?
.
SEISENSE Business Review
,
2
(
1
),
43
54
. doi: .
Rao
,
A.
,
Spasojevic
,
N.
,
Li
,
Z.
, &
Dsouza
,
T.
(
2015
).
Klout score: Measuring influence across multiple social networks
. In
Proceedings of the 2015 IEEE International Conference on Big Data (Big Data)
(pp. 
2282
2289
). doi: .
Reddit.com
(
n.d.-b
).
Reddit
.
Available from:
 https://www.reddit.com/wiki/api/
Rosado
,
E. J.
(
2023
).
PII-codex: A Python library for PII detection, categorization, and severity assessment
.
Journal of Open Source Software
,
8
(
86
), 5402. doi: .
Rosado
,
E. J.
(
2024
).
Privacy vs. Social Capital: Social Media PII Disclosure Analyses (0.0.1)
.
Zenodo
, Available from: https://doi.org/10.5281/zenodo.13133302
Saha
,
K.
,
Bayraktaroglu
,
A. E.
,
Campbell
,
A. T.
,
Chawla
,
N. V.
,
De Choudhury
,
M.
,
D’Mello
,
S. K.
,
Dey
,
A. K.
,
Gao
,
G.
,
Gregg
,
J. M.
,
Jagannath
,
K.
,
Mark
,
G.
,
Martinez
,
G. J.
,
Mattingly
,
S. M.
,
Moskal
,
E.
,
Sirigiri
,
A.
,
Striegel
,
A.
, &
Yoo
,
D. W.
(
2019
).
Social media as a passive sensor in longitudinal studies of human behavior and wellbeing
.
Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems
,
1
8
. doi: .
Schwartz
,
P. M.
&
Solove
,
D. J.
(
2011
).
The PII problem: Privacy and a new concept of personally identifiable information
.
New York University Law Review
.
86
.
1814
-
2011
.
Sharma
,
P.
,
Agarwal
,
A.
, &
Sardana
,
N.
(
2018
).
Extraction of influencers across Twitter using credibility and trend analysis
. In
Proceedings of the 11th International Conference on Contemporary Computing
(Vol. 
IC3
, pp. 
1
3
). doi: .
Shopify
. (
2023
).
The ROI of influencer marketing: How to measure and get the most out of your influencer efforts
.
Available from:
 https://www.shopify.com/enterprise/roi-influencer-marketing
Stieglitz
,
S.
, &
Dang-Xuan
,
L.
(
2013
).
Social media and political communication: A social media analytics framework
.
Social Network Analysis and Mining
,
3
(
4
),
1277
1291
. doi: .
Stoddard
,
G.
(
2021
).
Popularity dynamics and intrinsic quality in reddit and hacker news
. In
Proceedings of the International AAAI Conference on Web and Social Media
(Vol. 
9
, pp. 
416
425
). doi: .
Tajfel
,
H.
, &
Turner
,
J. C.
(
1979
). An integrative theory of intergroup conflict. In
W. G.
,
Austin
&
S.
,
Worchel
(Eds.),
The Social Psychology of Intergroup Relations
(pp. 
33
47
).
Monterey, CA
:
Brooks/Cole
.
Tene
,
O.
&
Polonetsky
,
J.
(
2013
).
Big data for all: Privacy and user control in the age of analytics
.
Northwestern Journal of Technology and Intellectual Property
,
11
(
5
), 1.
Trepte
,
S.
(
2020
).
The social media privacy model: Privacy and communication in the light of social media affordances
.
Communication Theory
,
31
(
4
),
549
570
. doi: .
Turban
,
E.
,
Bolloju
,
N.
, &
Liang
,
T.-P.
(
2011
).
Enterprise social networking: Opportunities, adoption, and risk mitigation
.
Journal of Organizational Computing and Electronic Commerce
,
21
(
3
),
202
220
. doi: .
Valle-Cruz
,
D.
,
López-Chau
,
A.
, &
Sandoval-Almazán
,
R.
(
2020
).
Impression analysis of trending topics in Twitter with classification algorithms
. In
Proceedings of the 13th International Conference on Theory and Practice of Electronic Governance
(pp. 
430
441
). doi: .
Waldman
,
A. E.
(
2016
).
Privacy, sharing, and trust: The Facebook study
.
Case Western Reserve Law Review
,
67
(
1
). doi:
Westin
,
F.
, &
Chiasson
,
S.
(
2019
).
Opt out of privacy or Go home
. In
Proceedings of the New Security Paradigms Workshop
(pp. 
57
67
). doi: .
Weston
,
H.
, &
Wells
,
B. P.
(
2020
).
Social media as a factor in personal injury underwriting: Risk, rate and regulation
.
Journal of Insurance Regulation
,
39
(
1
). doi: .
Yang
,
Y.
, &
Huang
,
Y.
(
2019
).
Dumping the closet skeletons online: Exploring the guilty information disclosure behavior on social media
. In
Proceedings of the International AAAI Conference on Web and Social Media
(Vol. 
13
, pp. 
663
666
). doi: .
Zhang
,
Z.
,
Zhao
,
W.
,
Yang
,
J.
,
Paris
,
C.
, &
Nepal
,
S.
(
2019
).
Learning influence probabilities and modelling influence diffusion in Twitter
. In
Companion Proceedings of the 2019 World Wide Web Conference
(pp. 
1087
1094
). doi: .
Zhang
,
M.
,
Beltran
,
F.
, &
Liu
,
J.
(
2020a
).
Incentive mechanism for social network data pricing under privacy preservation
. In
Proceedings of the 2nd ACM International Symposium on Blockchain and Secure Critical Infrastructure
(pp. 
85
95
). doi: .
Zhang
,
X.
, &
Choi
,
J.
(
2022
).
The importance of social influencer-generated contents for user cognition and emotional attachment: An information relevance perspective
.
Sustainability
,
14
(
11
),
6676
. doi: .
Zhang
,
Z.
,
Jiménez
,
F. R.
, &
Cicala
,
J. E.
(
2020b
).
Fear of missing out scale: A self‐ concept perspective
.
Psychology and Marketing
,
37
(
11
),
1619
1634
. doi: .
Gadek
,
G.
(
2019
).
From community detection to topical, interactive group detection in online social networks
. In
Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence – Companion Volume
(pp. 
176
183
). doi: .
Keeping safe on social media – U.S. department of defense
(
2018
).
Available from:
 https://media.defense.gov/2021/Sep/16/2002855950/-1/-1/0/CSI_KEEPING_SAFE_ON_SOCIAL_MEDIA_20210806.PDF (
accessed
 12 April 2023).
Leonardi
,
S.
,
Monti
,
D.
,
Rizzo
,
G.
, &
Morisio
,
M.
(
2020
).
Mining micro-influencers from social media posts
. In
Proceedings of the 35th Annual ACM Symposium on Applied Computing
(pp. 
867
874
). doi: .
Liu
,
Z.
,
Zhu
,
Z.
,
Gao
,
J.
, &
Xu
,
C.
(
2021
).
Forecast methods for time series data: A survey
.
IEEE Access
,
9
,
91896
91912
. doi: .
Lou
,
C.
, &
Kim
,
H. K.
(
2019
).
Fancying the new rich and famous? Explicating the roles of influencer content, credibility, and parental mediation in adolescents’ parasocial relationship, materialism, and purchase intentions
.
Frontiers in Psychology
,
10
, 2567. doi: .
Pirina
,
I.
, &
Çöltekin
,
Ç.
(
2018
).
Identifying depression on reddit: The effect of training data
. In
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task
(pp. 
9
12
). doi: .
Robards
,
B.
(
2018
).
Belonging and neo-tribalism on social media site Reddit
.
Neo-Tribes
,
187
206
. doi: .
Seabold
,
S.
, &
Perktold
,
J.
(
2010
).
Statsmodels: Econometric and statistical modeling with python
. In
Proceedings of the Python in Science Conference
(pp. 
92
96
). doi: .
Turel
,
O.
, &
Bechara
,
A.
(
2021
).
A triple-system neural model of maladaptive consumption
.
Journal of the Association for Consumer Research
,
6
(
3
),
324
333
. doi: .
Published in Digital Transformation and Society. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at Link to the terms of the CC BY 4.0 licence.

or Create an Account

Close Modal
Close Modal