The increasing popularity of social media increases the risk of user data privacy leakage. This study introduces a method called BERT-based Privacy Risk Assessment and Probability Calculation (B-PRAPC) to help users identify and quantify the risk of privacy leakage on social networks.
From the user’s perspective, this study focuses on the generation and storage stage of data life cycle, builds a privacy lexicon and a risk identification model of privacy leakage, and calculates the privacy leakage probability by combining the frequency of risk-related words and the amount of privacy contained in the user’s text.
Compared to baseline models, B-PRAPC achieves the highest accuracy (0.9264) and F1 score (0.9253) in identifying the risk of users’ text privacy leakage. The results show that personal location, medical, identity and work education information are more prone to disclosure, while users demonstrate a strong awareness of protecting their personal property, network identity and health-related privacy.
These findings highlight the effectiveness of B-PRAPC in enhancing user privacy protection on social media, and provide insights for social media platforms and users on how to protect the privacy of personal data.
