This study examines the impact of data bias and algorithmic discrimination on individual credit accessibility in China’s financial system. It aims to align financial inclusion and equity goals with statistical fairness conditions by constructing fairness metrics from multiple dimensions. The paper evaluates the fairness of commonly used credit evaluation models and proposes a novel approach to eliminate data bias in historical datasets.
We model credit evaluation using Logistic Regression, Random Forest, and XGBoost algorithms, focusing on education level and work experience as sensitive attributes. To mitigate data bias in historical datasets, we employ the Metropolis-Hastings (M-H) algorithm for data preprocessing.
(1) Machine learning models like Random Forest and XGBoost outperform traditional methods in addressing unfairness arising from multiple sensitive attributes. (2) Sensitive attributes, while excluded from credit scoring models, may indirectly influence outcomes through other indicators. Limiting the gap in credit accessibility between the general population and protected groups is essential for fairness of opportunity. (3) Data bias significantly affects credit ratings, increasing the false positive rate for certain demographic subgroups and reducing their credit accessibility.
The study provides a micro-level examination of individual credit accessibility and fairness in China. It analyzes the fairness of credit evaluation models used by Chinese financial institutions across different population groups and proposes an M-H algorithm–based method to eliminate data bias in historical datasets.
This paper enhances research on fairness in individual credit accessibility in China by introducing three fairness metrics for evaluating credit evaluation models. It offers a micro-level perspective for scholars studying related issues.
