This study aims to address the problems of data provenance, trust management and reliable knowledge inference in Web-based biomedical information systems by developing a blockchain-enabled heterogeneous graph learning framework for prostate cancer driver gene prediction.
A framework named BE-HGAT is proposed to integrate blockchain-based provenance verification with trust-aware heterogeneous graph attention learning. Processed gene expression profiles and protein−protein interaction metadata are hashed, organized through Merkle trees and anchored to an Ethereum smart contract, while raw biomedical data remain off-chain. The resulting verification signals are combined with empirical interaction confidence scores to construct trust-weighted edges in a heterogeneous information network. Node-level and semantic-level attention mechanisms are then used to learn representations across genes, samples and disease-related semantic paths.
Experiments on publicly available prostate cancer data sets show that BE-HGAT outperforms representative graph learning baselines in Accuracy, Precision, Recall, F1-score and ROC-AUC. The model achieves an average F1-score of 0.777 and a ROC-AUC of 0.938, exceeding the strongest baseline by approximately 0.075 in F1-score and 0.04 in ROC-AUC. Robustness and ablation analyses further indicate that heterogeneous semantic attention provides the main performance gain, while blockchain-derived trust signals improve stability under noisy and perturbed graph conditions.
This study links decentralized data provenance with heterogeneous graph-based biomedical inference. Rather than using blockchain only as a storage or audit mechanism, BE-HGAT converts cryptographic verification outcomes into trust-aware graph signals, enabling more traceable, robust and reliability-aware driver gene prediction in distributed biomedical information systems.
