Table 2

A summary of semi-supervised learning methods for Action Recognition. The column “Performance” presents the top-1 accuracy of the best model in each method. The percent (%) after each dataset denotes the percent of labeled data used for training. * denotes that these methods were re-implement for video domain by [46].

MethodDescriptionNetworkPerformanceCode
VideoSSL [46]Utilizing a pre-trained network on ImageNet to guide the training of the 3D CNN.3D ResNet-18

47.6 (Kinetics100 - 5%)

32.4 (UCF101 - 5%)

32.7 (HMDB51 - 40%)

None
TCL [95]Proposing two types of loss including Maximize Instance Agreement and Maximize Group Agreement.TSM ResNet-18

29.81 (SS-V2 - 5%)

30.28 (Kinetics400 - 5%)

93.29 (Jester - 5%)

Link
FitMach* [97]The pseudo-labels from weakly-augmented data are utilized to guide the training for a strongly-augmented version of the same data.3D ResNet-18

40.5 (Kinetics100 - 5%)

27.1 (UCF101 - 5%)

32.9 (HMDB51 - 40%)

None
S4L* [121]The combination of the self-supervised and semi-supervised learning method.3D ResNet-18

33.0 (Kinetics100 - 5%)

22.7 (UCF101 - 5%)

29.8 (HMDB51 - 40%)

None
MT* [6]Calculating the average of model weights over training steps that helps to generate a more robust model compared to using the final weights.3D ResNet-18

27.8 (Kinetics100 - 5%)

17.5 (UCF101 - 5%)

27.2 (HMDB51 - 40%)

None
PL* [62]The prediction from a sample is reused to guide itself.3D ResNet-18

27.8 (Kinetics100 - 5%)

17.6 (UCF101 - 5%)

27.3 (HMDB51 - 40%)

None

or Create an Account

Close Modal
Close Modal