A summary of common small-scale datasets from 2011 to now used for action recognition.
| Dataset | Description | #classes | Samples | Download |
|---|---|---|---|---|
| HMDB51 [59] | - At least 1s / video. - Single activity / video. | 51 | 6,849 | Link |
| UCF50 [88] | - Realistic videos from Youtube. - Single activity / video. | 50 | 6,676 | Link |
| UCF101 [98] | - At least 1.06s/video. - Single activity / video. | 101 | 13,320 | Link |
| ActivityNet [9] | - Large-scale video. - 1.41 activity instance / video. | 203 | 27,811 | Link |
| Hollywood2 [72] | - 19.7s/video on average action videos and scene videos. | 22 | 3,669 | Link |
| MSR-Action3D [64] | An action dataset of depth sequences captured by a depth camera. | 20 | — | Link |
| MSR-Daily Activity 3D [112] | - A daily activity dataset captured by a Kinect device camera. - An activity is performed in either “sitting on sofa” or “standing” pose. | 12 | 320 | Link |
| ASLAN [55] | - Focus on action similarity. | 432 | 3,697 | Link |
| RGBD-HuDaAct [80] | - Synchronized color-depth video streams 30s-150s/video. | 16 | 1,189 | Link |
| Charades [93] | - Video action classification performance 6.8 actions/video. | 157 | 9,848 | Link |
| Dataset | Description | #classes | Samples | Download |
|---|---|---|---|---|
| HMDB51 [ | - At least 1s / video. - Single activity / video. | 51 | 6,849 | Link |
| UCF50 [ | - Realistic videos from Youtube. - Single activity / video. | 50 | 6,676 | Link |
| UCF101 [ | - At least 1.06s/video. - Single activity / video. | 101 | 13,320 | Link |
| ActivityNet [ | - Large-scale video. - 1.41 activity instance / video. | 203 | 27,811 | Link |
| Hollywood2 [ | - 19.7s/video on average action videos and scene videos. | 22 | 3,669 | Link |
| MSR-Action3D [ | An action dataset of depth sequences captured by a depth camera. | 20 | — | Link |
| MSR-Daily Activity 3D [ | - A daily activity dataset captured by a Kinect device camera. - An activity is performed in either “sitting on sofa” or “standing” pose. | 12 | 320 | Link |
| ASLAN [ | - Focus on action similarity. | 432 | 3,697 | Link |
| RGBD-HuDaAct [ | - Synchronized color-depth video streams 30s-150s/video. | 16 | 1,189 | Link |
| Charades [ | - Video action classification performance 6.8 actions/video. | 157 | 9,848 | Link |