An overview of human-machine collaborative video compression methods in literature. MBID, MBHD, and SBMD respectively represent multi-bitstream independent decoding, multi-bitstream hierarchical decoding, and single-bitstream multi-head decoding.
| Category | Author | Presented Task | Core Method |
|---|---|---|---|
| MBID | [50] | Video Retrieval | Feature extrAction + CDVS + CNN |
| [211] | Video Retrieval | Rate-accuracy optimization + affine motion compensation | |
| [10] | Class Identification, Object Recognition | Comprising Multiple autoencoders | |
| MBHD | [197] | Action Recognition | Conditional deep generation network |
| [82] | Action Recognition | Semantic information + feature Laddering Framework | |
| [114] | Object Detection | Conditional semantic compression + interlayer frame prediction | |
| [64] | Object Detection | End-to-end learnable video codec + conditional coding | |
| [39] | Object Detection | Conventional + DNN video compression | |
| [85] | Action Recognition | Learned semantic representation + end-to-end optimize | |
| [93] | Object Detection, Pose Estimation, Action Recognition, Object Segmentation | Static Object characteristic + dynamic motion clue | |
| [170] | Action Recognition, Multiple Object Tracking, Object Segmentation | Traditional codec + DNN | |
| [171] | Action Recognition, Multiple Object Tracking, Object Segmentation | Semantic-Mining-then-Compensation + masked image modeling | |
| [4] | Object Detection | Cuboidal feature descriptor | |
| SBMD | [207] | Action Recognition | Task-driven optimization |
| [160] | Action Recognition, Object Detection, Object Tracking, Object Segmentation | Temporal context + cross-domain motion |
| Category | Author | Presented Task | Core Method |
|---|---|---|---|
| MBID | [ | Video Retrieval | Feature extrAction + CDVS + CNN |
| [ | Video Retrieval | Rate-accuracy optimization + affine motion compensation | |
| [ | Class Identification, Object Recognition | Comprising Multiple autoencoders | |
| MBHD | [ | Action Recognition | Conditional deep generation network |
| [ | Action Recognition | Semantic information + feature Laddering Framework | |
| [ | Object Detection | Conditional semantic compression + interlayer frame prediction | |
| [ | Object Detection | End-to-end learnable video codec + conditional coding | |
| [ | Object Detection | Conventional + DNN video compression | |
| [ | Action Recognition | Learned semantic representation + end-to-end optimize | |
| [ | Object Detection, Pose Estimation, Action Recognition, Object Segmentation | Static Object characteristic + dynamic motion clue | |
| [ | Action Recognition, Multiple Object Tracking, Object Segmentation | Traditional codec + DNN | |
| [ | Action Recognition, Multiple Object Tracking, Object Segmentation | Semantic-Mining-then-Compensation + masked image modeling | |
| [ | Object Detection | Cuboidal feature descriptor | |
| SBMD | [ | Action Recognition | Task-driven optimization |
| [ | Action Recognition, Object Detection, Object Tracking, Object Segmentation | Temporal context + cross-domain motion |
Sharing content requires targeting cookies to be enabled. Please update your cookie preferences to use this feature.