Human-Machine Collaborative Image and Video Compression: A Survey

[2]

Agustsson

Mentzer

Tschannen

Cavigelli

Timofte

Benini

, and

L. V.

Gool

, “

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

”, in

Proceedings of Advances in Neural Information Processing Systems

2017

1141

–

1151

[3]

Agustsson

Tschannen

Mentzer

Timofte

, and

L. V.

Gool

, “

Generative adversarial networks for extreme learned image compression

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

221

–

231

[4]

Ahmmed

Paul

Murshed

, and

Taubman

, “

Human-machine collaborative video coding through cuboidal partitioning

”, in

2021 IEEE International Conference on Image Processing

2021

2074

–

2078

[5]

Akbari

Liang

, and

Han

, “

DSSLIC: Deep semantic segmentation-based layered image compression

”, in

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing

2019

2042

–

2046

[6]

Akyazi

and

Ebrahimi

, “

Learning-based image compression using convolutional autoencoder and wavelet decomposition

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops

2019

[7]

Alexandre

C.-P.

Chang

W.-H.

Peng

, and

H.-M.

Hang

, “

An autoencoder-based learned image compressor: Description of challenge proposal by nctu

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

2018

2539

–

2542

[8]

S. R.

Alvar

and

I. V.

Bajić

, “

Multi-task learning with compressible features for collaborative intelligence

”, in

Proceedings of the IEEE International Conference on Image Processing

2019

1705

–

1709

[9]

S. R.

Alvar

Choi

, and

I. V.

Bajic

, “

Can you tell a face from a HEVC bitstream?

”, in

2018 IEEE Conference on Multimedia Information Processing and Retrieval

2018

257

–

261

[10]

Antonio

Faria

L. M.

Tavora

Navarro

, and

Assuncao

, “

Learning-based compression of visual objects for smart surveillance

”, in

2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)

2022

–

[11]

Ascenso

Alshina

, and

Ebrahimi

, “

The jpeg ai standard: Providing efficient human and machine visual data consumption

”, in

IEEE Multimedia

, Vol.

, No.

2023

100

–

111

[12]

Ayzik

and

Avidan

, “

Deep image compression using decoder side information

”, in

Proceedings of Computer Vision-ECCV 2020: 16th European Conference

2020

699

–

714

[13]

Bai

and

Urtasun

, “

Deep watershed transform for instance segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5221

–

5229

[14]

Bai

Yang

Liu

Jiang

Wang

, and

Gao

, “

Towards end-to-end image compression and analysis with transformers

”, in

Proceedings of the AAAI conference on artificial intelligence

, Vol.

, No.

2022

104

–

112

[15]

Ballé

Laparra

, and

E. P.

Simoncelli

, “

End-to-end optimized image compression

”, in

arXiv preprint arXiv:1611.01704

2016

[16]

Bansal

Kumar

Sachdeva

, and

Mittal

, “

Transfer learning for image classification using VGG19: Caltech-101 image data set

”, in

Journal ofambient intelligence and humanized computing

2023

–

[17]

Báscones

González

, and

Mozos

, “

Hyperspectral image compression using vector quantization, PCA and JPEG2000

”, in

Remote Sensing

, Vol.

, No.

2018

907

[18]

Bay

Tuytelaars

, and

L. V.

Gool

, “

SURF: Speeded up robust features

”, in

Computer Vision—ECCV

2006

404

–

417

[19]

Bolya

Zhou

Xiao

, and

Y. J.

Lee

, “

Yolact: Real-time instance segmentation

”, in

Proceedings of the IEEE international conference on computer vision

2019

9157

–

9166

[20]

Bossen

Bross

Suhring

, and

Flynn

, “

HEVC complexity and implementation analysis

”, in

IEEE Transactions on circuits and Systems for Video Technology

, Vol.

, No.

2012

1685

–

1696

[21]

Bross

Y.-K.

Wang

Liu

Chen

G. J.

Sullivan

, and

J.-R.

Ohm

, “

Overview of the versatile video coding (VVC) standard and its applications

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2021

3736

–

3764

[22]

Buades

Coll

, and

J.-M.

Morel

, “

A non-local algorithm for image denoising

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

, Vol.

2005

–

[23]

Cai

Cao

, and

Zhang

, “

Learning a single tucker decomposition network for lossy image compression with multiple bits-per-pixel rates

”, in

IEEE Transactions on Image Processing

, Vol.

2020

3612

–

3625

[24]

Cao

Yao

Zhang

Jin

Zhang

, and

B. W.-K.

Ling

, “

Slimmable multi-task image compression for human and machine vision

”, in

IEEE Access

, Vol.

2023

29946

–

[25]

H.-S.

Chang

Learned-Miller

, and

McCallum

, “

Active bias: Training more accurate neural networks by emphasizing high variance samples

”, in

Advances in Neural Information Processing Systems

2017

1002

–

1012

[26]

Y.-H.

Chen

Y.-C.

Weng

C.-H.

Kao

Chien

W.-C.

Chiu

, and

W.-H.

Peng

, “

Transtic: Transferring transformer-based image compression from human perception to machine perception

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

23297307

[27]

Chen

Yao

Liu

, and

Zhao

, “

An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation

”, in

Chinese Conference on Pattern Recognition and Computer Vision

2021

623

–

635

[28]

L.-C.

Chen

Papandreou

Kokkinos

Murphy

, and

A. L.

Yuille

, “

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2017

834

–

848

[29]

L.-C.

Chen

Zhu

Papandreou

Schroff

, and

Adam

, “

Encoderdecoder with atrous separable convolution for semantic image segmentation

”, in

Proceedings of the European Conference on Computer Vision

2018

801

–

818

[30]

Chen

Jin

Meng

Lin

Chen

T.-S.

Chang

, and

Zhang

, “

A new image codec paradigm for human and machine uses

”, in

arXiv preprint arXiv:2112.10071

2021

[31]

Chen

Liu

Shen

Cao

, and

Wang

, “

End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling

”, in

IEEE Transactions on Image Processing

, Vol.

2021

3179

–

3191

[32]

Chen

Murherjee

Han

Grange

Liu

Parker

Chen

Joshi

, et al., “

An overview of core coding tools in the AV1 video codec

”, in

2018 picture coding symposium

2018

–

[33]

Chen

L.-Y.

Duan

Wang

Lin

, and

A. C.

Kot

, “

Data representation in hybrid coding framework for feature maps compression

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

3094

–

3098

[34]

Chen

Fan

Wang

L.-Y.

Duan

Lin

, and

Kot

, “

Lossy intermediate deep learning feature compression and evaluation

”, in

ACM Trans. Multimedia

2019

2414

–

2422

[35]

Cheng

Sun

Takeuchi

, and

Katto

, “

Deep Convolutional AutoEncoder-based Lossy Image Compression

”, in

Proceedings of 2018 Picture Coding Symposium

2018

253

–

257

[36]

Cheng

Sun

Takeuchi

, and

Katto

, “

Learned image compression with discretized gaussian mixture likelihoods and attention modules

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

7939

–

7948

[37]

Choi

and

I. V.

Bajić

, “

Latent-space scalability for multi-task collaborative intelligence

”, in

2021 IEEE International Conference on Image Processing

2021

3562

–

3566

[38]

Choi

and

I. V.

Bajić

, “

Scalable image coding for humans and machines

”, in

IEEE Transactions on Image Processing

, Vol.

2022

273954

[39]

Choi

and

I. V.

Bajić

, “

Scalable video coding for humans and machines

”, in

2022 IEEE 2Ąth International Workshop on Multimedia Signal Processing

2022

–

[40]

Choi

El-Khamy

, and

Lee

, “

Variable rate deep image compression with a conditional autoencoder

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

3146

–

3154

[41]

Cordts

Omran

Ramos

Rehfeld

Enzweiler

Benenson

Franke

Roth

, and

Schiele

, “

The cityscapes dataset for semantic urban scene understanding

”, in

Proceedings of the IEEE conference on computer vision and pattern recognition

2016

3213

–

3223

[42]

Covell

Johnston

Minnen

S. J.

Hwang

Shor

Singh

Vincent

, and

Toderici

, “

Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

”, in

arXiv preprint arXiv:1705.06687

2017

[43]

Cui

Wang

Gao

Guo

Feng

, and

Bai

, “

Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2021

10532

–

[44]

Dai

, and

Sun

, “

R-FCN: Object detection via region-based fully convolutional networks

”, in

Proceedings of the Advances in Neural Information Processing Systems

2016

379

–

387

[45]

Dendorfer

Rezatofighi

Milan

Shi

Cremers

Reid

Roth

Schindler

, and

Leal-Taixé

, “

Mot20: A benchmark for multi object tracking in crowded scenes

”, in

arXiv preprint arXiv:2003.09003

2020

[46]

Deng

Dong

Socher

L.-J.

, and

Fei-Fei

, “

Imagenet: A large-scale hierarchical image database

”, in

2009 IEEE conference on computer vision and pattern recognition

2009

248

–

255

[47]

Ding

Wang

Fan

, and

Gong

, “

A semi-supervised two-stage approach to learning from noisy labels

”, in

2018 IEEE Winter Conference on Applications of Computer Vision

2018

1215

–

1224

[48]

Dong

and

W. D.

Pan

, “

A survey on compression domain image and video data processing and analysis techniques

”, in

Information

, Vol.

, No.

2023

184

[49]

L.-Y.

Duan

Chandrasekhar

Chen

Lin

Wang

Huang

Girod

, and

Gao

, “

Overview of the MPEG-CDVS standard

”, in

IEEE Transactions on Image Processing

, Vol.

, No.

2015

179

–

194

[50]

L.-Y.

Duan

Lou

Bai

Huang

Gao

Chandrasekhar

Lin

Wang

, and

A. C.

Kot

, “

Compact descriptors for video analysis: The emerging MPEG standard

”, in

IEEE MultiMedia

, Vol.

, No.

2018

–

[51]

Duan

Liu

Yang

Huang

, and

Gao

, “

Video coding for machines: A paradigm of collaborative compression and intelligent analytics

”, in

IEEE Transactions on Image Processing

, Vol.

2020

8680

–

8695

[52]

Dumas

Roumy

, and

Guillemot

, “

Autoencoder Based Image Compression: Can the Learning be Quantization Independent?

”, in

Proceedings of2018 IEEE International Conference on Acoustics, Speech and Signal Processing

2018

1188

–

1192

[53]

Dumas

Roumy

, and

Guillemot

, “

Image compression with Stochastic Winner-Take-All Auto-Encoder

”, in

Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing

2017

1512

–

1516

[54]

Fang

Duan

Tao

, and

, “

Sketch assisted face image coding for human and machine vision: a joint training approach

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[55]

Fang

Shen

Wang

, and

Jin

, “

Priors guided extreme underwater image compression for machine vision and human vision

”, in

IEEE Journal of Oceanic Engineering

2023

[56]

Feng

Gao

Jin

Feng

, and

Chen

, “

Semantically structured image compression via irregular group-based decoupling

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

17237

–

[57]

C.-Y.

Liu

Ranga

Tyagi

, and

A. C.

Berg

, “

DSSD: Decon-volutional Single Shot Detector

”, in

arXiv preprint arXiv:1701.06659

2017

[58]

Galteri

Seidenari

Bertini

, and

A. D.

Bimbo

, “

Deep Generative Adversarial Compression Artifact Removal

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

4826

–

4835

[59]

Gao

Liu

, and

, “

Towards task-generic image compression: A study of semantics-oriented metrics

”, in

IEEE Transactions on Multimedia

, Vol.

2021

721

–

735

[60]

Gregor

Besse

D. J.

Rezende

Danihelka

, and

Wierstra

, “

Towards Conceptual Compression

”, in

Proceedings of Advances In Neural Information Processing Systems

2016

3549

–

3557

[61]

Guo

Huang

Zhang

Zhuang

Dong

M. R.

Scott

, and

Huang

, “

Curriculumnet: Weakly supervised learning from large-scale web images

”, in

Proceedings of the European Conference on Computer Vision

2018

135

–

150

[62]

Guo

Chen

Zhao

Zhang

, and

Duan

, “

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

1431

–

1442

[63]

Guo

Zhang

Feng

, and

Chen

, “

Causal Contextual Prediction for Learned Image Compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2021

[64]

Hadizadeh

and

I. V.

Bajić

, “

Learned Scalable Video Coding For Humans and Machines

”, in

arXiv preprint arXiv:2307.08978

2023

[65]

Han

I. W.

Tsang

Chen

P. Y.

Celina

, and

S.-F.

Fung

, “

Progressive stochastic learning for noisy labels

”, in

IEEE transactions on neural networks and learning systems

, No.

2018

–

[66]

Han

Yao

Niu

Tsang

, and

Sugiyama

, “

Co-teaching: Robust training of deep neural networks with extremely noisy labels

”, in

Advances in Neural Information Processing Systems

2018

8527

–

8537

[67]

Han

Zhang

Cheng

Liu

, and

, “

Advanced deep-learning techniques for salient and category-specific object detection: A survey

”, in

IEEE Signal Processing Magazine

, Vol.

, No.

2018

–

100

[68]

Hayder

, and

Salzmann

, “

Boundary-aware instance segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5696

–

5704

[69]

Gkioxari

Dollar

, and

Girshick

, “

Mask R-CNN

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

2961

–

2969

[70]

Zhang

Ren

, and

Sun

, “

Deep residual learning for image recognition

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2016

770

–

778

[71]

Zhang

Lin

, and

Han

, “

Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network

”, in

Proceedings of 2018 25th IEEE International Conference on Image Processing

2018

216

–

220

[72]

T. M.

Hoang

Zhou

, and

Fan

, “

Image compression with encoder-decoder matched semantic segmentation

”, in

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

2020

160

–

161

[73]

Hong

Zhang

, and

Huang

, “

Fine-grained feature generation for generalized zero-shot video classification

”, in

IEEE Transactions on Image Processing

, Vol.

2023

1599

–

1612

[74]

Hong

Zhang

, and

Huang

, “

Multi-modal multi-grained embedding learning for generalized zero-shot video classification

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[75]

Hore

and

Ziou

, “

Image quality metrics: PSNR vs. SSIM

”, in

2010 20th international conference on pattern recognition

2010

2366

–

2369

[76]

Hou

Zheng

, and

Gould

, “

Learning to structure an image with few colors

”, in

Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition

2020

10113

–

[77]

Yang

, and

Liu

, “

Progressive Spatial Recurrent Neural Network for Intra Prediction

”, in

IEEE Transactions on Multimedia

, Vol.

, No.

2019

3024

–

3037

[78]

Yang

, and

Liu

, “

Coarse-to-fine hyper-prior modeling for learned image compression

”, in

Proceedings of the AAAI Conference on Artificial Intelligence

, Vol.

, No. 0

2020

11013

–

[79]

Yang

, and

Liu

, “

Learning end-to-end lossy image compression: A benchmark

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[80]

Yang

L.-Y.

Duan

, and

Liu

, “

Towards coding for human and machine vision: A scalable image coding approach

”, in

2020 IEEE International Conference on Multimedia and Expo

2020

–

[81]

Huang

Liu

L. V. D.

Maaten

, and

K. Q.

Weinberger

, “

Densely connected convolutional networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

4700

–

4708

[82]

Huang

Yang

Xiang

Liu

, and

L.-Y.

Duan

, “

Collaborative scalable visual compression for human-centered videos

”, in

2022 IEEE International Symposium on Circuits and Systems

2022

2988

–

2992

[83]

Huang

et al., “

Speed/accuracy trade-offs for modern convolutional object detectors

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

7310

–

7311

[84]

Huang

Jia

, and

Zhao

, “

O2u-net: A simple noisy label detection approach for deep neural networks

”, in

Proceedings of the IEEE International Conference on Computer Vision

2019

3326

–

3334

[85]

Huang

Jia

Wang

, and

, “

Hmfvc: A human-machine friendly video compression scheme

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2022

[86]

Ikusan

and

Dai

, “

Deep Feature Compression with Rate-Distortion Optimization for Networked Camera Systems

”, in

Proceedings of the 14th Conference on ACM Multimedia Systems

2023

–

[87]

D. J.

C. D.

Kim

Jiang

, and

Memisevic

, “

Generating images with recurrent adversarial networks

”, in

CoRR, vol. abs/1602.05110

2016

[88]

Islam

L. M.

Dang

Lee

, and

Moon

, “

Image Compression With Recurrent Neural Network and Generalized Divisive Normalization

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2021

1875

–

1879

[89]

Jamil

M. J.

Piran

Rahman

, and

O.-J.

Kwon

, “

Learning-driven lossy image compression: A comprehensive survey

”, in

Engineering Applications of Artificial Intelligence

, Vol.

123

2023

106361

[90]

Jégou

Douze

Schmid

, and

Pérez

, “

Aggregating local descriptors into a compact image representation

”, in

Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition

2010

3304

–

3311

[91]

Jiang

Wang

Chen

, and

Liu

, “

Hyperspectral image classification in the presence of noisy labels

”, in

IEEE Transactions on Geoscience and Remote Sensing

, Vol.

, No.

2019

851

–

865

[92]

Jiang

Zhou

Leung

L.-J.

, and

Fei-Fei

, “

Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels

”, in

International Conference on Machine Learning

2018

2304

–

2313

[93]

Jin

Feng

Sun

Feng

, and

Chen

, “

Semantical video coding: Instill static-dynamic clues into structured bitstream for ai tasks

”, in

Journal of Visual Communication and Image Representation

, Vol.

2023

103816

[94]

Johnston

Vincent

Minnen

Covell

Singh

Chinen

S. J.

Hwang

Shor

, and

Toderici

, “

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

4385

–

4393

[95]

Kang

Tripathi

, and

T. Q.

Nguyen

, “

Toward Joint Image Generation and Compression using Generative Adversarial Networks

”, in

arXiv preprint arXiv:1901.07838

2019

[96]

A. C.

Karaca

and

M. K.

Güllü

, “

Target preserving hyperspectral image compression using weighted PCA and JPEG2000

”, in

Proceedings of International Conference on Image and Signal Processing

2018

508

–

516

[97]

Khalaf

Zaghar

, and

Hashim

, “

Enhancement of Curve-Fitting Image Compression Using Hyperbolic Function

”, in

Symmetry

, Vol.

2019

291

[98]

Kodak

, “

Kodak Lossless True Color Image Suite (PhotoCD PCD0992)

”,

1992

[99]

Kong

, and

Zhao

, “

Spectral-Spatial Feature Partitioned Extraction Based on CNN for Multispectral Image Compression

”, in

Remote Sensing

, Vol.

, No.

2020

[100]

Kong

Sun

Yao

Liu

, and

Chen

, “

RON: Reverse connection with objectness prior networks for object detection

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5936

–

5944

[101]

A. B. L.

Larsen

S. K.

Sønderby

Larochelle

, and

Winther

, “

Autoen-coding beyond pixels using a learned similarity metric

”, in

Proceedings of The 33rd International Conference on Machine Learning

, Vol.

2016

1558

–

1566

[102]

Lee

Cho

, and

Kim

, “

An end-to-end joint learning scheme of image compression and quality enhancement with improved entropy minimization

”, in

arXiv preprint arXiv:1912.12817

2020

[103]

Lee

and

Park

, “

Centermask: Real-time anchor-free instance segmentation

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

13906

–

[104]

Lei

Duan

Hong

J. F.

Mota

Shi

, and

C.-X.

Wang

, “

Progressive deep Image compression for hybrid contexts of image classification and reconstruction

”, in

IEEE Journal on Selected Areas in Communications

, Vol.

, No.

2022

–

[105]

Lei

Hong

Shi

Lin

, and

Xia

, “

Quantization-Based Adaptive Deep Image Compression Using Semantic Information

”, in

IEEE Access

2023

[106]

“

SSD: Single shot multibox detector

”, in

Computer Vision-ECCV

, ed.

Leibe

Matas

Sebe

, and

Welling

2016

–

[107]

and

Liu

, “

Multispectral transforms using convolution neural networks for remote sensing multispectral image compression

”, in

Remote Sensing

, Vol.

, No.

2019

759

[108]

Zhang

Zuo

Timofte

, and

Zhang

, “

Learning Context-Based Nonlocal Entropy Modeling for Image Compression

”, in

IEEE Transactions on Neural Networks and Learning Systems

2021

[109]

Sun

Zhao

Yuan

, and

Liu

, “

Deep Image Compression with Residual Learning

”, in

Applied Sciences

, Vol.

, No.

2020

4023

[110]

Chen

Dou

C.-W.

, and

P.-A.

Heng

, “

HDenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes

”, in

IEEE Transactions on Medical Imaging

, Vol.

, No.

2018

2663

–

2674

[111]

Jia

Wang

Zhang

Wang

, and

Gao

, “

Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images

”, in

2018 IEEE fourth international conference on multimedia big data (BigMM)

2018

–

[112]

Peng

Zhang

Deng

, and

Sun

, “

DetNet: A backbone network for object detection

”, in

arXiv preprint arXiv:1804.06215

2018

[113]

Lin

Milan

Shen

, and

Reid

, “

RefineNet: Multi-path refinement networks for high-resolution semantic segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

1925

–

1934

[114]

Lin

Chen

Zhang

Lin

Wang

, and

Zhao

, “

DeepSVC: Deep scalable video coding for both machine and human vision

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

9205

–

9214

[115]

T.-Y.

Lin

Dollar

Girshick

Hariharan

, and

Belongie

, “

Feature pyramid networks for object detection

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

2117

–

2125

[116]

T.-Y.

Lin

Goyal

Girshick

, and

Dollar

, “

Focal loss for dense object detection

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

2980

–

2988

[117]

T.-Y.

Lin

Maire

Belongie

Hays

Perona

Ramanan

Dollar

, and

C. L.

Zitnick

, “

Microsoft coco: Common objects in context

”, in

Computer Vision-ECCV 2014·’ 13th European Conference

Zurich, Switzerland

September 6-12, 2014

Proceedings, Part V 13, 2014

74055

[118]

Liu

Wen

Fan

C. C.

Loy

, and

T. S.

Huang

, “

Non-local recurrent network for image restoration

”, in

Neural Information Processing Systems

2018

1673

–

1682

[119]

Liu

Chen

Shen

Yue

, and

, “

Deep Image Compression via End-to-End Learning

”, in

Proceedings of Computer Vision Pattern Recognition

, Vol.

2018

[120]

Liu

Sun

, and

Katto

, “

Improving multiple machine vision tasks in the compressed domain

”, in

2022 26th International Conference on Pattern Recognition

2022

331

–

337

[121]

Liu

Yan

, and

, “

Semantics-to-signal scalable image compression with learned revertible representations

”, in

International Journal of Computer Vision

, Vol.

129

, No.

2021

260521

[122]

Liu

Chen

, and

, “

Icmh-net: Neural image compression towards both machine vision and human vision

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

8047

–

8056

[123]

Liu

Chen

Liu

Wang

, and

Shen

, “

2C-Net: integrate image compression and classification via deep neural network

”, in

Multimedia Systems

, Vol.

, No.

2023

945

–

959

[124]

Liu

et al., “

Receptive field block net for accurate and fast object detection

”, in

Proceedings of the European Conference on Computer Vision

2018

385

–

400

[125]

D. G.

Lowe

, “

Distinctive image features from scale-invariant keypoints

”, in

Int. J. Comput. Vis.

Vol.

, No.

2004

–

110

[126]

Luengo

S.-O.

Shim

Alshomrani

Altalhi

, and

Herrera

, “

Cnc-nos: Class noise cleaning by ensemble filtering and noise scoring

”, in

Knowledge-Based Systems

, Vol.

140

2018

–

[127]

Lyu

and

I. W.

Tsang

, “

Curriculum Loss: Robust Learning and Generalization against Label Corruption

”, in

arXiv preprint arXiv:1905.10045

2019

[128]

Zhang

Wang

Zhang

Jia

, and

Wang

, “

Joint feature and texture coding: Toward smart video representation via front-end intelligence

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2018

3095

–

3105

[129]

Zhang

Jia

Zhao

Wang

, and

Wang

, “

Image and video compression with neural networks: A review

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2019

1683

–

1698

[130]

Malach

and

Shalev-Shwartz

, “

Decoupling” when to update” from” how to update”

”, in

Advances in Neural Information Processing Systems

2017

960

–

970

[131]

Mao

Wang

Chen

Jin

, and

, “

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

”, in

IEEE Transactions on Image Processing

2023

[132]

Mao

Chen

Wang

, and

, “

Peering into the sketch: Ultra-low bitrate face compression for joint human and machine perception

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

2564

–

2572

[133]

M. W.

Marcellin

M. J.

Gormish

Bilgin

, and

M. P.

Boliek

, “

An overview of JPEG-2000

”, in

Proceedings DCC 2000. Data compression conference

2000

523

–

541

[134]

F. D.

Martino

Perfilieva

, and

Sessa

, “

A Fast Multilevel Fuzzy Transform Image Compression Method

”, in

Axioms

, Vol.

, No.

2019

135

[135]

F. D.

Martino

and

Sessa

, “

Multi-level fuzzy transforms image compression

”, in

Journal of Ambient Intelligence and Humanized Computing

, Vol.

, No.

2019

2745

–

2756

[136]

Mercat

Viitanen

, and

Vanne

, “

UVG dataset: 50/120fps 4K sequences for video codec analysis and development

”, in

Proceedings of the 11th ACM Multimedia Systems Conference

2020

297

–

302

[137]

Minnen

Balle

, and

G. D.

Toderici

, “

Joint autoregressive and hierarchical priors for learned image compression

”, in

Proceedings of Advances in Neural Information Processing Systems

2018

10771

–

[138]

Minnen

and

Singh

, “

Channel-Wise Autoregressive Entropy Models for Learned Image Compression

”, in

Proceedings of 2020 IEEE International Conference on Image Processing

2020

3339

–

3343

[139]

D. T.

Nguyen

C. K.

Mummadi

T. P. N.

Ngo

T. H. P.

Nguyen

Beggel

, and

Brox

, “

SELF: Learning to Filter Noisy Labels with Self-Ensembling

”, in

arXiv preprint arXiv:1910.01842

2019

[140]

D. T.

Nguyen

T.-P.-N.

Ngo

Lou

Klar

Beggel

, and

Brox

, “

Robust Learning Under Label Noise With Iterative Noise-Filtering

”, in

arXiv preprint arXiv:1906.00216

2019

[141]

C. G.

Northcutt

, and

I. L.

Chuang

, “

Learning with confident examples: Rank pruning for robust classification with noisy labels

”, in

Uncertainty in Artificial Intelligence - Proceedings of the 33rd Conference, UAI 2017

2017

[142]

J.-R.

Ohm

G. J.

Sullivan

Schwarz

T. K.

Tan

, and

Wiegand

, “

Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC)

”, in

IEEE Transactions on circuits and systems for video technology

, Vol.

, No.

2012

1669

–

1684

[143]

V. A.

de Oliveira

Chabert

Oberlin

Poulliat

Bruno

Latry

Carlavan

Henrot

Falzon

, and

Camarero

, “

Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression

”, in

Remote Sensing

, Vol.

, No.

2021

447

[144]

Ollivier

, “

Auto-encoders: reconstruction versus compression

”,

2014

[145]

A. G.

Ororbia

Mali

O’Connell

Dreese

Miller

, and

C. L.

Giles

, “

Learned Neural Iterative Decoding for Lossy Image Compression Systems

”, in

Proceedings of 2019 Data Compression Conference

2019

–

[146]

Ouyang

Wang

Zhu

, and

Wang

, “

Chained cascade network for object detection

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

1956

–

1964

[147]

Paszke

Chaurasia

Kim

, and

Culurciello

, “

Enet: A deep neural network architecture for real-time semantic segmentation

”, in

arXiv preprint arXiv:1606.02147

2016

[148]

Perronnin

Liu

Sánchez

, and

Poirier

, “

Large-scale image retrieval with compressed fisher vectors

”, in

Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition

2010

338491

[149]

P. O.

Pinheiro

Collobert

, and

Dollár

, “

Learning to segment object candidates

”, in

Advances in Neural Information Processing Systems

2015

1990

–

1998

[150]

Prakash

Moran

Garber

DiLillo

, and

Storer

, “

Semantic Perceptual Image Compression using Deep Convolution Networks

”, in

Proceedings of2017 Data Compression Conference

2017

250

–

259

[151]

Punnappurath

and

M. S.

Brown

, “

Learning raw image reconstruction-aware deep image compressors

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2020

1013

–

1019

[152]

Purkait

Zhao

, and

Zach

, “

SPP-net: Deep absolute pose regression with synthetic views

”, in

arXiv preprint arXiv:1712.03Ą52

2017

[153]

S. K.

Raman

Ramesh

Naganoor

Dash

Kumaravelu

, and

Lee

, “

CompressNet: Generative Compression at Extremely Low Bitrates

”, in

Proceedings of The IEEE Winter Conference on Applications of Computer Vision

2020

2325

–

2333

[154]

Reed

Lee

Anguelov

Szegedy

Erhan

, and

Rabinovich

, “

Training deep neural networks on noisy labels with bootstrapping

”, in

arXiv preprint arXiv:1Ą12.6596

2014

[155]

Rezatofighi

Tsoi

Gwak

Sadeghian

Reid

, and

Savarese

, “

Generalized intersection over union: A metric and a loss for bounding box regression

”, in

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

2019

658

–

666

[156]

Ronneberger

Fischer

, and

Brox

, “

U-net: Convolutional networks for biomedical image segmentation

”, in

International Conference on Medical Image Computing and Computer-assisted Intervention

2015

234

–

241

[157]

Sento

, “

Image Compression with Auto-encoder Algorithm using Deep Neural Network (DNN)

”, in

Proceedings of2016 Management and Innovation Technology International Conference

2016

[158]

R. R.

Shamir

Duchin

Kim

Sapiro

, and

Harel

, “

Continuous dice coefficient: a method for evaluating probabilistic segmentations

”, in

arXiv preprint arXiv:1906.11031

2019

[159]

Shen

Liu

Y.-G.

Jiang

Chen

, and

Xue

, “

DSOD: Learning deeply supervised object detectors from scratch

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

1919

–

1927

[160]

Sheng

Liu

, and

, “

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2024

[161]

Singh

Abu-El-Haija

Johnston

Ballé

Shrivastava

, and

Toderici

, “

End-to-end learning of compressible features

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

334953

[162]

Sivic

and

Zisserman

, “

Video Google: A text retrieval approach to object matching in videos

”, in

Proceedings of the IEEE International Conference on Computer Vision

2003

1470

–

1477

[163]

Song

Gao

Hanjalic

, and

T. H.

Shen

, “

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

”, in

International Journal of Computer Vision

, Vol.

128

, No.

2020

2243

–

2264

[164]

Soomro

A. R.

Zamir

, and

Shah

, “

UCF101: A dataset of 101 human actions classes from videos in the wild

”, in

arXiv preprint arXiv:1212.0402

2012

[165]

Sun

, and

Chen

, “

Semantic structured image coding framework for multiple intelligent applications

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2020

363142

[166]

Suzuki

Takagi

Hayase

Onishi

, and

Shimizu

, “

Image pre-transformation for recognition-aware image compression

”, in

Proceedings of the IEEE International Conference on Image Processing

2019

268690

[167]

Suzuki

Takagi

Takeda

Tanida

, and

Kimata

, “

Deep feature compression with spatio-temporal arranging for collaborative intelligence

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

3099

–

3103

[168]

Tellez

Litjens

van der Laak

, and

Ciompi

, “

Neural Image Compression for Gigapixel Histopathology Image Analysis

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2021

567

–

578

[169]

Theis

Shi

Cunningham

, and

Huszar

, “

Lossy image compression with compressive autoencoders

”, in

Proceedings of International Conference on Learning Representations

2017

[170]

Tian

Yan

Zhai

Chen

, and

Gao

, “

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2024

[171]

Tian

Zhai

, and

Gao

, “

Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

13610

–

[172]

Toderici

S. M. O’

Malley

S. J.

Hwang

Vincent

Minnen

Baluja

Covell

, and

Sukthankar

, “

Variable Rate Image Compression with Recurrent Neural Networks

”, in

CoRR

, vol.

abs/1511.06085

2016

[173]

Toderici

Vincent

Johnston

S. J.

Hwang

Minnen

Shor

, and

Covell

, “

Full Resolution Image Compression with Recurrent Neural Networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5306

–

5314

[174]

Toderici

Theis

Johnston

Agustsson

Mentzer

Ballé

Shi

, and

Timofte

, “

CLIC 2020: Challenge on Learned Image Compression

”,

Sections 2

2020

[175]

Torfason

Mentzer

Augustsson

Tschannen

Timofte

, and

L. V.

Gool

, “

Towards Image Understanding from Deep Compression Without Decoding

”, in

Proceedings of International Conference on Learning Representations

2018

[176]

Torfason

Mentzer

Agustsson

Tschannen

Timofte

, and

Van Gool

, “

Towards image understanding from deep compression without decoding

”, in

arXiv preprint arXiv:1803.06131

2018

[177]

Zhou

, and

, “

Semantic scalable image compression with cross-layer priors

”, in

Proceedings of the 29th A CM International conference on multimedia

2021

4044

–

4052

[178]

Vicente

Carreira

Agapito

, and

Batista

, “

Reconstructing pascal voc

”, in

Proceedings of the IEEE conference on computer vision and pattern recognition

2014

–

[179]

Visin

Ciccone

Romero

Kastner

Cho

Bengio

Matteucci

, and

Courville

, “

Reseg: A recurrent neural network-based model for semantic segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

2016

–

[180]

G. K.

Wallace

, “

The JPEG still picture compression standard

”, in

Communications of the ACM

, Vol.

, No.

1991

–

[181]

Wang

Han

, and

Wang

, “

An End-to-End Deep Learning Image Compression Framework Based on Semantic Analysis

”, in

Applied Sciences

, Vol.

, No.

2019

3580

[182]

Wang

Chen

Yuan

Liu

Huang

Hou

, and

Cottrell

, “

Understanding convolution for semantic segmentation

”, in

Winter Conference on Applications of Computer Vision

2018

1451

–

1460

[183]

R. J.

Wang

, and

C. X.

Ling

, “

Pelee: A real-time object detection system on mobile devices

”, in

Proceedings of the Advances in Neural Information Processing Systems

2018

1963

–

1972

[184]

Wang

Yang

Zhang

Wang

, and

Gao

, “

Towards analysis-friendly face representation with scalable feature and texture compression

”, in

IEEE Transactions on Multimedia

, Vol.

2021

3169

–

3181

[185]

Wang

Zhang

Wang

, and

Gao

, “

Scalable facial image compression with deep feature reconstruction

”, in

2019 IEEE International Conference on Image Processing

2019

2691

–

2695

[186]

Wang

, and

, “

End-to-end compression towards machine vision: Network architecture design and optimization

”, in

IEEE Open Journal of Circuits and Systems

, Vol.

2021

675

–

685

[187]

Wang

Girshick

Gupta

, and

, “

Non-local neural networks

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

2018

7794

–

7803

[188]

Wang

and

A. C.

Bovik

, “

A universal image quality index

”, in

IEEE signal processing letters

, Vol.

, No.

2002

–

[189]

Wang

E. P.

Simoncelli

, and

A. C.

Bovik

, “

Multiscale structural similarity for image quality assessment

”, in

The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003

, Vol.

2003

1398402

[190]

Wang

, and

P. C.

Cosman

, “

Human-machine interaction-oriented image coding for resource-constrained visual monitoring in IoT

”, in

IEEE Internet of Things Journal

, Vol.

, No.

2022

1618195

[191]

Wiegand

G. J.

Sullivan

Bjontegaard

, and

Luthra

, “

Overview of the H. 264/AVC video coding standard

”, in

IEEE Transactions on circuits and systems for video technology

, Vol.

, No.

2003

560

–

576

[192]

Huang

, and

Shen

, “

A GAN-based tunable image compression system

”, in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2020

2334

–

2342

[193]

Sun

, and

Tan

, “

A light CNN for deep face representation with noisy labels

”, in

IEEE Transactions on Information Forensics and Security

, Vol.

, No.

2018

2884

–

2896

[194]

Zhang

Jin

, and

Chen

, “

Learned Block-based Hybrid Image Compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2021

[195]

Yang

, and

Huang

, “

Scalable image coding with enhancement features for human and machine

”, in

Multimedia Systems

, Vol.

, No.

2024

[196]

Wang

, and

Zhang

, “

End to End Scalable Image Coding for Machines

”, in

2023 3rd International Conference on Intelligent Communications and Computing

2023

217

–

221

[197]

Xia

Liang

Yang

L.-Y.

Duan

, and

Liu

, “

An emerging coding paradigm VCM: A scalable coding approach beyond feature and signal

”, in

2020 IEEE International Conference on Multimedia and Expo

2020

–

[198]

Xie

K. L.

Cheng

, and

Chen

, “

Enhanced invertible encoding for learned image compression

”, in

Proceedings of the 29th ACM International Conference on Multimedia

2021

162

–

170

[199]

Xue

and

, “

Attention Based Image Compression Post-Processing Convolutional Neural Network

”, in

Proceedings of CVPR Workshops

2019

[200]

Yacouby

and

Axman

, “

Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models

”, in

Proceedings of the first workshop on evaluation and comparison of NLP systems

2020

–

[201]

R. J.

Yadav

and

M. S.

Nagmode

, “

Compression of hyperspectral image using PCA-DCT technology

”, in

Proceedings of Innovations in Electronics and Communication Engineering

2018

269

–

277

[202]

Yan

Gao

Liu

, and

, “

SSSIC: semantics-to-signal scalable image coding with learned structural representations

”, in

IEEE Transactions on Image Processing

, Vol.

2021

8939

–

8954

[203]

Yan

I. W.

Tsang

Long

, and

Yang

, “

Robust semi-supervised learning through label aggregation

”, in

30th AAAI Conference on Artificial Intelligence, AAAI 2016

2016

2244

–

2250

[204]

Yang

Zhang

, and

Yang

, “

DenseASPP for semantic segmentation in street scenes

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

368492

[205]

Yang

Huang

Liu

, and

A. C.

Kot

, “

Facial image compression via neural image manifold compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[206]

Yang

Wang

, and

Tian

, “

Discernible image compression

”, in

ACM Trans. Multimedia

2020

1561

–

1569

[207]

Wang

Kwong

, and

C.-C. J.

Kuo

, “

Task-driven video compression for humans and machines: Framework design and optimization

”, in

IEEE Transactions on Multimedia

2022

[208]

T. M.

Zeegers

D. M.

Pelt

van Leeuwen

van Liere

, and

K. J.

Batenburg

, “

Task-Driven Learned Hyperspectral Data Reduction Using End-to-End Supervised Deep Learning

”, in

Journal of Imaging

, Vol.

, No.

2020

132

[209]

Zhang

Jia

Lei

Wang

, and

Gao

, “

Recent development of AVS video coding standard: AVS3

”, in

2019 picture coding symposium

2019

–

[210]

Zhang

and

, “

Near-lossless L-infinity constrained Multi-rate Image Decompression via Deep Neural Network

”,

2018

[211]

Zhang

Wang

Zhang

Sun

, and

Gao

, “

A joint compression scheme of video feature descriptors and visual content

”, in

IEEE Transactions on Image Processing

, Vol.

, No.

2016

633

–

647

[212]

Zhang

Zhong

, and

, “

Residual non-local attention networks for image restoration

”, in

International Conference on Learning Representations

2019

[213]

Zhang

Jia

Chang

, and

, “

Machine perception-driven image compression: A layered generative approach

”, in

arXiv preprint arXiv:2304.06896

2023

[214]

Zhang

Zhu

Jiang

Kwong

, and

C.-C. J.

Kuo

, “

A survey on perceptually optimized video coding

”, in

ACM Computing Surveys

, Vol.

, No.

2023

–

[215]

Zhao

Shi

Wang

, and

Jia

, “

Pyramid scene parsing network

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

2881

–

2890

[216]

Z.-Q.

Zhao

Zheng

S.-T.

, and

, “

Object detection with deep learning: A review

”, in

IEEE Transactions on Neural Networks and Learning Systems

, Vol.

, No.

2019

3212

–

3232

[217]

Zheng

Deng

, and

, “

Cross-age lfw: A database for studying cross-age face recognition in unconstrained environments

”, in

arXiv preprint arXiv:1708.08197

2017

[218]

Zhou

Cai

Gao

, and

, “

Variational Autoencoder for Low Bit-rate Image Compression

”, in

Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition Workshops

2018

2024

H. Li, X. Zhang, S. Wang, S. Wang and J. Pan

Figure 1

Different human-machine collaborative image compression frameworks. E represents the encoder, D represents the decoder, E_h and D_h represent the human visual compression codec, E_m and D_m represent the machine vision task codec, b represents the bitstream, x represents the original image, $\hat{x}$ represents the reconstructed image, and ê represents the features used for machine vision tasks.

Figure 2

A block diagram shows an Original Image inputting to Interest Point Detection, followed by Local Feature Selection, followed by Local Feature Description.

CDVS standard normative blocks. Global and local features are extracted from the image and compressed.

Figure 3

A line graph shows P S N R in d B on the vertical axis from twenty-eight to thirty-eight, and Bit-rate in b p p on the horizontal axis from zero point twenty-five to one point fifty.

Rate-distortion performance evaluations for the latest compression methods on Kodak. Methods belonging to the multi-bitstream independent decoding category are represented with dashed lines, while multi-bitstream hierarchical decoding methods are represented with solid lines.

Figure 4

A line graph shows P S N R in d B on the vertical axis from thirty to thirty-six, and Bit-rate in b p p on the horizontal axis from zero point zero zero to zero point twenty.

Rate-distortion performance evaluations for the latest human-machine collaborative video compression methods on HEVC class B. Methods belonging to the single-bitstream multi-head decoding category are represented with dashed lines, while multi-bitstream hierarchical decoding methods are represented with solid lines.

Table 1

An overview of human-machine collaborative image compression methods in literature. MBID, MBHD, SBMD and SBAR respectively represent multi-bitstream independent decoding, multi-bitstream hierarchical decoding, single-bitstream multi-head decoding, and single-bitstream analysis after reconstruction. The ✓ indicates that the method aims to reconstruct and analyze facial images.

Category	Author	Presented Task	Core Method	Facial Image Specific
MBID	[111]	recognition	TFQI-based joint bit allocation
	[177]	classification	cross-layer context model + ROI
	[27]	segmentation	semantic feature enhancement
	[24]	detection	slimmable compressive encoder
	[120]	detection segmentation	gate module+knowledge distillation
MBHD	[9]	face detection	feature map + CNN	✓
	[185]	face recognition	feature & texture representation	✓
	[131]	facial identity recognition, facial attribute prediction	StyleGAN prior + layer-wise scalable entropy transformer	✓
	[184]	face verification	feature & texture + residual	✓
	[80]	facial landmark detection	edge map + GAN	✓
	[205]	segmentation	GAN+hyperprior model	✓
	[30]	detection	instance segmentation map + signal feature
	[5]	image search	semantic segmentation map + residual
	[72]	semantic enhancement	semantic segmentation + enhancement
	[190]	classification	task feature+residual
	[104]	classification	residual enhance + GAN
	[165]	detection	ob ject separation + parameter share
	[56]	segmentation, pose estimation	customized group mask + group-independent transform
	[121]	classification	pyramid of multiple subbands
	[54]	face recognition	Canny edge color sketch	✓
	[202]	detection, segmentation	structural representation+VGG
	[196]	detection	depth-constrained encoder
	[122]	classification, detection, segmentation	hyperprior network + predictor module
	[37]	detection	latent space transform
	[213]	classification, segmentation	reconstruction semantic feature fusion
	[55]	detection, segmentation	structural edges + feature + prior
	[105]	classification	semantics-based ROI mask + generation module
	[38]	detection, segmentation	ask-dependent latent space transform
	[195]	detection	mask multilayer fusion
	[14]	classification	lightweight image encoder+ViT
SBMD	[123]	classification	general feature extraction + feature-analytic classifier
	[26]	classification, detection, segmentation	prompt generator + Transformer
	[176]	classification, segmentation	feature-maps
SBAR	[132]	face recognition	sketches thumbnails + retrieved guidance
	[186]	detection	inverted bottleneck structure encoder
	[62]	detection, segmentation, facial landmark detection	content-adaptive diffusion model
	[59]	image caption, detection	feature distance + importance-weighted pixel distance

Category	Author	Presented Task	Core Method	Facial Image Specific
MBID	[111]	recognition	TFQI-based joint bit allocation
	[177]	classification	cross-layer context model + ROI
	[27]	segmentation	semantic feature enhancement
	[24]	detection	slimmable compressive encoder
	[120]	detection segmentation	gate module+knowledge distillation
MBHD	[9]	face detection	feature map + CNN	✓
	[185]	face recognition	feature & texture representation	✓
	[131]	facial identity recognition, facial attribute prediction	StyleGAN prior + layer-wise scalable entropy transformer	✓
	[184]	face verification	feature & texture + residual	✓
	[80]	facial landmark detection	edge map + GAN	✓
	[205]	segmentation	GAN+hyperprior model	✓
	[30]	detection	instance segmentation map + signal feature
	[5]	image search	semantic segmentation map + residual
	[72]	semantic enhancement	semantic segmentation + enhancement
	[190]	classification	task feature+residual
	[104]	classification	residual enhance + GAN
	[165]	detection	ob ject separation + parameter share
	[56]	segmentation, pose estimation	customized group mask + group-independent transform
	[121]	classification	pyramid of multiple subbands
	[54]	face recognition	Canny edge color sketch	✓
	[202]	detection, segmentation	structural representation+VGG
	[196]	detection	depth-constrained encoder
	[122]	classification, detection, segmentation	hyperprior network + predictor module
	[37]	detection	latent space transform
	[213]	classification, segmentation	reconstruction semantic feature fusion
	[55]	detection, segmentation	structural edges + feature + prior
	[105]	classification	semantics-based ROI mask + generation module
	[38]	detection, segmentation	ask-dependent latent space transform
	[195]	detection	mask multilayer fusion
	[14]	classification	lightweight image encoder+ViT
SBMD	[123]	classification	general feature extraction + feature-analytic classifier
	[26]	classification, detection, segmentation	prompt generator + Transformer
	[176]	classification, segmentation	feature-maps
SBAR	[132]	face recognition	sketches thumbnails + retrieved guidance
	[186]	detection	inverted bottleneck structure encoder
	[62]	detection, segmentation, facial landmark detection	content-adaptive diffusion model
	[59]	image caption, detection	feature distance + importance-weighted pixel distance

Table 2

An overview of human-machine collaborative video compression methods in literature. MBID, MBHD, and SBMD respectively represent multi-bitstream independent decoding, multi-bitstream hierarchical decoding, and single-bitstream multi-head decoding.

Category	Author	Presented Task	Core Method
MBID	[50]	Video Retrieval	Feature extrAction + CDVS + CNN
	[211]	Video Retrieval	Rate-accuracy optimization + affine motion compensation
	[10]	Class Identification, Object Recognition	Comprising Multiple autoencoders
MBHD	[197]	Action Recognition	Conditional deep generation network
	[82]	Action Recognition	Semantic information + feature Laddering Framework
	[114]	Object Detection	Conditional semantic compression + interlayer frame prediction
	[64]	Object Detection	End-to-end learnable video codec + conditional coding
	[39]	Object Detection	Conventional + DNN video compression
	[85]	Action Recognition	Learned semantic representation + end-to-end optimize
	[93]	Object Detection, Pose Estimation, Action Recognition, Object Segmentation	Static Object characteristic + dynamic motion clue
	[170]	Action Recognition, Multiple Object Tracking, Object Segmentation	Traditional codec + DNN
	[171]	Action Recognition, Multiple Object Tracking, Object Segmentation	Semantic-Mining-then-Compensation + masked image modeling
	[4]	Object Detection	Cuboidal feature descriptor
SBMD	[207]	Action Recognition	Task-driven optimization
SBMD	[160]	Action Recognition, Object Detection, Object Tracking, Object Segmentation	Temporal context + cross-domain motion

Category	Author	Presented Task	Core Method
MBID	[50]	Video Retrieval	Feature extrAction + CDVS + CNN
	[211]	Video Retrieval	Rate-accuracy optimization + affine motion compensation
	[10]	Class Identification, Object Recognition	Comprising Multiple autoencoders
MBHD	[197]	Action Recognition	Conditional deep generation network
	[82]	Action Recognition	Semantic information + feature Laddering Framework
	[114]	Object Detection	Conditional semantic compression + interlayer frame prediction
	[64]	Object Detection	End-to-end learnable video codec + conditional coding
	[39]	Object Detection	Conventional + DNN video compression
	[85]	Action Recognition	Learned semantic representation + end-to-end optimize
	[93]	Object Detection, Pose Estimation, Action Recognition, Object Segmentation	Static Object characteristic + dynamic motion clue
	[170]	Action Recognition, Multiple Object Tracking, Object Segmentation	Traditional codec + DNN
	[171]	Action Recognition, Multiple Object Tracking, Object Segmentation	Semantic-Mining-then-Compensation + masked image modeling
	[4]	Object Detection	Cuboidal feature descriptor
SBMD	[207]	Action Recognition	Task-driven optimization
SBMD	[160]	Action Recognition, Object Detection, Object Tracking, Object Segmentation	Temporal context + cross-domain motion

Table 3

The open-sourced code links of human-machine collaborative image and video compression methods.

Author	Code Link
Hu et al. [80]	https://williamyang1991.github.io/projects/VCM-Face/
Akbari et al. [5]	https://github.com/makbari7/DSSLIC
Fang et al. [55]	https://global.iflytek.com/
Torfason et al. [176]	https://github.com/DrSleep/tensorflow-deeplab-resnet
Gao et al. [59]	https://github.com/chansongoal/semantic_image_compression
Xia et al. [197]	https://lists.aau.at/mailman/listinfo/mpe-vcm
Lin et al. [114]	https://github.com/LHB116/DeepSVC.
Huang et al. [85]	https://github.com/ZhihaoHu/PyTorchVideoCompression
Yi et al. [207]	https://mic.tongji.edu.cn.

Author	Code Link
Hu et al. [80]	https://williamyang1991.github.io/projects/VCM-Face/
Akbari et al. [5]	https://github.com/makbari7/DSSLIC
Fang et al. [55]	https://global.iflytek.com/
Torfason et al. [176]	https://github.com/DrSleep/tensorflow-deeplab-resnet
Gao et al. [59]	https://github.com/chansongoal/semantic_image_compression
Xia et al. [197]	https://lists.aau.at/mailman/listinfo/mpe-vcm
Lin et al. [114]	https://github.com/LHB116/DeepSVC.
Huang et al. [85]	https://github.com/ZhihaoHu/PyTorchVideoCompression
Yi et al. [207]	https://mic.tongji.edu.cn.

Table 4

Image machine vision task performance of facial analysis tasks. The symbol “—” means the bitrate information is not given in paper or the task performance is not related to bitrate.

Author	bitrate	face recognition Acc.	face detection recall	NME	face seg Acc.
Li et al. [111]	0.07	0.99 (LFW)
Alvar et al. [9]	—		0.98 (LFW)
Wang et al. [185]	0.1	0.98 (LFW)
Mao et al. [131]	0.01	0.75 (CelebA-HQ)
Wang et al. [184]	0.003	0.99 (LFW)
Hu et al. [80]	0.225			3.5 (VGGFace2)
Yang et al. [205]	0.004				0.83 (FFHQ-Aging)
Fang et al. [54]	0.05	0.95 (LFW)

Author	bitrate	face recognition Acc.	face detection recall	NME	face seg Acc.
Li et al. [111]	0.07	0.99 (LFW)
Alvar et al. [9]	—		0.98 (LFW)
Wang et al. [185]	0.1	0.98 (LFW)
Mao et al. [131]	0.01	0.75 (CelebA-HQ)
Wang et al. [184]	0.003	0.99 (LFW)
Hu et al. [80]	0.225			3.5 (VGGFace2)
Yang et al. [205]	0.004				0.83 (FFHQ-Aging)
Fang et al. [54]	0.05	0.95 (LFW)

Table 5

Image machine vision task performance of classification, detection and segmentation task. The symbol “—” means the bitrate information is not given in paper or the task performance is not related to bitrate.

Author	bitrate	Classifacation Acc.	Detection mAP	Segmentation mAP	Seg IoU
Tu et al. [177]	0.002	0.32 (CUB-200-2011)
Chen et al. [27]	—				0.728 (Cityscapes)
Cao et al. [24]	0.2		0.54 (COCO2014)
Liu et al. [120]	0.125		0.48 (PASCAL VOC 07)		0.70 (Cityscapes)
Chen et al. [30]	—			0.428 (COCO2017)
Akbari et al. [5]	—
Hoang et al. [72]	—
Wang et al. [190]	0.5	0.31 (ImageNet)
Lei et al. [104]	0.2	0.8 (ImageNet)
Sun et al. [165]		0.75 (COCO2014)
Feng et al. [56]	0.45			0.362 (COCO 2017)
Liu et al. [121]	0.15	0.38 (CUB200-2011)
Yan et al. [202]	—	0.3085 (CUB)
Wu et al. [136]	0.1		0.745 (COCO)
Liu et al. [122]	0.4	0.71 (ILSVRC2012)	0.73 (VOC2012)	0.365 (COCO)
Choi and Bajić et al. [37]	0.24		0.55 (COCO 2014)
Zhang et al. [213]	0.175		0.8841 (CelebAMask-HQ)		0.58 (CelebAMask-HQ)
Fang et al. [55]	0.035		0.512 (SUIM)		0.545 (SUIM)
Lei et al. [105]	0.2	0.83 (ImageNet)
Choi and Bajić et al. [38]	0.15		0.33 (COCO 2014)	0.362 (COCO 2014)
Wu et al. [135]	0.2		0.74 (VOC2007)
Bai et al. [14]	0.2	0.73 (ImageNet)
Liu et al. [123]	0.18	0.77 (Caltec)
Chen et al. [26]	0.2	0.75 (ImageNet)	0.33 (COCO2017)	0.361 (COCO2017)
Torfason et al. [176]	0.038	0.5582 (ILSVRC2012)			0.5578 (ILSVRC2012)
Mao et al. [132]	0.0281	0.5538 (CelebAHQ)
Wang et al. [186]	0.4		0.48 (COCO2017)
Guo et al. [62]	0.153		0.376 (COCO 2017)	0.335 (COCO 2017)
Gao et al. [53]	0.2		0.48 (COCO 2014)

Author	bitrate	Classifacation Acc.	Detection mAP	Segmentation mAP	Seg IoU
Tu et al. [177]	0.002	0.32 (CUB-200-2011)
Chen et al. [27]	—				0.728 (Cityscapes)
Cao et al. [24]	0.2		0.54 (COCO2014)
Liu et al. [120]	0.125		0.48 (PASCAL VOC 07)		0.70 (Cityscapes)
Chen et al. [30]	—			0.428 (COCO2017)
Akbari et al. [5]	—
Hoang et al. [72]	—
Wang et al. [190]	0.5	0.31 (ImageNet)
Lei et al. [104]	0.2	0.8 (ImageNet)
Sun et al. [165]		0.75 (COCO2014)
Feng et al. [56]	0.45			0.362 (COCO 2017)
Liu et al. [121]	0.15	0.38 (CUB200-2011)
Yan et al. [202]	—	0.3085 (CUB)
Wu et al. [136]	0.1		0.745 (COCO)
Liu et al. [122]	0.4	0.71 (ILSVRC2012)	0.73 (VOC2012)	0.365 (COCO)
Choi and Bajić et al. [37]	0.24		0.55 (COCO 2014)
Zhang et al. [213]	0.175		0.8841 (CelebAMask-HQ)		0.58 (CelebAMask-HQ)
Fang et al. [55]	0.035		0.512 (SUIM)		0.545 (SUIM)
Lei et al. [105]	0.2	0.83 (ImageNet)
Choi and Bajić et al. [38]	0.15		0.33 (COCO 2014)	0.362 (COCO 2014)
Wu et al. [135]	0.2		0.74 (VOC2007)
Bai et al. [14]	0.2	0.73 (ImageNet)
Liu et al. [123]	0.18	0.77 (Caltec)
Chen et al. [26]	0.2	0.75 (ImageNet)	0.33 (COCO2017)	0.361 (COCO2017)
Torfason et al. [176]	0.038	0.5582 (ILSVRC2012)			0.5578 (ILSVRC2012)
Mao et al. [132]	0.0281	0.5538 (CelebAHQ)
Wang et al. [186]	0.4		0.48 (COCO2017)
Guo et al. [62]	0.153		0.376 (COCO 2017)	0.335 (COCO 2017)
Gao et al. [53]	0.2		0.48 (COCO 2014)

Table 6

Video machine vision task performance of classification, detection and segmentation task. The symbol “—” means the bitrate information is not given in paper or the task performance is not related to bitrate.

Author	bitrate (bpp)	Detection mAP	Action Recognition Acc.	MOTA	J&F
Xia et al. [197]	0.0052		0.746 (PKU-MMD)
Huang et al. [82]	0.013		0.751 (PKU-MMD)
Lin et al. [114]	0.05	0.738 (ImageNet VID)
Hadizadeh and Bajić [64]	0.04	0.617 (HEVC Claĩs B)
Huang et al. [85]	—		0.9905 (UCF-101)
Jin et al. [93]	0.11	0.39 (COCO2014)	0.85 (UCF-101)
Tian et al. [170]	0.03		0.8939 (UCF-101)	0.74 (MOT17)	0.83 (DAVIS2017)
Tian et al. [171]	0.02		0.75 (UCF-101)	0.75 (MOT17)	0.74 (DAVIS2017)
Yi et al. [207]	0.1		0.853 (UCF101)
Sheng et al. [160]	0.1	0.723 (ImageNet VID)	0.504 (UCF101)	0.534 (MOT17)	0.551 (DAVIS2017)

Author	bitrate (bpp)	Detection mAP	Action Recognition Acc.	MOTA	J&F
Xia et al. [197]	0.0052		0.746 (PKU-MMD)
Huang et al. [82]	0.013		0.751 (PKU-MMD)
Lin et al. [114]	0.05	0.738 (ImageNet VID)
Hadizadeh and Bajić [64]	0.04	0.617 (HEVC Claĩs B)
Huang et al. [85]	—		0.9905 (UCF-101)
Jin et al. [93]	0.11	0.39 (COCO2014)	0.85 (UCF-101)
Tian et al. [170]	0.03		0.8939 (UCF-101)	0.74 (MOT17)	0.83 (DAVIS2017)
Tian et al. [171]	0.02		0.75 (UCF-101)	0.75 (MOT17)	0.74 (DAVIS2017)
Yi et al. [207]	0.1		0.853 (UCF101)
Sheng et al. [160]	0.1	0.723 (ImageNet VID)	0.504 (UCF101)	0.534 (MOT17)	0.551 (DAVIS2017)

[1]

A. H.

Abbas

Arab

, and

Harbi

, “

Image compression using principal component analysis

”, in

Mustansiriyah Journal of Science

, Vol.

, No.

2018

[2]

Agustsson

Mentzer

Tschannen

Cavigelli

Timofte

Benini

, and

L. V.

Gool

, “

Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations

”, in

Proceedings of Advances in Neural Information Processing Systems

2017

1141

–

1151

[3]

Agustsson

Tschannen

Mentzer

Timofte

, and

L. V.

Gool

, “

Generative adversarial networks for extreme learned image compression

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

221

–

231

[4]

Ahmmed

Paul

Murshed

, and

Taubman

, “

Human-machine collaborative video coding through cuboidal partitioning

”, in

2021 IEEE International Conference on Image Processing

2021

2074

–

2078

[5]

Akbari

Liang

, and

Han

, “

DSSLIC: Deep semantic segmentation-based layered image compression

”, in

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing

2019

2042

–

2046

[6]

Akyazi

and

Ebrahimi

, “

Learning-based image compression using convolutional autoencoder and wavelet decomposition

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops

2019

[7]

Alexandre

C.-P.

Chang

W.-H.

Peng

, and

H.-M.

Hang

, “

An autoencoder-based learned image compressor: Description of challenge proposal by nctu

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

2018

2539

–

2542

[8]

S. R.

Alvar

and

I. V.

Bajić

, “

Multi-task learning with compressible features for collaborative intelligence

”, in

Proceedings of the IEEE International Conference on Image Processing

2019

1705

–

1709

[9]

S. R.

Alvar

Choi

, and

I. V.

Bajic

, “

Can you tell a face from a HEVC bitstream?

”, in

2018 IEEE Conference on Multimedia Information Processing and Retrieval

2018

257

–

261

[10]

Antonio

Faria

L. M.

Tavora

Navarro

, and

Assuncao

, “

Learning-based compression of visual objects for smart surveillance

”, in

2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA)

2022

–

[11]

Ascenso

Alshina

, and

Ebrahimi

, “

The jpeg ai standard: Providing efficient human and machine visual data consumption

”, in

IEEE Multimedia

, Vol.

, No.

2023

100

–

111

[12]

Ayzik

and

Avidan

, “

Deep image compression using decoder side information

”, in

Proceedings of Computer Vision-ECCV 2020: 16th European Conference

2020

699

–

714

[13]

Bai

and

Urtasun

, “

Deep watershed transform for instance segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5221

–

5229

[14]

Bai

Yang

Liu

Jiang

Wang

, and

Gao

, “

Towards end-to-end image compression and analysis with transformers

”, in

Proceedings of the AAAI conference on artificial intelligence

, Vol.

, No.

2022

104

–

112

[15]

Ballé

Laparra

, and

E. P.

Simoncelli

, “

End-to-end optimized image compression

”, in

arXiv preprint arXiv:1611.01704

2016

[16]

Bansal

Kumar

Sachdeva

, and

Mittal

, “

Transfer learning for image classification using VGG19: Caltech-101 image data set

”, in

Journal ofambient intelligence and humanized computing

2023

–

[17]

Báscones

González

, and

Mozos

, “

Hyperspectral image compression using vector quantization, PCA and JPEG2000

”, in

Remote Sensing

, Vol.

, No.

2018

907

[18]

Bay

Tuytelaars

, and

L. V.

Gool

, “

SURF: Speeded up robust features

”, in

Computer Vision—ECCV

2006

404

–

417

[19]

Bolya

Zhou

Xiao

, and

Y. J.

Lee

, “

Yolact: Real-time instance segmentation

”, in

Proceedings of the IEEE international conference on computer vision

2019

9157

–

9166

[20]

Bossen

Bross

Suhring

, and

Flynn

, “

HEVC complexity and implementation analysis

”, in

IEEE Transactions on circuits and Systems for Video Technology

, Vol.

, No.

2012

1685

–

1696

[21]

Bross

Y.-K.

Wang

Liu

Chen

G. J.

Sullivan

, and

J.-R.

Ohm

, “

Overview of the versatile video coding (VVC) standard and its applications

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2021

3736

–

3764

[22]

Buades

Coll

, and

J.-M.

Morel

, “

A non-local algorithm for image denoising

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

, Vol.

2005

–

[23]

Cai

Cao

, and

Zhang

, “

Learning a single tucker decomposition network for lossy image compression with multiple bits-per-pixel rates

”, in

IEEE Transactions on Image Processing

, Vol.

2020

3612

–

3625

[24]

Cao

Yao

Zhang

Jin

Zhang

, and

B. W.-K.

Ling

, “

Slimmable multi-task image compression for human and machine vision

”, in

IEEE Access

, Vol.

2023

29946

–

[25]

H.-S.

Chang

Learned-Miller

, and

McCallum

, “

Active bias: Training more accurate neural networks by emphasizing high variance samples

”, in

Advances in Neural Information Processing Systems

2017

1002

–

1012

[26]

Y.-H.

Chen

Y.-C.

Weng

C.-H.

Kao

Chien

W.-C.

Chiu

, and

W.-H.

Peng

, “

Transtic: Transferring transformer-based image compression from human perception to machine perception

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

23297307

[27]

Chen

Yao

Liu

, and

Zhao

, “

An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation

”, in

Chinese Conference on Pattern Recognition and Computer Vision

2021

623

–

635

[28]

L.-C.

Chen

Papandreou

Kokkinos

Murphy

, and

A. L.

Yuille

, “

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2017

834

–

848

[29]

L.-C.

Chen

Zhu

Papandreou

Schroff

, and

Adam

, “

Encoderdecoder with atrous separable convolution for semantic image segmentation

”, in

Proceedings of the European Conference on Computer Vision

2018

801

–

818

[30]

Chen

Jin

Meng

Lin

Chen

T.-S.

Chang

, and

Zhang

, “

A new image codec paradigm for human and machine uses

”, in

arXiv preprint arXiv:2112.10071

2021

[31]

Chen

Liu

Shen

Cao

, and

Wang

, “

End-to-End Learnt Image Compression via Non-Local Attention Optimization and Improved Context Modeling

”, in

IEEE Transactions on Image Processing

, Vol.

2021

3179

–

3191

[32]

Chen

Murherjee

Han

Grange

Liu

Parker

Chen

Joshi

, et al., “

An overview of core coding tools in the AV1 video codec

”, in

2018 picture coding symposium

2018

–

[33]

Chen

L.-Y.

Duan

Wang

Lin

, and

A. C.

Kot

, “

Data representation in hybrid coding framework for feature maps compression

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

3094

–

3098

[34]

Chen

Fan

Wang

L.-Y.

Duan

Lin

, and

Kot

, “

Lossy intermediate deep learning feature compression and evaluation

”, in

ACM Trans. Multimedia

2019

2414

–

2422

[35]

Cheng

Sun

Takeuchi

, and

Katto

, “

Deep Convolutional AutoEncoder-based Lossy Image Compression

”, in

Proceedings of 2018 Picture Coding Symposium

2018

253

–

257

[36]

Cheng

Sun

Takeuchi

, and

Katto

, “

Learned image compression with discretized gaussian mixture likelihoods and attention modules

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

7939

–

7948

[37]

Choi

and

I. V.

Bajić

, “

Latent-space scalability for multi-task collaborative intelligence

”, in

2021 IEEE International Conference on Image Processing

2021

3562

–

3566

[38]

Choi

and

I. V.

Bajić

, “

Scalable image coding for humans and machines

”, in

IEEE Transactions on Image Processing

, Vol.

2022

273954

[39]

Choi

and

I. V.

Bajić

, “

Scalable video coding for humans and machines

”, in

2022 IEEE 2Ąth International Workshop on Multimedia Signal Processing

2022

–

[40]

Choi

El-Khamy

, and

Lee

, “

Variable rate deep image compression with a conditional autoencoder

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2019

3146

–

3154

[41]

Cordts

Omran

Ramos

Rehfeld

Enzweiler

Benenson

Franke

Roth

, and

Schiele

, “

The cityscapes dataset for semantic urban scene understanding

”, in

Proceedings of the IEEE conference on computer vision and pattern recognition

2016

3213

–

3223

[42]

Covell

Johnston

Minnen

S. J.

Hwang

Shor

Singh

Vincent

, and

Toderici

, “

Target-Quality Image Compression with Recurrent, Convolutional Neural Networks

”, in

arXiv preprint arXiv:1705.06687

2017

[43]

Cui

Wang

Gao

Guo

Feng

, and

Bai

, “

Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2021

10532

–

[44]

Dai

, and

Sun

, “

R-FCN: Object detection via region-based fully convolutional networks

”, in

Proceedings of the Advances in Neural Information Processing Systems

2016

379

–

387

[45]

Dendorfer

Rezatofighi

Milan

Shi

Cremers

Reid

Roth

Schindler

, and

Leal-Taixé

, “

Mot20: A benchmark for multi object tracking in crowded scenes

”, in

arXiv preprint arXiv:2003.09003

2020

[46]

Deng

Dong

Socher

L.-J.

, and

Fei-Fei

, “

Imagenet: A large-scale hierarchical image database

”, in

2009 IEEE conference on computer vision and pattern recognition

2009

248

–

255

[47]

Ding

Wang

Fan

, and

Gong

, “

A semi-supervised two-stage approach to learning from noisy labels

”, in

2018 IEEE Winter Conference on Applications of Computer Vision

2018

1215

–

1224

[48]

Dong

and

W. D.

Pan

, “

A survey on compression domain image and video data processing and analysis techniques

”, in

Information

, Vol.

, No.

2023

184

[49]

L.-Y.

Duan

Chandrasekhar

Chen

Lin

Wang

Huang

Girod

, and

Gao

, “

Overview of the MPEG-CDVS standard

”, in

IEEE Transactions on Image Processing

, Vol.

, No.

2015

179

–

194

[50]

L.-Y.

Duan

Lou

Bai

Huang

Gao

Chandrasekhar

Lin

Wang

, and

A. C.

Kot

, “

Compact descriptors for video analysis: The emerging MPEG standard

”, in

IEEE MultiMedia

, Vol.

, No.

2018

–

[51]

Duan

Liu

Yang

Huang

, and

Gao

, “

Video coding for machines: A paradigm of collaborative compression and intelligent analytics

”, in

IEEE Transactions on Image Processing

, Vol.

2020

8680

–

8695

[52]

Dumas

Roumy

, and

Guillemot

, “

Autoencoder Based Image Compression: Can the Learning be Quantization Independent?

”, in

Proceedings of2018 IEEE International Conference on Acoustics, Speech and Signal Processing

2018

1188

–

1192

[53]

Dumas

Roumy

, and

Guillemot

, “

Image compression with Stochastic Winner-Take-All Auto-Encoder

”, in

Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing

2017

1512

–

1516

[54]

Fang

Duan

Tao

, and

, “

Sketch assisted face image coding for human and machine vision: a joint training approach

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[55]

Fang

Shen

Wang

, and

Jin

, “

Priors guided extreme underwater image compression for machine vision and human vision

”, in

IEEE Journal of Oceanic Engineering

2023

[56]

Feng

Gao

Jin

Feng

, and

Chen

, “

Semantically structured image compression via irregular group-based decoupling

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

17237

–

[57]

C.-Y.

Liu

Ranga

Tyagi

, and

A. C.

Berg

, “

DSSD: Decon-volutional Single Shot Detector

”, in

arXiv preprint arXiv:1701.06659

2017

[58]

Galteri

Seidenari

Bertini

, and

A. D.

Bimbo

, “

Deep Generative Adversarial Compression Artifact Removal

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

4826

–

4835

[59]

Gao

Liu

, and

, “

Towards task-generic image compression: A study of semantics-oriented metrics

”, in

IEEE Transactions on Multimedia

, Vol.

2021

721

–

735

[60]

Gregor

Besse

D. J.

Rezende

Danihelka

, and

Wierstra

, “

Towards Conceptual Compression

”, in

Proceedings of Advances In Neural Information Processing Systems

2016

3549

–

3557

[61]

Guo

Huang

Zhang

Zhuang

Dong

M. R.

Scott

, and

Huang

, “

Curriculumnet: Weakly supervised learning from large-scale web images

”, in

Proceedings of the European Conference on Computer Vision

2018

135

–

150

[62]

Guo

Chen

Zhao

Zhang

, and

Duan

, “

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

1431

–

1442

[63]

Guo

Zhang

Feng

, and

Chen

, “

Causal Contextual Prediction for Learned Image Compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2021

[64]

Hadizadeh

and

I. V.

Bajić

, “

Learned Scalable Video Coding For Humans and Machines

”, in

arXiv preprint arXiv:2307.08978

2023

[65]

Han

I. W.

Tsang

Chen

P. Y.

Celina

, and

S.-F.

Fung

, “

Progressive stochastic learning for noisy labels

”, in

IEEE transactions on neural networks and learning systems

, No.

2018

–

[66]

Han

Yao

Niu

Tsang

, and

Sugiyama

, “

Co-teaching: Robust training of deep neural networks with extremely noisy labels

”, in

Advances in Neural Information Processing Systems

2018

8527

–

8537

[67]

Han

Zhang

Cheng

Liu

, and

, “

Advanced deep-learning techniques for salient and category-specific object detection: A survey

”, in

IEEE Signal Processing Magazine

, Vol.

, No.

2018

–

100

[68]

Hayder

, and

Salzmann

, “

Boundary-aware instance segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5696

–

5704

[69]

Gkioxari

Dollar

, and

Girshick

, “

Mask R-CNN

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

2961

–

2969

[70]

Zhang

Ren

, and

Sun

, “

Deep residual learning for image recognition

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2016

770

–

778

[71]

Zhang

Lin

, and

Han

, “

Enhancing HEVC Compressed Videos with a Partition-Masked Convolutional Neural Network

”, in

Proceedings of 2018 25th IEEE International Conference on Image Processing

2018

216

–

220

[72]

T. M.

Hoang

Zhou

, and

Fan

, “

Image compression with encoder-decoder matched semantic segmentation

”, in

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

2020

160

–

161

[73]

Hong

Zhang

, and

Huang

, “

Fine-grained feature generation for generalized zero-shot video classification

”, in

IEEE Transactions on Image Processing

, Vol.

2023

1599

–

1612

[74]

Hong

Zhang

, and

Huang

, “

Multi-modal multi-grained embedding learning for generalized zero-shot video classification

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[75]

Hore

and

Ziou

, “

Image quality metrics: PSNR vs. SSIM

”, in

2010 20th international conference on pattern recognition

2010

2366

–

2369

[76]

Hou

Zheng

, and

Gould

, “

Learning to structure an image with few colors

”, in

Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition

2020

10113

–

[77]

Yang

, and

Liu

, “

Progressive Spatial Recurrent Neural Network for Intra Prediction

”, in

IEEE Transactions on Multimedia

, Vol.

, No.

2019

3024

–

3037

[78]

Yang

, and

Liu

, “

Coarse-to-fine hyper-prior modeling for learned image compression

”, in

Proceedings of the AAAI Conference on Artificial Intelligence

, Vol.

, No. 0

2020

11013

–

[79]

Yang

, and

Liu

, “

Learning end-to-end lossy image compression: A benchmark

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2021

[80]

Yang

L.-Y.

Duan

, and

Liu

, “

Towards coding for human and machine vision: A scalable image coding approach

”, in

2020 IEEE International Conference on Multimedia and Expo

2020

–

[81]

Huang

Liu

L. V. D.

Maaten

, and

K. Q.

Weinberger

, “

Densely connected convolutional networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

4700

–

4708

[82]

Huang

Yang

Xiang

Liu

, and

L.-Y.

Duan

, “

Collaborative scalable visual compression for human-centered videos

”, in

2022 IEEE International Symposium on Circuits and Systems

2022

2988

–

2992

[83]

Huang

et al., “

Speed/accuracy trade-offs for modern convolutional object detectors

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

7310

–

7311

[84]

Huang

Jia

, and

Zhao

, “

O2u-net: A simple noisy label detection approach for deep neural networks

”, in

Proceedings of the IEEE International Conference on Computer Vision

2019

3326

–

3334

[85]

Huang

Jia

Wang

, and

, “

Hmfvc: A human-machine friendly video compression scheme

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2022

[86]

Ikusan

and

Dai

, “

Deep Feature Compression with Rate-Distortion Optimization for Networked Camera Systems

”, in

Proceedings of the 14th Conference on ACM Multimedia Systems

2023

–

[87]

D. J.

C. D.

Kim

Jiang

, and

Memisevic

, “

Generating images with recurrent adversarial networks

”, in

CoRR, vol. abs/1602.05110

2016

[88]

Islam

L. M.

Dang

Lee

, and

Moon

, “

Image Compression With Recurrent Neural Network and Generalized Divisive Normalization

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2021

1875

–

1879

[89]

Jamil

M. J.

Piran

Rahman

, and

O.-J.

Kwon

, “

Learning-driven lossy image compression: A comprehensive survey

”, in

Engineering Applications of Artificial Intelligence

, Vol.

123

2023

106361

[90]

Jégou

Douze

Schmid

, and

Pérez

, “

Aggregating local descriptors into a compact image representation

”, in

Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition

2010

3304

–

3311

[91]

Jiang

Wang

Chen

, and

Liu

, “

Hyperspectral image classification in the presence of noisy labels

”, in

IEEE Transactions on Geoscience and Remote Sensing

, Vol.

, No.

2019

851

–

865

[92]

Jiang

Zhou

Leung

L.-J.

, and

Fei-Fei

, “

Mentornet: Learning data-driven curriculum for very deep neural networks on corrupted labels

”, in

International Conference on Machine Learning

2018

2304

–

2313

[93]

Jin

Feng

Sun

Feng

, and

Chen

, “

Semantical video coding: Instill static-dynamic clues into structured bitstream for ai tasks

”, in

Journal of Visual Communication and Image Representation

, Vol.

2023

103816

[94]

Johnston

Vincent

Minnen

Covell

Singh

Chinen

S. J.

Hwang

Shor

, and

Toderici

, “

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

4385

–

4393

[95]

Kang

Tripathi

, and

T. Q.

Nguyen

, “

Toward Joint Image Generation and Compression using Generative Adversarial Networks

”, in

arXiv preprint arXiv:1901.07838

2019

[96]

A. C.

Karaca

and

M. K.

Güllü

, “

Target preserving hyperspectral image compression using weighted PCA and JPEG2000

”, in

Proceedings of International Conference on Image and Signal Processing

2018

508

–

516

[97]

Khalaf

Zaghar

, and

Hashim

, “

Enhancement of Curve-Fitting Image Compression Using Hyperbolic Function

”, in

Symmetry

, Vol.

2019

291

[98]

Kodak

, “

Kodak Lossless True Color Image Suite (PhotoCD PCD0992)

”,

1992

[99]

Kong

, and

Zhao

, “

Spectral-Spatial Feature Partitioned Extraction Based on CNN for Multispectral Image Compression

”, in

Remote Sensing

, Vol.

, No.

2020

[100]

Kong

Sun

Yao

Liu

, and

Chen

, “

RON: Reverse connection with objectness prior networks for object detection

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5936

–

5944

[101]

A. B. L.

Larsen

S. K.

Sønderby

Larochelle

, and

Winther

, “

Autoen-coding beyond pixels using a learned similarity metric

”, in

Proceedings of The 33rd International Conference on Machine Learning

, Vol.

2016

1558

–

1566

[102]

Lee

Cho

, and

Kim

, “

An end-to-end joint learning scheme of image compression and quality enhancement with improved entropy minimization

”, in

arXiv preprint arXiv:1912.12817

2020

[103]

Lee

and

Park

, “

Centermask: Real-time anchor-free instance segmentation

”, in

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

2020

13906

–

[104]

Lei

Duan

Hong

J. F.

Mota

Shi

, and

C.-X.

Wang

, “

Progressive deep Image compression for hybrid contexts of image classification and reconstruction

”, in

IEEE Journal on Selected Areas in Communications

, Vol.

, No.

2022

–

[105]

Lei

Hong

Shi

Lin

, and

Xia

, “

Quantization-Based Adaptive Deep Image Compression Using Semantic Information

”, in

IEEE Access

2023

[106]

“

SSD: Single shot multibox detector

”, in

Computer Vision-ECCV

, ed.

Leibe

Matas

Sebe

, and

Welling

2016

–

[107]

and

Liu

, “

Multispectral transforms using convolution neural networks for remote sensing multispectral image compression

”, in

Remote Sensing

, Vol.

, No.

2019

759

[108]

Zhang

Zuo

Timofte

, and

Zhang

, “

Learning Context-Based Nonlocal Entropy Modeling for Image Compression

”, in

IEEE Transactions on Neural Networks and Learning Systems

2021

[109]

Sun

Zhao

Yuan

, and

Liu

, “

Deep Image Compression with Residual Learning

”, in

Applied Sciences

, Vol.

, No.

2020

4023

[110]

Chen

Dou

C.-W.

, and

P.-A.

Heng

, “

HDenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes

”, in

IEEE Transactions on Medical Imaging

, Vol.

, No.

2018

2663

–

2674

[111]

Jia

Wang

Zhang

Wang

, and

Gao

, “

Joint rate-distortion optimization for simultaneous texture and deep feature compression of facial images

”, in

2018 IEEE fourth international conference on multimedia big data (BigMM)

2018

–

[112]

Peng

Zhang

Deng

, and

Sun

, “

DetNet: A backbone network for object detection

”, in

arXiv preprint arXiv:1804.06215

2018

[113]

Lin

Milan

Shen

, and

Reid

, “

RefineNet: Multi-path refinement networks for high-resolution semantic segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

1925

–

1934

[114]

Lin

Chen

Zhang

Lin

Wang

, and

Zhao

, “

DeepSVC: Deep scalable video coding for both machine and human vision

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

9205

–

9214

[115]

T.-Y.

Lin

Dollar

Girshick

Hariharan

, and

Belongie

, “

Feature pyramid networks for object detection

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

2117

–

2125

[116]

T.-Y.

Lin

Goyal

Girshick

, and

Dollar

, “

Focal loss for dense object detection

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

2980

–

2988

[117]

T.-Y.

Lin

Maire

Belongie

Hays

Perona

Ramanan

Dollar

, and

C. L.

Zitnick

, “

Microsoft coco: Common objects in context

”, in

Computer Vision-ECCV 2014·’ 13th European Conference

Zurich, Switzerland

September 6-12, 2014

Proceedings, Part V 13, 2014

74055

[118]

Liu

Wen

Fan

C. C.

Loy

, and

T. S.

Huang

, “

Non-local recurrent network for image restoration

”, in

Neural Information Processing Systems

2018

1673

–

1682

[119]

Liu

Chen

Shen

Yue

, and

, “

Deep Image Compression via End-to-End Learning

”, in

Proceedings of Computer Vision Pattern Recognition

, Vol.

2018

[120]

Liu

Sun

, and

Katto

, “

Improving multiple machine vision tasks in the compressed domain

”, in

2022 26th International Conference on Pattern Recognition

2022

331

–

337

[121]

Liu

Yan

, and

, “

Semantics-to-signal scalable image compression with learned revertible representations

”, in

International Journal of Computer Vision

, Vol.

129

, No.

2021

260521

[122]

Liu

Chen

, and

, “

Icmh-net: Neural image compression towards both machine vision and human vision

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

8047

–

8056

[123]

Liu

Chen

Liu

Wang

, and

Shen

, “

2C-Net: integrate image compression and classification via deep neural network

”, in

Multimedia Systems

, Vol.

, No.

2023

945

–

959

[124]

Liu

et al., “

Receptive field block net for accurate and fast object detection

”, in

Proceedings of the European Conference on Computer Vision

2018

385

–

400

[125]

D. G.

Lowe

, “

Distinctive image features from scale-invariant keypoints

”, in

Int. J. Comput. Vis.

Vol.

, No.

2004

–

110

[126]

Luengo

S.-O.

Shim

Alshomrani

Altalhi

, and

Herrera

, “

Cnc-nos: Class noise cleaning by ensemble filtering and noise scoring

”, in

Knowledge-Based Systems

, Vol.

140

2018

–

[127]

Lyu

and

I. W.

Tsang

, “

Curriculum Loss: Robust Learning and Generalization against Label Corruption

”, in

arXiv preprint arXiv:1905.10045

2019

[128]

Zhang

Wang

Zhang

Jia

, and

Wang

, “

Joint feature and texture coding: Toward smart video representation via front-end intelligence

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2018

3095

–

3105

[129]

Zhang

Jia

Zhao

Wang

, and

Wang

, “

Image and video compression with neural networks: A review

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2019

1683

–

1698

[130]

Malach

and

Shalev-Shwartz

, “

Decoupling” when to update” from” how to update”

”, in

Advances in Neural Information Processing Systems

2017

960

–

970

[131]

Mao

Wang

Chen

Jin

, and

, “

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

”, in

IEEE Transactions on Image Processing

2023

[132]

Mao

Chen

Wang

, and

, “

Peering into the sketch: Ultra-low bitrate face compression for joint human and machine perception

”, in

Proceedings of the 31st ACM International Conference on Multimedia

2023

2564

–

2572

[133]

M. W.

Marcellin

M. J.

Gormish

Bilgin

, and

M. P.

Boliek

, “

An overview of JPEG-2000

”, in

Proceedings DCC 2000. Data compression conference

2000

523

–

541

[134]

F. D.

Martino

Perfilieva

, and

Sessa

, “

A Fast Multilevel Fuzzy Transform Image Compression Method

”, in

Axioms

, Vol.

, No.

2019

135

[135]

F. D.

Martino

and

Sessa

, “

Multi-level fuzzy transforms image compression

”, in

Journal of Ambient Intelligence and Humanized Computing

, Vol.

, No.

2019

2745

–

2756

[136]

Mercat

Viitanen

, and

Vanne

, “

UVG dataset: 50/120fps 4K sequences for video codec analysis and development

”, in

Proceedings of the 11th ACM Multimedia Systems Conference

2020

297

–

302

[137]

Minnen

Balle

, and

G. D.

Toderici

, “

Joint autoregressive and hierarchical priors for learned image compression

”, in

Proceedings of Advances in Neural Information Processing Systems

2018

10771

–

[138]

Minnen

and

Singh

, “

Channel-Wise Autoregressive Entropy Models for Learned Image Compression

”, in

Proceedings of 2020 IEEE International Conference on Image Processing

2020

3339

–

3343

[139]

D. T.

Nguyen

C. K.

Mummadi

T. P. N.

Ngo

T. H. P.

Nguyen

Beggel

, and

Brox

, “

SELF: Learning to Filter Noisy Labels with Self-Ensembling

”, in

arXiv preprint arXiv:1910.01842

2019

[140]

D. T.

Nguyen

T.-P.-N.

Ngo

Lou

Klar

Beggel

, and

Brox

, “

Robust Learning Under Label Noise With Iterative Noise-Filtering

”, in

arXiv preprint arXiv:1906.00216

2019

[141]

C. G.

Northcutt

, and

I. L.

Chuang

, “

Learning with confident examples: Rank pruning for robust classification with noisy labels

”, in

Uncertainty in Artificial Intelligence - Proceedings of the 33rd Conference, UAI 2017

2017

[142]

J.-R.

Ohm

G. J.

Sullivan

Schwarz

T. K.

Tan

, and

Wiegand

, “

Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC)

”, in

IEEE Transactions on circuits and systems for video technology

, Vol.

, No.

2012

1669

–

1684

[143]

V. A.

de Oliveira

Chabert

Oberlin

Poulliat

Bruno

Latry

Carlavan

Henrot

Falzon

, and

Camarero

, “

Reduced-Complexity End-to-End Variational Autoencoder for on Board Satellite Image Compression

”, in

Remote Sensing

, Vol.

, No.

2021

447

[144]

Ollivier

, “

Auto-encoders: reconstruction versus compression

”,

2014

[145]

A. G.

Ororbia

Mali

O’Connell

Dreese

Miller

, and

C. L.

Giles

, “

Learned Neural Iterative Decoding for Lossy Image Compression Systems

”, in

Proceedings of 2019 Data Compression Conference

2019

–

[146]

Ouyang

Wang

Zhu

, and

Wang

, “

Chained cascade network for object detection

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

1956

–

1964

[147]

Paszke

Chaurasia

Kim

, and

Culurciello

, “

Enet: A deep neural network architecture for real-time semantic segmentation

”, in

arXiv preprint arXiv:1606.02147

2016

[148]

Perronnin

Liu

Sánchez

, and

Poirier

, “

Large-scale image retrieval with compressed fisher vectors

”, in

Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition

2010

338491

[149]

P. O.

Pinheiro

Collobert

, and

Dollár

, “

Learning to segment object candidates

”, in

Advances in Neural Information Processing Systems

2015

1990

–

1998

[150]

Prakash

Moran

Garber

DiLillo

, and

Storer

, “

Semantic Perceptual Image Compression using Deep Convolution Networks

”, in

Proceedings of2017 Data Compression Conference

2017

250

–

259

[151]

Punnappurath

and

M. S.

Brown

, “

Learning raw image reconstruction-aware deep image compressors

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2020

1013

–

1019

[152]

Purkait

Zhao

, and

Zach

, “

SPP-net: Deep absolute pose regression with synthetic views

”, in

arXiv preprint arXiv:1712.03Ą52

2017

[153]

S. K.

Raman

Ramesh

Naganoor

Dash

Kumaravelu

, and

Lee

, “

CompressNet: Generative Compression at Extremely Low Bitrates

”, in

Proceedings of The IEEE Winter Conference on Applications of Computer Vision

2020

2325

–

2333

[154]

Reed

Lee

Anguelov

Szegedy

Erhan

, and

Rabinovich

, “

Training deep neural networks on noisy labels with bootstrapping

”, in

arXiv preprint arXiv:1Ą12.6596

2014

[155]

Rezatofighi

Tsoi

Gwak

Sadeghian

Reid

, and

Savarese

, “

Generalized intersection over union: A metric and a loss for bounding box regression

”, in

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

2019

658

–

666

[156]

Ronneberger

Fischer

, and

Brox

, “

U-net: Convolutional networks for biomedical image segmentation

”, in

International Conference on Medical Image Computing and Computer-assisted Intervention

2015

234

–

241

[157]

Sento

, “

Image Compression with Auto-encoder Algorithm using Deep Neural Network (DNN)

”, in

Proceedings of2016 Management and Innovation Technology International Conference

2016

[158]

R. R.

Shamir

Duchin

Kim

Sapiro

, and

Harel

, “

Continuous dice coefficient: a method for evaluating probabilistic segmentations

”, in

arXiv preprint arXiv:1906.11031

2019

[159]

Shen

Liu

Y.-G.

Jiang

Chen

, and

Xue

, “

DSOD: Learning deeply supervised object detectors from scratch

”, in

Proceedings of the IEEE International Conference on Computer Vision

2017

1919

–

1927

[160]

Sheng

Liu

, and

, “

VNVC: A Versatile Neural Video Coding Framework for Efficient Human-Machine Vision

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2024

[161]

Singh

Abu-El-Haija

Johnston

Ballé

Shrivastava

, and

Toderici

, “

End-to-end learning of compressible features

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

334953

[162]

Sivic

and

Zisserman

, “

Video Google: A text retrieval approach to object matching in videos

”, in

Proceedings of the IEEE International Conference on Computer Vision

2003

1470

–

1477

[163]

Song

Gao

Hanjalic

, and

T. H.

Shen

, “

Unified Binary Generative Adversarial Network for Image Retrieval and Compression

”, in

International Journal of Computer Vision

, Vol.

128

, No.

2020

2243

–

2264

[164]

Soomro

A. R.

Zamir

, and

Shah

, “

UCF101: A dataset of 101 human actions classes from videos in the wild

”, in

arXiv preprint arXiv:1212.0402

2012

[165]

Sun

, and

Chen

, “

Semantic structured image coding framework for multiple intelligent applications

”, in

IEEE Transactions on Circuits and Systems for Video Technology

, Vol.

, No.

2020

363142

[166]

Suzuki

Takagi

Hayase

Onishi

, and

Shimizu

, “

Image pre-transformation for recognition-aware image compression

”, in

Proceedings of the IEEE International Conference on Image Processing

2019

268690

[167]

Suzuki

Takagi

Takeda

Tanida

, and

Kimata

, “

Deep feature compression with spatio-temporal arranging for collaborative intelligence

”, in

Proceedings of the IEEE International Conference on Image Processing

2020

3099

–

3103

[168]

Tellez

Litjens

van der Laak

, and

Ciompi

, “

Neural Image Compression for Gigapixel Histopathology Image Analysis

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

, Vol.

, No.

2021

567

–

578

[169]

Theis

Shi

Cunningham

, and

Huszar

, “

Lossy image compression with compressive autoencoders

”, in

Proceedings of International Conference on Learning Representations

2017

[170]

Tian

Yan

Zhai

Chen

, and

Gao

, “

A Coding Framework and Benchmark towards Low-Bitrate Video Understanding

”, in

IEEE Transactions on Pattern Analysis and Machine Intelligence

2024

[171]

Tian

Zhai

, and

Gao

, “

Non-Semantics Suppressed Mask Learning for Unsupervised Video Semantic Compression

”, in

Proceedings of the IEEE/CVF International Conference on Computer Vision

2023

13610

–

[172]

Toderici

S. M. O’

Malley

S. J.

Hwang

Vincent

Minnen

Baluja

Covell

, and

Sukthankar

, “

Variable Rate Image Compression with Recurrent Neural Networks

”, in

CoRR

, vol.

abs/1511.06085

2016

[173]

Toderici

Vincent

Johnston

S. J.

Hwang

Minnen

Shor

, and

Covell

, “

Full Resolution Image Compression with Recurrent Neural Networks

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

5306

–

5314

[174]

Toderici

Theis

Johnston

Agustsson

Mentzer

Ballé

Shi

, and

Timofte

, “

CLIC 2020: Challenge on Learned Image Compression

”,

Sections 2

2020

[175]

Torfason

Mentzer

Augustsson

Tschannen

Timofte

, and

L. V.

Gool

, “

Towards Image Understanding from Deep Compression Without Decoding

”, in

Proceedings of International Conference on Learning Representations

2018

[176]

Torfason

Mentzer

Agustsson

Tschannen

Timofte

, and

Van Gool

, “

Towards image understanding from deep compression without decoding

”, in

arXiv preprint arXiv:1803.06131

2018

[177]

Zhou

, and

, “

Semantic scalable image compression with cross-layer priors

”, in

Proceedings of the 29th A CM International conference on multimedia

2021

4044

–

4052

[178]

Vicente

Carreira

Agapito

, and

Batista

, “

Reconstructing pascal voc

”, in

Proceedings of the IEEE conference on computer vision and pattern recognition

2014

–

[179]

Visin

Ciccone

Romero

Kastner

Cho

Bengio

Matteucci

, and

Courville

, “

Reseg: A recurrent neural network-based model for semantic segmentation

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

2016

–

[180]

G. K.

Wallace

, “

The JPEG still picture compression standard

”, in

Communications of the ACM

, Vol.

, No.

1991

–

[181]

Wang

Han

, and

Wang

, “

An End-to-End Deep Learning Image Compression Framework Based on Semantic Analysis

”, in

Applied Sciences

, Vol.

, No.

2019

3580

[182]

Wang

Chen

Yuan

Liu

Huang

Hou

, and

Cottrell

, “

Understanding convolution for semantic segmentation

”, in

Winter Conference on Applications of Computer Vision

2018

1451

–

1460

[183]

R. J.

Wang

, and

C. X.

Ling

, “

Pelee: A real-time object detection system on mobile devices

”, in

Proceedings of the Advances in Neural Information Processing Systems

2018

1963

–

1972

[184]

Wang

Yang

Zhang

Wang

, and

Gao

, “

Towards analysis-friendly face representation with scalable feature and texture compression

”, in

IEEE Transactions on Multimedia

, Vol.

2021

3169

–

3181

[185]

Wang

Zhang

Wang

, and

Gao

, “

Scalable facial image compression with deep feature reconstruction

”, in

2019 IEEE International Conference on Image Processing

2019

2691

–

2695

[186]

Wang

, and

, “

End-to-end compression towards machine vision: Network architecture design and optimization

”, in

IEEE Open Journal of Circuits and Systems

, Vol.

2021

675

–

685

[187]

Wang

Girshick

Gupta

, and

, “

Non-local neural networks

”, in

Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

2018

7794

–

7803

[188]

Wang

and

A. C.

Bovik

, “

A universal image quality index

”, in

IEEE signal processing letters

, Vol.

, No.

2002

–

[189]

Wang

E. P.

Simoncelli

, and

A. C.

Bovik

, “

Multiscale structural similarity for image quality assessment

”, in

The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003

, Vol.

2003

1398402

[190]

Wang

, and

P. C.

Cosman

, “

Human-machine interaction-oriented image coding for resource-constrained visual monitoring in IoT

”, in

IEEE Internet of Things Journal

, Vol.

, No.

2022

1618195

[191]

Wiegand

G. J.

Sullivan

Bjontegaard

, and

Luthra

, “

Overview of the H. 264/AVC video coding standard

”, in

IEEE Transactions on circuits and systems for video technology

, Vol.

, No.

2003

560

–

576

[192]

Huang

, and

Shen

, “

A GAN-based tunable image compression system

”, in

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2020

2334

–

2342

[193]

Sun

, and

Tan

, “

A light CNN for deep face representation with noisy labels

”, in

IEEE Transactions on Information Forensics and Security

, Vol.

, No.

2018

2884

–

2896

[194]

Zhang

Jin

, and

Chen

, “

Learned Block-based Hybrid Image Compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2021

[195]

Yang

, and

Huang

, “

Scalable image coding with enhancement features for human and machine

”, in

Multimedia Systems

, Vol.

, No.

2024

[196]

Wang

, and

Zhang

, “

End to End Scalable Image Coding for Machines

”, in

2023 3rd International Conference on Intelligent Communications and Computing

2023

217

–

221

[197]

Xia

Liang

Yang

L.-Y.

Duan

, and

Liu

, “

An emerging coding paradigm VCM: A scalable coding approach beyond feature and signal

”, in

2020 IEEE International Conference on Multimedia and Expo

2020

–

[198]

Xie

K. L.

Cheng

, and

Chen

, “

Enhanced invertible encoding for learned image compression

”, in

Proceedings of the 29th ACM International Conference on Multimedia

2021

162

–

170

[199]

Xue

and

, “

Attention Based Image Compression Post-Processing Convolutional Neural Network

”, in

Proceedings of CVPR Workshops

2019

[200]

Yacouby

and

Axman

, “

Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models

”, in

Proceedings of the first workshop on evaluation and comparison of NLP systems

2020

–

[201]

R. J.

Yadav

and

M. S.

Nagmode

, “

Compression of hyperspectral image using PCA-DCT technology

”, in

Proceedings of Innovations in Electronics and Communication Engineering

2018

269

–

277

[202]

Yan

Gao

Liu

, and

, “

SSSIC: semantics-to-signal scalable image coding with learned structural representations

”, in

IEEE Transactions on Image Processing

, Vol.

2021

8939

–

8954

[203]

Yan

I. W.

Tsang

Long

, and

Yang

, “

Robust semi-supervised learning through label aggregation

”, in

30th AAAI Conference on Artificial Intelligence, AAAI 2016

2016

2244

–

2250

[204]

Yang

Zhang

, and

Yang

, “

DenseASPP for semantic segmentation in street scenes

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2018

368492

[205]

Yang

Huang

Liu

, and

A. C.

Kot

, “

Facial image compression via neural image manifold compression

”, in

IEEE Transactions on Circuits and Systems for Video Technology

2023

[206]

Yang

Wang

, and

Tian

, “

Discernible image compression

”, in

ACM Trans. Multimedia

2020

1561

–

1569

[207]

Wang

Kwong

, and

C.-C. J.

Kuo

, “

Task-driven video compression for humans and machines: Framework design and optimization

”, in

IEEE Transactions on Multimedia

2022

[208]

T. M.

Zeegers

D. M.

Pelt

van Leeuwen

van Liere

, and

K. J.

Batenburg

, “

Task-Driven Learned Hyperspectral Data Reduction Using End-to-End Supervised Deep Learning

”, in

Journal of Imaging

, Vol.

, No.

2020

132

[209]

Zhang

Jia

Lei

Wang

, and

Gao

, “

Recent development of AVS video coding standard: AVS3

”, in

2019 picture coding symposium

2019

–

[210]

Zhang

and

, “

Near-lossless L-infinity constrained Multi-rate Image Decompression via Deep Neural Network

”,

2018

[211]

Zhang

Wang

Zhang

Sun

, and

Gao

, “

A joint compression scheme of video feature descriptors and visual content

”, in

IEEE Transactions on Image Processing

, Vol.

, No.

2016

633

–

647

[212]

Zhang

Zhong

, and

, “

Residual non-local attention networks for image restoration

”, in

International Conference on Learning Representations

2019

[213]

Zhang

Jia

Chang

, and

, “

Machine perception-driven image compression: A layered generative approach

”, in

arXiv preprint arXiv:2304.06896

2023

[214]

Zhang

Zhu

Jiang

Kwong

, and

C.-C. J.

Kuo

, “

A survey on perceptually optimized video coding

”, in

ACM Computing Surveys

, Vol.

, No.

2023

–

[215]

Zhao

Shi

Wang

, and

Jia

, “

Pyramid scene parsing network

”, in

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2017

2881

–

2890

[216]

Z.-Q.

Zhao

Zheng

S.-T.

, and

, “

Object detection with deep learning: A review

”, in

IEEE Transactions on Neural Networks and Learning Systems

, Vol.

, No.

2019

3212

–

3232