Image generation based on image description using artificial intelligence

https://proceedings.neurips.cc/paper_files/paper/2014/file/f033ed80deb0234979a61f95710dbe25-Paper.pdf

2.

Goodfellow

IJ

,

Pouget-Abadie

J

,

Mirza

M

,

Xu

B

,

Warde-Farley

D

,

Ozair

S

, et al.

Generative adversarial nets

. In

Ghahramani

Z

,

Welling

M

,

Cortes

C

,

Lawrence

N

,

Weinberger

KQ

(Eds)

Advances in Neural Information Processing Systems

.

[Internet]

.

Curran Associates

;

2014

.

Available from:

https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

3.

Vaswani

A

,

Shazeer

N

,

Parmar

N

,

Uszkoreit

J

,

Jones

L

,

Gomez

AN

, et al.

Attention is all you need

. In

Guyon

I

,

Luxburg

UV

,

Bengio

S

,

Wallach

H

,

Fergus

R

,

Vishwanathan

S

(Eds)

et al.:

Advances in Neural Information Processing Systems

.

[Internet]

.

Curran Associates

;

2017

.

Available from:

4.

Bengio

Y

,

Mesnil

G

,

Dauphin

Y

,

Rifai

S

.

Better mixing via deep representations

.

[Internet]

,

;

2012

.

Available from:

http://arxiv.org/abs/1207.4404 [

accessed

10 August 2025].

https://ieeexplore.ieee.org/document/9879122/ [

5.

Tao

M

,

Tang

H

,

Wu

F

,

Jing

X

,

Bao

BK

,

Xu

C

.

DF-GAN: a simple and effective baseline for text-to-image synthesis

. In:

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

.

[Internet]

.

New Orleans, LA

:

IEEE

;

2022

. p.

16494

–

504

.

Available from:

accessed

10 August 2025].

6.

Liao

W

,

Hu

K

,

Yang

MY

,

Rosenhahn

B

.

Text to image generation with semantic-spatial aware GAN

. In:

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

;

2022

. p.

18187

-

96

.

7.

Chen

X

,

Fang

H

,

Lin

TY

,

Vedantam

R

,

Gupta

S

,

Dollar

P

, et al.

Microsoft COCO captions: data collection and evaluation server

.

[Internet]

.

;

2015

.

Available from:

http://arxiv.org/abs/1504.00325 [

accessed

10 August 2025].

8.

Wah

C

,

Branson

S

,

Welinder

P

,

Perona

P

,

Belongie

S

.

California institute of technology

;

2011

.

Report No.: CNS-TR-2011-001

.

http://link.springer.com/10.1007/978-3-319-10602-1_48 [

9.

Lin

TY

,

Maire

M

,

Belongie

S

,

Hays

J

,

Perona

P

,

Ramanan

D

, et al. Microsoft COCO: common objects in context. In:

Fleet

D

,

Pajdla

T

,

Schiele

B

,

Tuytelaars

T

(Eds)

Computer vision – ECCV 2014. Lecture notes in computer science

.

[Internet]

.

Cham

:

Springer International Publishing

2014

. p.

740

–

55

; Vol.

8693

.

Available from:

accessed

10 August 2025].

https://ieeexplore.ieee.org/document/5206848/ [

10.

Deng

J

,

Dong

W

,

Socher

R

,

Li

LJ

,

Li

K

,

Fei-Fei

L

. ImageNet: a large-scale hierarchical image database. In:

2009 IEEE conference on computer vision and pattern recognition

.

[Internet]

.

Miami, FL

:

IEEE

;

2009

. p.

248

–

55

.

Available from:

accessed

10 August 2025].

11.

Salimans

T

,

Goodfellow

I

,

Zaremba

W

,

Cheung

V

,

Radford

A

,

Chen

X

.

Improved techniques for training GANs

.

[Internet]

.

;

2016

.

Available from:

https://arxiv.org/abs/1606.03498 [

accessed

10 August 2025].

12.

Heusel

M

,

Ramsauer

H

,

Unterthiner

T

,

Nessler

B

,

Hochreiter

S

.

GANs trained by a two time-scale update rule converge to a local nash equilibrium

;

2017

.

Available from:

https://arxiv.org/abs/1706.08500 [

accessed

10 August 2025].

https://dl.acm.org/doi/10.1145/383259.383295 [

13.

Hertzmann

A

,

Jacobs

CE

,

Oliver

N

,

Curless

B

,

Salesin

DH

.

Image analogies

. In:

Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques

.

[Internet]

.

ACM

;

2001

. p.

327

–

40

.

Available from:

accessed

10 August 2025].

14.

Mordvintsev

A

,

Olah

C

,

Tyka

M

.

Deepdream-a code example for visualizing neural networks

.

Google Research

.

2015

;

2

(

5

):

67

.

http://ieeexplore.ieee.org/document/7780634/ [

15.

Gatys

LA

,

Ecker

AS

,

Bethge

M

.

Image style transfer using convolutional neural networks

. In:

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

.

[Internet]

.

Las Vegas, NV

:

IEEE

;

2016

. p.

2414

–

23

.

Available from:

accessed

10 August 2025].

https://proceedings.mlr.press/v48/ulyanov16.html

16.

Ulyanov

D

,

Lebedev

V

,

Andrea

,

LV

.

Texture networks: feed-forward synthesis of textures and stylized images

. In:

Balcan

MF

,

Weinberger

KQ

(Eds)

Proceedings of the 33rd International Conference on Machine Learning

.

[Internet]

.

New York, NY

:

PMLR

;

2016

. p.

1349

–

57

.

Proceedings of Machine Learning Research

; Vol.

48

.

Available from:

http://ieeexplore.ieee.org/document/8237506/ [

17.

Zhu

JY

,

Park

T

,

Isola

P

,

Efros

AA

.

Unpaired image-to-image translation using cycle-consistent adversarial networks

. In:

2017 IEEE International Conference on Computer Vision (ICCV)

.

[Internet]

.

Venice

:

IEEE

;

2017

. p.

2242

–

51

.

Available from:

accessed

10 August 2025].

18.

Elgammal

A

,

Liu

B

,

Elhoseiny

M

,

Mazzone

M

.

CAN: creative adversarial networks, generating ‘art’ by learning about styles and deviating from style norms

.

[Internet]

.

;

2017

.

Available from:

http://arxiv.org/abs/1706.07068 [

accessed

10 August 2025].

19.

Brock

A

,

Donahue

J

,

Simonyan

K

.

Large scale GAN training for high fidelity natural image synthesis

.

[Internet]

.

;

2019

.

Available from:

http://arxiv.org/abs/1809.11096 [

accessed

10 August 2025].

https://proceedings.mlr.press/v48/reed16.html

20.

Reed

S

,

Akata

Z

,

Yan

X

,

Logeswaran

L

,

Schiele

B

,

Lee

H

.

Generative adversarial text to image synthesis

. In:

Balcan

MF

,

Weinberger

KQ

(Eds)

.

Proceedings of the 33rd International Conference on Machine Learning

.

[Internet]

.

New York, NY

:

PMLR

;

2016

. p.

1060

–

9

.

Proceedings of Machine Learning Research

; Vol.

48

.

Available from:

https://doi.org/10.1109/tpami.2018.2856256

21.

Zhang

H

,

Xu

T

,

Li

H

,

Zhang

S

,

Wang

X

,

Huang

X

,

Metaxas

DN

.

StackGAN++: realistic image synthesis with stacked generative adversarial networks

.

IEEE Trans Pattern Anal Mach Intell

.

2019

;

41

(

8

):

1947

-

62

. doi:

.

PubMed

22.

Ramesh

A

,

Dhariwal

P

,

Nichol

A

,

Chu

C

,

Chen

M

.

Hierarchical text-conditional image generation with CLIP latents

.

[Internet]

.

;

2022

.

Available from:

http://arxiv.org/abs/2204.06125 [

accessed

10 August 2025].

23.

Nichol

A

,

Dhariwal

P

,

Ramesh

A

,

Shyam

P

,

Mishkin

P

,

McGrew

B

, et al.

GLIDE: towards photorealistic image generation and editing with text-guided diffusion models

.

[Internet]

.

;

2021

.

Available from:

https://arxiv.org/abs/2112.10741 [

accessed

10 August 2025].

24.

Saharia

C

,

Chan

W

,

Saxena

S

,

Li

L

,

Whang

J

,

Denton

E

, et al.

Photorealistic text-to-image diffusion models with deep language understanding

.

[Internet]

.

;

2022

.

Available from:

http://arxiv.org/abs/2205.11487 [

accessed

10 August 2025].

https://ieeexplore.ieee.org/document/9878449/ [

25.

Rombach

R

,

Blattmann

A

,

Lorenz

D

,

Esser

P

,

Ommer

B

.

High-resolution image synthesis with latent diffusion models

. In:

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

.

[Internet]

.

New Orleans, LA

:

IEEE

;

2022

. p.

10674

–

85

.

Available from:

accessed

10 August 2025].

https://doi.org/10.1016/j.cviu.2021.103329

26.

Borji

A

.

Pros and cons of GAN evaluation measures: new developments

.

Comp Vis Image Understanding

.

2022

;

215

: 103329. doi:

.

https://ieeexplore.ieee.org/document/8578241/ [

27.

Xu

T

,

Zhang

P

,

Huang

Q

,

Zhang

H

,

Gan

Z

,

Huang

X

, et al.

AttnGAN: fine-grained text to image generation with attentional generative adversarial networks

. In:

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

.

[Internet]

.

Salt Lake City, UT

:

IEEE

;

2018

. p.

1316

–

24

.

Available from:

accessed

10 August 2025].

https://doi.org/10.1109/cvpr.2016.308

28.

Szegedy

C

,

Vanhoucke

V

,

Ioffe

S

,

Shlens

J

,

Wojna

Z

.

Rethinking the inception architecture for computer vision

. In:

2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

;

2015

. p.

2818

-

26

. doi:

.

29.

Barratt

S

,

Sharma

R

.

A note on the inception score

.

[Internet]

.

;

2018

.

Available from:

http://arxiv.org/abs/1801.01973 [

accessed

10 August 2025].

30.

Qiao

T

,

Zhang

J

,

Xu

D

,

Tao

D

.

MirrorGAN: learning text-to-image generation by redescription

.

[Internet]

.

https://doi.org/10.1109/cvpr.2019.00160

;

2019

. p.

1505

-

14

. doi:

.

Available from:

http://arxiv.org/abs/1903.05854 [

accessed

10 August 2025].

31.

Luccioni

AS

,

Rolnick

D

.

Bugs in the data: how ImageNet misrepresents biodiversity

.

[Internet]

.

;

2022

.

Available from:

http://arxiv.org/abs/2208.11695 [

accessed

10 August 2025].

https://doi.org/10.1109/access.2025.3564121

32.

Grgurević

A

,

Bagić Babac

M

.

Transformer-based approach for solving mathematical problems using automatic speech recognition

.

IEEE Access

.

2025

;

13

:

79845

-

59

. doi:

.

https://doi.org/10.1109/access.2024.3358452

33.

Gezici

AHB

,

Sefer

E

.

Deep transformer-based asset price and direction prediction

.

IEEE Access

.

2024

;

12

:

24164

-

78

. doi:

.

https://dl.acm.org/doi/10.1145/3533271.3561738 [

34.

Tuncer

T

,

Kaya

U

,

Sefer

E

,

Alacam

O

,

Hoser

T

. Asset price and direction prediction via deep 2D transformer and convolutional neural networks. In:

Proceedings of the third ACM international conference on AI in finance

.

[Internet]

.

New York NY

:

ACM

;

2022

. p.

79

–

86

.

Available from:

accessed

10 August 2025].

https://doi.org/10.1108/ACI-09-2024-0353 [

35.

Mohanrasu

SS

,

Phan

LT

,

Rajan

R

,

Manavalan

B

.

Cost-sensitive feature selection for multi-label classification: multi-criteria decision-making approach

.

Appl Comput Inform

.

[Internet]

.

2025

.

Available from:

accessed

10 August 2025].

https://doi.org/10.2478/crdj-2023-0004

36.

Ivezić

D

,

Babac

MB

.

Trends and challenges of text-to-image generation: sustainability perspective

.

Croatian Regional Dev J

.

2023

;

4

(

1

):

56

-

77

. doi:

.