Meta Soft Prompting and Learning Open Access

[3]

H.-Y.

Chen

and

J.-T.

Chien

, “

Deep semi-supervised learning for domain adaptation

”, in

Proc. of International Workshop on Machine Learning for Signal Processing

2015

–

[4]

Chen

and

Cardie

, “

Multinomial adversarial networks for multi-domain text classification

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2018

1226

–

1240

[5]

J.-T.

Chien

, “

Deep Bayesian natural language processing

”, in

Proc. of Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2019

–

[6]

J.-T.

Chien

and

C.-W.

Huang

, “

Stochastic adversarial learning for domain adaptation

”, in

Proc. of International Joint Conference on Neural Networks

2020

–

[7]

J.-T.

Chien

and

Y.-H.

Huang

, “

Bayesian transformer using disentangled mask attention

”, in

Proc. of Annual Conference of International Speech Communication Association

2022

1761

–

1765

[8]

J.-T.

Chien

and

J.-C.

Junqua

, “

Unsupervised hierarchical adaptation using reliable selection of cluster-dependent parameters

”,

Speech Communication

(

2000

235

–

253

[9]

J.-T.

Chien

and

Y.-C.

, “

Bayesian recurrent neural network for language modeling

”,

IEEE Transactions on Neural Networks and Learning Systems

(

2015

361

–

374

[10]

J.-T.

Chien

and

Lai

, “

Variational skill embeddings for meta reinforcement learning

”, in

Proc. of International Joint Conference on Neural Networks

2023

–

[11]

J.-T.

Chien

and

W. X.

Lieow

, “

Meta learning for hyperparameter optimization in dialogue system

”,

Proc. of Annual Conference of International Speech Communication Association

2019

839

–

843

[12]

J.-T.

Chien

and

Y.-Y.

Lyu

, “

Partially adversarial learning and adaptation

”, in

Proc. of European Signal Processing Conference

2019

[13]

J.-T.

Chien

M.-Y.

Chen

, and

J.-H.

Xue

, “

Learning meta soft prompt for few-shot language models

”, in

Proc. of Asia Pacific Signal and Information Processing Association Annual Summit and Conference

2023

–

[14]

J.-T.

Chien

H.-T.

Wang

, and

C.-H.

Lee

, “

Contrastive meta learning for soft prompts using dynamic mixup

”, in

Proc. of International Joint Conference on Neural Networks

2024

–

[15]

Devlin

M.-W.

Chang

Lee

, and

Toutanova

, “

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics

2019

4171

–

4186

[16]

Sun

Wang

, and

Liao

, “

Adversarial and domain-aware BERT for cross-domain sentiment analysis

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2020

4019

–

4028

[17]

Finn

Abbeel

, and

Levine

, “

Model-agnostic meta-learning for fast adaptation of deep networks

”, in

Proc. of International Conference on Machine Learning

2017

1126

–

1135

[18]

Gao

Fisch

, and

Chen

, “

Making pre-trained language models better few-shot learners

”, in

Proc. of International Joint Conference on Natural Language Processing

2021

3816

–

3830

[19]

Gururangan

Marasovic

Swayamdipta

Beltagy

Downey

, and

N. A.

Smith

, “

Don’t stop pretraining: adapt language models to domains and tasks

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2020

8342

–

8360

[20]

Han

and

Eisenstein

, “

Unsupervised domain adaptation of contex-tualized embeddings for sequence labeling

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2019

4238

–

4248

[21]

Hou

Dong

Wang

, and

Che

, “

MetaPrompting: learning to learn better prompts

”, in

Proc. of International Conference on Computational Linguistics

2022

3251

–

3262

[22]

Houlsby

Giurgiu

Jastrzebski

Morrone

De Laroussilhe

Gesmundo

Attariyan

, and

Gelly

, “

Parameter-efficient transfer learning for NLP

”, in

Proc. of International Conference on Machine Learning

2019

2790

–

2799

[23]

Huang

Qian

, and

, “

Learning a Better Initialization for Soft Prompts via Meta-Learning

”, in

Proc. of International Joint Conference on Natural Language Processing

2023

–

[24]

Jiang

Zhang

, and

Kwok

, “

Effective structured prompting by meta-learning and representative verbalizer

”, in

Proc. of International Conference on Machine Learning

2023

15186

–

[25]

Karouzos

Paraskevopoulos

, and

Potamianos

, “

UDALM: un-supervised domain adaptation through language modeling

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

2579

–

2590

[26]

Lee

Yoon

Kim

C. H.

, and

Kang

, “

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

”,

Bioinformatics

(

2020

1234

–

1240

[27]

Lester

Al-Rfou

, and

Constant

, “

The power of scale for parameter-efficient prompt tuning

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2021

3045

–

3059

[28]

M.-W.

Mak

, and

J.-T.

Chien

, “

Contrastive adversarial domain adaptation networks for speaker recognition

”,

IEEE Transactions on Neural Networks and Learning Systems

(

2022

2236

–

2245

[29]

Lin

M.-W.

Mak

, and

J.-T.

Chien

, “

Multisource i-vectors domain adaptation using maximum mean discrepancy based autoencoders

”,

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2018

2412

–

2422

[30]

Lio

S.-E.

, and

J.-T.

Chien

, “

Adversarial mask transformer for sequential learning

”, in

Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing

2022

[31]

Liu

Qiu

, and

Huang

, “

Adversarial multi-task learning for text classification

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2017

–

[32]

Liu

Gao

Deng

Duh

, and

Wang

, “

Representation learning using multi-task deep neural networks for semantic classification and information retrieval

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics

2015

912

–

921

[33]

Liu

Zheng

Ding

Qian

Yang

, and

Tang

, “

GPT understands, too

”,

arXiv preprint arXiv:2103.10385

2021

[34]

Loshchilov

and

Hutter

, “

Decoupled weight decay regularization

”, in

Proc. of International Conference on Learning Representations

2019

[35]

van der Maaten

and

Hinton

, “

Visualizing data using t-SNE

”,

Journal of Machine Learning Research

(

2008

2579

–

2605

[36]

Ouyang

Jiang

Almeida

Wainwright

Mishkin

Zhang

Agarwal

Slama

Gray

Schulman

Hilton

Kelton

Miller

Simens

Askell

Welinder

Christiano

Leike

, and

Lowe

, “

Training language models to follow instructions with human feedback

”, in

Advances in Neural Information Processing Systems

2022

[37]

Qin

and

Eisner

, “

Learning how to ask: querying LMs with mixtures of soft prompts

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

5203

–

5212

[38]

Schick

and

Schütze

, “

Exploiting cloze-questions for few-shot text classification and natural language inference

”, in

Proc. of Conference of European Chapter of the Association for Computational Linguistics

2021

255

–

269

[39]

Schick

and

Schütze

, “

It’s not just size that matters: small language models are also few-shot learners

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

2339

–

2352

[40]

J.-C.

Tsai

and

J.-T.

Chien

, “

Adversarial domain separation and adaptation

”, in

Proc. of International Workshop on Machine Learning for Signal Processing

2017

–

[41]

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

A. N.

Gomez

Kaiser

, and

Polosukhin

, “

Attention is all you need

”, in

Advances in Neural Information Processing Systems

2017

5998

–

6008

[42]

Wright

and

Augenstein

, “

Transformer based multi-source domain adaptation

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2020

7963

–

7974

[43]

C.-H.

Yang

Y.-Y.

Tsai

, and

P.-Y.

Chen

, “

Voice2series: reprogramming acoustic models for time series classification

”, in

Proc. of International Conference on Machine Learning

2021

11808

–

[44]

L.-J.

Yang

I.-P.

Yeh

, and

J.-T.

Chien

, “

Low-resource speech synthesis with speaker-aware embedding

”, in

Proc. of International Symposium on Chinese Spoken Language Processing

2022

235

–

239

[45]

Zhang

Huang

, and

, “

Prompt-based meta-learning for few-shot text classification

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2022

1342

–

1357

[46]

Zhang

Chen

Deng

Tan

Huang

, and

Chen

, “

Differentiable prompt makes pre-trained language models better few-shot learners

”, in

Proc. of International Conference on Learning Representations

2022

2024

J.-T. Chien, M.-Y. Chen, C.-h. Lee and J.-H. Xue

Figure 1

Illustration of (a) fine-tuning, (b) hard prompt and (c) soft prompt for sentiment classification where the pre-trained language model is utilized. The tunable parameters are shown by yellow while the frozen parameters are shown by blue. Outputs of these methods could be either class labels or word tokens (shown by green) via verbalizer.

Figure 2

A diagram showing two models for prompt adaptation.

Illustration of optimization-based meta learning which optimizes for a meta representation θ_meta (shown by black) that can be quickly adapted to new tasks ${θ_{1}^{*}, θ_{2}^{*}, θ_{3}^{*}}$ ⁠. The blue arrows indicate the gradients $\nabla_{θ} ℒ_{T_{i}}$ or simply ∇ℒ_i of individual tasks 𝒯i (i = 1, 2, 3) and the red arrows indicate the updates of the corresponding models $θ_{i}^{*}$ ⁠.

Figure 3

A diagram showing a two stage training process with back propagation.

Soft prompt language models for two implementations of unsupervised domain adaptation (UDA) under low-resource settings. (a) Zero-shot UDA is performed by using the learned soft prompt combined with the test data from Ɗt. (b) Few-shot semi-supervised domain adaptation by using few-shot labeled data from Ɗ_s and unlabeled data from Ɗ_T where the learned soft prompt (shown by yellow) is applied. The soft prompt is then adapted (as shown by red) and employed for UDA using test data Ɗ_T. The pre-trained language model is utilized and frozen in these two realizations of UDA.

Figure 4

A diagram illustrating two meta-training tasks.

System overview for meta soft prompting and learning, which are developed for multi-domain adaptation by using the labeled data from source domain T>s where the learning objectives £support and Lq_uery are calculated from class labels by using the masked tokens [MASK] (shown by white). The learning objectives are also accumulated from unlabeled data in target domain T>t for prediction of the randomly masked tokens [MASK] in input sentences. The objectives £_supp_0rt and Lq_uery are calculated and minimized to clone the soft prompt tokens through the error back propagation.

Figure 5

Meta training over different support set (shown by purple) and query set (shown by orange). There are m individual mappings of domain adaptation across domain pairs for sentiment classification.

Algorithm 1:

A table displaying the words “required” in bold text, emphasizing the necessity of the items listed.

Training meta soft prompt over various pairs of source domains Ɗ_S

Figure 6

A diagram showing two scatter plots: one labeled (a) without soft prompt and the other (b) with soft prompt, both displaying data points with eight class labels on x and y axes ranging from 0.0 to 1.0.

Unsupervised domain adaptation for multi-domain language modeling. Domain agnostic soft prompt is learned by using support data Ɗ^sup (shown by purple) and query data Ɗ^qry (shown by orange) from various source domains Ɗ_ss and target domains Ɗ_t where few-shot unlabeled data in target domain Ɗ_T· are enrolled. Calculation flow of UDA and MLM losses is shown. Error backpropagation over losses ℒ_uda and ℒ_mlm in inner and outer optimizations is shown by red dash lines, respectively.

Algorithm 2:

A table displaying various numbers and corresponding words arranged in rows and columns.

Unsupervised domain adaptation over various source Ɗ_S and target domains Ɗ_T

Figure 7

A diagram showing two scatter plots: The top plot is labeled (a) positive and negative reviews with legend labels negative and positive. The bottom plot is labeled (b) reviews in source and target domains with legend labels source and target. Both plots have x and y axes ranging from 0 point 0 to 1 point 0.

Two-dimensional latent visualization for the positive reviews and negative reviews in Amazon review dataset shown by blue and red, respectively, for four individual domains where the results of (a) without meta soft prompt and (b) with meta soft prompt are compared.

Figure 8

A line graph showing accuracy percent on the y axis from 82 to 90 and soft prompt length on the x axis from 2 to 20, with four data series labeled books, electronics, dvd, and kitchen.

Two-dimensional latent visualization for the product reviews of ‘Books’ in target domain in Amazon review dataset by using the learned meta soft prompt. The results of (a) positive reviews (blue) versus negative reviews (red), and (b) the reviews in source domain (orange) versus target domain (green) are compared.

Figure 9

Comparison of classification accuracy for four individual domains in Amazon review dataset where the length of meta soft prompt was varied for evaluation.

Table 1

Statistics of individual domains in Amazon review dataset.

Domain	labeled data size	unlabeled data size	avg. len.
Books	2000	2000	159
DVD	2000	2000	173
Electronics	1998	2000	101
Kitchen	2000	2000	89

Table 2

Statistics of individual domains in FDU-MTL dataset.

Domain	labeled data	size unlabeled dat	a size avg. len.
Books	2000	2000	159
DVD	2000	2000	173
Electronics	1998	2000	101
Kitchen	2000	2000	89
Apparel	2000	2000	57
Camera	2000	2000	130
Health	1998	2000	81
Music	2000	2000	136
Toys	2000	2000	90
Video	2000	2000	156
Baby	1998	2000	104
Magazine	2000	2000	117
Software	2000	2000	129
Sports	2000	2000	94
IMDB	1998	2000	269
MR	2000	2000	21

Domain	labeled data	size unlabeled dat	a size avg. len.
Books	2000	2000	159
DVD	2000	2000	173
Electronics	1998	2000	101
Kitchen	2000	2000	89
Apparel	2000	2000	57
Camera	2000	2000	130
Health	1998	2000	81
Music	2000	2000	136
Toys	2000	2000	90
Video	2000	2000	156
Baby	1998	2000	104
Magazine	2000	2000	117
Software	2000	2000	129
Sports	2000	2000	94
IMDB	1998	2000	269
MR	2000	2000	21

Table 3

Comparison of prompt templates based on hard prompt and soft prompt where input text and masked token are included to construct the prompt sentences. A standard hard prompt for sentiment classification is designed by applying different domains listed in Tables 1 and 2.

	prompt sentence
hard prompt	input text This {domain} is [MASK].
soft prompt	input text v₁,v₂,v₃ [MASK].

Table 4

Top five predicted words based on the probability for the masked token where text representations using hard prompt and meta soft prompt are compared. The examples of product reviews of the domains (a) Music and (b) Magazine in FDU-MTL dataset are shown. Red shows that the predicted word is seen as an irrelevant or wrong sentiment for the given product review. Blue shows that the predicted word is viewed as a correct sentiment.

input text: This album contains only rap and no rock songs. This was very disappointing to say the least. → negative review
hard prompt	meta soft prompt
1. positive	1. bad
2. worth	2. unacceptable
3. negative	3. disappointing
4. lacking	4. terrible
5. good	5. wrong

input text: This album contains only rap and no rock songs. This was very disappointing to say the least. → negative review
hard prompt	meta soft prompt
1. positive	1. bad
2. worth	2. unacceptable
3. negative	3. disappointing
4. lacking	4. terrible
5. good	5. wrong

input text: I still have not received this magazine, what is taking so long! — negative review
hard prompt	meta soft prompt
1. interesting	1. terrible
2. good	2. difficult
3. great	3. unacceptable
4. excellent	4. frightening
5. boring	5. complicated

input text: I still have not received this magazine, what is taking so long! — negative review
hard prompt	meta soft prompt
1. interesting	1. terrible
2. good	2. difficult
3. great	3. unacceptable
4. excellent	4. frightening
5. boring	5. complicated

Table 5

Comparison of classification accuracy (%) on various domains in Amazon review dataset by using fine-tuning (FT), soft prompt (SP) [27] and the proposed meta soft prompt (MSP) where the length of soft prompt is set to be 2 (denoted by SP^† and MSP^†) and 5 (denoted by SP^‡ and MSP^‡). The highest number among different methods is shown by bold.

Domain	FT	SP^†	MSP^†	SP^‡	MSP^‡
Books	80.8	85.2	83.6	86.8	88.0
DVD	79.3	83.2	83.4	84.4	85.6
Electronics	79.4	82.1	82.6	84.8	85.1
Kitchen	79.5	84.5	86.2	86.1	87.6

Domain	FT	SP^†	MSP^†	SP^‡	MSP^‡
Books	80.8	85.2	83.6	86.8	88.0
DVD	79.3	83.2	83.4	84.4	85.6
Electronics	79.4	82.1	82.6	84.8	85.1
Kitchen	79.5	84.5	86.2	86.1	87.6

Table 6

Comparison of classification accuracy (%) on 16 domains in FDU-MTL dataset where the previous methods (MT-DNN, ASP-MTL, MAN-L2, MAN-NLL, FT) and the proposed MSP are evaluated. Length of soft prompt is set as 5.

Domain	MT-DNN	ASP-MTL	MAN-L2	MAN-NLL	FT	MSP
Books	82.2	84.0	87.6	86.8	87.0	89.0
DVD	84.2	85.5	88.1	88.6	85.6	88.1
Electronics	81.7	86.8	87.4	88.8	88.3	90.3
Kitchen	80.7	86.2	89.8	89.9	91.0	90.7
Apparel	85.0	87.0	87.6	87.6	90.0	92.0
Camera	86.2	89.2	91.4	90.7	90.0	90.8
Health	85.7	88.2	89.8	89.4	88.3	91.3
Music	84.7	82.5	85.9	85.5	86.8	87.8
Toys	87.7	88.0	90.0	90.4	90.3	90.8
Video	85.0	84.5	89.5	89.6	88.0	88.4
Baby	88.0	88.2	90.0	90.2	91.5	91.3
Magazine	89.5	92.2	92.5	92.9	89.8	90.2
Software	85.7	87.2	90.4	90.9	89.3	90.9
Sports	83.2	85.7	89.0	89.0	90.8	91.8
IMDB	83.2	85.5	86.6	87.0	85.8	88.3
MR	75.5	76.7	76.1	76.7	74.0	80.4
AVG	84.3	86.1	88.2	88.4	87.9	89.5

Domain	MT-DNN	ASP-MTL	MAN-L2	MAN-NLL	FT	MSP
Books	82.2	84.0	87.6	86.8	87.0	89.0
DVD	84.2	85.5	88.1	88.6	85.6	88.1
Electronics	81.7	86.8	87.4	88.8	88.3	90.3
Kitchen	80.7	86.2	89.8	89.9	91.0	90.7
Apparel	85.0	87.0	87.6	87.6	90.0	92.0
Camera	86.2	89.2	91.4	90.7	90.0	90.8
Health	85.7	88.2	89.8	89.4	88.3	91.3
Music	84.7	82.5	85.9	85.5	86.8	87.8
Toys	87.7	88.0	90.0	90.4	90.3	90.8
Video	85.0	84.5	89.5	89.6	88.0	88.4
Baby	88.0	88.2	90.0	90.2	91.5	91.3
Magazine	89.5	92.2	92.5	92.9	89.8	90.2
Software	85.7	87.2	90.4	90.9	89.3	90.9
Sports	83.2	85.7	89.0	89.0	90.8	91.8
IMDB	83.2	85.5	86.6	87.0	85.8	88.3
MR	75.5	76.7	76.1	76.7	74.0	80.4
AVG	84.3	86.1	88.2	88.4	87.9	89.5

Table 7

Comparison of classification accuracy (%) and number of trainable parameters (N) for the methods without and with additional few-shot unsupervised domain adaptation (UDA) where number of shots is set as 4 and 8. Length of soft prompt is fixed as 10. MoE-Tr [42] is included in the comparison. Notably, MoE-Tr requires a large-scaled trainable model and needs multiple PLMs. MSP is parameter efficient and involves only one PLM. Various domains in Amazon review dataset are evaluated.

Domain	MoE-Tr	SP	MSP	SP (4)	MSP (4)	MSP (8)
Books	90.0	87.5	88.6	87.9	88.7	89.0
DVD	89.3	86.2	86.9	87.0	88.5	88.1
Electronics	90.6	87.4	88.4	87.9	89.2	90.3
Kitchen	90.8	88.5	89.8	89.2	90.5	90.7
UDA	yes	no	no	yes	yes	yes
N	264M	7.68K	7.68K	7.68K	7.68K	7.68K

Domain	MoE-Tr	SP	MSP	SP (4)	MSP (4)	MSP (8)
Books	90.0	87.5	88.6	87.9	88.7	89.0
DVD	89.3	86.2	86.9	87.0	88.5	88.1
Electronics	90.6	87.4	88.4	87.9	89.2	90.3
Kitchen	90.8	88.5	89.8	89.2	90.5	90.7
UDA	yes	no	no	yes	yes	yes
N	264M	7.68K	7.68K	7.68K	7.68K	7.68K

Table 8

Comparison of classification accuracy (%) for the methods without and with additional few-shot unsupervised domain adaptation (UDA) where number of shots is set as 4 and 8. Length of soft prompt is fixed as 10. The randomly-selected domains in FDU-MTL dataset are evaluated.

Domain	FT	SP	MSP	SP (4)	MSP (4)	SP (8)	MSP (8)
Health	88.3	90.5	91.3	90.8	91.8	90.7	92.2
Music	86.8	86.2	87.8	86.0	88.4	86.4	89.2
Toys	90.3	90.0	90.8	90.7	91.9	90.5	91.8
Magazine	89.8	88.9	90.2	88.7	91.0	88.9	92.1
UDA	no	no	no	yes	yes	yes	yes

Domain	FT	SP	MSP	SP (4)	MSP (4)	SP (8)	MSP (8)
Health	88.3	90.5	91.3	90.8	91.8	90.7	92.2
Music	86.8	86.2	87.8	86.0	88.4	86.4	89.2
Toys	90.3	90.0	90.8	90.7	91.9	90.5	91.8
Magazine	89.8	88.9	90.2	88.7	91.0	88.9	92.1
UDA	no	no	no	yes	yes	yes	yes

[1]

Blitzer

Dredze

, and

Pereira

, “

Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification

”, in

Proc. of Annual Meeting of Association of Computational Linguistics

2007

440

–

447

[2]

Brown

Mann

Ryder

Subbiah

J. D.

Kaplan

Dhariwal

Neelakantan

Shyam

Sastry

Askell

, et al., “

Language models are few-shot learners

”,

Advances in Neural Information Processing Systems

2020

1877

–

1901

[3]

H.-Y.

Chen

and

J.-T.

Chien

, “

Deep semi-supervised learning for domain adaptation

”, in

Proc. of International Workshop on Machine Learning for Signal Processing

2015

–

[4]

Chen

and

Cardie

, “

Multinomial adversarial networks for multi-domain text classification

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2018

1226

–

1240

[5]

J.-T.

Chien

, “

Deep Bayesian natural language processing

”, in

Proc. of Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts

2019

–

[6]

J.-T.

Chien

and

C.-W.

Huang

, “

Stochastic adversarial learning for domain adaptation

”, in

Proc. of International Joint Conference on Neural Networks

2020

–

[7]

J.-T.

Chien

and

Y.-H.

Huang

, “

Bayesian transformer using disentangled mask attention

”, in

Proc. of Annual Conference of International Speech Communication Association

2022

1761

–

1765

[8]

J.-T.

Chien

and

J.-C.

Junqua

, “

Unsupervised hierarchical adaptation using reliable selection of cluster-dependent parameters

”,

Speech Communication

(

2000

235

–

253

[9]

J.-T.

Chien

and

Y.-C.

, “

Bayesian recurrent neural network for language modeling

”,

IEEE Transactions on Neural Networks and Learning Systems

(

2015

361

–

374

[10]

J.-T.

Chien

and

Lai

, “

Variational skill embeddings for meta reinforcement learning

”, in

Proc. of International Joint Conference on Neural Networks

2023

–

[11]

J.-T.

Chien

and

W. X.

Lieow

, “

Meta learning for hyperparameter optimization in dialogue system

”,

Proc. of Annual Conference of International Speech Communication Association

2019

839

–

843

[12]

J.-T.

Chien

and

Y.-Y.

Lyu

, “

Partially adversarial learning and adaptation

”, in

Proc. of European Signal Processing Conference

2019

[13]

J.-T.

Chien

M.-Y.

Chen

, and

J.-H.

Xue

, “

Learning meta soft prompt for few-shot language models

”, in

Proc. of Asia Pacific Signal and Information Processing Association Annual Summit and Conference

2023

–

[14]

J.-T.

Chien

H.-T.

Wang

, and

C.-H.

Lee

, “

Contrastive meta learning for soft prompts using dynamic mixup

”, in

Proc. of International Joint Conference on Neural Networks

2024

–

[15]

Devlin

M.-W.

Chang

Lee

, and

Toutanova

, “

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics

2019

4171

–

4186

[16]

Sun

Wang

, and

Liao

, “

Adversarial and domain-aware BERT for cross-domain sentiment analysis

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2020

4019

–

4028

[17]

Finn

Abbeel

, and

Levine

, “

Model-agnostic meta-learning for fast adaptation of deep networks

”, in

Proc. of International Conference on Machine Learning

2017

1126

–

1135

[18]

Gao

Fisch

, and

Chen

, “

Making pre-trained language models better few-shot learners

”, in

Proc. of International Joint Conference on Natural Language Processing

2021

3816

–

3830

[19]

Gururangan

Marasovic

Swayamdipta

Beltagy

Downey

, and

N. A.

Smith

, “

Don’t stop pretraining: adapt language models to domains and tasks

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2020

8342

–

8360

[20]

Han

and

Eisenstein

, “

Unsupervised domain adaptation of contex-tualized embeddings for sequence labeling

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2019

4238

–

4248

[21]

Hou

Dong

Wang

, and

Che

, “

MetaPrompting: learning to learn better prompts

”, in

Proc. of International Conference on Computational Linguistics

2022

3251

–

3262

[22]

Houlsby

Giurgiu

Jastrzebski

Morrone

De Laroussilhe

Gesmundo

Attariyan

, and

Gelly

, “

Parameter-efficient transfer learning for NLP

”, in

Proc. of International Conference on Machine Learning

2019

2790

–

2799

[23]

Huang

Qian

, and

, “

Learning a Better Initialization for Soft Prompts via Meta-Learning

”, in

Proc. of International Joint Conference on Natural Language Processing

2023

–

[24]

Jiang

Zhang

, and

Kwok

, “

Effective structured prompting by meta-learning and representative verbalizer

”, in

Proc. of International Conference on Machine Learning

2023

15186

–

[25]

Karouzos

Paraskevopoulos

, and

Potamianos

, “

UDALM: un-supervised domain adaptation through language modeling

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

2579

–

2590

[26]

Lee

Yoon

Kim

C. H.

, and

Kang

, “

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

”,

Bioinformatics

(

2020

1234

–

1240

[27]

Lester

Al-Rfou

, and

Constant

, “

The power of scale for parameter-efficient prompt tuning

”, in

Proc. of Conference on Empirical Methods in Natural Language Processing

2021

3045

–

3059

[28]

M.-W.

Mak

, and

J.-T.

Chien

, “

Contrastive adversarial domain adaptation networks for speaker recognition

”,

IEEE Transactions on Neural Networks and Learning Systems

(

2022

2236

–

2245

[29]

Lin

M.-W.

Mak

, and

J.-T.

Chien

, “

Multisource i-vectors domain adaptation using maximum mean discrepancy based autoencoders

”,

IEEE/ACM Transactions on Audio, Speech, and Language Processing

(

2018

2412

–

2422

[30]

Lio

S.-E.

, and

J.-T.

Chien

, “

Adversarial mask transformer for sequential learning

”, in

Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing

2022

[31]

Liu

Qiu

, and

Huang

, “

Adversarial multi-task learning for text classification

”, in

Proc. of Annual Meeting of Association for Computational Linguistics

2017

–

[32]

Liu

Gao

Deng

Duh

, and

Wang

, “

Representation learning using multi-task deep neural networks for semantic classification and information retrieval

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics

2015

912

–

921

[33]

Liu

Zheng

Ding

Qian

Yang

, and

Tang

, “

GPT understands, too

”,

arXiv preprint arXiv:2103.10385

2021

[34]

Loshchilov

and

Hutter

, “

Decoupled weight decay regularization

”, in

Proc. of International Conference on Learning Representations

2019

[35]

van der Maaten

and

Hinton

, “

Visualizing data using t-SNE

”,

Journal of Machine Learning Research

(

2008

2579

–

2605

[36]

Ouyang

Jiang

Almeida

Wainwright

Mishkin

Zhang

Agarwal

Slama

Gray

Schulman

Hilton

Kelton

Miller

Simens

Askell

Welinder

Christiano

Leike

, and

Lowe

, “

Training language models to follow instructions with human feedback

”, in

Advances in Neural Information Processing Systems

2022

[37]

Qin

and

Eisner

, “

Learning how to ask: querying LMs with mixtures of soft prompts

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

5203

–

5212

[38]

Schick

and

Schütze

, “

Exploiting cloze-questions for few-shot text classification and natural language inference

”, in

Proc. of Conference of European Chapter of the Association for Computational Linguistics

2021

255

–

269

[39]

Schick

and

Schütze

, “

It’s not just size that matters: small language models are also few-shot learners

”, in

Proc. of Conference of North American Chapter of Association for Computational Linguistics: Human Language Technologies

2021

2339

–

2352

[40]

J.-C.

Tsai

and

J.-T.

Chien

, “

Adversarial domain separation and adaptation

”, in

Proc. of International Workshop on Machine Learning for Signal Processing

2017

–

[41]

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

A. N.

Gomez

Kaiser

, and

Polosukhin

, “

Attention is all you need

”, in

Advances in Neural Information Processing Systems

2017

5998

–

6008