4.4 Article

Transformer-Based Deep Neural Language Modeling for Construct-Specific Automatic Item Generation

Journal

PSYCHOMETRIKA
Volume 87, Issue 2, Pages 749-772

Publisher

SPRINGER
DOI: 10.1007/s11336-021-09823-9

Keywords

automatic item generation; natural language processing; deep learning; neural networks; language modeling

Funding

  1. Projekt DEAL

Ask authors/readers for more resources

Algorithmic automatic item generation has been used to obtain cognitive items in knowledge and aptitude testing, with recent progress made in creating items for non-cognitive constructs using recurrent neural networks. The study proposes fine-tuning pre-trained causal transformer models for construct-specific item generation and compares the validity of human- and machine-authored items using empirical data. Results show that deep neural networks can effectively generate non-cognitive items with good psychometric properties.
Algorithmic automatic item generation can be used to obtain large quantities of cognitive items in the domains of knowledge and aptitude testing. However, conventional item models used by template-based automatic item generation techniques are not ideal for the creation of items for non-cognitive constructs. Progress in this area has been made recently by employing long short-term memory recurrent neural networks to produce word sequences that syntactically resemble items typically found in personality questionnaires. To date, such items have been produced unconditionally, without the possibility of selectively targeting personality domains. In this article, we offer a brief synopsis on past developments in natural language processing and explain why the automatic generation of construct-specific items has become attainable only due to recent technological progress. We propose that pre-trained causal transformer models can be fine-tuned to achieve this task using implicit parameterization in conjunction with conditional generation. We demonstrate this method in a tutorial-like fashion and finally compare aspects of validity in human- and machine-authored items using empirical data. Our study finds that approximately two-thirds of the automatically generated items show good psychometric properties (factor loadings above .40) and that one-third even have properties equivalent to established and highly curated human-authored items. Our work thus demonstrates the practical use of deep neural networks for non-cognitive automatic item generation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available