4.5 Article

Automated Scoring of Chinese Grades 7-9 Students' Competence in Interpreting and Arguing from Evidence

期刊

JOURNAL OF SCIENCE EDUCATION AND TECHNOLOGY
卷 30, 期 2, 页码 269-282

出版社

SPRINGER
DOI: 10.1007/s10956-020-09859-z

关键词

Automated scoring; Scientific argumentation; Chinese writing; LightSIDE

资金

  1. International Joint Research Project of Faculty of Education, Beijing Normal University
  2. China Scholarship Council (CSC) [201806040088]

向作者/读者索取更多资源

The study found that at least 800 human-scored student responses were needed as the training sample size for accurately building scoring models in automated scoring of Chinese written responses. There was nearly perfect agreement between human scoring and computer-automated scoring for both holistic and analytic scores.
Assessing scientific argumentation is one of main challenges in science education. Constructed-response (CR) items can be used to measure the coherence of student ideas and inform science instruction on argumentation. Published research on automated scoring of CR items has been conducted mostly in English writing, rarely in other languages. The objective of this study is to investigate issues related to the automated scoring of Chinese written responses. LightSIDE was used to score students' written responses in Chinese. The sample of this study was from Beijing (grades 7-9) consisting of 4000 students. Items for assessing interpreting data and making claims under an ecological topic developed by the Stanford NGSS Assessment Project were translated into Chinese and used to assess student competence of interpreting data and making claims. The results show that: (1) at least 800 human-scored student responses were needed as the training sample size to accurately build scoring models. When doubling the training sample size, the accuracy in kappa increased only slightly by 0.03-0.04; (2) there was a nearly perfect agreement between human scoring and computer-automated scoring based on both holistic scores and analytic scores, although analytic scores produced slightly better accuracy than holistic scores; (3) automated scoring accuracy did not differ substantially by student response length, although shorter text length produced slightly higher human-machine agreement. The above findings suggest that automated scoring of Chinese writings produced a similar level of accuracy compared with that of English writings reported in literature, although there are specific considerations, e.g., training data set size, scoring rubric, and text lengths, to be considered using automated scoring of student written responses in Chinese.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据