Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor

Authors

  • Lala Septem Riza Universitas Pendidikan Indonesia Author
  • Anita Dyah Pertiwi Universitas Pendidikan Indonesia Author
  • Eka Fitrajaya Rahman Universitas Pendidikan Indonesia Author
  • Munir Munir Universitas Pendidikan Indonesia Author
  • Cep Ubad Abdullah Universitas Pendidikan Indonesia Author

Keywords:

TOEFL, Automatic question generation, Natural Language Processing, Machine Learning, K-Nearest Neighbor, Education, Learning

Abstract

Test of English as a Foreign Language (TOEFL) is one of learning evaluation forms that requires excellent quality of questions. Preparing TOEFL questions using a conventional way certainly spends a lot of time. Computer technology can be used to solve the problem. Therefore, this research was conducted in order to solve the problem of making TOEFL questions with sentence completion type. The built system consists of several stages: (1) input data collection from foreign media news sites with excellent English grammar quality ; (2) preprocessing with Natural Language Processing (NLP); (3) Part of Speech (POS) tagging; (4) question feature extraction; (5) separation and selection of news sentences; (6) determination and value collection of seven features; (7) conversion of categorical data value (8) target classification of blank position word with K-Nearest Neighbor (KNN); (9) heuristic determination of rules from human experts ; and (10 ) options selection or distraction based on heuristic rules. After conducting the experiment on 10 news, it is obtained that 20 questions based on the results of the evaluation showed that the generated questions had very good quality with percentage of 81.93% (after the assessment by the human expert), and 70%w the same blank position from the historical data of TOEFL questions. So, it can be concluded that the generated question has the following characteristics: the quality of the result follows the data training from the historical TOEFL questions, and the quality of the distraction is very good because it is derived from the heuristics of human experts.

Downloads

Download data is not yet available.

Author Biographies

  • Lala Septem Riza, Universitas Pendidikan Indonesia

    Department of Computer Science Education

  • Anita Dyah Pertiwi, Universitas Pendidikan Indonesia

    Department of Computer Science Education

  • Eka Fitrajaya Rahman, Universitas Pendidikan Indonesia

    Department of Computer Science Education

  • Munir Munir, Universitas Pendidikan Indonesia

     Department of Computer Science Education

  • Cep Ubad Abdullah, Universitas Pendidikan Indonesia

    Fakultas Pendidikan Ilmu Pengetahuan Sosial

References

Agarwal, M., Shah, R., and Mannem, P. (2011). Automatic question generation using discourse cues. In Proceedings ofthe6thWorkshop on Innovative Use of NLP for Building Educational Applications (pp. 1-9). Association for Computational Linguistics.

Alderson, J. C., and Hamp-Lyons, L. (1996). TOEFL preparation courses: A study of washback. Language Testing, 13(3), 280-297.

Aldabe, I., De Lacalle M. L., Maritxalar, M., Martinez, E., and Uria, L. (2006). Arikiturri: an automatic question generator based on corpora and nlp techniques. In International Conference on Intelligent Tutoring Systems (pp. 584-594). Springer, Berlin, Heidelberg.

Aquino, J. F., Chua, D. D., Kabiling, R. K., Pingco, J. N., and Sagum, R. (2011). Text 2Test: Question generator utilizing information abstraction techniques and question generation methods for narrative and declarative text. In Proceedings of the 8thNational Natural Language Processing Research Symposium (pp. 29-34). A

raki, J., Rajagopal, D., Sankaranarayanan, S.,Holm, S., Yamakawa, Y., and Mitamura, T. (2016). Generating questions and multiple -choice answers using semantic analysis of texts. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers (pp. 1125-1136).

Cen, G., Dong, Y., Gao, W., Yu, L., See, S., Wang, Q., and Jiang, H. (2010). A implementation of an automatic examination paper generation system. Mathematical and Computer Modelling, 51(11-12), 1339-1342.

Chen, C. Y., Liou, H. C., and Chang, J. S. (2006). Fast: an automatic generation system for grammar tests. In Proceedings of the COLING /ACL on Interactive presentation sessions (pp. 1-4). Association for Computational Linguistics.

Chesla, E. (2002). TOEFL Exam success from Learning Express. New York: Learning Express.

Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information ScienceandTechnology, 37(1), 51-89.

Davy, E., and Davy, K. (2006). Peterson’s Master TOEFL Vocabulary. USA: Petersons Co.

ETS, TOEFL Practice TESTS Volume 1, Princeton, 2003.

Goto, T., Kojiri, T., Watanabe, T., Iwata, T., and Yamada, T. (2010). Automatic generation system of multiple -choice cloze questions and its evaluation. Knowledge Management and E-Learning, 2(3), 210.

Hill, J., and Simha, R. (2016). Automatic Generation of Context-based Fill-in-the-blank Exercises using Co -occurrence Likelihoods and Google n-grams. In Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications (pp. 23-30).

Hoshino, A., and Nakagawa, H. (2005). A real -time multiple -choice question generation for language testing: a preliminary study. In Proceedings of the second workshop on Building Educational Applications Using NLP (pp. 17-20). Association for Computational Linguistics.

Huang, Y., and He, L. (2016). Automatic generation of short answer questions for reading comprehension assessment. Natural Language Engineering, 22(3),457-489.

Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., and McClosky, D. (2014). The Stanford Core NLP natural language processing toolkit. In Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations (pp. 55-60).

Majumder, M., and Saha, S. K. (2015). A system for generating multiple choice questions: With a novel approach for sentence selection. In Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (pp. 64-72).

Marcus, M. P., Marcinkiewicz, M. A., and Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313-330.

Nilsson, N. J. (1998). Introduction to Machine Learning. California, Amerika.

Pannu, S., Krishna, A., Kumari, S., Patra, R., and Saha, S. K. (2018). Automatic Generation of Fill in- the -Blank Questions from History Books for School -Level Evaluation. In Progress in Computing, Analytics and Networking (pp. 461-469). Springer, Singapore.

Papasalouros, A., Kanaris, K., and Kotis, K. (2008). Automatic Generation of Multiple-Choice Questions from Domain Ontologies. In e-learning, 427-434.

Pardiyono, (2005). TOEFL Practical Strategy for The Best Scores. Yogyakarta: ANDI.

Phillips, D. (2001). Longman Complete Course for the TOEFL Test: Preparation for the Computer and Paper Tests. New York: Pearson Education.

Riyanto, S. (2011a). Easy TOEIC: Test of English for International Communication. Yogyakarta: Pustaka Pelajar.

Riyanto, S. (2011b). Easy TOEFL. Yogyakarta: Pustaka Pelajar.

Stufflebeam, D. L. (1971). The use of experimental design in educational evaluation. Journal of Educational Measurement, 8(4), 267-274.

Susanti, Y., Iida, R., and Tokunaga, T. (2015). Automatic generation of english vocabulary tests. In CSEDU (pp. 77-87).

Downloads

Published

2024-01-23

How to Cite

Question Generator System of Sentence Completion in TOEFL Using NLP and K-Nearest Neighbor. (2024). Indonesian Journal of Science and Technology, 4(2), 294-311. https://ejournal.kjpupi.id/index.php/ijost/article/view/179