Short Answer Detection for Open Questions: A Sequence Labeling Approach with Deep Learning Models

González-López, Samuel; Montes-Rosales, Zeltzyn Guadalupe; López-Monroy, Adrián Pastor; López-López, Aurelio; García-Gorrostieta, Jesús Miguel

Short Answer Detection for Open Questions: A Sequence Labeling Approach with Deep Learning Models

Samuel González-López, Zeltzyn Guadalupe Montes-Rosales, Adrián Pastor López-Monroy, Aurelio López-López and Jesús Miguel García-Gorrostieta
Additional contact information
Samuel González-López: Department of Computer Science, Universidad Tecnológica de Nogales, Nogales 84094, Mexico
Zeltzyn Guadalupe Montes-Rosales: Department of Computer Science, Mathematics Research Center (CIMAT), Jalisco s/n, Valenciana, Guanajuato 36023, Mexico
Adrián Pastor López-Monroy: Department of Computer Science, Mathematics Research Center (CIMAT), Jalisco s/n, Valenciana, Guanajuato 36023, Mexico
Aurelio López-López: Computational Sciences Department, Instituto Nacional de Astrofísica, Óptica y Electrónica, Sta. María Tonantzintla, Puebla 72840, México
Jesús Miguel García-Gorrostieta: Department of Computer Science, Universidad de la Sierra, Moctezuma 84560, Mexico

Mathematics, 2022, vol. 10, issue 13, 1-13

Abstract: Evaluating the response to open questions is a complex process since it requires prior knowledge of a specific topic and language. The computational challenge is to analyze the text by learning from a set of correct examples to train a model and then predict unseen cases. Thus, we will be able to capture patterns that characterize answers to open questions. In this work, we used a sequence labeling and deep learning approach to detect if a text segment corresponds to the answer to an open question. We focused our efforts on analyzing the general objective of a thesis according to three methodological questions: Q1: What will be done? Q2: Why is it going to be done? Q3: How is it going to be done? First, we use the Beginning-Inside-Outside (BIO) format to label a corpus of targets with the help of two annotators. Subsequently, we adapted four state-of-the-art architectures to analyze the objective: Bidirectional Encoder Representations from Transformers (BERT-BETO) for Spanish, Code Switching Embeddings from Language Model (CS-ELMo), Multitask Neural Network (MTNN), and Bidirectional Long Short-Term Memory (Bi-LSTM). The results of the F-measure for detection of the answers to the three questions indicate that the BERT-BETO and CS-ELMo architecture obtained the best effectivity. The architecture that obtained the best results was BERT-BETO. BERT was the architecture that obtained more accurate results. The result of a detection analysis for Q1, Q2 and Q3 on a non-annotated corpus at the graduate and undergraduate levels is also reported. We found that for detecting the three questions, only the doctoral academic level reached 100%; that is, the doctoral objectives did contain the answer to the three questions.

Keywords: question answering; open questions; academic document analysis; sequence labeling; deep learning (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2022
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/10/13/2259/pdf (application/pdf)
https://www.mdpi.com/2227-7390/10/13/2259/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:10:y:2022:i:13:p:2259-:d:849522

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().