Math-word embedding in math search and semantic extraction

Greiner-Petter, André; Youssef, Abdou; Ruas, Terry; Miller, Bruce R.; Schubotz, Moritz; Aizawa, Akiko; Gipp, Bela

Math-word embedding in math search and semantic extraction

André Greiner-Petter (), Abdou Youssef (), Terry Ruas (), Bruce R. Miller (), Moritz Schubotz (), Akiko Aizawa () and Bela Gipp ()
Additional contact information
André Greiner-Petter: University of Wuppertal
Abdou Youssef: The George Washington University
Terry Ruas: University of Wuppertal
Bruce R. Miller: NIST
Moritz Schubotz: University of Wuppertal
Akiko Aizawa: National Institute of Informatics
Bela Gipp: University of Wuppertal

Scientometrics, 2020, vol. 125, issue 3, No 51, 3017-3046

Abstract: Abstract Word embedding, which represents individual words with semantically fixed-length vectors, has made it possible to successfully apply deep learning to natural language processing tasks such as semantic role-modeling, question answering, and machine translation. As math text consists of natural text, as well as math expressions that similarly exhibit linear correlation and contextual characteristics, word embedding techniques can also be applied to math documents. However, while mathematics is a precise and accurate science, it is usually expressed through imprecise and less accurate descriptions, contributing to the relative dearth of machine learning applications for information retrieval in this domain. Generally, mathematical documents communicate their knowledge with an ambiguous, context-dependent, and non-formal language. Given recent advances in word embedding, it is worthwhile to explore their use and effectiveness in math information retrieval tasks, such as math language processing and semantic knowledge extraction. In this paper, we explore math embedding by testing it on several different scenarios, namely, (1) math-term similarity, (2) analogy, (3) numerical concept-modeling based on the centroid of the keywords that characterize a concept, (4) math search using query expansions, and (5) semantic extraction, i.e., extracting descriptive phrases for math expressions. Due to the lack of benchmarks, our investigations were performed using the arXiv collection of STEM documents and carefully selected illustrations on the Digital Library of Mathematical Functions (DLMF: NIST digital library of mathematical functions. Release 1.0.20 of 2018-09-1, 2018). Our results show that math embedding holds much promise for similarity, analogy, and search tasks. However, we also observed the need for more robust math embedding approaches. Moreover, we explore and discuss fundamental issues that we believe thwart the progress in mathematical information retrieval in the direction of machine learning.

Keywords: Mathematical information retrieval; Math search; Semantic extraction; Machine learning; Word embedding; Math embedding (search for similar items in EconPapers)
Date: 2020
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
http://link.springer.com/10.1007/s11192-020-03502-9 Abstract (text/html)
Access to the full text of the articles in this series is restricted.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:scient:v:125:y:2020:i:3:d:10.1007_s11192-020-03502-9

Ordering information: This journal article can be ordered from
http://www.springer.com/economics/journal/11192

DOI: 10.1007/s11192-020-03502-9

Access Statistics for this article

Scientometrics is currently edited by Wolfgang Glänzel

More articles in Scientometrics from Springer, Akadémiai Kiadó
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().