Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Wang, Zhouhao; Liu, Enda; Sakaji, Hiroki; Ito, Tomoki; Izumi, Kiyoshi; Tsubouchi, Kota; Yamashita, Tatsuo

Estimation of Cross-Lingual News Similarities Using Text-Mining Methods

Zhouhao Wang, Enda Liu, Hiroki Sakaji, Tomoki Ito, Kiyoshi Izumi, Kota Tsubouchi and Tatsuo Yamashita
Additional contact information
Zhouhao Wang: Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
Enda Liu: Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
Hiroki Sakaji: Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
Tomoki Ito: Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
Kiyoshi Izumi: Izumi lab, Department of System Innovation, Graduate School of Engineering, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan
Kota Tsubouchi: Yahoo! Japan Research, Kioicho 1-3, Chiyoda-ku, Tokyo 102-8282, Japan
Tatsuo Yamashita: Yahoo! Japan Research, Kioicho 1-3, Chiyoda-ku, Tokyo 102-8282, Japan

JRFM, 2018, vol. 11, issue 1, 1-13

Abstract: In this research, two estimation algorithms for extracting cross-lingual news pairs based on machine learning from financial news articles have been proposed. Every second, innumerable text data, including all kinds news, reports, messages, reviews, comments, and tweets are generated on the Internet, and these are written not only in English but also in other languages such as Chinese, Japanese, French, etc. By taking advantage of multi-lingual text resources provided by Thomson Reuters News, we developed two estimation algorithms for extracting cross-lingual news pairs from multilingual text resources. In our first method, we propose a novel structure that uses the word information and the machine learning method effectively in this task. Simultaneously, we developed a bidirectional Long Short-Term Memory (LSTM) based method to calculate cross-lingual semantic text similarity for long text and short text, respectively. Thus, when an important news article is published, users can read similar news articles that are written in their native language using our method.

Keywords: text similarity; text mining; machine learning; SVM; neural network; LSTM (search for similar items in EconPapers)
JEL-codes: C E F2 F3 G (search for similar items in EconPapers)
Date: 2018
References: View complete reference list from CitEc
Citations: View citations in EconPapers (2)

Downloads: (external link)
https://www.mdpi.com/1911-8074/11/1/8/pdf (application/pdf)
https://www.mdpi.com/1911-8074/11/1/8/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jjrfmx:v:11:y:2018:i:1:p:8-:d:129624

Access Statistics for this article

JRFM is currently edited by Ms. Chelthy Cheng

More articles in JRFM from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().