Looking for Razors and Needles in a Haystack: Multifaceted Analysis of Suicidal Declarations on Social Media—A Pragmalinguistic Approach
Michal Ptaszynski,
Monika Zasko-Zielinska,
Michal Marcinczuk,
Gniewosz Leliwa,
Marcin Fortuna,
Kamil Soliwoda,
Ida Dziublewska,
Olimpia Hubert,
Pawel Skrzek,
Jan Piesiewicz,
Paula Karbowska,
Maria Dowgiallo,
Juuso Eronen,
Patrycja Tempska,
Maciej Brochocki,
Marek Godny and
Michal Wroczynski
Additional contact information
Michal Ptaszynski: Department of Computer Science, Kitami Institute of Technology, Kitami 090-8507, Japan
Monika Zasko-Zielinska: Department of Contemporary Polish Language, Faculty of Philology, University of Wrocław, 50-140 Wrocław, Poland
Michal Marcinczuk: Samurai Labs, 81-824 Sopot, Poland
Gniewosz Leliwa: Samurai Labs, 81-824 Sopot, Poland
Marcin Fortuna: Samurai Labs, 81-824 Sopot, Poland
Kamil Soliwoda: Samurai Labs, 81-824 Sopot, Poland
Ida Dziublewska: Samurai Labs, 81-824 Sopot, Poland
Olimpia Hubert: Samurai Labs, 81-824 Sopot, Poland
Pawel Skrzek: Samurai Labs, 81-824 Sopot, Poland
Jan Piesiewicz: Samurai Labs, 81-824 Sopot, Poland
Paula Karbowska: Samurai Labs, 81-824 Sopot, Poland
Maria Dowgiallo: Samurai Labs, 81-824 Sopot, Poland
Juuso Eronen: Department of Computer Science, Kitami Institute of Technology, Kitami 090-8507, Japan
Patrycja Tempska: Samurai Labs, 81-824 Sopot, Poland
Maciej Brochocki: Samurai Labs, 81-824 Sopot, Poland
Marek Godny: Samurai Labs, 81-824 Sopot, Poland
Michal Wroczynski: Samurai Labs, 81-824 Sopot, Poland
IJERPH, 2021, vol. 18, issue 22, 1-49
Abstract:
In this paper, we study language used by suicidal users on Reddit social media platform. To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme. Next, we perform a multifaceted analysis of the dataset, including: (1) the analysis of user activity before and after posting a suicidal message, and (2) a pragmalinguistic study on the vocabulary used by suicidal users. In the second part of the analysis, we apply LIWC, a dictionary-based toolset widely used in psychology and linguistic research, which provides a wide range of linguistic category annotations on text. However, since raw LIWC scores are not sufficiently reliable, or informative, we propose a procedure to decrease the possibility of unreliable and misleading LIWC scores leading to misleading conclusions by analyzing not each category separately, but in pairs with other categories. The analysis of the results supported the validity of the proposed approach by revealing a number of valuable information on the vocabulary used by suicidal users and helped to pin-point false predictors. For example, we were able to specify that death-related words, typically associated with suicidal posts in the majority of the literature, become false predictors, when they co-occur with apostrophes, even in high-risk subreddits. On the other hand, the category-pair based disambiguation helped to specify that death becomes a predictor only when co-occurring with future-focused language, informal language, discrepancy, or 1st person pronouns. The promising applicability of the approach was additionally analyzed for its limitations, where we found out that although LIWC is a useful and easily applicable tool, the lack of any contextual processing makes it unsuitable for application in psychological and linguistic studies. We conclude that disadvantages of LIWC can be easily overcome by creating a number of high-performance AI-based classifiers trained for annotation of similar categories as LIWC, which we plan to pursue in future work.
Keywords: suicidal declarations; LIWC; social media (search for similar items in EconPapers)
JEL-codes: I I1 I3 Q Q5 (search for similar items in EconPapers)
Date: 2021
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (2)
Downloads: (external link)
https://www.mdpi.com/1660-4601/18/22/11759/pdf (application/pdf)
https://www.mdpi.com/1660-4601/18/22/11759/ (text/html)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:gam:jijerp:v:18:y:2021:i:22:p:11759-:d:675355
Access Statistics for this article
IJERPH is currently edited by Ms. Jenna Liu
More articles in IJERPH from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().