Reasoning about unstructured data de-identification
Patricia Thaine and
Gerald Penn
Additional contact information
Patricia Thaine: PhD Candidate, University of Toronto Co-Founder & CEO, Private AI, Canada
Gerald Penn: Professor of Computer Science, University of Toronto Co-Founder & Chief Science Officer, Private AI, Canada
Journal of Data Protection & Privacy, 2020, vol. 3, issue 3, 299-309
Abstract:
We frame the problem of de-identifying unstructured text within the greater landscape of privacy-enhancing technologies. We then cover what sort of background knowledge can be gained from only stylistic information about a written document and how we can use research on authorship attribution and author profiling to improve our understanding about the sorts of inferences that can be made from an otherwise de-identified text. Finally, we provide a risk score for determining the likelihood that a message will be attributed to a particular author within a dataset using only author profiling tools.
Keywords: anonymisation; de-identification; authorship attribution; author profiling; unstructured data; risk (search for similar items in EconPapers)
JEL-codes: K2 (search for similar items in EconPapers)
Date: 2020
References: Add references at CitEc
Citations:
Downloads: (external link)
https://hstalks.com/article/5711/download/ (application/pdf)
https://hstalks.com/article/5711/ (text/html)
Requires a paid subscription for full access.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:aza:jdpp00:y:2020:v:3:i:3:p:299-309
Access Statistics for this article
More articles in Journal of Data Protection & Privacy from Henry Stewart Publications
Bibliographic data for series maintained by Henry Stewart Talks ().