EconPapers    
Economics at your fingertips  
 

Extracting Proceedings Data from Court Cases with Machine Learning

Bruno Mathis ()
Additional contact information
Bruno Mathis: CHROME Laboratory, Nimes University, 5 Rue du Docteur Georges Salan CS 13019, 30021 Nîmes, France

Stats, 2022, vol. 5, issue 4, 1-16

Abstract: France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type.

Keywords: machine learning; named-entity recognition; information extraction; judicial datae; civil procedur (search for similar items in EconPapers)
JEL-codes: C1 C10 C11 C14 C15 C16 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2571-905X/5/4/79/pdf (application/pdf)
https://www.mdpi.com/2571-905X/5/4/79/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jstats:v:5:y:2022:i:4:p:79-1320:d:1001711

Access Statistics for this article

Stats is currently edited by Mrs. Minnie Li

More articles in Stats from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jstats:v:5:y:2022:i:4:p:79-1320:d:1001711