NER-IPL: Indian Legal Prediction Dataset for Named Entity Recognition
Sarika Jain () and
Pooja Harde ()
Additional contact information
Sarika Jain: National Institute of Technology Kurukshetra
Pooja Harde: National Institute of Technology Kurukshetra
Chapter Chapter 4 in Business Analytics and Decision Making in Practice, 2024, pp 41-50 from Springer
Abstract:
Abstract Identifying Named Entities from unstructured text is difficult, especially for domain-specific data. Legal documents are usually very lengthy and highly unstructured. We can use two approaches for extracting the named legal entities from the legal documents: the Rule-based approach and the Machine-learning (ML) approach. This paper introduces NER-IPL, an Indian Legal Prediction Dataset for Named Entity Recognition. The dataset consists of 213481 sentences with 123193 annotated entities and 6198700 tokens. Different ML models take different encoding schemes to process the dataset; therefore, we use three different encoding schemes (BILOU, BOI, IOEBS) on the named entities for tagging, allowing the corpus to be trained on any machine learning model for automatic extraction of named entities. To validate our dataset, we have created a battery of baseline models to test the suitability of NER tasks using language models. All the different experiments with scores, detailed analysis, and the scope of improvements are elaborated in detail. Amongst the experimented baseline models, the InLegalBERT model gives the best F1 score of 0.67 on our dataset.
Keywords: Indian legal dataset; Corpus creation; NER; Legal domain; Semantic web; NLP; Entity extraction (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:lnopch:978-3-031-61589-4_4
Ordering information: This item can be ordered from
http://www.springer.com/9783031615894
DOI: 10.1007/978-3-031-61589-4_4
Access Statistics for this chapter
More chapters in Lecture Notes in Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().