EconPapers    
Economics at your fingertips  
 

NER-IPL: Indian Legal Prediction Dataset for Named Entity Recognition

Sarika Jain () and Pooja Harde ()
Additional contact information
Sarika Jain: National Institute of Technology Kurukshetra
Pooja Harde: National Institute of Technology Kurukshetra

Chapter Chapter 4 in Business Analytics and Decision Making in Practice, 2024, pp 41-50 from Springer

Abstract: Abstract Identifying Named Entities from unstructured text is difficult, especially for domain-specific data. Legal documents are usually very lengthy and highly unstructured. We can use two approaches for extracting the named legal entities from the legal documents: the Rule-based approach and the Machine-learning (ML) approach. This paper introduces NER-IPL, an Indian Legal Prediction Dataset for Named Entity Recognition. The dataset consists of 213481 sentences with 123193 annotated entities and 6198700 tokens. Different ML models take different encoding schemes to process the dataset; therefore, we use three different encoding schemes (BILOU, BOI, IOEBS) on the named entities for tagging, allowing the corpus to be trained on any machine learning model for automatic extraction of named entities. To validate our dataset, we have created a battery of baseline models to test the suitability of NER tasks using language models. All the different experiments with scores, detailed analysis, and the scope of improvements are elaborated in detail. Amongst the experimented baseline models, the InLegalBERT model gives the best F1 score of 0.67 on our dataset.

Keywords: Indian legal dataset; Corpus creation; NER; Legal domain; Semantic web; NLP; Entity extraction (search for similar items in EconPapers)
Date: 2024
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:spr:lnopch:978-3-031-61589-4_4

Ordering information: This item can be ordered from
http://www.springer.com/9783031615894

DOI: 10.1007/978-3-031-61589-4_4

Access Statistics for this chapter

More chapters in Lecture Notes in Operations Research from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-04-01
Handle: RePEc:spr:lnopch:978-3-031-61589-4_4