EconPapers    
Economics at your fingertips  
 

Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws

Bella Martinez-Seis, Obdulia Pichardo-Lagunas, Harlan Koff, Miguel Equihua, Octavio Perez-Maqueo and Arturo Hernández-Huerta
Additional contact information
Bella Martinez-Seis: Engineering Department, UPIITA-IPN, Instituto Politécnico Nacional, Mexico City 07360, Mexico
Obdulia Pichardo-Lagunas: Engineering Department, UPIITA-IPN, Instituto Politécnico Nacional, Mexico City 07360, Mexico
Harlan Koff: Department of Geography and Spatial Planning, University of Luxembourg, Maison des Sciences Humaines, 11, Porte des Sciences, L-4366 Luxembourg, Luxembourg
Miguel Equihua: Red de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, Mexico
Octavio Perez-Maqueo: Red de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, Mexico
Arturo Hernández-Huerta: Red de Ambiente y Sustentabilidad, Instituto de Ecología, A.C. (INECOL), Xalapa 91073, Mexico

Data, 2022, vol. 7, issue 7, 1-13

Abstract: This paper presents a corpus of pre-processed Mexican laws for computational tasks. The main contributions are the proposed JSON structure and the methodology used to achieve the semi-structured corpus with the selected algorithms. Law PDF documents were transformed into plain text, unified by a deconstruction of law–document structure, and labeled with natural language processing techniques considering part of speech (PoS); a process of entity extraction was also performed. The corpus includes the Mexican constitution and the Mexican laws that were collected from the official site in PDF format repealed before 14 October 2021. The collection has 305 documents, including: the Mexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance. The semi-structured database includes the transformation of the set of laws from PDF format to a digital representation in order to facilitate its computational analysis. The documents were migrated to JSON type files to represent internal hierarchical relations. In addition, basic natural language processing techniques were implemented on laws for the identification of part of speech and named entities. The presented data set is mainly useful for text analysis and data science. It could be used for various legislative analysis tasks including: comprehension, interpretation, translation, classification, accessibility, coherence, and searches. Finally, we present some statistic of the identified entities and an example of the usefulness of the corpus for environmental laws.

Keywords: Mexican legislation; laws; natural language processing; legislative documents (search for similar items in EconPapers)
JEL-codes: C8 C80 C81 C82 C83 (search for similar items in EconPapers)
Date: 2022
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2306-5729/7/7/91/pdf (application/pdf)
https://www.mdpi.com/2306-5729/7/7/91/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jdataj:v:7:y:2022:i:7:p:91-:d:856500

Access Statistics for this article

Data is currently edited by Ms. Cecilia Yang

More articles in Data from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().

 
Page updated 2025-03-19
Handle: RePEc:gam:jdataj:v:7:y:2022:i:7:p:91-:d:856500