EconPapers    
Economics at your fingertips  
 

Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data

Stephen Meisenbacher, Svetlozar Nestorov and Peter Norlander

MPRA Paper from University Library of Munich, Germany

Abstract: Data from online job postings are difficult to access and are not built in a standard or transparent manner. Data included in the standard taxonomy and occupational information database (O*NET) are updated infrequently and based on small survey samples. We adopt O*NET as a framework for building natural language processing tools that extract structured information from job postings. We publish the Job Ad Analysis Toolkit (JAAT), a collection of open-source tools built for this purpose, and demonstrate its reliability and accuracy in out-of-sample and LLM-as-judge testing. We extract more than 10 billion data points from more than 155 million online job ads provided by the National Labor Exchange (NLx) Research Hub, including O*NET tasks, occupation codes, tools, and technologies, as well as wages, skills, industry, and more features. We describe the construction of a dataset of occupation, state, and industry level features aggregated by monthly active jobs from 2015 – 2025. We illustrate the potential for research and future uses in education and workforce development.

Keywords: Labor Market Information; Online Job Vacancies; NLP methods; ML; data transparency (search for similar items in EconPapers)
JEL-codes: J23 J24 J63 (search for similar items in EconPapers)
Date: 2025-10-01
New Economics Papers: this item is included in nep-inv and nep-lma
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://mpra.ub.uni-muenchen.de/126336/1/MPRA_paper_126336.pdf original version (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pra:mprapa:126336

Access Statistics for this paper

More papers in MPRA Paper from University Library of Munich, Germany Ludwigstraße 33, D-80539 Munich, Germany. Contact information at EDIRC.
Bibliographic data for series maintained by Joachim Winter ().

 
Page updated 2025-11-01
Handle: RePEc:pra:mprapa:126336