EconPapers    
Economics at your fingertips  
 

A Method of Automated Nonparametric Content Analysis for Social Science

Daniel J. Hopkins and Gary King

American Journal of Political Science, 2010, vol. 54, issue 1, 229-247

Abstract: The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.

Date: 2010
References: View references in EconPapers View complete reference list from CitEc
Citations: View citations in EconPapers (54)

Downloads: (external link)
https://doi.org/10.1111/j.1540-5907.2009.00428.x

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247

Access Statistics for this article

More articles in American Journal of Political Science from John Wiley & Sons
Bibliographic data for series maintained by Wiley Content Delivery ().

 
Page updated 2025-03-20
Handle: RePEc:wly:amposc:v:54:y:2010:i:1:p:229-247