Iterative Denoising for Cross-Corpus Discovery
Carey E. Priebe (),
David J. Marchette,
Youngser Park,
Edward J. Wegman,
Jeffrey L. Solka,
Diego A. Socolinsky,
Damianos Karakos,
Ken W. Church,
Roland Guglielmi,
Ronald R. Coifman,
Dekang Lin,
Dennis M. Healy,
Marc Q. Jacobs and
Anna Tsao
Additional contact information
Carey E. Priebe: AlgoTek, Inc.
Youngser Park: Johns Hopkins U.
Edward J. Wegman: AlgoTek, Inc.
Diego A. Socolinsky: AlgoTek, Inc.
Damianos Karakos: Johns Hopkins U.
Ken W. Church: AlgoTek, Inc.
Roland Guglielmi: AlgoTek, Inc.
Ronald R. Coifman: AlgoTek, Inc.
Dekang Lin: AlgoTek, Inc.
Dennis M. Healy: DARPA
Marc Q. Jacobs: AlgoTek, Inc.
Anna Tsao: AlgoTek, Inc.
A chapter in COMPSTAT 2004 — Proceedings in Computational Statistics, 2004, pp 381-392 from Springer
Abstract:
Abstract We consider the problem of statistical pattern recognition in a heterogeneous, high-dimensional setting. In particular, we consider the search for meaningful cross-category associations in a heterogeneous text document corpus. Our approach involves “iterative denoising ” — that is, iteratively extracting (corpus-dependent) features and partitioning the document collection into sub-corpora. We present an anecdote wherein this methodology discovers a meaningful cross-category association in a heterogeneous collection of scientific documents.
Keywords: Text document processing; statistical pattern recognition; dimensionality reduction (search for similar items in EconPapers)
Date: 2004
References: Add references at CitEc
Citations:
There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:spr:sprchp:978-3-7908-2656-2_31
Ordering information: This item can be ordered from
http://www.springer.com/9783790826562
DOI: 10.1007/978-3-7908-2656-2_31
Access Statistics for this chapter
More chapters in Springer Books from Springer
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().