Beyond n-grams, tf-idf, and word indicators for text: Leveraging the Python API for vector embeddings

Buchanan, William

Beyond n-grams, tf-idf, and word indicators for text: Leveraging the Python API for vector embeddings

William Buchanan
Additional contact information
William Buchanan: SAG Corporation

2021 Stata Conference from Stata Users Group

Abstract: This talk will share strategies that Stata users can use to get more informative word, sentence, and document vector embeddings of text in their data. While indicator and bag-of-words strategies can be useful for some types of text analytics, they lack the richness of the semantic relationships between words that provide meaning and structure to language. Vector space embeddings attempt to preserve these relationships and in doing so can provide more robust numerical representations of text data that can be used for subsequent analysis. I will share strategies for using existing tools from the Python ecosystem with Stata to leverage the advances in NLP in your Stata workflow.

Date: 2021-08-07
References: Add references at CitEc
Citations:

There are no downloads for this item, see the EconPapers FAQ for hints about obtaining it.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:boc:scon21:26

Access Statistics for this paper

More papers in 2021 Stata Conference from Stata Users Group Contact information at EDIRC.
Bibliographic data for series maintained by Christopher F Baum ().