Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions

Andrade, Stefan B.; Fasang, Anette Eva; Helske, Satu; Karhula, Aleksi

Typologies in Sequence Analysis: Practical Guidelines for Identifying Robust Cluster Solutions

Stefan B. Andrade, Anette Eva Fasang, Satu Helske and Aleksi Karhula

No kj8d5, SocArXiv from Center for Open Science

Abstract: Sequence analysis in the social sciences heavily relies on cluster techniques to identify typologies. Clustering techniques and statistical cluster cut-off criteria for selecting the optimal number of clusters have greatly improved. In contrast, we lack a systematic assessment of how data features, such as the sequence sample size, the number of time points in the sequences, and the number of distinct states in the sequence alphabet might systematically impact the identification of sequence typologies. Drawing on both simulated data from mixture Markov models and real data from the German Family Panel survey, we provide best-practice guidelines for applied researchers to gauge whether their data is sufficient for extracting robust sequence typologies, if they empirically exist. Sequence typologies are most robust for samples with at least 500 sequences, sequence lengths greater than 10 time points, and state alphabets that have at least as many states as the “true” number of clusters.

Date: 2023-10-10
References: View references in EconPapers View complete reference list from CitEc
Citations:

Downloads: (external link)
https://osf.io/download/65252d329b0cf30107786f89/

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:kj8d5

DOI: 10.31219/osf.io/kj8d5

Access Statistics for this paper

More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().