Clustering Longitudinal Life-Course Sequences using Mixtures of Exponential-Distance Models
Keefe Murphy,
Brendan Murphy,
Raffaella Piccarreta and
Isobel Claire Gormley
Additional contact information
Keefe Murphy: University College Dublin
No f5n8k, SocArXiv from Center for Open Science
Abstract:
Sequence analysis is an increasingly popular approach for the analysis of life courses represented by an ordered collection of activities experienced by subjects over a given time period. Several criteria exist for measuring pairwise dissimilarities among sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms, with the aim of identifying the most relevant patterns in the data. Here, we propose a model-based clustering approach for categorical sequence data. The technique is applied to a survey data set containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. Specifically, we develop a family of methods for clustering sequences directly, based on mixtures of exponential-distance models, which we call MEDseq. The use of the Hamming distance or weighted variants thereof as the distance metrics permits closed-form expressions for the normalising constant, thereby facilitating the development of an ECM algorithm for model fitting. Additionally, MEDseq models allow the probability of component membership to depend on fixed covariates. Sampling weights, which are often associated with life-course data arising from surveys, are also accommodated. Simultaneously including weights and covariates in the clustering process yields new insights on the Northern Irish data.
Date: 2019-12-05
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://osf.io/download/5de7c691e1e62f000a334c46/
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:osf:socarx:f5n8k
DOI: 10.31219/osf.io/f5n8k
Access Statistics for this paper
More papers in SocArXiv from Center for Open Science
Bibliographic data for series maintained by OSF ().