AI-Assisted Triage of Flaky Test Failures from System Logs: A Practical Pipeline for CI at Scale

Baranetska, Yuliia

AI-Assisted Triage of Flaky Test Failures from System Logs: A Practical Pipeline for CI at Scale

Yuliia Baranetska ()

Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023, 2025, vol. 8, issue 02, 209-218

Abstract: This study presents a practical and reproducible pipeline for managing flaky test failures directly from CI logs. We parse raw log streams online using Drain/Drain3 to create stable templates, aggregate them within per-failure windows, and vectorize the data using TF-IDF over template n-grams and basic statistics. Next, we apply HDBSCAN to group recurrent failure families into human-readable cluster cards, which include representative examples, dominant templates, and default routing rules. For systems with a strong sequential structure, we can optionally train an LSTM model in the style of DeepLog on template sequences to detect off-pattern executions and identify likely next events. The pipeline is designed for low-label settings, prioritizes explainability, and integrates governance through versioned rules that control actions like quarantine, environment health probes, and owner assignment. We outline a reproducible evaluation plan for teams to use in production contexts, focusing on clustering coverage and purity, the CI dashboard's signal-to-noise ratio (SNR), reductions in median time-to-recover/repair (MTTR), and the rate of duplicate investigations. We provide illustrative (neutral) metrics to demonstrate how to report improvements without revealing proprietary data and discuss a negative case that documents a rule which initially reduced reruns but increased false quarantines; this was later corrected by adding a confirmation step. Overall, by combining robust online parsing, density-based clustering, and optional sequence modeling, we transform noisy failures into explainable, routable families that stabilize delivery at scale while remaining compatible with standard CI/CD tooling.

Keywords: Quality Engineering; Software Testing; Playwright; AI triage; observability; CI/CD (search for similar items in EconPapers)
Date: 2025
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://newjaigs.com/index.php/JAIGS/article/view/416 (application/pdf)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:das:njaigs:v:8:y:2025:i:02:p:209-218:id:416

Access Statistics for this article

Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023 is currently edited by Justyna Żywiołek

More articles in Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023 from Open Knowledge
Bibliographic data for series maintained by Open Knowledge ().