Detecting System Anomalies without Labels using Workflow Patterns in Logs

Arun Kumar Bandlamudi; Sunitha Pachala

doi:https://doi.org/10.14445/22315381/IJETT-V74I6P103

Research Article | Open Access | Download PDF

Volume 74 | Issue 6 | Year 2026 | Article Id. IJETT-V74I6P103 | DOI : https://doi.org/10.14445/22315381/IJETT-V74I6P103

Detecting System Anomalies without Labels using Workflow Patterns in Logs

Arun Kumar Bandlamudi, Sunitha Pachala

Received	Revised	Accepted	Published
13 Aug 2025	02 Apr 2026	20 Apr 2026	27 Jun 2026

Citation :

Arun Kumar Bandlamudi, Sunitha Pachala, "Detecting System Anomalies without Labels using Workflow Patterns in Logs," International Journal of Engineering Trends and Technology (IJETT), vol. 74, no. 6, pp. 33-46, 2026. Crossref, https://doi.org/10.14445/22315381/IJETT-V74I6P103

Abstract

Large software systems create many logs. These logs help developers find and fix problems. Logs record what happens inside the system. Logs usually appear in a semi-structured text format. Hand-reading all logs is hard in large systems. In this paper, a method named ADR is proposed. ADR stands for Anomaly Detection by workflow Relations. It finds mathematical patterns from logs. These patterns show the way events in the system relate to each other. ADR checks if logs follow these patterns. If the patterns are not followed, it indicates that something is wrong. The process starts by converting raw logs into event sequences. Then, these events are put into a special matrix. This matrix records the number of times each event happens. The system then checks for hidden patterns in this matrix. These patterns are referred to as numerical relations. ADR has two versions: sADR and uADR. The first one, sADR is semi-supervised. It needs a few labeled logs to learn. The second one, uADR is fully unsupervised. It works without any labeled logs. This saves time and reduces effort. Both versions were tested on four public datasets. ADR found useful patterns and detected many problems in the logs. It worked well with or without labels. ADR is a new and effective method. It uses numerical patterns to find system problems. It works even when logs are not labeled.

Keywords

Logs, ADR, Anomalies, Detection.

References

[1] Mohamed Amine Batoun et al., “A Literature Review and Existing Challenges on Software Logging Practices: From the Creation to the Analysis of Software Logs,” Empirical Software Engineering, vol. 29, no. 4, pp. 1-61, 2024.
[CrossRef] [Google Scholar] [Publisher Link]

[2] Nan Yang et al., “An Interview Study about the use of Logs in Embedded Software Engineering,” Empirical Software Engineering, vol. 28, no. 2, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[3] Ralph Foorthuis, “On the Nature and Types of Anomalies: A Review of Deviations in Data,” International Journal of Data Science and Analytics, vol. 12, no. 4, pp. 297-331, 2021.
[CrossRef] [Google Scholar] [Publisher Link]

[4] Jesper E. van Engelen, and Holger H. Hoos, “A Survey on Semi-Supervised Learning,” Machine Learning, vol. 109, no. 2, pp. 373-440, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[5] Matthias Kowal, Sofia Ananieva, and Thomas Thüm, “Explaining Anomalies in Feature Models,” ACM SIGPLAN Notices, vol. 52, no. 3, pp. 132-143, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[6] Shreya Shankar et al., “Moving Fast with Broken Data,” ArXiv Preprint, pp. 1-14, 2023.
[CrossRef] [Google Scholar] [Publisher Link]

[7] Weibin Meng et al., “LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs,” International Join Conferences on Artificial Intelligence Organization, vol. 19, no. 7, pp. 4739-4745, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[8] Bo Zhang et al., “Anomaly Detection Via Mining Numerical Workflow Relations from Logs,” 2020 International Symposium on Reliable Distributed Systems (SRDS), Shanghai, China, pp. 195-204, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[9] Arie Karniel, and Yoram Reich, “Formalizing a Workflow-Net Implementation of Design-Structure-Matrix-based Process Planning for new Product Development,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 41, no. 3, pp. 476-491, 2011.
[CrossRef] [Google Scholar] [Publisher Link]

[10] Alon Geva et al., “Adverse Drug Event Presentation and Tracking (ADEPT): Semiautomated, high Throughput Pharmacovigilance using Real-World Data,” JAMIA Open, vol. 3, no. 3, pp. 413-421, 2020.
[CrossRef] [Google Scholar] [Publisher Link]

[11] Christian Schlereth, and Bernd Skiera, “Two New Features in Discrete Choice Experiments to Improve Willingness-to-Pay Estimation that Result in SDR and SADR: Separated (Adaptive) Dual Response,” Management Science, vol. 63, no. 3, pp. 587-900, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[12] Linda Härmark, Florence van Hunsel, and Birgitta Grundmark, “ADR Reporting by the General Public: Lessons Learnt from the Dutch and Swedish Systems,” Drug Safety, vol. 38, no. 4, pp. 337-347, 2015.
[CrossRef] [Google Scholar] [Publisher Link]

[13] Marcello Cinque et al., “On the Impact of Debugging on Software Reliability Growth Analysis: A Case Study,” Computational Science and its Applications - ICCSA 2014: 14^th International Conference, Guimarães, Portugal, vol. 8583, pp. 461-475, 2014.
[CrossRef] [Google Scholar] [Publisher Link]

[14] Adetokunbo A.O. Makanju, A. Nur Zincir-Heywood, and Evangelos E. Milios, “Clustering Event Logs using Iterative Partitioning,” Proceedings of the 15^th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, New York, NY, United States, pp. 1255-1264, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[15] Qiang Fu et al., “Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis,” 2009 Ninth IEEE International Conference on Data Mining, Miami Beach, FL, USA, pp. 149-158, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[16] Min Du, and Feifei Li, “Spell: Streaming Parsing of System Event Logs,” 2016 IEEE 16^th International Conference on Data Mining (ICDM), Barcelona, Spain, pp. 859-864, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[17] Pinjia He et al., “Drain: An Online Log Parsing Approach with Fixed Depth Tree,” 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA, pp. 33-40, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[18] Jieming Zhu et al., “Tools and Benchmarks for Automated Log Parsing,” 2019 IEEE/ACM 41^stInternational Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada, pp. 121-130, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[19] R.K. Sahoo et al., “Failure Data Analysis of a Large-Scale Heterogeneous Internet Services,” International Conference on Dependable Systems and Networks, Florence, Italy, pp. 772-781, 2004.
[CrossRef] [Google Scholar] [Publisher Link]

[20] Yinglung Liang et al., “Failure Prediction in IBM BlueGene/L Event Logs,” Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, pp. 583-588, 2007.
[CrossRef] [Google Scholar] [Publisher Link]

[21] Peter Bodik et al., “Fingerprinting the Datacenter: Automated Classification of Performance Crises,” Proceedings of the 5^th European Conference on Computer Systems, Association for Computing Machinery, New York, NY, United States, pp. 111-124, 2010.
[CrossRef] [Google Scholar] [Publisher Link]

[22] Shilin He et al., “Experience Report: System Log Analysis for Anomaly Detection,” 2016 IEEE 27^thInternational Symposium on Software Reliability Engineering (ISSRE), Ottawa, ON, Canada, pp. 207-218, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[23] Wei Xu et al., “Detecting Large-Scale System Problems by Mining Console Logs,” Proceedings of the ACM SIGOPS 22^nd Symposium on Operating Systems Principles, Association for Computing Machinery, New York, NY, United States, pp. 117-132, 2009.
[CrossRef] [Google Scholar] [Publisher Link]

[24] Qingwei Lin et al., “Log Clustering based Problem Identification for Online Service Systems,” Proceedings of the 38^th International Conference on Software Engineering Companion, Association for Computing Machinery, New York, NY, United States, pp. 102-111, 2016.
[CrossRef] [Google Scholar] [Publisher Link]

[25] Jian-Guang LOU et al., “Mining Invariants from Console Logs for System Problem Detection,” 2010 USENIX Annual Technical Conference (USENIX ATC 10), 2010.
[Google Scholar]

[26] Min Du et al., “DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Association for Computing Machinery, New York, NY, United States, pp. 1285-1298, 2017.
[CrossRef] [Google Scholar] [Publisher Link]

[27] Xu Zhang et al., “Robust Log-based Anomaly Detection on Unstable Log Data,” Proceedings of the 2019 27^thACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Association for Computing Machinery, New York, NY, United States, pp. 807-817, 2019.
[CrossRef] [Google Scholar] [Publisher Link]

[28] Christophe Bertero et al., “Experience Report: Log Mining using Natural Language Processing and Application to Anomaly Detection,” 2017 IEEE 28^th International Symposium on Software Reliability Engineering (ISSRE), Toulouse, France, pp. 351-360, 2017.
[CrossRef] [Google Scholar] [Publisher Link]