Chess Anomaly Detection

Machine learning system for behavioral anomaly detection in online chess, comparing six unsupervised detectors and an ensemble review workflow.

Role

ML Project Contributor

Period

2026

Signal

100/100 ML project

Why it matters

LOF reached 0.971 test AUC on subtle synthetic injection; ensemble flagged 312 of 17,909 players for review.

View Code

Chess Anomaly Detection is a machine learning project for detecting unusual behavioral patterns in online chess. Lichess game data is reshaped into player-level records, behavioral and engine-accuracy features are engineered, and six unsupervised detectors are compared against synthetic anomaly injections.

The goal is careful and defensible: identify statistically unusual behavior clusters for review, not declare definitive cheating labels. The strongest result was LOF with 0.971 test AUC on subtle synthetic injection, while an ensemble flagged 312 of 17,909 players for human review.

Tech Stack

Python scikit-learn PyTorch Pandas UMAP SHAP pytest Jupyter

Key Features

Player-level behavioral feature engineering from chess game data
Six unsupervised anomaly detectors compared with shared validation
Synthetic anomaly injection for measurable evaluation
Majority-vote ensemble for review-oriented flagging
Precomputed results, charts, report, poster, and presentation deliverables
100/100 machine learning final project result

Technical Highlights

LOF achieved 0.971 test AUC on subtle synthetic injection
Ensemble flagged 312 of 17,909 players for review
Models included LOF, Isolation Forest, One-Class SVM, Autoencoder, ACPLSubAutoencoder, and Z-score baseline
Feature importance, UMAP, ROC curves, model agreement, and learning curves documented
Unit tests verify feature engineering and validation logic

Architecture

Data Pipeline

Load raw chess game data
Aggregate games into player-level records
Engineer behavioral and engine-accuracy features
Split and validate without leaking labels across phases

Modeling Layer

Compare anomaly detectors under common evaluation
Use synthetic injection to measure recall on subtle behaviors
Combine model outputs through ensemble review logic

Interpretability

Permutation importance for feature signal
UMAP views for cluster inspection
Model agreement analysis for review confidence

Challenges & Solutions

Avoiding overclaiming in a sensitive cheating-detection domain

Designing meaningful synthetic anomalies for evaluation

Comparing unsupervised models fairly

Making results understandable through charts and deliverables

Gallery