Configure alert rules in TraceKit to get notified about errors, latency spikes, and anomalies in your distributed systems.

Alert Rules

Get notified when your services need attention. Set up intelligent alerts based on error rates, latency, throughput, and health scores.

Quick Start

Set up notification channels (Slack, Telegram, Discord, Teams, PagerDuty, or OpsGenie)
Create an alert rule with conditions
Get notified when thresholds are breached

Monitor the percentage of failed requests. Ideal for detecting when your service starts experiencing issues.

Example Use Case: Alert when authentication service error rate exceeds 5% over 5 minutes

Best Practice: Set thresholds based on your baseline. 5-10% is typical for warning, 15%+ for critical.

Track response times and get alerted when requests are too slow. Choose from average, P50, P95, or P99 metrics.

Example Use Case: Alert when P95 latency exceeds 1000ms (1 second) for API endpoints

Metric Guide:

Monitor requests per minute. Perfect for detecting when services stop processing traffic or get overwhelmed.

Service Down Detection:

Traffic Spike Detection:

Throughput alerts are perfect for detecting service outages (low threshold) or DDoS attacks (high threshold).

Composite metric combining error rate and latency into a single health score (0-100). Higher is better.

Example Use Case: Alert when overall service health drops below 70

Formula: Health Score = (Error Rate Score x 50) + (Latency Score x 50). Score 100 = perfect, 0 = complete failure.