TraceKitTraceKit Docs

Code Monitoring

Monitor your production code with TraceKit Code Monitoring. Set live breakpoints, capture variable state, and debug without redeploying.

Code Monitoring

Debug production code without stopping your application. Set non-breaking breakpoints and capture variable state in real-time.

Production Debugging Without Downtime -- Set breakpoints in production and capture variables, stack traces, and context without redeploying. Built-in PII scrubbing, crash isolation, circuit breakers, and real-time SSE updates. Less than 5ms overhead. Supports Go, Java, PHP, Laravel, Node.js, Python, .NET and Ruby.

What is Code Monitoring?

  • Non-Breaking Breakpoints -- Create breakpoints that capture data without stopping your application.
  • Capture Variable State -- See all variable values at the exact moment the breakpoint was hit.
  • Full Stack Traces -- Complete call stack showing how code reached that breakpoint.
  • Request Context -- HTTP headers, trace IDs, and more to understand what triggered execution.

How It Works

Automatic Code Discovery

TraceKit automatically indexes your code from traces you're already sending. When traces contain stack traces (from errors or instrumentation), we extract file paths, functions, and line numbers. No extra instrumentation needed.

Browse your discovered code in the Code Monitoring page, "Browse Code" tab.

Step-by-Step

  1. Send Traces -- Your existing traces automatically index code. Stack traces reveal file paths and functions.
  2. Browse and Set Breakpoints -- Click "Browse Code" to see discovered files/functions, then click "Set Breakpoint" on any location.
  3. Capture and Debug -- When that code runs, we capture variables, stack trace, and context automatically. View snapshots in the UI.

Recommended: Use CheckAndCaptureWithContext() for automatic breakpoint registration. The SDK handles file detection, line tracking, and breakpoint creation for you.

Quick Start

Step 1: Install and Enable Code Monitoring

Choose your language:

Go:

go get github.com/Tracekit-Dev/go-sdk/tracekit
sdk, _ := tracekit.NewSDK(&tracekit.Config{
    APIKey:               os.Getenv("TRACEKIT_API_KEY"),
    ServiceName:          "order-service",
    EnableCodeMonitoring: true,
})
defer sdk.Shutdown(context.Background())
sdk.CheckAndCaptureWithContext(ctx, "order-processing", map[string]interface{}{
    "orderID": orderID,
    "total":   total,
    "status":  "validated",
})

Full Go Code Monitoring Docs

Python:

pip install tracekit-apm
import tracekit

client = tracekit.init(
    api_key=os.getenv("TRACEKIT_API_KEY"),
    service_name="my-flask-app",
    enable_code_monitoring=True,  # default: False
)
client.capture_snapshot("order-processing", {
    "order_id": order["id"],
    "total": order["total"],
    "user_id": user.id,
})

Full Python Code Monitoring Docs

Node.js:

npm install @tracekit/node-apm
import * as tracekit from '@tracekit/node-apm';

const client = tracekit.init({
    apiKey: process.env.TRACEKIT_API_KEY,
    serviceName: 'my-app',
    enableCodeMonitoring: true,
});
await client.captureSnapshot('checkout-validation', {
    userId,
    amount,
    timestamp: new Date().toISOString(),
});

Full Node.js Code Monitoring Docs

PHP:

composer require tracekit/php-apm
$tracekit = new TracekitClient([
    'api_key' => getenv('TRACEKIT_API_KEY'),
    'service_name' => 'my-php-app',
    'endpoint' => 'https://your-app.com/v1/traces',
    'code_monitoring_enabled' => true,
]);
$tracekit->captureSnapshot('checkout-validation', [
    'user_id' => $userId,
    'cart_items' => count($cart['items']),
    'total_amount' => $cart['total'],
]);

Full PHP Code Monitoring Docs

Laravel:

composer require tracekit/laravel-apm
TRACEKIT_CODE_MONITORING_ENABLED=true
TRACEKIT_CODE_MONITORING_POLL_INTERVAL=30
tracekit_snapshot('checkout-start', [
    'user_id' => $userId,
    'cart_total' => $cartTotal,
    'items_count' => count($items),
]);

Full Laravel Code Monitoring Docs

Java:

<dependency>
    <groupId>dev.tracekit</groupId>
    <artifactId>tracekit-core</artifactId>
</dependency>
tracekit:
    enable-code-monitoring: true
tracekit.captureSnapshot("order_processing",
    Map.of(
        "orderId", order.getId(),
        "customerId", order.getCustomerId(),
        "total", order.getTotal()
    )
);

Full Java Code Monitoring Docs

Step 3: Add Checkpoints (Automatic)

Recommended: Automatic Breakpoint Registration -- Breakpoints are automatically created and updated when you call CheckAndCaptureWithContext. No manual UI setup required.

// Automatic file/line detection + auto-creates breakpoint!
sdk.CheckAndCaptureWithContext(ctx, "payment-processing", map[string]interface{}{
    "userID": userID,
    "amount": amount,
})

// The SDK will:
// 1. Detect file path and line number automatically
// 2. Auto-create/update the breakpoint in TraceKit
// 3. Capture snapshot when breakpoint is active

Step 4: View and Manage (Optional)

Breakpoints are automatically created and enabled. You can optionally:

  • View captured snapshots in the UI at /snapshots
  • Adjust conditions or sampling rates
  • Browse auto-discovered code
  • Disable/enable breakpoints as needed

Advanced: Manual Breakpoint Creation

For advanced users who want full control, you can manually create breakpoints in the UI first. Go to Code Monitoring and create a breakpoint for payment.go:42.

Production Safety

Code monitoring is built for production from day one. Every SDK includes multiple layers of protection to ensure zero impact on your application, even under failure conditions.

PII Scrubbing (Default On)

13 built-in patterns automatically redact sensitive data before it leaves your application. Emails, SSNs, credit cards, API keys, JWTs, and more are replaced with typed markers like [REDACTED:email]. Enabled by default. Add custom patterns or disable per-service.

Crash Isolation

Every SDK entry point is wrapped in language-idiomatic recovery handlers. A bug in TraceKit's snapshot code will never crash your application -- the SDK recovers silently and continues. Go: defer/recover. Node: try/catch. Java: catch(Throwable). And more.

Circuit Breaker

If the TraceKit backend is unreachable, the SDK automatically stops sending snapshots after 3 failures in 60 seconds. It re-enables after a 5-minute cooldown. No manual intervention needed. Thresholds are configurable per SDK instance.

Remote Kill Switch

Instantly disable all code monitoring for a service from the dashboard. The kill switch propagates to all connected SDKs in real-time via SSE, or within 60 seconds via polling. One click in the dashboard to stop all captures immediately.

All safety features are enabled by default across all 8 SDKs. No configuration required -- just enable code monitoring and you're protected.

Real-Time Updates

Breakpoint changes propagate to your SDKs in under 1 second using Server-Sent Events (SSE). No more waiting for the next 30-second poll cycle.

  1. Auto-Discovery -- When your SDK polls for breakpoints, the server returns an sse_endpoint URL. The SDK automatically connects.
  2. Real-Time Streaming -- Breakpoint creates, updates, deletes, and kill switch commands stream instantly to connected SDKs. Polling pauses while SSE is active.
  3. Automatic Fallback -- If the SSE connection drops, the SDK seamlessly falls back to polling and reconnects SSE on the next successful poll.

Dashboard Live Updates

The Code Monitoring dashboard also uses SSE for live capture counters, breakpoint status changes, and connected SDK count -- all without page refresh.

Server-Side Conditions

Breakpoint conditions (e.g., user.id == 42) are evaluated server-side in a sandboxed engine. SDKs send metadata via a check-in endpoint, and the server decides whether to capture.

Available SDKs

Code Monitoring is available in all TraceKit SDKs:

Log-Point Mode

Log-points are lightweight breakpoints that capture only specific expressions without the overhead of full variable snapshots. Think of them as production console.log statements you can add and remove without redeploying.

When a breakpoint is set to log-point mode:

  • Only the specified capture expressions are evaluated and recorded
  • Local variables, stack traces, and request context are skipped
  • Overhead is significantly lower than full snapshots

Creating a Log-Point

Set mode: logpoint and specify capture_expressions when creating a breakpoint via the dashboard or API. Expressions are evaluated against the variables passed to CheckAndCaptureWithContext.

Example: Capture only order_total and user_id from a checkout handler:

// Go
sdk.CheckAndCaptureWithContext(ctx, "checkout-total", map[string]interface{}{
    "order_total": order.Total,
    "user_id":     user.ID,
    "items":       order.Items, // This won't be captured in logpoint mode
})

In the dashboard, the log-point breakpoint on checkout-total with capture_expressions: ["order_total", "user_id"] will only record those two values.

SDK-Side Condition Evaluation

Breakpoint conditions can now be evaluated locally in the SDK without a server round-trip. When you set a condition like status > 200 on a breakpoint, the server classifies it as either:

  • sdk-evaluable -- Simple expressions (comparisons, boolean logic, property access) that all SDKs can evaluate locally
  • server-only -- Complex expressions (regex, function calls) that require server-side evaluation

SDK-evaluable conditions are evaluated in-process before any network call, eliminating the HTTP round-trip to the /sdk/snapshots/check-in endpoint. This makes breakpoints viable on hot code paths.

Supported Expression Syntax

OperatorExampleDescription
Comparisonstatus > 200, amount != 0==, !=, <, >, <=, >=
Logicalstatus > 200 && method == "POST"&&, ||, !
Arithmeticprice * quantity > 1000+, -, *, /
Property accessuser.role == "admin"Dot notation for nested objects
Membership"error" in tagsCheck if value exists in a collection
Literalstrue, false, null, 42, "text"Boolean, null, numeric, string

Conditions using regex, function calls (len(), contains()), or assignment are classified as server-only and fall back to the check-in endpoint.

Auto-Expire

Breakpoints now auto-expire after 48 hours of inactivity by default. This prevents forgotten breakpoints from accumulating production overhead.

  • Idle timer -- based on last_captured_at. Each capture resets the timer.
  • Zero captures -- breakpoints that never fire expire 48h after creation.
  • Pinned breakpoints -- pin a breakpoint to prevent auto-expiry. Useful for long-running debugging sessions.
  • Configurable -- set idle_timeout_hours per breakpoint (default: 48). Set to null for no expiry.

Expired breakpoints show a distinct "auto-expired" status in the dashboard. Expiry events are pushed via SSE in real-time.

Existing breakpoints created before v25.0 have no auto-expire (idle_timeout_hours = null). Only new breakpoints get the 48h default.

Per-Breakpoint Capture Limits

New breakpoints default to sensible capture limits:

SettingDefaultDescription
max_depth5Maximum variable nesting depth. Deeper objects are truncated with a _truncated indicator.
max_payload_bytes131072 (128KB)Maximum serialized snapshot size. Larger payloads are truncated.

These limits can be overridden per breakpoint via the dashboard or API. Set to null for unlimited (legacy behavior).

Existing breakpoints retain unlimited behavior -- the new defaults only apply to breakpoints created after v25.0.

Overhead Visibility

The breakpoint detail page now shows a capture overhead trend chart displaying average capture_overhead_ms over time. Breakpoint cards include a color-coded overhead indicator:

ColorThresholdMeaning
Green< 5msHealthy overhead
Yellow5-20msModerate overhead
Red> 20msHigh overhead -- consider adjusting capture limits or sampling

A warning banner appears when average overhead exceeds 20ms.

Snapshot Aggregation

When a breakpoint has multiple captures, the aggregation panel on the breakpoint detail page shows:

  • Numeric distributions -- min, max, avg, p50, p95, p99 for numeric variables
  • String frequencies -- top-N value counts for string/enum variables
  • Anomaly detection -- values outside 2 standard deviations from the mean are flagged

Select which variables to aggregate and set a time range. Aggregation results are cached and refresh incrementally.

Search across captured variables using JSONPath expressions. Find patterns across all snapshots for a service.

Examples:

  • $.order.items[*].price -- find all item prices across snapshots
  • $.user.role == "admin" -- find snapshots where user was admin
  • $.response.status >= 500 -- find error responses

Search uses a two-phase approach (GIN-indexed narrowing then JSONPath filtering) with mandatory time range and statement timeout guard for performance safety.

Advanced Configuration

All safety features work with zero configuration. For advanced use cases, you can tune capture limits, PII patterns, and circuit breaker thresholds.

Go:

sdk, _ := tracekit.NewSDK(&tracekit.Config{
    APIKey:               os.Getenv("TRACEKIT_API_KEY"),
    ServiceName:          "order-service",
    EnableCodeMonitoring: true,
    CaptureConfig: &tracekit.CaptureConfig{
        CaptureDepth:   10,              // Max nesting depth (0 = unlimited)
        MaxPayload:     65536,           // Max payload bytes (0 = unlimited)
        CaptureTimeout: 5 * time.Second, // Capture timeout (0 = none)
        PIIScrubbing:   boolPtr(true),   // Default: enabled
        CircuitBreaker: &tracekit.CircuitBreakerConfig{
            MaxFailures: 3,     // Failures before tripping (default: 3)
            WindowMs:    60000, // Failure window in ms (default: 60s)
            CooldownMs:  300000,// Auto-recovery after (default: 5min)
        },
    },
})

Node.js:

const client = tracekit.init({
    apiKey: process.env.TRACEKIT_API_KEY,
    serviceName: 'order-service',
    enableCodeMonitoring: true,
    captureConfig: {
        captureDepth: 10,         // Max nesting depth (undefined = unlimited)
        maxPayload: 65536,        // Max payload bytes (undefined = unlimited)
        captureTimeout: 5000,     // Capture timeout in ms (undefined = none)
        piiScrubbing: true,       // Default: true
        circuitBreaker: {
            maxFailures: 3,       // Failures before tripping (default: 3)
            windowMs: 60000,      // Failure window in ms (default: 60s)
            cooldownMs: 300000,   // Auto-recovery after (default: 5min)
        },
    },
});

Python:

client = tracekit.init(
    api_key=os.getenv("TRACEKIT_API_KEY"),
    service_name="order-service",
    enable_code_monitoring=True,
    capture_config={
        "capture_depth": 10,         # Max nesting depth (None = unlimited)
        "max_payload": 65536,        # Max payload bytes (None = unlimited)
        "capture_timeout": 5.0,      # Capture timeout in seconds (None = none)
        "pii_scrubbing": True,       # Default: True
        "circuit_breaker": {
            "max_failures": 3,       # Failures before tripping (default: 3)
            "window_ms": 60000,      # Failure window in ms (default: 60s)
            "cooldown_ms": 300000,   # Auto-recovery after (default: 5min)
        },
    },
)

Java:

TracekitConfig config = TracekitConfig.builder()
    .apiKey(System.getenv("TRACEKIT_API_KEY"))
    .serviceName("order-service")
    .enableCodeMonitoring(true)
    .captureDepth(10)              // Max nesting depth (0 = unlimited)
    .maxPayload(65536)             // Max payload bytes (0 = unlimited)
    .captureTimeoutMs(5000)        // Capture timeout in ms (0 = none)
    .piiScrubbing(true)            // Default: true
    .circuitBreakerMaxFailures(3)  // Default: 3
    .circuitBreakerWindowMs(60000) // Default: 60s
    .circuitBreakerCooldownMs(300000) // Default: 5min
    .build();

.NET:

var sdk = TracekitSDK.CreateBuilder()
    .WithApiKey(Environment.GetEnvironmentVariable("TRACEKIT_API_KEY"))
    .WithServiceName("order-service")
    .WithEnableCodeMonitoring(true)
    .WithCaptureDepth(10)              // Max nesting depth (0 = unlimited)
    .WithMaxPayload(65536)             // Max payload bytes (0 = unlimited)
    .WithCaptureTimeoutMs(5000)        // Capture timeout in ms (0 = none)
    .WithPiiScrubbing(true)            // Default: true
    .WithCircuitBreakerMaxFailures(3)  // Default: 3
    .WithCircuitBreakerWindowMs(60000) // Default: 60s
    .WithCircuitBreakerCooldownMs(300000) // Default: 5min
    .Build();

Ruby:

Tracekit::SDK.configure do |c|
    c.api_key                = ENV['TRACEKIT_API_KEY']
    c.service_name           = "order-service"
    c.enable_code_monitoring = true
    c.capture_depth          = 10      # Max nesting depth (nil = unlimited)
    c.max_payload            = 65536   # Max payload bytes (nil = unlimited)
    c.capture_timeout        = 5.0     # Capture timeout in seconds (nil = none)
    c.pii_scrubbing          = true    # Default: true
    c.circuit_breaker_max_failures = 3       # Default: 3
    c.circuit_breaker_window_ms    = 60000   # Default: 60s
    c.circuit_breaker_cooldown_ms  = 300000  # Default: 5min
end

Use Cases

Debug Production Issues

Customer reports error? Set breakpoint to see exact state next time it happens.

Performance Investigation

Capture input size and timing to find what causes slowdowns.

Verify Calculations

Track money flows through complex pipelines to ensure correctness.

Troubleshooting

No snapshots captured?

  • Check breakpoint is enabled and not expired
  • Verify file path and line number match
  • Ensure service name matches between SDK and breakpoint
  • Check the kill switch is not active for the service
  • Verify the circuit breaker hasn't tripped (check SDK logs for "circuit breaker open")
  • If using conditions, verify the condition expression is valid

Performance concerns?

  • Use max_captures to limit total captures per breakpoint
  • Set capture_frequency for sampling
  • Set short expiration times on breakpoints
  • Use opt-in capture limits (captureDepth, maxPayload, captureTimeout)
  • The circuit breaker auto-disables after 3 failures -- no manual action needed

Variables showing [REDACTED:type]?

  • PII scrubbing is enabled by default and redacts sensitive data before transmission
  • 13 built-in patterns detect emails, SSNs, credit cards, API keys, JWTs, and more
  • To disable for a specific service, set piiScrubbing: false in your capture config
  • Custom patterns can be added via the piiPatterns config option

Circuit breaker tripped?

  • The circuit breaker opens after 3 HTTP 5xx/network failures within 60 seconds
  • It auto-recovers after 5 minutes -- no restart required
  • Check your network connectivity and TraceKit server status
  • Thresholds can be tuned via circuitBreaker config

SSE not connecting?

  • SSE auto-discovers via polling -- ensure at least one poll has completed
  • SSE only activates when breakpoints exist and kill switch is off
  • The SDK falls back to polling if SSE is unavailable
  • PHP/Laravel in web request mode use polling only (SSE for CLI/worker processes)

Ready to Start?

On this page