37: Monitoring and Observability

Home›Series›Claude for PHP Developers›Chapter 37

Chapter 37: Monitoring and Observability

Overview

Production AI applications require comprehensive monitoring to ensure reliability, performance, cost efficiency, and rapid incident response. Unlike traditional applications, Claude integrations have unique monitoring requirements: token usage tracking, latency optimization, model performance analysis, cost attribution, and quality assurance for AI outputs.

This chapter teaches you to build robust observability into your Claude applications. You'll implement structured logging, collect meaningful metrics, set up distributed tracing, create actionable dashboards, configure intelligent alerts, and integrate with popular monitoring platforms like Sentry, Datadog, and Prometheus.

Prerequisites

Before starting, ensure you have:

✓ PHP 8.4+ with JSON and cURL extensions
✓ Monolog or similar logging library
✓ Redis or similar for metrics storage
✓ Access to monitoring platforms (optional but recommended)
✓ Completion of Chapter 36 or equivalent understanding of security best practices

Estimated Time: ~60-75 minutes

Verify your setup:

bash

# Check PHP version
php --version

# Verify Redis is running
redis-cli ping

# Check if Monolog is available
composer show monolog/monolog

What You'll Build

By the end of this chapter, you will have created:

A complete structured logging system with JSON formatting and context enrichment
A metrics collection infrastructure using Redis for time-series data
A distributed tracing system to track requests across services
Real-time monitoring dashboards with performance, cost, and quality metrics
Intelligent alerting system with configurable rules and cooldowns
Integration examples for Sentry, Datadog, and Prometheus
A comprehensive monitoring solution ready for production deployment

Objectives

By completing this chapter, you will:

Understand how to implement structured logging with Monolog for Claude applications
Learn to collect and analyze key metrics: latency, tokens, costs, errors, and quality
Master distributed tracing to understand request flow across multiple services
Build real-time dashboards that provide actionable insights
Configure intelligent alerting systems with proper thresholds and cooldowns
Integrate with popular monitoring platforms (Sentry, Datadog, Prometheus)
Monitor AI-specific metrics like token usage, cost attribution, and output quality

Structured Logging

Structured logging provides searchable, analyzable log data essential for debugging and monitoring AI applications.

Logging Infrastructure

php

<?php
# filename: src/Logging/ClaudeLogger.php
declare(strict_types=1);

namespace App\Logging;

use Monolog\Logger;
use Monolog\Handler\StreamHandler;
use Monolog\Handler\RotatingFileHandler;
use Monolog\Formatter\JsonFormatter;
use Monolog\Processor\WebProcessor;
use Monolog\Processor\IntrospectionProcessor;

class ClaudeLogger
{
    private Logger $logger;

    public function __construct(string $name = 'claude-app')
    {
        $this->logger = new Logger($name);

        // File handler with JSON formatting
        $fileHandler = new RotatingFileHandler(
            filename: '/var/log/app/claude.log',
            maxFiles: 30,
            level: Logger::INFO
        );
        $fileHandler->setFormatter(new JsonFormatter());

        // Error handler for critical issues
        $errorHandler = new StreamHandler(
            stream: '/var/log/app/claude-errors.log',
            level: Logger::ERROR
        );
        $errorHandler->setFormatter(new JsonFormatter());

        $this->logger->pushHandler($fileHandler);
        $this->logger->pushHandler($errorHandler);

        // Add contextual processors
        $this->logger->pushProcessor(new WebProcessor());
        $this->logger->pushProcessor(new IntrospectionProcessor());
        $this->logger->pushProcessor([$this, 'addGlobalContext']);
    }

    public function addGlobalContext(array $record): array
    {
        $record['extra']['environment'] = getenv('APP_ENV') ?: 'production';
        $record['extra']['server'] = gethostname();
        $record['extra']['app_version'] = getenv('APP_VERSION') ?: 'unknown';

        return $record;
    }

    public function getLogger(): Logger
    {
        return $this->logger;
    }

    /**
     * Log Claude API request
     */
    public function logRequest(
        string $model,
        int $inputTokens,
        string $userId,
        array $metadata = []
    ): void {
        $this->logger->info('claude.request.started', [
            'model' => $model,
            'input_tokens' => $inputTokens,
            'user_id' => $userId,
            'metadata' => $metadata,
            'timestamp' => microtime(true),
        ]);
    }

    /**
     * Log Claude API response
     */
    public function logResponse(
        string $messageId,
        string $model,
        int $inputTokens,
        int $outputTokens,
        float $duration,
        string $stopReason,
        ?string $userId = null
    ): void {
        $cost = $this->calculateCost($model, $inputTokens, $outputTokens);

        $this->logger->info('claude.request.completed', [
            'message_id' => $messageId,
            'model' => $model,
            'tokens' => [
                'input' => $inputTokens,
                'output' => $outputTokens,
                'total' => $inputTokens + $outputTokens,
            ],
            'cost' => $cost,
            'duration_ms' => round($duration * 1000, 2),
            'tokens_per_second' => round($outputTokens / $duration, 2),
            'stop_reason' => $stopReason,
            'user_id' => $userId,
            'timestamp' => microtime(true),
        ]);
    }

    /**
     * Log errors with full context
     */
    public function logError(
        \Throwable $error,
        string $context,
        array $additionalData = []
    ): void {
        $this->logger->error('claude.error', [
            'error_type' => get_class($error),
            'message' => $error->getMessage(),
            'code' => $error->getCode(),
            'file' => $error->getFile(),
            'line' => $error->getLine(),
            'trace' => $error->getTraceAsString(),
            'context' => $context,
            'additional_data' => $additionalData,
        ]);
    }

    /**
     * Log quality metrics
     */
    public function logQuality(
        string $messageId,
        float $relevanceScore,
        float $coherenceScore,
        bool $userSatisfied,
        ?string $feedback = null
    ): void {
        $this->logger->info('claude.quality', [
            'message_id' => $messageId,
            'scores' => [
                'relevance' => $relevanceScore,
                'coherence' => $coherenceScore,
                'average' => ($relevanceScore + $coherenceScore) / 2,
            ],
            'user_satisfied' => $userSatisfied,
            'feedback' => $feedback,
        ]);
    }

    private function calculateCost(string $model, int $inputTokens, int $outputTokens): array
    {
        $pricing = match($model) {
            'claude-opus-4-1' => ['input' => 15.00, 'output' => 75.00],
            'claude-sonnet-4-5-20250929' => ['input' => 3.00, 'output' => 15.00],
            'claude-haiku-4-5-20251001' => ['input' => 0.25, 'output' => 1.25],
            default => ['input' => 0, 'output' => 0],
        };

        $inputCost = ($inputTokens / 1_000_000) * $pricing['input'];
        $outputCost = ($outputTokens / 1_000_000) * $pricing['output'];

        return [
            'input' => $inputCost,
            'output' => $outputCost,
            'total' => $inputCost + $outputCost,
        ];
    }
}

// Usage
$logger = new ClaudeLogger();

// Log request
$logger->logRequest(
    'claude-sonnet-4-5-20250929',
    150,
    'user-123',
    ['feature' => 'chatbot', 'session_id' => 'sess-456']
);

// Log response
$logger->logResponse(
    'msg_abc123',
    'claude-sonnet-4-5-20250929',
    150,
    300,
    2.5,
    'end_turn',
    'user-123'
);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191

Request Logging Middleware

php

<?php
# filename: src/Logging/RequestLoggingMiddleware.php
declare(strict_types=1);

namespace App\Logging;

class RequestLoggingMiddleware
{
    public function __construct(
        private readonly ClaudeLogger $logger
    ) {}

    /**
     * Wrap Claude requests with automatic logging
     */
    public function loggedRequest(
        callable $claudeRequest,
        string $userId,
        array $context = []
    ): mixed {
        $requestId = $this->generateRequestId();
        $startTime = microtime(true);

        // Log request start
        $this->logger->getLogger()->info('claude.request.initiated', [
            'request_id' => $requestId,
            'user_id' => $userId,
            'context' => $context,
        ]);

        try {
            $response = $claudeRequest();

            $duration = microtime(true) - $startTime;

            // Log successful response
            $this->logger->logResponse(
                $response->id,
                $response->model,
                $response->usage->inputTokens,
                $response->usage->outputTokens,
                $duration,
                $response->stopReason,
                $userId
            );

            return $response;

        } catch (\Throwable $e) {
            $duration = microtime(true) - $startTime;

            // Log error
            $this->logger->logError($e, 'claude_request_failed', [
                'request_id' => $requestId,
                'user_id' => $userId,
                'duration_ms' => round($duration * 1000, 2),
            ]);

            throw $e;
        }
    }

    private function generateRequestId(): string
    {
        return bin2hex(random_bytes(16));
    }
}

// Usage
use ClaudePhp\ClaudePhp;

$client = new ClaudePhp(
    apiKey: $_ENV['ANTHROPIC_API_KEY']
);

$loggingMiddleware = new RequestLoggingMiddleware($logger);

$response = $loggingMiddleware->loggedRequest(
    claudeRequest: fn() => $client->messages()->create([
        'model' => 'claude-sonnet-4-5-20250929',
        'max_tokens' => 1024,
        'messages' => [['role' => 'user', 'content' => $prompt]]
    ]),
    userId: 'user-123',
    context: ['feature' => 'support_bot', 'priority' => 'high']
);

Why It Works: The middleware pattern wraps Claude API calls with automatic logging, ensuring every request is tracked without modifying the core business logic. By generating a unique request ID and capturing timing information, you can trace requests through your system and correlate logs with metrics. The try-catch ensures errors are logged with full context before re-throwing, maintaining error propagation while preserving observability.

Metrics Collection

Track key performance indicators to understand your Claude application's health and performance.

Metrics Collector

php

<?php
# filename: src/Metrics/MetricsCollector.php
declare(strict_types=1);

namespace App\Metrics;

class MetricsCollector
{
    public function __construct(
        private readonly \Redis $redis
    ) {}

    /**
     * Record a metric value
     */
    public function record(string $metric, float $value, array $tags = []): void
    {
        $timestamp = time();
        $key = $this->buildKey($metric, $tags);

        // Store time-series data
        $this->redis->zAdd(
            $key,
            ['NX'],
            $timestamp,
            json_encode(['value' => $value, 'timestamp' => $timestamp])
        );

        // Keep only last 24 hours
        $oneDayAgo = $timestamp - 86400;
        $this->redis->zRemRangeByScore($key, '-inf', $oneDayAgo);

        // Update aggregates
        $this->updateAggregates($metric, $value, $tags);
    }

    /**
     * Increment a counter
     */
    public function increment(string $metric, int $amount = 1, array $tags = []): void
    {
        $key = $this->buildKey($metric, $tags) . ':counter';
        $this->redis->incrBy($key, $amount);
    }

    /**
     * Record a histogram value (for latency, token counts, etc.)
     */
    public function histogram(string $metric, float $value, array $tags = []): void
    {
        $key = $this->buildKey($metric, $tags) . ':histogram';

        // Store in sorted set for percentile calculations
        $this->redis->zAdd(
            $key,
            ['NX'],
            $value,
            json_encode(['value' => $value, 'timestamp' => time()])
        );

        // Keep last 10,000 values
        $count = $this->redis->zCard($key);
        if ($count > 10000) {
            $this->redis->zRemRangeByRank($key, 0, $count - 10001);
        }
    }

    /**
     * Get metric statistics
     */
    public function getStats(string $metric, array $tags = []): array
    {
        $key = $this->buildKey($metric, $tags);

        $values = $this->redis->zRange($key, 0, -1);
        $parsedValues = array_map(fn($v) => json_decode($v, true)['value'], $values);

        if (empty($parsedValues)) {
            return [
                'count' => 0,
                'sum' => 0,
                'avg' => 0,
                'min' => 0,
                'max' => 0,
            ];
        }

        return [
            'count' => count($parsedValues),
            'sum' => array_sum($parsedValues),
            'avg' => array_sum($parsedValues) / count($parsedValues),
            'min' => min($parsedValues),
            'max' => max($parsedValues),
        ];
    }

    /**
     * Get percentiles for histogram
     */
    public function getPercentiles(string $metric, array $percentiles = [50, 95, 99], array $tags = []): array
    {
        $key = $this->buildKey($metric, $tags) . ':histogram';
        $count = $this->redis->zCard($key);

        if ($count === 0) {
            return array_fill_keys($percentiles, 0);
        }

        $results = [];

        foreach ($percentiles as $percentile) {
            $rank = (int) ceil(($percentile / 100) * $count) - 1;
            $value = $this->redis->zRange($key, $rank, $rank);

            $results["p$percentile"] = !empty($value)
                ? json_decode($value[0], true)['value']
                : 0;
        }

        return $results;
    }

    private function buildKey(string $metric, array $tags): string
    {
        $tagString = empty($tags) ? '' : ':' . implode(':', array_map(
            fn($k, $v) => "$k=$v",
            array_keys($tags),
            $tags
        ));

        return "metrics:$metric$tagString";
    }

    private function updateAggregates(string $metric, float $value, array $tags): void
    {
        $hourKey = $this->buildKey($metric, $tags) . ':hour:' . date('Y-m-d-H');
        $dayKey = $this->buildKey($metric, $tags) . ':day:' . date('Y-m-d');

        // Update hourly aggregate
        $this->redis->hIncrByFloat($hourKey, 'sum', $value);
        $this->redis->hIncrBy($hourKey, 'count', 1);
        $this->redis->expire($hourKey, 172800); // 2 days

        // Update daily aggregate
        $this->redis->hIncrByFloat($dayKey, 'sum', $value);
        $this->redis->hIncrBy($dayKey, 'count', 1);
        $this->redis->expire($dayKey, 2592000); // 30 days
    }
}

// Usage
$metrics = new MetricsCollector($redis);

// Record request duration
$metrics->histogram('claude.request.duration', 2.5, ['model' => 'sonnet']);

// Record token usage
$metrics->record('claude.tokens.input', 150, ['model' => 'sonnet', 'user' => 'user-123']);
$metrics->record('claude.tokens.output', 300, ['model' => 'sonnet', 'user' => 'user-123']);

// Increment error counter
$metrics->increment('claude.errors', 1, ['type' => 'rate_limit']);

// Get statistics
$stats = $metrics->getStats('claude.request.duration', ['model' => 'sonnet']);
$percentiles = $metrics->getPercentiles('claude.request.duration', [50, 95, 99], ['model' => 'sonnet']);

Why It Works: Redis sorted sets (zAdd) provide efficient time-series storage where timestamps serve as scores, enabling fast range queries and automatic sorting. The histogram implementation uses sorted sets with values as scores, allowing percentile calculations by rank position. Aggregates are stored in Redis hashes for fast O(1) lookups, while TTL ensures old data is automatically cleaned up. This design balances query performance with storage efficiency, making it suitable for high-volume metric collection.

Metrics Categorization

When deciding what to monitor, categorize metrics into three tiers:

Tier 1: Critical Metrics (Always track)

Request success/failure rate
API latency (p50, p95, p99)
Total cost per day
Error rate by type

Tier 2: Important Metrics (Track for optimization)

Token usage by model and user
Quality scores (relevance, coherence)
Model selection distribution
Cache hit rates

Tier 3: Optional Metrics (Track for deep analysis)

Processing time by component
Queue depth and age
Specific user behavior patterns
Detailed model performance metrics

Why This Matters: Tracking every possible metric creates noise and increases storage costs. Focus on Tier 1 metrics for real-time alerts, Tier 2 for optimization, and Tier 3 only when investigating specific issues. This prevents "alert fatigue" and keeps your observability system performant.

php

<?php
// Example: Categorizing metrics
$criticalMetrics = [
    'claude.requests.success_rate',    // % of successful requests
    'claude.request.duration_p95',     // 95th percentile latency
    'claude.cost.daily_total',         // Daily spending
    'claude.errors.rate_limit',        // Rate limit errors
];

$importantMetrics = [
    'claude.tokens.input_total',       // Input tokens for optimization
    'claude.tokens.output_total',      // Output tokens for analysis
    'claude.quality.relevance_avg',    // Quality measurement
    'claude.cache.hit_rate',           // Cache effectiveness
];

$optionalMetrics = [
    'claude.processing.db_time_ms',    // Component-level timing
    'claude.queue.depth',              // Queue monitoring
    'claude.user.intent_distribution', // User behavior analysis
];

Key Metrics to Track

php

<?php
# filename: src/Metrics/ClaudeMetrics.php
declare(strict_types=1);

namespace App\Metrics;

class ClaudeMetrics
{
    public function __construct(
        private readonly MetricsCollector $metrics
    ) {}

    /**
     * Track request metrics
     */
    public function trackRequest(
        string $model,
        int $inputTokens,
        int $outputTokens,
        float $duration,
        string $stopReason,
        ?string $userId = null
    ): void {
        $tags = ['model' => $this->simplifyModelName($model)];

        if ($userId) {
            $tags['user'] = $userId;
        }

        // Request count
        $this->metrics->increment('claude.requests.total', 1, $tags);

        // Duration
        $this->metrics->histogram('claude.request.duration', $duration, $tags);

        // Token usage
        $this->metrics->record('claude.tokens.input', $inputTokens, $tags);
        $this->metrics->record('claude.tokens.output', $outputTokens, $tags);
        $this->metrics->record('claude.tokens.total', $inputTokens + $outputTokens, $tags);

        // Throughput
        $tokensPerSecond = $duration > 0 ? $outputTokens / $duration : 0;
        $this->metrics->record('claude.throughput.tokens_per_second', $tokensPerSecond, $tags);

        // Cost
        $cost = $this->calculateCost($model, $inputTokens, $outputTokens);
        $this->metrics->record('claude.cost.total', $cost, $tags);

        // Stop reason distribution
        $this->metrics->increment("claude.stop_reason.$stopReason", 1, $tags);
    }

    /**
     * Track errors
     */
    public function trackError(
        string $errorType,
        string $model,
        ?string $userId = null
    ): void {
        $tags = [
            'model' => $this->simplifyModelName($model),
            'error_type' => $errorType,
        ];

        if ($userId) {
            $tags['user'] = $userId;
        }

        $this->metrics->increment('claude.errors.total', 1, $tags);
    }

    /**
     * Track quality metrics
     */
    public function trackQuality(
        float $relevanceScore,
        float $coherenceScore,
        bool $userSatisfied,
        string $model
    ): void {
        $tags = ['model' => $this->simplifyModelName($model)];

        $this->metrics->record('claude.quality.relevance', $relevanceScore, $tags);
        $this->metrics->record('claude.quality.coherence', $coherenceScore, $tags);
        $this->metrics->increment(
            'claude.quality.satisfaction',
            $userSatisfied ? 1 : 0,
            $tags
        );
    }

    /**
     * Get dashboard data
     */
    public function getDashboardData(string $timeRange = '1h'): array
    {
        return [
            'requests' => [
                'total' => $this->getMetricSum('claude.requests.total'),
                'by_model' => $this->getMetricsByTag('claude.requests.total', 'model'),
            ],
            'latency' => [
                'p50' => $this->metrics->getPercentiles('claude.request.duration', [50])['p50'],
                'p95' => $this->metrics->getPercentiles('claude.request.duration', [95])['p95'],
                'p99' => $this->metrics->getPercentiles('claude.request.duration', [99])['p99'],
            ],
            'tokens' => [
                'input' => $this->getMetricSum('claude.tokens.input'),
                'output' => $this->getMetricSum('claude.tokens.output'),
                'total' => $this->getMetricSum('claude.tokens.total'),
            ],
            'cost' => [
                'total' => $this->getMetricSum('claude.cost.total'),
                'by_model' => $this->getMetricsByTag('claude.cost.total', 'model'),
            ],
            'errors' => [
                'total' => $this->getMetricSum('claude.errors.total'),
                'by_type' => $this->getMetricsByTag('claude.errors.total', 'error_type'),
            ],
            'quality' => [
                'relevance_avg' => $this->metrics->getStats('claude.quality.relevance')['avg'],
                'coherence_avg' => $this->metrics->getStats('claude.quality.coherence')['avg'],
                'satisfaction_rate' => $this->calculateSatisfactionRate(),
            ],
        ];
    }

    private function simplifyModelName(string $model): string
    {
        return match(true) {
            str_contains($model, 'opus') => 'opus',
            str_contains($model, 'sonnet') => 'sonnet',
            str_contains($model, 'haiku') => 'haiku',
            default => 'unknown'
        };
    }

    private function calculateCost(string $model, int $inputTokens, int $outputTokens): float
    {
        $pricing = match($this->simplifyModelName($model)) {
            'opus' => ['input' => 15.00, 'output' => 75.00],
            'sonnet' => ['input' => 3.00, 'output' => 15.00],
            'haiku' => ['input' => 0.25, 'output' => 1.25],
            default => ['input' => 0, 'output' => 0],
        };

        return ($inputTokens / 1_000_000 * $pricing['input']) +
               ($outputTokens / 1_000_000 * $pricing['output']);
    }

    private function getMetricSum(string $metric): float
    {
        $stats = $this->metrics->getStats($metric);
        return $stats['sum'];
    }

    private function getMetricsByTag(string $metric, string $tag): array
    {
        // Implementation would query Redis for different tag values
        return []; // Placeholder
    }

    private function calculateSatisfactionRate(): float
    {
        $stats = $this->metrics->getStats('claude.quality.satisfaction');
        return $stats['count'] > 0 ? ($stats['sum'] / $stats['count']) * 100 : 0;
    }
}

// Usage
$claudeMetrics = new ClaudeMetrics($metrics);

// Track request
$claudeMetrics->trackRequest(
    'claude-sonnet-4-5-20250929',
    150,
    300,
    2.5,
    'end_turn',
    'user-123'
);

// Get dashboard data
$dashboard = $claudeMetrics->getDashboardData('1h');

Distributed Tracing

Track requests across multiple services and understand the complete flow.

Tracing Implementation

php

<?php
# filename: src/Tracing/RequestTracer.php
declare(strict_types=1);

namespace App\Tracing;

class RequestTracer
{
    private array $spans = [];
    private ?string $traceId = null;
    private ?string $parentSpanId = null;

    public function startTrace(string $operationName, array $tags = []): string
    {
        $this->traceId = $this->generateId();
        return $this->startSpan($operationName, $tags);
    }

    public function startSpan(string $operationName, array $tags = []): string
    {
        $spanId = $this->generateId();

        $this->spans[$spanId] = [
            'trace_id' => $this->traceId,
            'span_id' => $spanId,
            'parent_span_id' => $this->parentSpanId,
            'operation_name' => $operationName,
            'start_time' => microtime(true),
            'tags' => $tags,
            'logs' => [],
        ];

        $this->parentSpanId = $spanId;

        return $spanId;
    }

    public function finishSpan(string $spanId, array $tags = []): void
    {
        if (!isset($this->spans[$spanId])) {
            return;
        }

        $this->spans[$spanId]['finish_time'] = microtime(true);
        $this->spans[$spanId]['duration'] = $this->spans[$spanId]['finish_time'] -
                                            $this->spans[$spanId]['start_time'];
        $this->spans[$spanId]['tags'] = array_merge(
            $this->spans[$spanId]['tags'],
            $tags
        );

        // Reset parent span ID
        $this->parentSpanId = $this->spans[$spanId]['parent_span_id'];
    }

    public function addLog(string $spanId, string $event, array $data = []): void
    {
        if (!isset($this->spans[$spanId])) {
            return;
        }

        $this->spans[$spanId]['logs'][] = [
            'timestamp' => microtime(true),
            'event' => $event,
            'data' => $data,
        ];
    }

    public function getTrace(): array
    {
        return [
            'trace_id' => $this->traceId,
            'spans' => array_values($this->spans),
            'total_duration' => $this->calculateTotalDuration(),
        ];
    }

    public function exportToJaeger(): void
    {
        // Export to Jaeger format
        $jaegerTrace = $this->convertToJaegerFormat();

        // Send to Jaeger collector
        // Implementation depends on Jaeger client library
    }

    private function generateId(): string
    {
        return bin2hex(random_bytes(8));
    }

    private function calculateTotalDuration(): float
    {
        if (empty($this->spans)) {
            return 0;
        }

        $firstSpan = reset($this->spans);
        $lastSpan = end($this->spans);

        return ($lastSpan['finish_time'] ?? microtime(true)) -
               $firstSpan['start_time'];
    }

    private function convertToJaegerFormat(): array
    {
        // Convert internal format to Jaeger format
        return []; // Placeholder
    }
}

// Usage
use ClaudePhp\ClaudePhp;

$client = new ClaudePhp(
    apiKey: $_ENV['ANTHROPIC_API_KEY']
);

$tracer = new RequestTracer();

// Start trace
$traceId = $tracer->startTrace('claude_chatbot_request', [
    'user_id' => 'user-123',
    'session_id' => 'sess-456',
]);

// Database span
$dbSpan = $tracer->startSpan('fetch_conversation_history', ['database' => 'postgres']);
// ... fetch history ...
$tracer->finishSpan($dbSpan, ['rows_fetched' => 10]);

// Claude API span
$claudeSpan = $tracer->startSpan('claude_api_request', [
    'model' => 'claude-sonnet-4-5-20250929',
    'max_tokens' => 1024,
]);

try {
    $response = $client->messages()->create([...]);

    $tracer->addLog($claudeSpan, 'response_received', [
        'message_id' => $response->id,
        'tokens' => $response->usage->inputTokens + $response->usage->outputTokens,
    ]);

    $tracer->finishSpan($claudeSpan, [
        'status' => 'success',
        'tokens' => $response->usage->inputTokens + $response->usage->outputTokens,
    ]);

} catch (\Exception $e) {
    $tracer->addLog($claudeSpan, 'error', [
        'error_type' => get_class($e),
        'message' => $e->getMessage(),
    ]);

    $tracer->finishSpan($claudeSpan, [
        'status' => 'error',
        'error' => true,
    ]);

    throw $e;
}

// Cache span
$cacheSpan = $tracer->startSpan('cache_response', ['cache' => 'redis']);
// ... cache response ...
$tracer->finishSpan($cacheSpan);

// Get complete trace
$trace = $tracer->getTrace();

Real-Time Dashboards

Create actionable dashboards to monitor your Claude application.

Dashboard Data Provider

php

<?php
# filename: src/Dashboard/DashboardProvider.php
declare(strict_types=1);

namespace App\Dashboard;

use App\Metrics\ClaudeMetrics;
use App\Logging\ClaudeLogger;

class DashboardProvider
{
    public function __construct(
        private readonly ClaudeMetrics $metrics,
        private readonly \Redis $redis
    ) {}

    /**
     * Get real-time dashboard data
     */
    public function getRealTimeDashboard(): array
    {
        return [
            'overview' => $this->getOverview(),
            'performance' => $this->getPerformanceMetrics(),
            'costs' => $this->getCostMetrics(),
            'quality' => $this->getQualityMetrics(),
            'errors' => $this->getErrorMetrics(),
            'alerts' => $this->getActiveAlerts(),
        ];
    }

    private function getOverview(): array
    {
        return [
            'requests_last_hour' => $this->getHourlyRequestCount(),
            'requests_last_24h' => $this->getDailyRequestCount(),
            'active_users' => $this->getActiveUserCount(),
            'avg_response_time' => $this->getAverageResponseTime(),
            'error_rate' => $this->getErrorRate(),
            'total_cost_today' => $this->getTotalCostToday(),
        ];
    }

    private function getPerformanceMetrics(): array
    {
        $percentiles = $this->metrics->getPercentiles(
            'claude.request.duration',
            [50, 75, 95, 99]
        );

        return [
            'latency' => [
                'p50' => round($percentiles['p50'] * 1000, 2),  // Convert to ms
                'p75' => round($percentiles['p75'] * 1000, 2),
                'p95' => round($percentiles['p95'] * 1000, 2),
                'p99' => round($percentiles['p99'] * 1000, 2),
            ],
            'throughput' => [
                'requests_per_minute' => $this->getRequestsPerMinute(),
                'tokens_per_second' => $this->getTokensPerSecond(),
            ],
            'by_model' => $this->getPerformanceByModel(),
        ];
    }

    private function getCostMetrics(): array
    {
        return [
            'today' => [
                'total' => $this->getTotalCostToday(),
                'by_model' => $this->getCostByModel('today'),
                'by_user' => $this->getTopCostUsers('today', 10),
            ],
            'this_month' => [
                'total' => $this->getTotalCostThisMonth(),
                'projection' => $this->getMonthlyProjection(),
            ],
            'budget' => [
                'daily_limit' => 500.00,
                'daily_spent' => $this->getTotalCostToday(),
                'daily_remaining' => 500.00 - $this->getTotalCostToday(),
                'monthly_limit' => 15000.00,
                'monthly_spent' => $this->getTotalCostThisMonth(),
            ],
        ];
    }

    private function getQualityMetrics(): array
    {
        $relevanceStats = $this->metrics->getStats('claude.quality.relevance');
        $coherenceStats = $this->metrics->getStats('claude.quality.coherence');

        return [
            'relevance' => [
                'average' => round($relevanceStats['avg'], 2),
                'min' => round($relevanceStats['min'], 2),
                'max' => round($relevanceStats['max'], 2),
            ],
            'coherence' => [
                'average' => round($coherenceStats['avg'], 2),
                'min' => round($coherenceStats['min'], 2),
                'max' => round($coherenceStats['max'], 2),
            ],
            'satisfaction_rate' => $this->getSatisfactionRate(),
        ];
    }

    private function getErrorMetrics(): array
    {
        return [
            'total_errors' => $this->getTotalErrors(),
            'error_rate' => $this->getErrorRate(),
            'by_type' => $this->getErrorsByType(),
            'recent_errors' => $this->getRecentErrors(10),
        ];
    }

    private function getActiveAlerts(): array
    {
        $alerts = [];

        // Check error rate
        $errorRate = $this->getErrorRate();
        if ($errorRate > 5.0) {
            $alerts[] = [
                'severity' => 'high',
                'type' => 'error_rate',
                'message' => "Error rate is $errorRate% (threshold: 5%)",
                'timestamp' => time(),
            ];
        }

        // Check latency
        $p95 = $this->metrics->getPercentiles('claude.request.duration', [95])['p95'];
        if ($p95 > 5.0) {
            $alerts[] = [
                'severity' => 'medium',
                'type' => 'high_latency',
                'message' => "P95 latency is " . round($p95, 2) . "s (threshold: 5s)",
                'timestamp' => time(),
            ];
        }

        // Check daily cost
        $dailyCost = $this->getTotalCostToday();
        if ($dailyCost > 450) {
            $alerts[] = [
                'severity' => 'high',
                'type' => 'budget',
                'message' => "Daily cost is $" . round($dailyCost, 2) . " (limit: $500)",
                'timestamp' => time(),
            ];
        }

        return $alerts;
    }

    // Helper methods (implementations would query actual data)
    private function getHourlyRequestCount(): int { return 0; }
    private function getDailyRequestCount(): int { return 0; }
    private function getActiveUserCount(): int { return 0; }
    private function getAverageResponseTime(): float { return 0.0; }
    private function getErrorRate(): float { return 0.0; }
    private function getTotalCostToday(): float { return 0.0; }
    private function getRequestsPerMinute(): float { return 0.0; }
    private function getTokensPerSecond(): float { return 0.0; }
    private function getPerformanceByModel(): array { return []; }
    private function getCostByModel(string $period): array { return []; }
    private function getTopCostUsers(string $period, int $limit): array { return []; }
    private function getTotalCostThisMonth(): float { return 0.0; }
    private function getMonthlyProjection(): float { return 0.0; }
    private function getSatisfactionRate(): float { return 0.0; }
    private function getTotalErrors(): int { return 0; }
    private function getErrorsByType(): array { return []; }
    private function getRecentErrors(int $limit): array { return []; }
}

HTML Dashboard

php

<?php
# filename: public/dashboard.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

$dashboardProvider = new App\Dashboard\DashboardProvider($metrics, $redis);
$data = $dashboardProvider->getRealTimeDashboard();
?>
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Claude Monitoring Dashboard</title>
    <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
               background: #f5f5f5; padding: 20px; }
        .container { max-width: 1400px; margin: 0 auto; }
        h1 { margin-bottom: 30px; color: #333; }
        .grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(300px, 1fr));
                gap: 20px; margin-bottom: 20px; }
        .card { background: white; padding: 20px; border-radius: 8px;
                box-shadow: 0 2px 4px rgba(0,0,0,0.1); }
        .card h2 { font-size: 18px; margin-bottom: 15px; color: #666; }
        .metric { display: flex; justify-'content' => space-between; align-items: center;
                  padding: 10px 0; border-bottom: 1px solid #eee; }
        .metric:last-child { border-bottom: none; }
        .metric-label { color: #666; }
        .metric-value { font-size: 24px; font-weight: bold; color: #333; }
        .alert { padding: 15px; margin-bottom: 15px; border-radius: 4px; }
        .alert-high { background: #fee; border-left: 4px solid #d00; }
        .alert-medium { background: #ffe; border-left: 4px solid #f90; }
        .chart-container { height: 300px; }
    </style>
</head>
<body>
    <div class="container">
        <h1>Claude Monitoring Dashboard</h1>

        <!-- Alerts -->
        <?php if (!empty($data['alerts'])): ?>
            <div class="card">
                <h2>Active Alerts</h2>
                <?php foreach ($data['alerts'] as $alert): ?>
                    <div class="alert alert-<?= $alert['severity'] ?>">
                        <strong><?= ucfirst($alert['type']) ?>:</strong>
                        <?= htmlspecialchars($alert['message']) ?>
                    </div>
                <?php endforeach; ?>
            </div>
        <?php endif; ?>

        <!-- Overview -->
        <div class="grid">
            <div class="card">
                <h2>Requests</h2>
                <div class="metric">
                    <span class="metric-label">Last Hour</span>
                    <span class="metric-value"><?= number_format($data['overview']['requests_last_hour']) ?></span>
                </div>
                <div class="metric">
                    <span class="metric-label">Last 24h</span>
                    <span class="metric-value"><?= number_format($data['overview']['requests_last_24h']) ?></span>
                </div>
                <div class="metric">
                    <span class="metric-label">Active Users</span>
                    <span class="metric-value"><?= number_format($data['overview']['active_users']) ?></span>
                </div>
            </div>

            <div class="card">
                <h2>Performance</h2>
                <div class="metric">
                    <span class="metric-label">Avg Response</span>
                    <span class="metric-value"><?= number_format($data['overview']['avg_response_time'], 2) ?>s</span>
                </div>
                <div class="metric">
                    <span class="metric-label">Error Rate</span>
                    <span class="metric-value"><?= number_format($data['overview']['error_rate'], 2) ?>%</span>
                </div>
            </div>

            <div class="card">
                <h2>Costs</h2>
                <div class="metric">
                    <span class="metric-label">Today</span>
                    <span class="metric-value">$<?= number_format($data['overview']['total_cost_today'], 2) ?></span>
                </div>
                <div class="metric">
                    <span class="metric-label">Monthly Budget</span>
                    <span class="metric-value">
                        $<?= number_format($data['costs']['this_month']['total'], 2) ?> /
                        $<?= number_format($data['costs']['budget']['monthly_limit'], 2) ?>
                    </span>
                </div>
            </div>
        </div>

        <!-- Latency Chart -->
        <div class="card">
            <h2>Latency Distribution</h2>
            <div class="chart-container">
                <canvas id="latencyChart"></canvas>
            </div>
        </div>
    </div>

    <script>
        <?= "
        // Latency chart
        new Chart(document.getElementById('latencyChart'), {
            type: 'bar',
            data: {
                labels: ['P50', 'P75', 'P95', 'P99'],
                datasets: [{
                    label: 'Latency (ms)',
                    data: [
                        {$data['performance']['latency']['p50']},
                        {$data['performance']['latency']['p75']},
                        {$data['performance']['latency']['p95']},
                        {$data['performance']['latency']['p99']}
                    ],
                    backgroundColor: ['#4CAF50', '#8BC34A', '#FFC107', '#FF5722']
                }]
            },
            options: {
                responsive: true,
                maintainAspectRatio: false,
                scales: {
                    y: { beginAtZero: true }
                }
            }
        });

        // Auto-refresh every 30 seconds
        setTimeout(() => location.reload(), 30000);
        " ?>
    </script>
</body>
</html>

Platform Integrations

Sentry Integration

php

<?php
# filename: src/Monitoring/SentryIntegration.php
declare(strict_types=1);

namespace App\Monitoring;

class SentryIntegration
{
    public function __construct(
        private readonly string $dsn
    ) {
        \Sentry\init([
            'dsn' => $this->dsn,
            'traces_sample_rate' => 0.1,  // 10% of transactions
            'environment' => getenv('APP_ENV') ?: 'production',
        ]);
    }

    /**
     * Capture Claude request as Sentry transaction
     */
    public function traceClaudeRequest(callable $request, array $context = []): mixed
    {
        $transaction = \Sentry\startTransaction([
            'op' => 'claude.request',
            'name' => $context['operation'] ?? 'claude_api_call',
        ]);

        \Sentry\SentrySdk::getCurrentHub()->setSpan($transaction);

        try {
            $result = $request();

            $transaction->setStatus(\Sentry\Tracing\SpanStatus::ok());
            $transaction->setData($context);

            return $result;

        } catch (\Throwable $e) {
            $transaction->setStatus(\Sentry\Tracing\SpanStatus::internalError());

            \Sentry\captureException($e, [
                'tags' => [
                    'component' => 'claude_api',
                    'model' => $context['model'] ?? 'unknown',
                ],
                'extra' => $context,
            ]);

            throw $e;

        } finally {
            $transaction->finish();
        }
    }
}

// Usage
use ClaudePhp\ClaudePhp;

$client = new ClaudePhp(
    apiKey: $_ENV['ANTHROPIC_API_KEY']
);

$sentry = new SentryIntegration(getenv('SENTRY_DSN'));

$response = $sentry->traceClaudeRequest(
    fn() => $client->messages()->create([...]),
    context: [
        'operation' => 'chatbot_response',
        'model' => 'claude-sonnet-4-5-20250929',
        'user_id' => 'user-123',
    ]
);

Datadog Integration

php

<?php
# filename: src/Monitoring/DatadogIntegration.php
declare(strict_types=1);

namespace App\Monitoring;

use DataDog\DogStatsd;

class DatadogIntegration
{
    private DogStatsd $statsd;

    public function __construct(string $host = 'localhost', int $port = 8125)
    {
        $this->statsd = new DogStatsd([
            'host' => $host,
            'port' => $port,
            'global_tags' => [
                'env:' . (getenv('APP_ENV') ?: 'production'),
                'service:claude-app',
            ],
        ]);
    }

    /**
     * Send Claude metrics to Datadog
     */
    public function trackClaudeRequest(
        string $model,
        int $inputTokens,
        int $outputTokens,
        float $duration,
        string $status = 'success'
    ): void {
        $tags = [
            "model:$model",
            "status:$status",
        ];

        // Request count
        $this->statsd->increment('claude.requests', 1, $tags);

        // Duration
        $this->statsd->timing('claude.duration', $duration * 1000, $tags);  // Convert to ms

        // Tokens
        $this->statsd->histogram('claude.tokens.input', $inputTokens, $tags);
        $this->statsd->histogram('claude.tokens.output', $outputTokens, $tags);

        // Cost
        $cost = $this->calculateCost($model, $inputTokens, $outputTokens);
        $this->statsd->histogram('claude.cost', $cost, $tags);
    }

    private function calculateCost(string $model, int $inputTokens, int $outputTokens): float
    {
        $pricing = match(true) {
            str_contains($model, 'opus') => ['input' => 15.00, 'output' => 75.00],
            str_contains($model, 'sonnet') => ['input' => 3.00, 'output' => 15.00],
            str_contains($model, 'haiku') => ['input' => 0.25, 'output' => 1.25],
            default => ['input' => 0, 'output' => 0],
        };

        return ($inputTokens / 1_000_000 * $pricing['input']) +
               ($outputTokens / 1_000_000 * $pricing['output']);
    }
}

// Usage
use ClaudePhp\ClaudePhp;

$client = new ClaudePhp(
    apiKey: $_ENV['ANTHROPIC_API_KEY']
);

$datadog = new DatadogIntegration();

$startTime = microtime(true);

try {
    $response = $client->messages()->create([...]);

    $duration = microtime(true) - $startTime;

    $datadog->trackClaudeRequest(
        $response->model,
        $response->usage->inputTokens,
        $response->usage->outputTokens,
        $duration,
        'success'
    );

} catch (\Exception $e) {
    $duration = microtime(true) - $startTime;

    $datadog->trackClaudeRequest(
        'unknown',
        0,
        0,
        $duration,
        'error'
    );

    throw $e;
}

Prometheus Integration

php

<?php
# filename: src/Monitoring/PrometheusIntegration.php
declare(strict_types=1);

namespace App\Monitoring;

class PrometheusIntegration
{
    private array $counters = [];
    private array $histograms = [];
    private string $namespace = 'claude_app';

    /**
     * Increment a counter metric
     */
    public function incrementCounter(
        string $name,
        array $labels = [],
        float $value = 1.0
    ): void {
        $key = $this->buildKey($name, $labels);
        $this->counters[$key] = ($this->counters[$key] ?? 0) + $value;
    }

    /**
     * Observe a histogram value
     */
    public function observeHistogram(
        string $name,
        float $value,
        array $labels = []
    ): void {
        $key = $this->buildKey($name, $labels);
        if (!isset($this->histograms[$key])) {
            $this->histograms[$key] = [];
        }
        $this->histograms[$key][] = $value;
    }

    /**
     * Track Claude request metrics
     */
    public function trackClaudeRequest(
        string $model,
        int $inputTokens,
        int $outputTokens,
        float $duration,
        string $status = 'success'
    ): void {
        $labels = [
            'model' => $this->simplifyModelName($model),
            'status' => $status,
        ];

        // Request counter
        $this->incrementCounter('claude_requests_total', $labels);

        // Duration histogram
        $this->observeHistogram('claude_request_duration_seconds', $duration, $labels);

        // Token histograms
        $this->observeHistogram('claude_tokens_input', $inputTokens, $labels);
        $this->observeHistogram('claude_tokens_output', $outputTokens, $labels);

        // Cost
        $cost = $this->calculateCost($model, $inputTokens, $outputTokens);
        $this->observeHistogram('claude_cost_usd', $cost, $labels);
    }

    /**
     * Export metrics in Prometheus format
     */
    public function exportMetrics(): string
    {
        $output = [];

        // Export counters
        foreach ($this->counters as $key => $value) {
            [$name, $labels] = $this->parseKey($key);
            $labelString = $this->formatLabels($labels);
            $output[] = "# TYPE {$this->namespace}_{$name} counter";
            $output[] = "{$this->namespace}_{$name}{$labelString} {$value}";
        }

        // Export histograms
        foreach ($this->histograms as $key => $values) {
            [$name, $labels] = $this->parseKey($key);
            $labelString = $this->formatLabels($labels);
            
            $count = count($values);
            $sum = array_sum($values);
            
            $output[] = "# TYPE {$this->namespace}_{$name} histogram";
            $output[] = "{$this->namespace}_{$name}_count{$labelString} {$count}";
            $output[] = "{$this->namespace}_{$name}_sum{$labelString} {$sum}";
            
            // Calculate buckets (simplified - in production use proper buckets)
            $buckets = [0.1, 0.5, 1.0, 2.5, 5.0, 10.0];
            foreach ($buckets as $bucket) {
                $bucketCount = count(array_filter($values, fn($v) => $v <= $bucket));
                $bucketLabels = $this->formatLabels(array_merge($labels, ['le' => (string)$bucket]));
                $output[] = "{$this->namespace}_{$name}_bucket{$bucketLabels} {$bucketCount}";
            }
            $infLabels = $this->formatLabels(array_merge($labels, ['le' => '+Inf']));
            $output[] = "{$this->namespace}_{$name}_bucket{$infLabels} {$count}";
        }

        return implode("\n", $output);
    }

    private function buildKey(string $name, array $labels): string
    {
        ksort($labels);
        $labelString = json_encode($labels);
        return "{$name}:{$labelString}";
    }

    private function parseKey(string $key): array
    {
        [$name, $labelJson] = explode(':', $key, 2);
        $labels = json_decode($labelJson, true);
        return [$name, $labels];
    }

    private function formatLabels(array $labels): string
    {
        if (empty($labels)) {
            return '';
        }

        $parts = [];
        foreach ($labels as $key => $value) {
            $parts[] = "{$key}=\"" . addslashes((string)$value) . "\"";
        }

        return '{' . implode(',', $parts) . '}';
    }

    private function simplifyModelName(string $model): string
    {
        return match(true) {
            str_contains($model, 'opus') => 'opus',
            str_contains($model, 'sonnet') => 'sonnet',
            str_contains($model, 'haiku') => 'haiku',
            default => 'unknown'
        };
    }

    private function calculateCost(string $model, int $inputTokens, int $outputTokens): float
    {
        $pricing = match(true) {
            str_contains($model, 'opus') => ['input' => 15.00, 'output' => 75.00],
            str_contains($model, 'sonnet') => ['input' => 3.00, 'output' => 15.00],
            str_contains($model, 'haiku') => ['input' => 0.25, 'output' => 1.25],
            default => ['input' => 0, 'output' => 0],
        };

        return ($inputTokens / 1_000_000 * $pricing['input']) +
               ($outputTokens / 1_000_000 * $pricing['output']);
    }
}

// Usage
use ClaudePhp\ClaudePhp;

$client = new ClaudePhp(
    apiKey: $_ENV['ANTHROPIC_API_KEY']
);

$prometheus = new PrometheusIntegration();

$startTime = microtime(true);

try {
    $response = $client->messages()->create([...]);
    $duration = microtime(true) - $startTime;

    $prometheus->trackClaudeRequest(
        $response->model,
        $response->usage->inputTokens,
        $response->usage->outputTokens,
        $duration,
        'success'
    );

} catch (\Exception $e) {
    $duration = microtime(true) - $startTime;

    $prometheus->trackClaudeRequest(
        'unknown',
        0,
        0,
        $duration,
        'error'
    );

    throw $e;
}

// Export metrics endpoint (e.g., /metrics)
// echo $prometheus->exportMetrics();

ELK Stack Integration

php

<?php
# filename: src/Monitoring/ElkStackIntegration.php
declare(strict_types=1);

namespace App\Monitoring;

use Elasticsearch\ClientBuilder;

class ElkStackIntegration
{
    private $elasticsearchClient;

    public function __construct(string $host = 'localhost', int $port = 9200)
    {
        $this->elasticsearchClient = ClientBuilder::create()
            ->setHosts(["{$host}:{$port}"])
            ->build();
    }

    /**
     * Send log to Elasticsearch
     */
    public function logEvent(
        string $index,
        array $document,
        ?string $documentId = null
    ): void {
        try {
            $params = [
                'index' => $index,
                'body' => $document,
            ];

            if ($documentId) {
                $params['id'] = $documentId;
            }

            $this->elasticsearchClient->index($params);
        } catch (\Exception $e) {
            error_log("Failed to send log to ELK: " . $e->getMessage());
        }
    }

    /**
     * Log Claude request to ELK
     */
    public function logClaudeRequest(
        string $model,
        int $inputTokens,
        int $outputTokens,
        float $duration,
        string $status = 'success',
        ?string $userId = null,
        ?string $requestId = null
    ): void {
        $document = [
            'timestamp' => date('c'),
            'service' => 'claude-app',
            'event_type' => 'claude_request',
            'model' => $model,
            'tokens' => [
                'input' => $inputTokens,
                'output' => $outputTokens,
                'total' => $inputTokens + $outputTokens,
            ],
            'duration_ms' => round($duration * 1000, 2),
            'status' => $status,
            'cost' => $this->calculateCost($model, $inputTokens, $outputTokens),
        ];

        if ($userId) {
            $document['user_id'] = $userId;
        }

        $this->logEvent(
            "claude-requests-" . date('Y.m.d'),
            $document,
            $requestId
        );
    }

    /**
     * Query logs from Elasticsearch
     */
    public function queryLogs(
        string $index,
        array $query,
        int $limit = 50
    ): array {
        try {
            $params = [
                'index' => $index,
                'body' => [
                    'query' => $query,
                    'size' => $limit,
                    'sort' => ['timestamp' => ['order' => 'desc']],
                ],
            ];

            $results = $this->elasticsearchClient->search($params);

            return array_map(
                fn($hit) => array_merge(['id' => $hit['_id']], $hit['_source']),
                $results['hits']['hits']
            );
        } catch (\Exception $e) {
            error_log("Failed to query ELK: " . $e->getMessage());
            return [];
        }
    }

    /**
     * Get error logs for today
     */
    public function getErrorLogsForToday(string $level = 'error'): array
    {
        $today = date('Y-m-d');

        return $this->queryLogs(
            "claude-logs-" . date('Y.m.d'),
            [
                'bool' => [
                    'must' => [
                        ['term' => ['level' => $level]],
                        ['range' => ['timestamp' => ['gte' => "{$today}T00:00:00Z"]]],
                    ],
                ],
            ]
        );
    }

    /**
     * Get aggregated metrics
     */
    public function getMetricsAggregation(
        string $startDate,
        string $endDate
    ): array {
        $params = [
            'index' => "claude-requests-*",
            'body' => [
                'query' => [
                    'range' => [
                        'timestamp' => [
                            'gte' => "{$startDate}T00:00:00Z",
                            'lte' => "{$endDate}T23:59:59Z",
                        ],
                    ],
                ],
                'aggs' => [
                    'avg_duration' => ['avg' => ['field' => 'duration_ms']],
                    'total_tokens' => ['sum' => ['field' => 'tokens.total']],
                    'total_cost' => ['sum' => ['field' => 'cost']],
                    'by_model' => [
                        'terms' => ['field' => 'model.keyword'],
                        'aggs' => [
                            'avg_duration' => ['avg' => ['field' => 'duration_ms']],
                            'total_requests' => ['value_count' => ['field' => '_id']],
                        ],
                    ],
                    'by_status' => ['terms' => ['field' => 'status.keyword']],
                ],
            ],
        ];

        try {
            return $this->elasticsearchClient->search($params);
        } catch (\Exception $e) {
            error_log("Failed to aggregate metrics from ELK: " . $e->getMessage());
            return [];
        }
    }

    private function calculateCost(string $model, int $inputTokens, int $outputTokens): float
    {
        $pricing = match(true) {
            str_contains($model, 'opus') => ['input' => 15.00, 'output' => 75.00],
            str_contains($model, 'sonnet') => ['input' => 3.00, 'output' => 15.00],
            str_contains($model, 'haiku') => ['input' => 0.25, 'output' => 1.25],
            default => ['input' => 0, 'output' => 0],
        };

        return ($inputTokens / 1_000_000 * $pricing['input']) +
               ($outputTokens / 1_000_000 * $pricing['output']);
    }
}

// Usage
$elk = new ElkStackIntegration('localhost', 9200);

// Log a request
$elk->logClaudeRequest(
    'claude-sonnet-4-5-20250929',
    200,
    400,
    2.5,
    'success',
    'user-123',
    'req_abc123'
);

// Query error logs
$errors = $elk->getErrorLogsForToday('error');
echo "Found " . count($errors) . " errors today\n";

// Get metrics
$metrics = $elk->getMetricsAggregation('2025-01-01', '2025-01-15');

Why It Works: Elasticsearch provides powerful full-text search and aggregation capabilities, making it ideal for log analysis at scale. The ELK Stack (Elasticsearch, Logstash, Kibana) enables searching across millions of log entries in milliseconds, creating custom dashboards, and setting up alerts based on complex queries. Unlike time-series databases optimized for metrics, Elasticsearch excels at analyzing and correlating log events with deep searchability.

Intelligent Alerting

php

<?php
# filename: src/Alerting/AlertManager.php
declare(strict_types=1);

namespace App\Alerting;

class AlertManager
{
    private array $alertRules = [];

    public function addRule(string $name, callable $condition, callable $action, int $cooldownSeconds = 300): void
    {
        $this->alertRules[$name] = [
            'condition' => $condition,
            'action' => $action,
            'cooldown' => $cooldownSeconds,
            'last_triggered' => 0,
        ];
    }

    public function checkAlerts(array $metrics): void
    {
        $now = time();

        foreach ($this->alertRules as $name => $rule) {
            // Check cooldown
            if ($now - $rule['last_triggered'] < $rule['cooldown']) {
                continue;
            }

            // Check condition
            if ($rule['condition']($metrics)) {
                // Trigger action
                $rule['action']($name, $metrics);

                // Update last triggered time
                $this->alertRules[$name]['last_triggered'] = $now;
            }
        }
    }
}

// Usage
$alertManager = new AlertManager();

// High error rate alert
$alertManager->addRule(
    name: 'high_error_rate',
    condition: fn($m) => $m['error_rate'] > 5.0,
    action: function($name, $metrics) use ($logger) {
        // Log alert
        $logger->getLogger()->warning('alert.triggered', [
            'alert_name' => $name,
            'error_rate' => $metrics['error_rate'],
            'threshold' => 5.0,
        ]);

        // Send to Slack (implement your own Slack integration)
        // $this->sendSlackAlert([...]);
    },
    cooldownSeconds: 600  // Don't spam - wait 10 minutes
);

// High cost alert
$alertManager->addRule(
    name: 'daily_cost_limit',
    condition: fn($m) => $m['daily_cost'] > 450,
    action: function($name, $metrics) {
        mail(
            'ops@example.com',
            'Claude API Cost Alert',
            "Daily cost has reached $" . $metrics['daily_cost'] . " (limit: $500)"
        );
    }
);

// Check alerts periodically
$dashboardData = $dashboardProvider->getRealTimeDashboard();
$alertManager->checkAlerts([
    'error_rate' => $dashboardData['overview']['error_rate'],
    'daily_cost' => $dashboardData['overview']['total_cost_today'],
]);

Exercises

Exercise 1: Custom Metrics Dashboard

Goal: Build a custom dashboard showing business-critical metrics for your Claude application.

Create a CustomDashboard class that implements:

Customer satisfaction scores (average, trend over time)
Response quality trends (relevance and coherence over last 7 days)
Cost per customer interaction (total cost / total interactions)
Most common user intents (top 10 intents by frequency)
Peak usage hours (requests per hour of day)

Validation: Test your implementation:

php

$dashboard = new CustomDashboard();
$metrics = $dashboard->getBusinessMetrics();

// Verify all required metrics are present
assert(isset($metrics['satisfaction']));
assert(isset($metrics['quality_trends']));
assert(isset($metrics['cost_per_interaction']));
assert(isset($metrics['top_intents']));
assert(isset($metrics['peak_hours']));

// Verify data types
assert(is_float($metrics['cost_per_interaction']));
assert(is_array($metrics['top_intents']));
assert(count($metrics['top_intents']) <= 10);

Expected output structure:

php

[
    'satisfaction' => ['average' => 4.2, 'trend' => 'increasing'],
    'quality_trends' => ['relevance' => [...], 'coherence' => [...]],
    'cost_per_interaction' => 0.15,
    'top_intents' => [['intent' => 'support', 'count' => 150], ...],
    'peak_hours' => [['hour' => 14, 'requests' => 250], ...]
]

Exercise 2: Anomaly Detection

Goal: Implement statistical anomaly detection to identify unusual patterns in metrics.

Create an AnomalyDetector class that detects anomalies using:

Z-score calculation for statistical outliers (threshold: |z| > 2.5)
Sudden spikes or drops (change > 50% from previous period)
Unusual patterns (values outside 3 standard deviations)
Return anomalies with severity levels: 'low', 'medium', 'high'

Validation: Test with sample data:

php

$detector = new AnomalyDetector();

// Test with normal data
$normalHistory = [10, 11, 9, 12, 10, 11, 10];
$anomalies = $detector->detectAnomalies('test_metric', $normalHistory);
assert(empty($anomalies)); // Should find no anomalies

// Test with outlier
$outlierHistory = [10, 11, 9, 12, 10, 11, 100]; // 100 is an outlier
$anomalies = $detector->detectAnomalies('test_metric', $outlierHistory);
assert(!empty($anomalies));
assert($anomalies[0]['severity'] === 'high');

Expected output format:

php

[
    [
        'metric' => 'test_metric',
        'value' => 100,
        'expected_range' => [8.5, 12.5],
        'z_score' => 8.2,
        'severity' => 'high',
        'timestamp' => 1234567890
    ]
]

Exercise 3: Performance Profiler

Goal: Create a detailed performance profiler to identify bottlenecks in Claude requests.

Implement a PerformanceProfiler that tracks:

Time spent in each component (database, cache, Claude API, processing)
Database query times (individual queries and total)
Claude API latency (request time, token generation time)
Caching effectiveness (hit rate, time saved)
Bottleneck identification (component taking > 30% of total time)

Validation: Profile a sample request:

php

$profiler = new PerformanceProfiler();
$requestId = 'req_123';

// Start profiling
$profiler->startRequest($requestId);

// Simulate components
$profiler->startComponent('database');
usleep(100000); // 100ms
$profiler->endComponent('database');

$profiler->startComponent('claude_api');
usleep(500000); // 500ms
$profiler->endComponent('claude_api');

$profiler->endRequest($requestId);

// Get profile
$profile = $profiler->getProfile($requestId);

assert($profile['total_duration'] > 0);
assert(isset($profile['components']['database']));
assert(isset($profile['components']['claude_api']));
assert(isset($profile['bottlenecks']));
assert($profile['components']['claude_api']['duration'] > 
       $profile['components']['database']['duration']);

Expected output:

php

[
    'request_id' => 'req_123',
    'total_duration' => 0.6,
    'components' => [
        'database' => ['duration' => 0.1, 'percentage' => 16.7],
        'claude_api' => ['duration' => 0.5, 'percentage' => 83.3],
    ],
    'bottlenecks' => ['claude_api'],
    'cache' => ['hit_rate' => 0.75, 'time_saved' => 0.2]
]

Troubleshooting

Metrics Not Appearing

Symptom: Metrics are being recorded but don't appear in dashboards or queries.

Possible Causes:

Redis connection issues
Metric names inconsistent between recording and querying
TTL too short for long-running queries
Clock skew across servers causing time-based queries to fail

Solutions:

php

// Verify Redis connection
$redis = new \Redis();
$redis->connect('127.0.0.1', 6379);
if (!$redis->ping()) {
    throw new \RuntimeException('Redis connection failed');
}

// Check metric exists
$key = 'metrics:claude.request.duration';
$exists = $redis->exists($key);
if (!$exists) {
    // Metric was never recorded or expired
}

// Verify TTL settings
$ttl = $redis->ttl($key);
if ($ttl < 0) {
    // Key exists but has no expiration - may need cleanup
}

High Cardinality Issues

Symptom: Redis memory usage growing rapidly, queries slow down, or Redis crashes.

Possible Causes:

Using unbounded tag values (like user IDs) creating unique metric keys
Too many unique tag combinations
Not aggregating metrics before storage

Solutions:

php

// ❌ BAD - Creates unique key per user
$metrics->record('claude.requests', 1, ['user' => $userId]);

// ✅ GOOD - Aggregate by user type or remove user tag
$metrics->record('claude.requests', 1, ['user_type' => 'premium']);
// Or aggregate separately for top users only
if ($isTopUser) {
    $metrics->record('claude.requests', 1, ['user' => $userId]);
}

// Use sampling for high-volume metrics
if (rand(1, 100) <= 10) { // Sample 10%
    $metrics->record('claude.requests', 10, $tags);
}

Dashboard Performance Slow

Symptom: Dashboard takes several seconds to load or times out.

Possible Causes:

Querying raw metrics instead of aggregates
No caching for dashboard queries
Complex calculations on large datasets

Solutions:

php

// ❌ BAD - Querying all raw metrics
$allMetrics = $redis->zRange('metrics:claude.request.duration', 0, -1);
$stats = calculateStats($allMetrics); // Slow!

// ✅ GOOD - Use pre-aggregated data
$hourlyStats = $redis->hGetAll('metrics:claude.request.duration:hour:2025-01-15-14');
// Already aggregated, fast!

// Add caching layer
$cacheKey = 'dashboard:overview:' . date('Y-m-d-H');
$data = $redis->get($cacheKey);
if (!$data) {
    $data = $dashboardProvider->getRealTimeDashboard();
    $redis->setex($cacheKey, 60, json_encode($data)); // Cache for 1 minute
}

Logs Not Being Written

Symptom: Application runs but no log files are created.

Possible Causes:

File permissions issues
Disk space full
Log directory doesn't exist
Handler configuration incorrect

Solutions:

bash

# Check directory exists and is writable
mkdir -p /var/log/app
chmod 755 /var/log/app
chown www-data:www-data /var/log/app

# Verify disk space
df -h /var/log

# Test logging manually
php -r "
require 'vendor/autoload.php';
\$logger = new Monolog\Logger('test');
\$logger->pushHandler(new Monolog\Handler\StreamHandler('/var/log/app/test.log'));
\$logger->info('Test message');
"

Distributed Tracing Not Working

Symptom: Traces are created but spans are missing or incomplete.

Possible Causes:

Spans not being finished properly
Parent span ID tracking incorrect
Trace context not propagated across services

Solutions:

php

// Always use try-finally to ensure spans finish
$spanId = $tracer->startSpan('operation');
try {
    // ... operation ...
    $tracer->finishSpan($spanId, ['status' => 'success']);
} catch (\Exception $e) {
    $tracer->finishSpan($spanId, ['status' => 'error', 'error' => true]);
    throw $e;
}

// Propagate trace context across services
$traceContext = [
    'trace_id' => $tracer->getTraceId(),
    'span_id' => $currentSpanId,
];
// Include in HTTP headers or message queue metadata

Alert Fatigue

Symptom: Too many alerts being triggered, causing important ones to be ignored.

Possible Causes:

Cooldown periods too short
Thresholds too sensitive
Alerts not properly categorized by severity

Solutions:

php

// Increase cooldown for non-critical alerts
$alertManager->addRule(
    name: 'minor_latency_increase',
    condition: fn($m) => $m['p95_latency'] > 3.0,
    action: $logAction,
    cooldownSeconds: 3600 // 1 hour cooldown
);

// Use different thresholds for different times
$hour = (int)date('H');
$threshold = ($hour >= 9 && $hour <= 17) ? 5.0 : 10.0; // Higher threshold off-hours

// Group related alerts
if ($errorRate > 5.0 && $latency > 5.0) {
    // Single alert for correlated issues
    triggerAlert('system_degradation', ['error_rate' => $errorRate, 'latency' => $latency]);
}

Monitoring Overhead

Symptom: Application performance degrades when monitoring is enabled; high CPU/memory usage from monitoring systems.

Possible Causes:

Sampling rate too high (logging every request)
Aggregations computed too frequently
Network latency sending metrics to remote systems
Unoptimized metric cardinality

Solutions:

php

// ❌ BAD - Sample every request (100%)
foreach ($requests as $request) {
    $metrics->histogram('request.duration', $request->duration);
}

// ✅ GOOD - Use adaptive sampling
$sampleRate = 0.1; // Sample 10% of requests
foreach ($requests as $request) {
    if (rand() / getrandmax() < $sampleRate) {
        $metrics->histogram('request.duration', $request->duration);
        // Scale metric by inverse of sample rate for accuracy
        $metrics->increment('requests.total', (int)(1 / $sampleRate));
    }
}

// ✅ GOOD - Use percentile sampling for high-traffic
$percentile = rand(0, 99);
if ($percentile < 5) { // Sample top 5% and bottom 5%
    $metrics->record('request.duration', $duration);
}

// ✅ GOOD - Batch metrics before sending
$batch = [];
foreach ($requests as $request) {
    $batch[] = [
        'metric' => 'claude.request.duration',
        'value' => $request->duration,
        'tags' => ['model' => $request->model],
    ];
}

// Send in batch (reduces network overhead)
$metricsCollector->recordBatch($batch);

// ✅ GOOD - Disable expensive metrics in production
if (getenv('APP_ENV') === 'production') {
    // Skip detailed query timing
    $profiler->disable('database.query_timing');
} else {
    // Enable all metrics for local development
    $profiler->enable('database.query_timing');
}

Performance Impact Guide:

Activity                    CPU Overhead    Memory    Network
- JSON logging             ~1-2%           +5MB      Low
- Redis metrics            ~2-3%           +10MB     Medium
- Distributed tracing      ~3-5%           +20MB     High
- All combined (no sample) ~8-12%          +50MB     High

With 10% sampling:         ~1-2%           +10MB     Low
With adaptive sampling:    ~0.5-1%         +5MB      Very Low

Monitoring Checklist:

[ ] Sampling enabled for high-volume systems
[ ] Batch metrics before sending to remote systems
[ ] Async logging (non-blocking)
[ ] Metrics aggregated at collection, not query time
[ ] High-cardinality metrics sampled or disabled
[ ] Monitoring overhead < 5% CPU in production
[ ] Alert thresholds configured to reduce noise
[ ] Old data archived/deleted to manage storage

Data Retention and Archival

Production monitoring systems generate enormous amounts of data. Without a retention strategy, storage costs grow unbounded and queries become slower over time.

Retention Strategy

php

<?php
# filename: src/Monitoring/RetentionPolicy.php
declare(strict_types=1);

namespace App\Monitoring;

class RetentionPolicy
{
    /**
     * Define retention tiers
     */
    public function getRetentionTiers(): array
    {
        return [
            'raw_metrics' => [
                'duration' => 7 * 24 * 3600,  // 7 days
                'resolution' => '1 second',
                'storage' => 'Redis',
                'cost_factor' => 1.0,
            ],
            'hourly_aggregates' => [
                'duration' => 90 * 24 * 3600, // 90 days
                'resolution' => '1 hour',
                'storage' => 'TimescaleDB',
                'cost_factor' => 0.1,
            ],
            'daily_aggregates' => [
                'duration' => 2 * 365 * 24 * 3600, // 2 years
                'resolution' => '1 day',
                'storage' => 'Parquet (S3)',
                'cost_factor' => 0.01,
            ],
            'archived_logs' => [
                'duration' => 7 * 365 * 24 * 3600, // 7 years (compliance)
                'resolution' => 'raw',
                'storage' => 'Glacier',
                'cost_factor' => 0.001,
            ],
        ];
    }

    /**
     * Aggregate data before archival
     */
    public function aggregateForArchival(
        array $rawMetrics,
        string $period = '1 hour'
    ): array {
        $aggregated = [];

        foreach ($rawMetrics as $metric) {
            $key = $metric['timestamp'];
            if (!isset($aggregated[$key])) {
                $aggregated[$key] = [
                    'timestamp' => $metric['timestamp'],
                    'count' => 0,
                    'sum' => 0,
                    'min' => PHP_FLOAT_MAX,
                    'max' => PHP_FLOAT_MIN,
                    'p50' => 0,
                    'p95' => 0,
                    'p99' => 0,
                    'values' => [],
                ];
            }

            $aggregated[$key]['values'][] = $metric['value'];
            $aggregated[$key]['count']++;
            $aggregated[$key]['sum'] += $metric['value'];
            $aggregated[$key]['min'] = min($aggregated[$key]['min'], $metric['value']);
            $aggregated[$key]['max'] = max($aggregated[$key]['max'], $metric['value']);
        }

        // Calculate percentiles
        foreach ($aggregated as &$agg) {
            sort($agg['values']);
            $count = count($agg['values']);
            $agg['p50'] = $agg['values'][(int)($count * 0.50)];
            $agg['p95'] = $agg['values'][(int)($count * 0.95)];
            $agg['p99'] = $agg['values'][(int)($count * 0.99)];
            unset($agg['values']);
        }

        return $aggregated;
    }

    /**
     * Compress data for long-term storage
     */
    public function compressForArchival(array $data): string
    {
        $json = json_encode($data);
        // Zstandard compression ratio ~5:1
        return zstd_compress($json, 3);
    }

    /**
     * Calculate storage cost for retention policy
     */
    public function calculateStorageCost(
        float $dailyGbGenerated,
        int $costPerGbMonth = 25
    ): array {
        $tiers = $this->getRetentionTiers();
        $costs = [];
        $totalCost = 0;

        foreach ($tiers as $tier => $policy) {
            $durationDays = $policy['duration'] / 86400;
            $totalGb = $durationDays * $dailyGbGenerated;
            $cost = $totalGb * ($costPerGbMonth / 30) * $policy['cost_factor'];

            $costs[$tier] = [
                'total_gb' => $totalGb,
                'duration_days' => $durationDays,
                'monthly_cost' => $cost,
                'storage' => $policy['storage'],
            ];

            $totalCost += $cost;
        }

        $costs['total_monthly'] = $totalCost;

        return $costs;
    }
}

// Usage
$retention = new RetentionPolicy();

// Example: Calculate cost for system generating 1GB/day
$costs = $retention->calculateStorageCost(
    dailyGbGenerated: 1.0,
    costPerGbMonth: 25
);

echo "Monthly storage costs:\n";
foreach ($costs as $tier => $data) {
    if (is_string($tier) && $tier !== 'total_monthly') {
        echo sprintf(
            "  %s: $%.2f (%d GB, %d days)\n",
            $tier,
            $data['monthly_cost'],
            (int)$data['total_gb'],
            (int)$data['duration_days']
        );
    }
}

echo sprintf("\nTotal monthly: $%.2f\n", $costs['total_monthly']);

Downsampling Strategies

php

<?php
// Downsampling reduces storage while maintaining accuracy

// Strategy 1: Uniform sampling (every Nth point)
function uniformDownsample(array $metrics, int $sampleRate = 10): array
{
    return array_filter(
        $metrics,
        fn($i) => $i % $sampleRate === 0,
        ARRAY_FILTER_USE_KEY
    );
}

// Strategy 2: Extrema preservation (keep min/max)
function extremaDownsample(array $metrics, int $bucketSize = 60): array
{
    $downsampled = [];

    for ($i = 0; $i < count($metrics); $i += $bucketSize) {
        $bucket = array_slice($metrics, $i, $bucketSize);
        $values = array_column($bucket, 'value');

        $downsampled[] = [
            'timestamp' => $bucket[0]['timestamp'],
            'min' => min($values),
            'max' => max($values),
            'avg' => array_sum($values) / count($values),
        ];
    }

    return $downsampled;
}

// Strategy 3: Compression-aware (keep anomalies)
function anomalyDownsample(array $metrics, float $deviation = 2.0): array
{
    $mean = array_sum(array_column($metrics, 'value')) / count($metrics);
    $stdDev = sqrt(
        array_reduce(
            $metrics,
            fn($sum, $m) => $sum + pow($m['value'] - $mean, 2),
            0
        ) / count($metrics)
    );

    return array_filter(
        $metrics,
        fn($m) => abs($m['value'] - $mean) > $deviation * $stdDev
    );
}

Why This Matters:

A production system generating just 1GB/day of metrics produces 365GB yearly. Raw metrics storage at $25/GB/month costs $9,125/year. By implementing tiered storage with downsampling:

Keep raw metrics for 7 days (quick debugging)
Store hourly aggregates for 90 days (trend analysis)
Archive daily summaries for 2 years (compliance + business analysis)
Archive audit logs for 7 years (compliance requirements)

This reduces yearly storage from $9,125 to under $1,000 while maintaining operational and compliance requirements.

Wrap-up

Congratulations! You've built a comprehensive monitoring and observability system for your Claude applications. Here's what you've accomplished:

✓ Structured Logging: Implemented JSON-formatted logging with Monolog, including context enrichment and automatic request/response tracking
✓ Metrics Collection: Created a Redis-based metrics system tracking latency, tokens, costs, errors, and quality metrics
✓ Distributed Tracing: Built a tracing system to understand request flow across multiple services and identify bottlenecks
✓ Real-Time Dashboards: Developed actionable dashboards showing performance, costs, quality, and error metrics
✓ Intelligent Alerting: Configured alert rules with cooldowns to detect issues proactively without alert fatigue
✓ Platform Integration: Integrated with Sentry, Datadog, and Prometheus for enterprise-grade monitoring
✓ Cost Monitoring: Implemented real-time cost tracking to prevent budget overruns
✓ Quality Metrics: Added monitoring for AI-specific metrics like output quality and user satisfaction

Key Concepts Learned:

Structured logging with JSON formatting enables searchable, analyzable log data
Time-series metrics storage in Redis allows for efficient aggregation and querying
ELK Stack provides deep log analysis with full-text search and complex aggregations
Metrics categorization (Tier 1/2/3) reduces noise and focuses monitoring efforts
Distributed tracing provides visibility into complex request flows across services
Real-time dashboards transform raw metrics into actionable business insights
Intelligent alerting balances proactive issue detection with manageable notification volume
Platform integrations extend your monitoring capabilities with enterprise features
Monitoring overhead can be controlled through sampling, batching, and selective metrics
Tiered storage and downsampling reduce costs from $9K/year to under $1K while maintaining compliance

Next Steps:

Monitoring is essential for production AI applications. With this foundation, you can:

Scale your monitoring as your application grows
Add custom metrics specific to your use case
Integrate with additional monitoring platforms
Build automated responses to common alerts

In the next chapter, you'll learn about scaling Claude applications horizontally to handle increased load while maintaining performance and reliability.

💻 Code Samples

All code examples from this chapter are available in the GitHub repository:

View Chapter 37 Code Samples

Clone and run locally:

bash

git clone https://github.com/dalehurley/codewithphp.git
cd codewithphp/code/claude-php/chapter-37
composer install
php examples/monitoring-demo.php

Chapter 37: Monitoring and Observability ​

Overview ​

Prerequisites ​

What You'll Build ​

Objectives ​

Structured Logging ​

Logging Infrastructure ​

Request Logging Middleware ​

Metrics Collection ​

Metrics Collector ​

Metrics Categorization ​

Key Metrics to Track ​

Distributed Tracing ​

Tracing Implementation ​

Real-Time Dashboards ​

Dashboard Data Provider ​

HTML Dashboard ​

Platform Integrations ​

Sentry Integration ​

Datadog Integration ​

Prometheus Integration ​

ELK Stack Integration ​

Intelligent Alerting ​

Exercises ​

Exercise 1: Custom Metrics Dashboard ​

Exercise 2: Anomaly Detection ​

Exercise 3: Performance Profiler ​

Troubleshooting ​

Metrics Not Appearing ​

High Cardinality Issues ​

Dashboard Performance Slow ​

Logs Not Being Written ​

Distributed Tracing Not Working ​

Alert Fatigue ​

Monitoring Overhead ​

Data Retention and Archival ​

Retention Strategy ​

Downsampling Strategies ​

Further Reading ​

Wrap-up ​

Further Reading ​

Logging & Log Analysis ​

Metrics & Time-Series ​

Sampling & Performance ​

Data Retention ​

Platform Documentation ​

Related Chapters ​

💻 Code Samples ​

Chapter 37: Monitoring and Observability

Overview

Prerequisites

What You'll Build

Objectives

Structured Logging

Logging Infrastructure

Request Logging Middleware

Metrics Collection

Metrics Collector

Metrics Categorization

Key Metrics to Track

Distributed Tracing

Tracing Implementation

Real-Time Dashboards

Dashboard Data Provider

HTML Dashboard

Platform Integrations

Sentry Integration

Datadog Integration

Prometheus Integration

ELK Stack Integration

Intelligent Alerting

Exercises

Exercise 1: Custom Metrics Dashboard

Exercise 2: Anomaly Detection

Exercise 3: Performance Profiler

Troubleshooting

Metrics Not Appearing

High Cardinality Issues

Dashboard Performance Slow

Logs Not Being Written

Distributed Tracing Not Working

Alert Fatigue

Monitoring Overhead

Data Retention and Archival

Retention Strategy

Downsampling Strategies

Further Reading

Wrap-up

Further Reading

Logging & Log Analysis

Metrics & Time-Series

Sampling & Performance

Data Retention

Platform Documentation

Related Chapters

💻 Code Samples