24: Deploying and Scaling AI-Powered PHP Services

Chapter 24: Deploying and Scaling AI-Powered PHP Services
Section titled “Chapter 24: Deploying and Scaling AI-Powered PHP Services”Overview
Section titled “Overview”You’ve built AI-powered features, integrated ML models into your PHP applications, and created intelligent web services. Now comes the crucial step that separates proof-of-concept from production: deploying your AI application to handle real users, real traffic, and real-world challenges. This chapter transforms your development environment into a robust, scalable, production-ready system.
Deploying AI-powered applications presents unique challenges compared to traditional web apps. A standard PHP website might respond to requests in 50-200 milliseconds, but an ML model making predictions could take anywhere from 100ms to several seconds. Your web server shouldn’t freeze while waiting for predictions. Database queries can be optimized with indexes and caching, but ML inference often requires significant CPU or GPU resources—resources that need careful management in production. A crashed web server affects users immediately, but a failed ML worker might silently stop processing predictions while your web interface continues accepting requests. These are the real-world issues this chapter addresses.
This chapter teaches production deployment strategies used by companies running ML at scale. You’ll containerize your application with Docker to ensure consistency across environments, implement async job queues so ML inference doesn’t block web requests, deploy to cloud infrastructure with load balancing for high availability, set up comprehensive monitoring to track both application and ML-specific metrics, and create CI/CD pipelines for automated testing and deployment including model updates. These aren’t theoretical concepts—they’re battle-tested patterns handling millions of requests daily.
By the end of this chapter, you’ll have a production-ready AI-powered PHP service running in the cloud, processing predictions asynchronously through worker processes, handling traffic spikes with load balancing, and monitored for both traditional web metrics and ML-specific concerns like model accuracy and inference latency. You’ll understand the trade-offs between different deployment strategies, know how to optimize costs while maintaining performance, and have the confidence to deploy and maintain AI services in production environments.
Prerequisites
Section titled “Prerequisites”Before starting this chapter, you should have:
- Completed Chapter 23 or have experience integrating ML models into PHP web applications
- PHP 8.4+ installed and working knowledge of PHP web development
- Docker installed locally for containerization testing
- Basic understanding of command-line operations and shell scripts
- Familiarity with Git and GitHub for CI/CD integration
- Access to a cloud provider account (DigitalOcean, AWS, or similar) with free tier available
- Redis understanding (covered in Chapter 23 or equivalent experience)
- Composer for PHP dependency management
- Basic understanding of HTTP, APIs, and web server concepts
- Text editor or IDE with Docker and YAML support
Estimated Time: ~90-120 minutes (including cloud deployment and testing)
Verify your setup:
# Check PHP versionphp --version
# Verify Docker installationdocker --versiondocker compose version
# Check Gitgit --version
# Verify Composercomposer --version
# Test Docker is runningdocker run hello-worldExpected output confirms Docker 20+, PHP 8.4+, Composer 2.x, and successful Docker test.
::: warning PHP 8.4 Compatibility PHP 8.4 was released in November 2024. If your hosting provider or CI/CD environment doesn’t support it yet, you can modify the Dockerfile to use PHP 8.3:
# Change this line in Dockerfile:- FROM php:8.4-cli-alpine AS builder+ FROM php:8.3-cli-alpine AS builderAll code examples in this chapter are compatible with PHP 8.3+. The main benefits of 8.4 (property hooks, asymmetric visibility) aren’t critical for the deployment patterns shown here. :::
::: tip Cloud Provider Choice This chapter uses DigitalOcean for examples (5 credit for new users), but the Docker-based approach works on any cloud provider (AWS EC2, Google Cloud, Azure, Linode). The concepts are platform-agnostic. :::
What You’ll Build
Section titled “What You’ll Build”By the end of this chapter, you will have created:
- A multi-stage Dockerfile building optimized PHP containers with ML dependencies for both development and production
- A Docker Compose configuration orchestrating app containers, Redis, worker processes, and Nginx reverse proxy
- A Redis-based job queue system for offloading ML inference to background workers without blocking web requests
- A PredictionJob class encapsulating ML prediction requests with payload validation, retry logic, and error handling
- An ML worker daemon continuously processing prediction jobs from the queue with graceful shutdown and resource limits
- A result caching layer storing prediction results in Redis to avoid redundant ML computations for identical inputs
- A REST API endpoint accepting prediction requests, queueing them, and returning results with appropriate status codes
- A health check endpoint monitoring system status, queue depth, worker availability, and model health for load balancers
- A metrics collection system tracking requests per second, average latency, queue depth, cache hit rates, and error rates
- An Nginx load balancer configuration distributing traffic across multiple application containers with health checks
- A cloud deployment setup running your AI service on DigitalOcean/AWS with proper environment variable management
- A CI/CD pipeline using GitHub Actions to automatically test, build Docker images, and deploy on git push
- A monitoring dashboard displaying real-time metrics, system health, and ML-specific performance indicators
- Production-ready logging infrastructure with structured logs, log levels, and error alerting capabilities
All code follows PHP 8.4 standards with strict typing, readonly properties, comprehensive error handling, and security best practices for production deployments.
::: info Code Examples Complete, runnable examples for this chapter are available in:
Configuration Files:
Dockerfile— Multi-stage PHP container with ML dependenciesdocker-compose.yml— Development stack configurationdocker-compose.prod.yml— Production overrides.dockerignore— Files to exclude from Docker contextenv.example— Environment variables template
Application Code:
01-simple-docker-test.php— Verify Docker setup works02-job-queue-system.php— Job and Queue implementation03-ml-worker.php— Worker daemon process04-api-endpoint.php— REST API for predictions05-caching-layer.php— Redis caching06-health-check.php— Health monitoring endpoint07-metrics-collector.php— Performance metrics
Server Configuration:
nginx/default.conf— Nginx reverse proxynginx/load-balancer.conf— Load balancer configsupervisor/worker.conf— Supervisor for workers
Deployment:
.github/workflows/deploy.yml— CI/CD pipelinescripts/deploy.sh— Deployment scriptscripts/scale-workers.sh— Worker scaling
Monitoring:
monitoring/dashboard.php— Metrics dashboardmonitoring/logger.php— Centralized logging
Exercise Solutions:
solutions/exercise1-autoscale.php— Auto-scaling implementationsolutions/exercise2-blue-green.sh— Blue-green deploymentsolutions/exercise3-health-check.php— Advanced health checkssolutions/exercise4-optimized.Dockerfile— Optimized Docker image
All files are in docs/series/ai-ml-php-developers/code/chapter-24/
:::
Quick Start
Section titled “Quick Start”Want to see a containerized AI service in action right now? Here’s a 5-minute example using Docker Compose:
version: "3.8"
services: redis: image: redis:7-alpine ports: - "6379:6379" healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 3s retries: 5
app: image: php:8.4-cli-alpine volumes: - ./:/app working_dir: /app environment: REDIS_HOST: redis REDIS_PORT: 6379 depends_on: redis: condition: service_healthy command: php quick-ml-service.php<?php
declare(strict_types=1);
// Quick ML service demo with proper error handlingtry { $redis = new Redis(); $redis->connect(getenv('REDIS_HOST') ?: 'redis', 6379);
// Verify Redis connectivity if (!$redis->ping()) { throw new RuntimeException('Redis ping failed - connection issue'); }
echo "🚀 AI Service Running!\n"; echo "📊 Queue depth: " . $redis->lLen('ml:jobs') . "\n"; echo "⏳ Ready to process predictions...\n\n";
// Process jobs continuously while (true) { $job = $redis->brPop(['ml:jobs'], 5); // 5-second timeout if ($job) { $jobId = $job[1]; echo "[" . date('Y-m-d H:i:s') . "] Processing: {$jobId}\n";
// Simulate ML inference (in production, call actual model) sleep(1);
// Store result in Redis $result = json_encode([ 'prediction' => 0.95, 'confidence' => 0.87, 'processed_at' => date('c') ], JSON_THROW_ON_ERROR);
$redis->set("result:{$jobId}", $result); echo "[" . date('Y-m-d H:i:s') . "] ✓ Job {$jobId} completed\n"; } }} catch (Exception $e) { echo "❌ Error: " . $e->getMessage() . "\n"; exit(1);}Run it:
# Start the entire stackdocker compose -f quick-start-compose.yml up -d
# Wait for Redis to be readysleep 3
# In another terminal, queue a test jobdocker compose -f quick-start-compose.yml exec redis redis-cli LPUSH ml:jobs test-prediction-1
# Check the result (will be available after ~2 seconds)sleep 2docker compose -f quick-start-compose.yml exec redis redis-cli GET "result:test-prediction-1"
# View logs in real-timedocker compose -f quick-start-compose.yml logs -f app
# Clean updocker compose -f quick-start-compose.yml downExpected output shows the service receiving jobs, processing them through Redis, and storing results. Queue a few jobs and watch them process sequentially! Now let’s build the production-ready version with multiple workers and load balancing.
Objectives
Section titled “Objectives”By completing this chapter, you will:
- Containerize PHP ML applications using Docker with multi-stage builds for optimized production images
- Implement async job queues with Redis to offload ML inference from web request cycles
- Deploy to cloud infrastructure with proper environment configuration, SSL/TLS, and public accessibility
- Configure load balancing with Nginx to distribute traffic across multiple application instances
- Monitor ML services tracking both traditional web metrics and ML-specific performance indicators
- Set up CI/CD pipelines for automated testing, Docker image building, and deployment on git push
- Optimize for production balancing cost, performance, reliability, and maintainability in real-world deployments
::: tip Complete Code Examples
This chapter includes extensive code examples. All files are available in code/chapter-24/ with detailed inline documentation. Each step below references the specific files you’ll need.
:::
Typical Request Flow
Section titled “Typical Request Flow”To understand how everything works together, here’s a complete workflow from user request to result:
1. User Request └─> POST /api/ml/sentiment Content: { "text": "Great product!" }
2. API Endpoint (04-api-endpoint.php) ├─> Validates input ├─> Generates unique job ID └─> Returns HTTP 202 "Accepted" Response: { "job_id": "pred_abc123", "status": "queued" }
3. Job Queued to Redis └─> LPUSH ml:jobs "pred_abc123"
4. Worker Process (03-ml-worker.php) ├─> Continuous loop: BRPOP ml:jobs (blocks until job available) ├─> Retrieves job from queue └─> Processes prediction
5. Caching Layer (05-caching-layer.php) ├─> Checks if input features already cached ├─> Cache hit? Return cached result (2ms) └─> Cache miss? Run inference (~250ms)
6. ML Inference └─> Load model → Extract features → Get prediction
7. Result Stored └─> SET result:pred_abc123 "{ prediction: 0.95 }" SET cache:v1:sentiment:hash123 "{ ... }" EX 3600
8. User Polls Result └─> GET /api/results/pred_abc123 Response: { "prediction": 0.95, "cached": false } or (on cache hit): Response: { "prediction": 0.78, "cached": true }
9. Monitoring & Metrics └─> INCR metrics:requests:total LPUSH metrics:latency 245 INCR metrics:cache:missesThis asynchronous pattern prevents the user’s HTTP request from blocking while waiting for potentially slow ML inference. The user gets an immediate response with a job ID, can check status later, and the system handles predictions in the background with multiple workers processing jobs in parallel.
Implementation Roadmap
Section titled “Implementation Roadmap”This chapter guides you through 8 progressive steps to deploy your AI service. Each step builds on the previous one, and all code is production-tested.
The implementation follows this sequence:
- Step 1: Containerize with Docker — Package your application into portable containers
- Step 2: Implement Job Queues — Set up Redis-based async job processing
- Step 3: Build Worker Processes — Create daemons to process ML predictions
- Step 4: Add Result Caching — Implement intelligent caching to reduce redundant computations
- Step 5: Deploy to Cloud — Move your service to production infrastructure
- Step 6: Configure Load Balancing — Distribute traffic across multiple instances
- Step 7: Set Up Monitoring — Track system health and ML-specific metrics
- Step 8: Create CI/CD Pipeline — Automate testing and deployment
Each step is essential and depends on the previous ones. By the end, you’ll have a complete production deployment pipeline.
Step 1: Containerizing Your AI Application (~15 min)
Section titled “Step 1: Containerizing Your AI Application (~15 min)”Package your PHP application with all ML dependencies into a Docker container that runs consistently across development, testing, and production environments.
Implementation
Section titled “Implementation”Create a multi-stage Dockerfile that separates build dependencies from runtime:
📄 Primary Files:
Dockerfile- Multi-stage build with PHP 8.4docker-compose.yml- Development stack.dockerignore- Exclude unnecessary files
Key Actions:
- Build a multi-stage Docker image with separate builder and production stages
- Install PHP extensions (
redis,pcntl,sockets) for ML operations - Create docker-compose.yml orchestrating app, Redis, and worker services
- Configure Nginx reverse proxy for routing (added in Step 6 for production)
- Test the container locally before cloud deployment
Note: While the final production setup includes Nginx load balancing (Step 6), the basic development docker-compose.yml exposes the app directly on port 8000 for simplicity during development.
Build and Test:
# Clone the code examplescd docs/series/ai-ml-php-developers/code/chapter-24
# Build the Docker imagedocker build -t ai-ml-service:latest .
# Verify image size (should be <150MB)docker images ai-ml-service
# Test the container with the test scriptdocker run --rm ai-ml-service php 01-simple-docker-test.php
# Start full development stackdocker compose up -dExpected Result:
✓ PHP 8.4+ detected✓ redis extension loaded✓ pcntl extension loaded✓ Connected to RedisAll tests passed! Docker setup is working correctly.Why It Works: Multi-stage builds keep your production image small by excluding build tools. Alpine Linux base (~5MB) plus PHP and extensions results in a lean ~80-150MB final image versus 400+MB with full Debian. This efficiency becomes critical at scale when deploying hundreds of container instances.
Common Issues:
- Image size >500MB: Ensure you’re using Alpine base (
php:8.4-cli-alpine), not Debian (php:8.4-cli) - Redis connection fails: Check
docker network ls- services must be on same Docker Compose network. Use service name (redis) notlocalhost - Permission errors: Container runs as
www-datauser - ensure PHP files are readable. Usedocker compose exec app ls -lato debug - Port already in use: If port 8000 is taken, modify
ports: ["8001:8000"]in docker-compose.yml
Step 2: Setting Up Redis and Job Queues (~15 min)
Section titled “Step 2: Setting Up Redis and Job Queues (~15 min)”Implement async job processing so ML predictions don’t block web requests. Users get immediate HTTP 202 responses while workers process predictions in the background.
Implementation
Section titled “Implementation”📄 Primary Files:
02-job-queue-system.php- PredictionJob and JobQueue classes
Core Classes:
// Job representationfinal readonly class PredictionJob { public function __construct( public string $id, public string $type, // 'classification', 'regression', etc. public array $data, // Input features public int $priority = 0, // Higher priority = processed first public int $attempts = 0 // Retry tracking ) {}}
// Queue managerfinal class JobQueue { public function push(PredictionJob $job): bool public function pop(int $timeout = 5): ?PredictionJob public function retry(PredictionJob $job): bool public function getQueueDepth(): int}Test the Queue:
# Start Redisdocker compose up -d redis
# Queue a prediction jobphp 02-job-queue-system.php
# Verify job was queueddocker compose exec redis redis-cli LLEN ml:jobs# Output: (integer) 1Why It Works: Redis lists provide atomic FIFO operations. LPUSH/BRPOP pattern ensures reliable job processing with blocking waits (no busy-looping). Priority queues use sorted sets (ZSET) where score = priority level.
Key Features:
- ✅ Priority queue for urgent predictions
- ✅ Automatic retry with exponential backoff
- ✅ Failed job tracking
- ✅ Queue depth monitoring
Step 3: Building ML Worker Processes (~20 min)
Section titled “Step 3: Building ML Worker Processes (~20 min)”Create daemon processes that continuously pull jobs from the queue, run ML inference, and handle errors gracefully with automatic retries.
Implementation
Section titled “Implementation”📄 Primary Files:
03-ml-worker.php- Worker daemon with signal handling
Worker Features:
- Graceful shutdown via SIGTERM/SIGINT handlers
- Automatic retry for failed jobs (max 3 attempts)
- Metrics publishing to track worker health
- Memory management with periodic garbage collection
Start Workers:
# Start worker (development)docker compose up worker
# Or start multiple workers (production)docker compose up -d --scale worker=4
# View worker logsdocker compose logs -f workerExpected Output:
[worker-1] Worker started[worker-1] Waiting for jobs...[worker-1] Processing job pred_12345 (attempt 0)[worker-1] ✓ Job pred_12345 completed (0.234s)Why It Works: PHP’s pcntl extension enables signal handling for graceful shutdown. When Docker stops the container, SIGTERM triggers cleanup - the worker finishes its current job before exiting rather than abruptly terminating.
Resource Management:
// Force garbage collection every 100 jobsif ($processedJobs % 100 === 0) { gc_collect_cycles();}
// Restart worker after 1000 jobs (fresh memory)if ($processedJobs >= 1000) { exit(0); // Docker restart=always will restart it}Step 4: Implementing Result Caching (~10 min)
Section titled “Step 4: Implementing Result Caching (~10 min)”Avoid redundant ML computations by caching prediction results. Identical inputs return cached results instantly instead of re-running inference.
Implementation
Section titled “Implementation”📄 Primary Files:
05-caching-layer.php- PredictionCache class
Key Features:
- Content-based keys: Hash of input features ensures identical inputs share cache
- Cache versioning:
CACHE_VERSIONallows invalidating all predictions when deploying new models - TTL management: Results expire after 1 hour (configurable)
- Hit rate tracking: Monitor cache effectiveness
Test Caching:
php 05-caching-layer.php
# Output:# Cache miss - running inference...# Result cached## Cache stats: {# "hits": 0,# "misses": 1,# "hit_rate": 0# }
# Run again - should hit cachephp 05-caching-layer.php# Output: Cache hit!Performance Impact:
- First request: 250ms (ML inference)
- Cached request: 2ms (Redis lookup)
- Typical hit rate: 60-80% in production
Cache Invalidation:
$cache = new PredictionCache($redis);
// Invalidate specific prediction$cache->invalidate($features, 'classifier-v1');
// Invalidate all predictions for a model (when retraining)$cache->invalidate(modelName: 'classifier-v1');
// Invalidate all cache$cache->invalidate();Step 5: Cloud Deployment Setup (~20 min)
Section titled “Step 5: Cloud Deployment Setup (~20 min)”Deploy your containerized service to a cloud server, making it accessible via public URL with proper SSL/TLS.
Implementation
Section titled “Implementation”📄 Primary Files:
docker-compose.prod.yml- Production overridesscripts/deploy.sh- Automated deployment
Deployment Steps:
- Provision Server (DigitalOcean $12/month droplet or AWS t2.small)
- Install Docker and Docker Compose
- Clone Repository to
/opt/ai-ml-service - Configure Environment (
.env.productionwith secrets) - Run Deployment Script
Quick Deploy:
# SSH to your serverssh root@your_server_ip
# Install Dockercurl -fsSL https://get.docker.com | sh
# Clone your repo (adjust path to your actual repo)git clone https://github.com/you/ai-ml-service.git /opt/ai-ml-servicecd /opt/ai-ml-service
# Copy environment template and configurecp env.example .env.productionnano .env.production # Edit: Set REDIS_PASSWORD, APP_ENV, domain, etc.
# Make deploy script executablechmod +x scripts/deploy.sh
# Run deployment./scripts/deploy.shEnvironment Configuration Best Practices:
# filename: .env.production (NEVER commit to git!)# Keep these secrets safe:
# Strong random password (generate with: openssl rand -base64 32)REDIS_PASSWORD=your_strong_random_password_here
# Application configurationAPP_ENV=productionAPP_URL=https://yourdomain.comAPP_DEBUG=false # Never use debug mode in production
# Worker configurationWORKER_COUNT=4 # Adjust based on server CPU cores
# MonitoringMETRICS_ENABLED=trueHEALTH_CHECK_INTERVAL=30Secrets Management:
- Never commit
.env.productionto Git - Use.gitignoreto exclude it - Use strong passwords - Generate with
openssl rand -base64 32 - Rotate credentials periodically - Especially
REDIS_PASSWORD - Use environment variables - Docker Compose can load from
.env.production - Restrict file permissions -
chmod 600 .env.productionon the server - For Kubernetes/cloud platforms - Use secrets management (AWS Secrets Manager, Azure Key Vault, etc.)
Production Overrides:
The docker-compose.prod.yml includes production-specific settings:
- Remove volume mounts (immutable, reproducible deployments)
- Set
restart: alwaysfor automatic recovery from crashes - Bind services to
127.0.0.1(only Nginx is public-facing) - Add resource limits (CPU/memory) to prevent one process consuming all resources
- Enable health checks for orchestration platforms
- Use build
target: productionto exclude dev dependencies
SSL/TLS Setup (Let’s Encrypt):
# Install Certbotapt install certbot
# Get free SSL certificatecertbot certonly --standalone -d yourdomain.com
# Update nginx/default.conf to use HTTPS (uncomment HTTPS section)# then rebuild containers:docker compose -f docker-compose.yml -f docker-compose.prod.yml build
# Restartdocker compose -f docker-compose.yml -f docker-compose.prod.yml restart nginxDeployment Verification:
After running the deploy script, verify your service is running:
# Check if containers are runningdocker compose -f docker-compose.yml -f docker-compose.prod.yml ps
# Check health endpointcurl https://yourdomain.com/health | jq
# View logsdocker compose -f docker-compose.yml -f docker-compose.prod.yml logs -f
# Test APIcurl -X POST https://yourdomain.com/api/ml/sentiment \ -H "Content-Type: application/json" \ -d '{"text": "Great product!"}'Step 6: Load Balancing and Scaling (~15 min)
Section titled “Step 6: Load Balancing and Scaling (~15 min)”Distribute traffic across multiple application instances for high availability and increased capacity. Handle traffic spikes gracefully by scaling workers and app containers together.
Implementation
Section titled “Implementation”📄 Primary Files:
nginx/default.conf- Reverse proxy confignginx/load-balancer.conf- Multi-instance configscripts/scale-workers.sh- Scaling automation
Nginx Load Balancer:
Nginx sits in front of your application containers and distributes requests:
upstream php_backend { least_conn; # Route to server with fewest active connections server app:8000 max_fails=3 fail_timeout=30s; # Add more for horizontal scaling (or use Docker service discovery): # server app2:8000 max_fails=3 fail_timeout=30s; # server app3:8000 max_fails=3 fail_timeout=30s;}
server { listen 80; location /api/ { proxy_pass http://php_backend; proxy_http_version 1.1; # Important headers for passing request context proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # Timeouts for slow ML inference proxy_read_timeout 60s; }}Scaling Strategy:
For development/small production setups, scale containers directly:
# Scale to 3 application instances (you'll need to update Nginx upstream)docker compose up -d --scale app=3
# Scale workers independently based on queue depthdocker compose up -d --scale worker=6For larger production deployments, use orchestration:
- Docker Swarm: Built-in Docker feature for orchestrating containers across multiple hosts
- Kubernetes: Industry-standard container orchestration with advanced scaling policies
- Managed services: AWS ECS, Google Cloud Run, Azure Container Instances handle scaling automatically
Manual Scaling Script:
# Scale to specific number of workers./scripts/scale-workers.sh 8
# This monitors queue depth and adjusts worker count automaticallyLoad Balancer Features:
- ✅ Least connections algorithm: Routes new requests to least-loaded backend
- ✅ Automatic failover: Retries on failed backends; removes failing servers temporarily
- ✅ Health checks:
/healthendpoint monitors each backend - ✅ Connection pooling: Keepalive connections reduce latency
- ✅ Buffering: Prevents slow clients from tying up backend processes
Scaling Considerations:
- Vertical Scaling (increase server size): Simple but limited by single machine capacity
- Horizontal Scaling (add more instances): Better for handling traffic spikes
- Auto-Scaling Policies: Scale based on:
- Queue depth (when jobs accumulate, add workers)
- CPU/Memory usage (when constrained, add app instances)
- Time of day (predictable traffic patterns)
- Response time (when exceeds threshold, scale up)
Monitoring During Scaling:
# Watch Nginx upstream status in real-timewatch 'docker compose exec nginx nginx -T 2>/dev/null | grep upstream'
# Monitor queue depthwatch 'docker compose exec redis redis-cli LLEN ml:jobs'
# View worker countdocker compose ps | grep worker | wc -lStep 7: Monitoring and Logging (~15 min)
Section titled “Step 7: Monitoring and Logging (~15 min)”Track system health, ML performance, and operational metrics in real-time to detect issues before they impact users.
Implementation
Section titled “Implementation”📄 Primary Files:
06-health-check.php- Health endpoint07-metrics-collector.php- Metrics systemmonitoring/dashboard.php- Visual dashboardmonitoring/logger.php- Structured logging
Health Check Endpoint:
The /health endpoint returns detailed JSON with your system status:
curl http://localhost/health | jq
{ "status": "healthy", "timestamp": "2025-01-31T12:34:56Z", "system": { "active_workers": 4, "total_processed": 1543, "uptime_seconds": 3600, "memory_usage_mb": 256 }, "queue": { "depth": 3, "retry_count": 0, "failed_count": 1, "avg_processing_time_ms": 245 }, "cache": { "hit_rate": 67.5, "total_hits": 2156, "total_misses": 1044 }}Understanding Health Status:
- healthy: All systems operational. Safe for load balancer to route traffic.
- degraded: Service operational but experiencing issues. Active workers < expected, or error rate elevated. Load balancers may reduce traffic.
- unhealthy: Service not available. Load balancers should stop routing traffic and alert operators.
- warming_up: Service started <60 seconds ago. Still initializing. Don’t mark as healthy yet.
Thresholds for Status:
degraded if: - Active workers = 0 - Queue depth > 1000 - Error rate > 5% - Cache hit rate < 20%
unhealthy if: - Redis connection fails - Error rate > 20% - All workers have crashed - Memory usage > 90% of limitMonitoring Dashboard:
Access http://your-server/monitoring/dashboard.php for real-time metrics:
- System Health: Current worker count, queue depth, cache statistics, memory usage
- Request Metrics: Requests per minute, average latency, total requests, error count
- Model Performance: Inference time distribution, prediction accuracy, error types
- Historical Data: Metrics over last hour with trend visualization
- Auto-refresh: Updates every 5 seconds with WebSocket for near real-time updates
Key Metrics to Monitor:
| Metric | Target | Alert if | Notes |
|---|---|---|---|
| Queue Depth | < 50 jobs | > 200 | Indicates workers can’t keep up |
| Response Time | < 500ms | > 2s | Slow responses hurt user experience |
| Error Rate | < 1% | > 5% | Indicates system reliability issues |
| Cache Hit Rate | > 60% | < 40% | Optimize cache strategy if low |
| Worker Health | All active | Any down | Check logs for why workers crashed |
| Memory Usage | 60-70% | > 85% | Risk of OOM kills; scale up or optimize |
| Inference Time | < 250ms | > 1000ms | Model slow; check GPU/CPU availability |
Setting Up Alerts:
For production, integrate with monitoring services:
# Example: Datadog integration# Send metrics to Datadog API every 60 secondscurl -X POST https://api.datadoghq.com/api/v1/series \ -H "DD-API-KEY: $DATADOG_API_KEY" \ -d @- << EOF{ "series": [ { "metric": "ai_service.queue_depth", "points": [[$(date +%s), $(redis-cli LLEN ml:jobs)]], "type": "gauge" } ]}EOFStructured Logging:
The service logs in JSON format for easy parsing:
{ "timestamp": "2025-01-31T12:34:56.789Z", "level": "info", "worker": "worker-1", "job_id": "pred_12345", "event": "job_completed", "duration_ms": 234, "status": "success"}Use docker compose logs --tail=100 app | jq 'select(.level=="error")' to filter errors.
Step 8: CI/CD Pipeline (~15 min)
Section titled “Step 8: CI/CD Pipeline (~15 min)”Automate testing, Docker image building, and deployment so git push triggers a complete deployment cycle.
Implementation
Section titled “Implementation”📄 Primary Files:
.github/workflows/deploy.yml- GitHub Actions workflow
Pipeline Stages:
- Test - PHP syntax check, Redis connectivity, unit tests
- Build - Docker image build and push to Docker Hub
- Deploy - SSH to server, pull images, restart containers, verify health
Setup GitHub Actions:
-
Add repository secrets (Settings → Secrets):
DOCKER_USERNAME,DOCKER_PASSWORDDEPLOY_HOST,DEPLOY_USER,DEPLOY_SSH_KEY
-
Push to trigger deployment:
Terminal window git add .git commit -m "Deploy new ML model"git push origin main -
Monitor at:
https://github.com/you/repo/actions
Deployment Verification:
- Runs health check after deployment
- Fails deployment if health check fails
- Automatic rollback on errors
Exercises
Section titled “Exercises”Test your understanding with these practical challenges:
Exercise 1: Auto-Scaling Based on Queue Depth
Section titled “Exercise 1: Auto-Scaling Based on Queue Depth”Goal: Automatically scale worker count when queue depth exceeds thresholds.
Requirements:
- Monitor queue depth every 30 seconds
- Scale up when depth > 50 (add 2 workers, max 10)
- Scale down when depth < 10 for 5+ minutes (remove 2, min 2)
- Prevent rapid scaling (cooldown period)
Solution: solutions/exercise1-autoscale.php
Exercise 2: Blue-Green Deployment
Section titled “Exercise 2: Blue-Green Deployment”Goal: Implement zero-downtime deployments.
Requirements:
- Start new containers alongside old
- Verify new containers are healthy
- Switch traffic atomically
- Keep old containers for quick rollback
Solution: solutions/exercise2-blue-green.sh
Exercise 3: Advanced Health Check with Circuit Breaker
Section titled “Exercise 3: Advanced Health Check with Circuit Breaker”Goal: Detect degraded states before complete failure.
Requirements:
- Track error rate over last 100 requests
- Status “degraded” if error rate > 5%
- Status “unhealthy” if error rate > 20%
- Include “warming up” state for first 60 seconds
Solution: solutions/exercise3-health-check.php
Exercise 4: Optimize Docker Image Size
Section titled “Exercise 4: Optimize Docker Image Size”Goal: Reduce image from ~150MB to <80MB.
Requirements:
- Remove build dependencies after compilation
- Use Alpine base image effectively
- Combine RUN commands to reduce layers
- Leverage
.dockerignore
Solution: solutions/exercise4-optimized.Dockerfile
Troubleshooting
Section titled “Troubleshooting”Common production issues and solutions:
Container exits immediately
Section titled “Container exits immediately”Symptom: docker compose ps shows “Exited (1)”
Solutions:
# Check logs for errorsdocker compose logs worker
# Common fixes:# - Missing PHP extension: Add to Dockerfile# - Missing environment variable: Check docker-compose.yml# - Code syntax error: Run php -l on filesRedis connection fails
Section titled “Redis connection fails”Symptom: RedisException: Connection refused
Solutions:
# Verify Redis is runningdocker compose ps redis
# Test connectivitydocker compose exec worker ping redis
# Check authenticationdocker compose exec redis redis-cli -a $REDIS_PASSWORD PINGQueue depth growing indefinitely
Section titled “Queue depth growing indefinitely”Symptom: Jobs accumulating, not being processed
Solutions:
# Check worker logsdocker compose logs worker
# Scale workersdocker compose up -d --scale worker=6
# Check for stuck jobsdocker compose exec redis redis-cli LRANGE ml:jobs 0 10High memory usage / OOM kills
Section titled “High memory usage / OOM kills”Symptom: Workers showing “Killed” in logs
Solutions:
// Force garbage collectionif ($processedJobs % 100 === 0) { gc_collect_cycles();}
// Restart after N jobsif ($processedJobs >= 1000) { exit(0); // Docker will restart}Deployment health check fails
Section titled “Deployment health check fails”Symptom: CI/CD hangs on “Waiting for health check”
Solutions:
# SSH to server and check manuallycurl http://localhost/health
# Common issues:# - App not binding to correct port# - Redis credentials wrong# - Missing environment variables# - Previous deployment still runningWrap-up
Section titled “Wrap-up”Congratulations! You’ve built a complete, production-ready AI-powered PHP service. Let’s review what you’ve accomplished:
✅ Containerization: Multi-stage Docker images optimized for production (<150MB)
✅ Async Processing: Redis job queues prevent ML inference from blocking web requests
✅ Worker Management: Resilient daemons with graceful shutdown and automatic retries
✅ Intelligent Caching: 60-80% cache hit rates significantly reduce computational costs
✅ Cloud Deployment: Live service on public infrastructure with SSL/TLS and secrets management
✅ Load Balancing: Nginx distributing traffic with automatic failover and health checks
✅ Comprehensive Monitoring: Real-time dashboards tracking system and ML-specific metrics
✅ CI/CD Automation: Push-to-deploy workflow with automated testing and health verification
Real-World Impact
Section titled “Real-World Impact”Your infrastructure can handle:
- 1000+ predictions/minute with 2-4 workers (varies by model complexity)
- Automatic scaling based on queue depth and system load
- Zero-downtime deployments via blue-green strategy (Exercise 2)
- Graceful failure handling with retries, circuit breakers, and fallback strategies
- Cost optimization through intelligent caching reducing redundant computations
- High availability with multiple instances and automatic failover
Production Deployment Checklist
Section titled “Production Deployment Checklist”Before going live with your AI service:
- Set strong, unique
REDIS_PASSWORDin production - Enable
APP_DEBUG=false - Configure SSL/TLS with valid certificate
- Set up monitoring and alerting (health check failures)
- Test database backups and recovery procedures
- Configure log aggregation (centralize logs from all containers)
- Set resource limits to prevent runaway processes
- Test horizontal scaling (add/remove containers, verify traffic distribution)
- Document your deployment process and runbook for operators
- Set up automated security scanning of Docker images
- Configure rate limiting on API endpoints to prevent abuse
Common Production Pitfalls
Section titled “Common Production Pitfalls”- Not monitoring workers - Workers can fail silently; always monitor health
- Cache invalidation issues - Old predictions served when models update; version your cache keys
- Resource exhaustion - Set memory limits on containers and workers
- No graceful shutdown - Use SIGTERM handlers to finish current job before exiting
- Hardcoded secrets - Never commit
.env.productionto git - Insufficient logging - Include context (job_id, worker, timing) in all logs
- No fallback - If ML service fails, have degraded mode for your application
Next Steps
Section titled “Next Steps”Continue to Chapter 25 where you’ll:
- Build a comprehensive capstone project
- Explore emerging trends (ONNX Runtime, generative AI)
- Learn about AI ethics and responsible ML
- Discover resources for continued learning
The deployment skills you’ve gained apply to any PHP application requiring high availability and professional operations. You’re now equipped to deploy and maintain production systems with confidence!
Further Reading
Section titled “Further Reading”Docker and Containerization
Section titled “Docker and Containerization”- Docker Documentation — Complete containerization guide
- Multi-Stage Builds — Optimize image size
- Docker Best Practices — Security and optimization
Job Queues
Section titled “Job Queues”- Laravel Queues — If using Laravel framework
- Redis Queue Patterns — Official Redis documentation
- Reliable Queue Processing — Advanced patterns
Load Balancing and Scaling
Section titled “Load Balancing and Scaling”- Nginx Documentation — Reverse proxy and load balancing
- Load Balancing Algorithms — Algorithm comparison
- Horizontal vs Vertical Scaling — When to use each
Monitoring
Section titled “Monitoring”- Prometheus Documentation — Industry-standard monitoring
- The Twelve-Factor App — Cloud-native best practices
- Monolog Documentation — PHP logging library
- GitHub Actions Documentation — Complete CI/CD guide
- Deployment Strategies — Blue-green, canary, rolling
Cloud Providers
Section titled “Cloud Providers”- DigitalOcean Tutorials — Practical deployment guides
- AWS EC2 Documentation — Amazon’s compute platform
Security
Section titled “Security”- OWASP Top Ten — Critical security risks
- Docker Security — Container security
- Let’s Encrypt — Free SSL certificates