29: Content Moderation System

Home›Series›Claude for PHP Developers›Chapter 29

Chapter 29: Content Moderation System

Overview

User-generated content requires careful moderation to maintain community standards and legal compliance. In this chapter, you'll build an intelligent moderation system that automatically detects toxic language, spam, personally identifiable information (PII), policy violations, and inappropriate content.

Claude excels at understanding context and nuance in content moderation—distinguishing between legitimate discussion and harmful content, identifying subtle violations, and explaining moderation decisions. Your system will handle real-time moderation, queue management, and human review workflows.

What You'll Build: A production-ready content moderation platform that analyzes text, images, and user behavior, enforces community guidelines, manages moderation queues, and provides detailed violation reports.

Objectives

By the end of this chapter, you will:

Understand how to build a comprehensive content moderation system using Claude's context-aware analysis
Implement multi-layered content analysis that detects toxic language, spam, PII, and policy violations
Create a policy engine that enforces customizable community guidelines with severity scoring
Build PII detection and redaction capabilities to protect user privacy and ensure legal compliance
Design a moderation queue system with priority-based processing and human review workflows
Develop spam detection that combines content analysis with behavioral pattern recognition
Integrate audit logging and analytics for accountability and continuous improvement

Prerequisites

Before starting, ensure you have:

✓ Completed Chapters 11-15 (Tool use and structured outputs)
✓ Content policy knowledge for moderation standards
✓ Queue system experience for async processing
✓ Database skills for audit trails and appeals

Estimated Time: 90-120 minutes

Architecture Overview

php

<?php
# filename: src/Moderation/ModerationSystem.php
declare(strict_types=1);

namespace App\Moderation;

use Anthropic\Anthropic;

class ModerationSystem
{
    public function __construct(
        private Anthropic $claude,
        private ContentAnalyzer $analyzer,
        private PolicyEngine $policyEngine,
        private ModerationQueue $queue,
        private AuditLogger $auditLogger
    ) {}

    /**
     * Moderate content in real-time
     */
    public function moderateContent(
        string $content,
        string $contentType = 'text',
        array $context = []
    ): ModerationResult {
        // Analyze content
        $analysis = $this->analyzer->analyze($content, $contentType);

        // Check against policies
        $violations = $this->policyEngine->checkViolations($analysis, $context);

        // Determine action
        $action = $this->determineAction($violations, $analysis);

        // Create result
        $result = new ModerationResult(
            approved: $action->approved,
            violations: $violations,
            severity: $action->severity,
            action: $action->type,
            explanation: $action->explanation,
            confidence: $analysis->confidence
        );

        // Queue for human review if needed
        if ($action->requiresHumanReview) {
            $this->queue->add($content, $result, $context);
        }

        // Log moderation decision
        $this->auditLogger->log($content, $result, $action);

        return $result;
    }

    /**
     * Batch moderate multiple items
     */
    public function moderateBatch(array $items): array
    {
        $results = [];

        foreach ($items as $item) {
            $results[$item['id']] = $this->moderateContent(
                content: $item['content'],
                contentType: $item['type'] ?? 'text',
                context: $item['context'] ?? []
            );
        }

        return $results;
    }

    private function determineAction(
        array $violations,
        ContentAnalysis $analysis
    ): ModerationAction {
        if (empty($violations)) {
            return new ModerationAction(
                approved: true,
                type: 'approve',
                severity: 'none',
                requiresHumanReview: false,
                explanation: 'Content meets all guidelines'
            );
        }

        // Calculate overall severity
        $maxSeverity = max(array_map(fn($v) => $v->severityScore, $violations));

        // Critical violations = immediate block
        if ($maxSeverity >= 0.9) {
            return new ModerationAction(
                approved: false,
                type: 'block',
                severity: 'critical',
                requiresHumanReview: false,
                explanation: $this->buildExplanation($violations)
            );
        }

        // High severity = block + human review
        if ($maxSeverity >= 0.7) {
            return new ModerationAction(
                approved: false,
                type: 'block',
                severity: 'high',
                requiresHumanReview: true,
                explanation: $this->buildExplanation($violations)
            );
        }

        // Medium severity = flag for review
        if ($maxSeverity >= 0.4) {
            return new ModerationAction(
                approved: true,
                type: 'flag',
                severity: 'medium',
                requiresHumanReview: true,
                explanation: $this->buildExplanation($violations)
            );
        }

        // Low severity = approve with warning
        return new ModerationAction(
            approved: true,
            type: 'warn',
            severity: 'low',
            requiresHumanReview: false,
            explanation: $this->buildExplanation($violations)
        );
    }

    private function buildExplanation(array $violations): string
    {
        $reasons = array_map(
            fn($v) => "{$v->category}: {$v->reason}",
            $violations
        );
        return implode('; ', $reasons);
    }
}

Content Analyzer

php

<?php
# filename: src/Moderation/ContentAnalyzer.php
declare(strict_types=1);

namespace App\Moderation;

use Anthropic\Anthropic;

class ContentAnalyzer
{
    public function __construct(
        private Anthropic $claude
    ) {}

    /**
     * Analyze content for policy violations
     */
    public function analyze(string $content, string $contentType = 'text'): ContentAnalysis
    {
        $prompt = $this->buildAnalysisPrompt($content, $contentType);

        $response = $this->claude->messages()->create([
            'model' => 'claude-sonnet-4-20250514',
            'max_tokens' => 4096,
            'temperature' => 0.2,
            'system' => $this->getAnalysisSystemPrompt(),
            'messages' => [[
                'role' => 'user',
                'content' => $prompt
            ]]
        ]);

        return $this->parseAnalysis($response->content[0]->text);
    }

    private function buildAnalysisPrompt(string $content, string $contentType): string
    {
        return <<<PROMPT
Analyze this content for moderation purposes.

Content Type: {$contentType}

Content:
{$content}

Analyze for:

1. **Toxic Language**
   - Hate speech
   - Harassment or bullying
   - Threats or violence
   - Discriminatory language

2. **Spam & Manipulation**
   - Spam or repetitive content
   - Phishing or scams
   - Malicious links
   - Vote manipulation

3. **Personal Information**
   - PII (names, addresses, SSN, etc.)
   - Phone numbers
   - Email addresses
   - Credit card information

4. **Inappropriate Content**
   - Sexual content
   - Graphic violence
   - Self-harm content
   - Illegal activities

5. **Misinformation**
   - False medical claims
   - Conspiracy theories
   - Misleading information

6. **Copyright & IP**
   - Copyrighted material
   - Trademark violations

Return JSON:
{
  "violations": [
    {
      "category": "category name",
      "type": "specific violation type",
      "severity": 0.0 to 1.0,
      "confidence": 0.0 to 1.0,
      "reason": "explanation",
      "evidence": "specific text excerpt",
      "recommendation": "approve|flag|block|review"
    }
  ],
  "pii_detected": [
    {
      "type": "email|phone|ssn|address|etc",
      "value": "detected value (redacted)",
      "location": "position in text"
    }
  ],
  "overall_safety_score": 0.0 to 1.0,
  "requires_human_review": boolean,
  "suggested_action": "approve|flag|block",
  "context_notes": "relevant context or nuances"
}

Return ONLY valid JSON.
PROMPT;
    }

    private function getAnalysisSystemPrompt(): string
    {
        return <<<SYSTEM
You are a content moderation expert analyzing user-generated content.

Your analysis must be:
- Objective and unbiased
- Context-aware (distinguish satire, quotes, educational content)
- Culturally sensitive
- Consistent with platform policies
- Detailed with specific examples

Severity Scoring:
- 0.0-0.3: Minor issues, likely acceptable
- 0.4-0.6: Moderate concerns, flag for review
- 0.7-0.8: Serious violations, likely block
- 0.9-1.0: Severe violations, immediate block

Consider:
1. Intent and context
2. Target audience
3. Potential harm
4. Legal implications
5. Community standards

Be especially careful with:
- False positives on legitimate discussion
- Cultural and linguistic nuances
- Satire and sarcasm
- Educational or news content
- Quotes or references

Always err on the side of caution for:
- Child safety
- Violence or threats
- Illegal activity
- PII exposure
SYSTEM;
    }

    private function parseAnalysis(string $jsonText): ContentAnalysis
    {
        // Extract JSON from response
        if (preg_match('/\{.*\}/s', $jsonText, $matches)) {
            $data = json_decode($matches[0], true);
            return new ContentAnalysis($data);
        }

        throw new \RuntimeException('Failed to parse analysis');
    }
}

Toxic Language Detection

php

<?php
# filename: src/Moderation/ToxicityDetector.php
declare(strict_types=1);

namespace App\Moderation;

use Anthropic\Anthropic;

class ToxicityDetector
{
    private array $toxicPatterns = [
        'profanity' => [
            'severity' => 0.4,
            'patterns' => ['fuck', 'shit', 'damn'] // Simplified example
        ],
        'slurs' => [
            'severity' => 0.9,
            'patterns' => ['<racial_slur>', '<homophobic_slur>'] // Use actual detection
        ],
        'threats' => [
            'severity' => 0.95,
            'patterns' => ['kill you', 'hurt you', 'find you']
        ]
    ];

    public function __construct(
        private Anthropic $claude
    ) {}

    /**
     * Detect toxic language with context awareness
     */
    public function detect(string $text): ToxicityReport
    {
        // Quick pattern check first
        $patternMatches = $this->patternCheck($text);

        // Deep analysis with Claude for context
        $aiAnalysis = $this->analyzeContext($text, $patternMatches);

        return new ToxicityReport(
            isToxic: $aiAnalysis['is_toxic'] ?? false,
            toxicityScore: $aiAnalysis['toxicity_score'] ?? 0.0,
            categories: $aiAnalysis['categories'] ?? [],
            targetedGroups: $aiAnalysis['targeted_groups'] ?? [],
            contextualFactors: $aiAnalysis['context'] ?? [],
            recommendation: $aiAnalysis['recommendation'] ?? 'approve'
        );
    }

    private function patternCheck(string $text): array
    {
        $matches = [];
        $lowerText = strtolower($text);

        foreach ($this->toxicPatterns as $category => $data) {
            foreach ($data['patterns'] as $pattern) {
                if (str_contains($lowerText, strtolower($pattern))) {
                    $matches[] = [
                        'category' => $category,
                        'pattern' => $pattern,
                        'severity' => $data['severity']
                    ];
                }
            }
        }

        return $matches;
    }

    private function analyzeContext(string $text, array $patternMatches): array
    {
        $patternsText = empty($patternMatches)
            ? 'None detected'
            : json_encode($patternMatches);

        $prompt = <<<PROMPT
Analyze this text for toxic language with full context awareness.

Text: {$text}

Pattern matches found: {$patternsText}

Consider:
1. Is this actually toxic or is it:
   - A quote or reference?
   - Educational discussion?
   - Reclaimed language by the target group?
   - Satire or criticism of toxicity?
   - Song lyrics or artistic expression?

2. If toxic, identify:
   - Type (harassment, hate speech, threats, etc.)
   - Targeted groups
   - Severity (0.0 to 1.0)
   - Intent

Return JSON:
{
  "is_toxic": boolean,
  "toxicity_score": 0.0 to 1.0,
  "categories": ["harassment", "hate_speech", "threats", etc.],
  "targeted_groups": ["group1", "group2"],
  "context": "explanation of context",
  "is_false_positive": boolean,
  "false_positive_reason": "if applicable",
  "recommendation": "approve|flag|block"
}

Return ONLY valid JSON.
PROMPT;

        $response = $this->claude->messages()->create([
            'model' => 'claude-sonnet-4-20250514',
            'max_tokens' => 1024,
            'temperature' => 0.2,
            'messages' => [[
                'role' => 'user',
                'content' => $prompt
            ]]
        ]);

        $jsonText = $response->content[0]->text;
        if (preg_match('/\{.*\}/s', $jsonText, $matches)) {
            return json_decode($matches[0], true) ?? [];
        }

        return [];
    }
}

PII Detection and Redaction

php

<?php
# filename: src/Moderation/PIIDetector.php
declare(strict_types=1);

namespace App\Moderation;

use Anthropic\Anthropic;

class PIIDetector
{
    private array $patterns = [
        'email' => '/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/',
        'phone' => '/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/',
        'ssn' => '/\b\d{3}-\d{2}-\d{4}\b/',
        'credit_card' => '/\b\d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}\b/',
        'ip_address' => '/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/'
    ];

    public function __construct(
        private Anthropic $claude
    ) {}

    /**
     * Detect and categorize PII
     */
    public function detect(string $text): PIIReport
    {
        // Pattern-based detection
        $patternMatches = $this->patternDetect($text);

        // AI-enhanced detection for complex cases
        $aiMatches = $this->aiDetect($text);

        // Merge results
        $allMatches = array_merge($patternMatches, $aiMatches);

        return new PIIReport(
            hasPII: !empty($allMatches),
            items: $allMatches,
            riskLevel: $this->calculateRiskLevel($allMatches)
        );
    }

    /**
     * Redact PII from text
     */
    public function redact(string $text, PIIReport $report): string
    {
        $redacted = $text;

        foreach ($report->items as $item) {
            $replacement = match($item['type']) {
                'email' => '[EMAIL REDACTED]',
                'phone' => '[PHONE REDACTED]',
                'ssn' => '[SSN REDACTED]',
                'credit_card' => '[CARD REDACTED]',
                'address' => '[ADDRESS REDACTED]',
                default => '[PII REDACTED]'
            };

            $redacted = str_replace($item['value'], $replacement, $redacted);
        }

        return $redacted;
    }

    private function patternDetect(string $text): array
    {
        $matches = [];

        foreach ($this->patterns as $type => $pattern) {
            if (preg_match_all($pattern, $text, $found)) {
                foreach ($found[0] as $value) {
                    $matches[] = [
                        'type' => $type,
                        'value' => $value,
                        'detection_method' => 'pattern',
                        'confidence' => 0.9
                    ];
                }
            }
        }

        return $matches;
    }

    private function aiDetect(string $text): array
    {
        $prompt = <<<PROMPT
Detect personally identifiable information (PII) in this text.

Text: {$text}

Identify:
- Full names (with context that makes them identifiable)
- Home addresses
- Government ID numbers
- Financial information
- Medical information
- Login credentials
- Any other PII

Return JSON array:
[
  {
    "type": "name|address|ssn|medical|financial|etc",
    "value": "the PII value (can be partial for demonstration)",
    "confidence": 0.0 to 1.0,
    "context": "why this is PII"
  }
]

Return ONLY valid JSON array.
PROMPT;

        $response = $this->claude->messages()->create([
            'model' => 'claude-haiku-4-20250514',
            'max_tokens' => 1024,
            'temperature' => 0.1,
            'messages' => [[
                'role' => 'user',
                'content' => $prompt
            ]]
        ]);

        $jsonText = $response->content[0]->text;
        if (preg_match('/\[.*\]/s', $jsonText, $matches)) {
            $items = json_decode($matches[0], true) ?? [];
            foreach ($items as &$item) {
                $item['detection_method'] = 'ai';
            }
            return $items;
        }

        return [];
    }

    private function calculateRiskLevel(array $matches): string
    {
        if (empty($matches)) {
            return 'none';
        }

        $highRiskTypes = ['ssn', 'credit_card', 'medical', 'financial'];
        $hasHighRisk = false;

        foreach ($matches as $match) {
            if (in_array($match['type'], $highRiskTypes)) {
                $hasHighRisk = true;
                break;
            }
        }

        if ($hasHighRisk) {
            return 'critical';
        }

        if (count($matches) > 3) {
            return 'high';
        }

        if (count($matches) > 1) {
            return 'medium';
        }

        return 'low';
    }
}

Spam Detection

php

<?php
# filename: src/Moderation/SpamDetector.php
declare(strict_types=1);

namespace App\Moderation;

use Anthropic\Anthropic;

class SpamDetector
{
    public function __construct(
        private Anthropic $claude,
        private \PDO $db
    ) {}

    /**
     * Detect spam with behavioral analysis
     */
    public function detect(
        string $content,
        string $userId,
        array $context = []
    ): SpamReport {
        // Check user behavior
        $userBehavior = $this->analyzeUserBehavior($userId);

        // Analyze content
        $contentAnalysis = $this->analyzeContent($content);

        // Calculate spam score
        $spamScore = $this->calculateSpamScore($contentAnalysis, $userBehavior, $context);

        return new SpamReport(
            isSpam: $spamScore > 0.7,
            spamScore: $spamScore,
            indicators: array_merge($contentAnalysis['indicators'], $userBehavior['indicators']),
            type: $contentAnalysis['spam_type'] ?? 'unknown',
            recommendation: $spamScore > 0.7 ? 'block' : ($spamScore > 0.4 ? 'flag' : 'approve')
        );
    }

    private function analyzeContent(string $content): array
    {
        $prompt = <<<PROMPT
Analyze this content for spam characteristics.

Content: {$content}

Check for:
1. Excessive links or URLs
2. Repetitive text
3. Keywords associated with spam (crypto, pills, "click here", etc.)
4. Suspicious formatting (all caps, excessive punctuation)
5. Promotional language
6. Phishing attempts
7. Malicious links

Return JSON:
{
  "is_likely_spam": boolean,
  "spam_score": 0.0 to 1.0,
  "spam_type": "promotional|phishing|link_spam|repetitive|malicious|legitimate",
  "indicators": ["indicator1", "indicator2"],
  "reasoning": "explanation"
}

Return ONLY valid JSON.
PROMPT;

        $response = $this->claude->messages()->create([
            'model' => 'claude-haiku-4-20250514',
            'max_tokens' => 512,
            'temperature' => 0.2,
            'messages' => [[
                'role' => 'user',
                'content' => $prompt
            ]]
        ]);

        $jsonText = $response->content[0]->text;
        if (preg_match('/\{.*\}/s', $jsonText, $matches)) {
            return json_decode($matches[0], true) ?? [];
        }

        return ['is_likely_spam' => false, 'spam_score' => 0.0, 'indicators' => []];
    }

    private function analyzeUserBehavior(string $userId): array
    {
        $indicators = [];

        // Check posting frequency
        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count
             FROM user_content
             WHERE user_id = :user_id
             AND created_at > DATE_SUB(NOW(), INTERVAL 1 HOUR)"
        );
        $stmt->execute([':user_id' => $userId]);
        $recentPosts = (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];

        if ($recentPosts > 10) {
            $indicators[] = 'high_posting_frequency';
        }

        // Check account age
        $stmt = $this->db->prepare(
            "SELECT DATEDIFF(NOW(), created_at) as age_days
             FROM users
             WHERE id = :user_id"
        );
        $stmt->execute([':user_id' => $userId]);
        $accountAge = (int)$stmt->fetch(\PDO::FETCH_ASSOC)['age_days'];

        if ($accountAge < 1) {
            $indicators[] = 'new_account';
        }

        // Check for similar content
        $stmt = $this->db->prepare(
            "SELECT content
             FROM user_content
             WHERE user_id = :user_id
             ORDER BY created_at DESC
             LIMIT 5"
        );
        $stmt->execute([':user_id' => $userId]);
        $recentContent = $stmt->fetchAll(\PDO::FETCH_COLUMN);

        if ($this->hasRepetitiveContent($recentContent)) {
            $indicators[] = 'repetitive_content';
        }

        return [
            'indicators' => $indicators,
            'recent_posts' => $recentPosts,
            'account_age' => $accountAge
        ];
    }

    private function calculateSpamScore(
        array $contentAnalysis,
        array $userBehavior,
        array $context
    ): float {
        $score = $contentAnalysis['spam_score'] ?? 0.0;

        // Boost score based on user behavior
        if (count($userBehavior['indicators']) > 2) {
            $score += 0.3;
        } elseif (count($userBehavior['indicators']) > 0) {
            $score += 0.15;
        }

        // Consider context
        if (isset($context['reported_by_users']) && $context['reported_by_users'] > 2) {
            $score += 0.2;
        }

        return min($score, 1.0);
    }

    private function hasRepetitiveContent(array $contents): bool
    {
        if (count($contents) < 2) {
            return false;
        }

        for ($i = 0; $i < count($contents) - 1; $i++) {
            for ($j = $i + 1; $j < count($contents); $j++) {
                similar_text($contents[$i], $contents[$j], $percent);
                if ($percent > 80) {
                    return true;
                }
            }
        }

        return false;
    }
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180

Moderation Queue System

php

<?php
# filename: src/Moderation/ModerationQueue.php
declare(strict_types=1);

namespace App\Moderation;

class ModerationQueue
{
    public function __construct(
        private \PDO $db,
        private \Redis $redis
    ) {}

    /**
     * Add item to moderation queue
     */
    public function add(
        string $content,
        ModerationResult $result,
        array $context = []
    ): int {
        $stmt = $this->db->prepare(
            "INSERT INTO moderation_queue
             (content, content_type, user_id, violations, severity, context, created_at, status)
             VALUES (:content, :type, :user_id, :violations, :severity, :context, NOW(), 'pending')"
        );

        $stmt->execute([
            ':content' => $content,
            ':type' => $context['content_type'] ?? 'text',
            ':user_id' => $context['user_id'] ?? null,
            ':violations' => json_encode($result->violations),
            ':severity' => $result->severity,
            ':context' => json_encode($context)
        ]);

        $queueId = (int)$this->db->lastInsertId();

        // Add to Redis for real-time processing
        $this->redis->zadd(
            'moderation:queue',
            $this->getSeverityScore($result->severity),
            (string)$queueId
        );

        return $queueId;
    }

    /**
     * Get next item from queue
     */
    public function getNext(string $moderatorId): ?array
    {
        // Get highest priority item
        $items = $this->redis->zrevrange('moderation:queue', 0, 0);

        if (empty($items)) {
            return null;
        }

        $queueId = (int)$items[0];

        // Claim the item
        if ($this->claimItem($queueId, $moderatorId)) {
            return $this->getItem($queueId);
        }

        return null;
    }

    /**
     * Resolve moderation item
     */
    public function resolve(
        int $queueId,
        string $moderatorId,
        string $decision,
        string $notes = ''
    ): void {
        $stmt = $this->db->prepare(
            "UPDATE moderation_queue
             SET status = :status,
                 moderator_id = :moderator_id,
                 decision = :decision,
                 moderator_notes = :notes,
                 resolved_at = NOW()
             WHERE id = :id"
        );

        $stmt->execute([
            ':id' => $queueId,
            ':status' => 'resolved',
            ':moderator_id' => $moderatorId,
            ':decision' => $decision,
            ':notes' => $notes
        ]);

        // Remove from Redis queue
        $this->redis->zrem('moderation:queue', (string)$queueId);
    }

    /**
     * Get queue statistics
     */
    public function getStats(): array
    {
        $stmt = $this->db->query(
            "SELECT
                COUNT(*) as total,
                SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) as pending,
                SUM(CASE WHEN status = 'in_review' THEN 1 ELSE 0 END) as in_review,
                SUM(CASE WHEN status = 'resolved' THEN 1 ELSE 0 END) as resolved,
                AVG(TIMESTAMPDIFF(MINUTE, created_at, resolved_at)) as avg_resolution_minutes
             FROM moderation_queue
             WHERE created_at > DATE_SUB(NOW(), INTERVAL 24 HOUR)"
        );

        return $stmt->fetch(\PDO::FETCH_ASSOC);
    }

    private function claimItem(int $queueId, string $moderatorId): bool
    {
        $stmt = $this->db->prepare(
            "UPDATE moderation_queue
             SET status = 'in_review',
                 moderator_id = :moderator_id,
                 claimed_at = NOW()
             WHERE id = :id
             AND status = 'pending'"
        );

        $stmt->execute([
            ':id' => $queueId,
            ':moderator_id' => $moderatorId
        ]);

        return $stmt->rowCount() > 0;
    }

    private function getItem(int $queueId): ?array
    {
        $stmt = $this->db->prepare(
            "SELECT * FROM moderation_queue WHERE id = :id"
        );
        $stmt->execute([':id' => $queueId]);

        return $stmt->fetch(\PDO::FETCH_ASSOC) ?: null;
    }

    private function getSeverityScore(string $severity): float
    {
        return match($severity) {
            'critical' => 1000,
            'high' => 750,
            'medium' => 500,
            'low' => 250,
            default => 0
        };
    }
}

Policy Engine

php

<?php
# filename: src/Moderation/PolicyEngine.php
declare(strict_types=1);

namespace App\Moderation;

class PolicyEngine
{
    private array $policies;

    public function __construct(array $policies = [])
    {
        $this->policies = $policies ?: $this->getDefaultPolicies();
    }

    /**
     * Check content against policies
     */
    public function checkViolations(
        ContentAnalysis $analysis,
        array $context = []
    ): array {
        $violations = [];

        foreach ($analysis->violations as $violation) {
            $policy = $this->findMatchingPolicy($violation);

            if ($policy && $this->violatesPolicy($violation, $policy, $context)) {
                $violations[] = new PolicyViolation(
                    policy: $policy['name'],
                    category: $violation['category'],
                    type: $violation['type'],
                    severity: $violation['severity'],
                    severityScore: $this->calculateSeverityScore($violation, $policy),
                    reason: $violation['reason'],
                    evidence: $violation['evidence'] ?? null,
                    action: $policy['action']
                );
            }
        }

        return $violations;
    }

    private function findMatchingPolicy(array $violation): ?array
    {
        foreach ($this->policies as $policy) {
            if ($this->matchesPolicy($violation, $policy)) {
                return $policy;
            }
        }
        return null;
    }

    private function matchesPolicy(array $violation, array $policy): bool
    {
        // Check if violation category matches policy
        if (isset($policy['categories']) &&
            !in_array($violation['category'], $policy['categories'])) {
            return false;
        }

        // Check minimum severity threshold
        if (isset($policy['min_severity']) &&
            $violation['severity'] < $policy['min_severity']) {
            return false;
        }

        return true;
    }

    private function violatesPolicy(
        array $violation,
        array $policy,
        array $context
    ): bool {
        // Apply policy-specific rules
        if (isset($policy['rules'])) {
            foreach ($policy['rules'] as $rule) {
                if (!$this->checkRule($rule, $violation, $context)) {
                    return false;
                }
            }
        }

        return true;
    }

    private function checkRule(string $rule, array $violation, array $context): bool
    {
        // Implement custom rule logic
        return match($rule) {
            'requires_high_confidence' => ($violation['confidence'] ?? 0) > 0.8,
            'context_aware' => $this->isContextAppropriate($violation, $context),
            default => true
        };
    }

    private function isContextAppropriate(array $violation, array $context): bool
    {
        // Check if content is appropriate for context
        // E.g., medical discussion in health forum, etc.
        return true; // Simplified
    }

    private function calculateSeverityScore(array $violation, array $policy): float
    {
        $baseScore = $violation['severity'];

        // Apply policy modifiers
        if (isset($policy['severity_multiplier'])) {
            $baseScore *= $policy['severity_multiplier'];
        }

        return min($baseScore, 1.0);
    }

    private function getDefaultPolicies(): array
    {
        return [
            [
                'name' => 'hate_speech',
                'categories' => ['hate_speech', 'discrimination'],
                'min_severity' => 0.7,
                'action' => 'block',
                'severity_multiplier' => 1.2
            ],
            [
                'name' => 'threats',
                'categories' => ['threats', 'violence'],
                'min_severity' => 0.6,
                'action' => 'block',
                'severity_multiplier' => 1.5
            ],
            [
                'name' => 'spam',
                'categories' => ['spam'],
                'min_severity' => 0.7,
                'action' => 'block'
            ],
            [
                'name' => 'pii',
                'categories' => ['pii'],
                'min_severity' => 0.5,
                'action' => 'flag'
            ]
        ];
    }
}

Audit Logger

php

<?php
# filename: src/Moderation/AuditLogger.php
declare(strict_types=1);

namespace App\Moderation;

class AuditLogger
{
    public function __construct(
        private \PDO $db
    ) {}

    /**
     * Log moderation decision for audit trail
     */
    public function log(
        string $content,
        ModerationResult $result,
        ModerationAction $action,
        array $context = []
    ): int {
        $stmt = $this->db->prepare(
            "INSERT INTO moderation_audit_log
             (content_hash, content_preview, approved, action, severity, violations, 
              explanation, confidence, context, user_id, ip_address, created_at)
             VALUES (:hash, :preview, :approved, :action, :severity, :violations,
                     :explanation, :confidence, :context, :user_id, :ip, NOW())"
        );

        $contentHash = hash('sha256', $content);
        $contentPreview = mb_substr($content, 0, 200);

        $stmt->execute([
            ':hash' => $contentHash,
            ':preview' => $contentPreview,
            ':approved' => $result->approved ? 1 : 0,
            ':action' => $action->type,
            ':severity' => $result->severity,
            ':violations' => json_encode($result->violations),
            ':explanation' => $result->explanation,
            ':confidence' => $result->confidence,
            ':context' => json_encode($context),
            ':user_id' => $context['user_id'] ?? null,
            ':ip' => $context['ip_address'] ?? null
        ]);

        return (int)$this->db->lastInsertId();
    }

    /**
     * Get audit log entries for a user
     */
    public function getUserLog(string $userId, int $limit = 100): array
    {
        $stmt = $this->db->prepare(
            "SELECT * FROM moderation_audit_log
             WHERE user_id = :user_id
             ORDER BY created_at DESC
             LIMIT :limit"
        );
        $stmt->bindValue(':user_id', $userId);
        $stmt->bindValue(':limit', $limit, \PDO::PARAM_INT);
        $stmt->execute();

        return $stmt->fetchAll(\PDO::FETCH_ASSOC);
    }

    /**
     * Get audit statistics
     */
    public function getStats(array $filters = []): array
    {
        $where = [];
        $params = [];

        if (isset($filters['user_id'])) {
            $where[] = "user_id = :user_id";
            $params[':user_id'] = $filters['user_id'];
        }

        if (isset($filters['date_from'])) {
            $where[] = "created_at >= :date_from";
            $params[':date_from'] = $filters['date_from'];
        }

        if (isset($filters['date_to'])) {
            $where[] = "created_at <= :date_to";
            $params[':date_to'] = $filters['date_to'];
        }

        $whereClause = !empty($where) ? 'WHERE ' . implode(' AND ', $where) : '';

        $stmt = $this->db->prepare(
            "SELECT
                COUNT(*) as total_decisions,
                SUM(CASE WHEN approved = 1 THEN 1 ELSE 0 END) as approved_count,
                SUM(CASE WHEN approved = 0 THEN 1 ELSE 0 END) as blocked_count,
                AVG(confidence) as avg_confidence,
                COUNT(DISTINCT user_id) as unique_users
             FROM moderation_audit_log
             {$whereClause}"
        );

        $stmt->execute($params);
        return $stmt->fetch(\PDO::FETCH_ASSOC);
    }

    /**
     * Search audit log by content hash or preview
     */
    public function search(string $query, int $limit = 50): array
    {
        $stmt = $this->db->prepare(
            "SELECT * FROM moderation_audit_log
             WHERE content_preview LIKE :query
             OR content_hash = :hash
             ORDER BY created_at DESC
             LIMIT :limit"
        );

        $stmt->bindValue(':query', '%' . $query . '%');
        $stmt->bindValue(':hash', hash('sha256', $query));
        $stmt->bindValue(':limit', $limit, \PDO::PARAM_INT);
        $stmt->execute();

        return $stmt->fetchAll(\PDO::FETCH_ASSOC);
    }
}

Moderator Workflow System

php

<?php
# filename: src/Moderation/ModeratorWorkflow.php
declare(strict_types=1);

namespace App\Moderation;

class ModeratorWorkflow
{
    public function __construct(
        private \PDO $db,
        private ModerationQueue $queue,
        private AuditLogger $auditLogger
    ) {}

    /**
     * Get moderator dashboard with queued items
     */
    public function getDashboard(string $moderatorId, array $filters = []): array
    {
        return [
            'statistics' => $this->getStatistics($moderatorId),
            'pending_queue' => $this->getPendingQueue($filters),
            'recent_decisions' => $this->getRecentDecisions($moderatorId),
            'performance_metrics' => $this->getPerformanceMetrics($moderatorId)
        ];
    }

    /**
     * Get moderator statistics
     */
    private function getStatistics(string $moderatorId): array
    {
        $stmt = $this->db->prepare(
            "SELECT
                COUNT(*) as total_decisions,
                AVG(TIMESTAMPDIFF(MINUTE, claimed_at, resolved_at)) as avg_resolution_minutes,
                SUM(CASE WHEN decision = 'approve' THEN 1 ELSE 0 END) as approved,
                SUM(CASE WHEN decision = 'reject' THEN 1 ELSE 0 END) as rejected,
                SUM(CASE WHEN decision = 'flag' THEN 1 ELSE 0 END) as flagged
             FROM moderation_queue
             WHERE moderator_id = :moderator_id
             AND status = 'resolved'
             AND resolved_at > DATE_SUB(NOW(), INTERVAL 7 DAY)"
        );
        $stmt->execute([':moderator_id' => $moderatorId]);
        return $stmt->fetch(\PDO::FETCH_ASSOC) ?: [];
    }

    /**
     * Get pending items for moderator review
     */
    private function getPendingQueue(array $filters = []): array
    {
        $where = ['status = ?'];
        $params = ['in_review'];

        if (isset($filters['severity'])) {
            $where[] = 'severity = ?';
            $params[] = $filters['severity'];
        }

        if (isset($filters['user_id'])) {
            $where[] = 'user_id = ?';
            $params[] = $filters['user_id'];
        }

        $whereClause = implode(' AND ', $where);

        $stmt = $this->db->prepare(
            "SELECT * FROM moderation_queue
             WHERE {$whereClause}
             ORDER BY severity DESC, created_at ASC
             LIMIT 50"
        );
        $stmt->execute($params);
        return $stmt->fetchAll(\PDO::FETCH_ASSOC);
    }

    /**
     * Get moderator's recent decisions
     */
    private function getRecentDecisions(string $moderatorId, int $limit = 20): array
    {
        $stmt = $this->db->prepare(
            "SELECT * FROM moderation_queue
             WHERE moderator_id = :moderator_id
             AND status = 'resolved'
             ORDER BY resolved_at DESC
             LIMIT :limit"
        );
        $stmt->bindValue(':moderator_id', $moderatorId);
        $stmt->bindValue(':limit', $limit, \PDO::PARAM_INT);
        $stmt->execute();
        return $stmt->fetchAll(\PDO::FETCH_ASSOC);
    }

    /**
     * Get moderator performance metrics
     */
    private function getPerformanceMetrics(string $moderatorId): array
    {
        $stmt = $this->db->prepare(
            "SELECT
                COUNT(*) as total_items_reviewed,
                COUNT(DISTINCT user_id) as unique_users_reviewed,
                COUNT(DISTINCT DATE(resolved_at)) as days_active,
                MIN(TIMESTAMPDIFF(MINUTE, claimed_at, resolved_at)) as min_review_time,
                MAX(TIMESTAMPDIFF(MINUTE, claimed_at, resolved_at)) as max_review_time
             FROM moderation_queue
             WHERE moderator_id = :moderator_id
             AND status = 'resolved'"
        );
        $stmt->execute([':moderator_id' => $moderatorId]);
        return $stmt->fetch(\PDO::FETCH_ASSOC) ?: [];
    }

    /**
     * Submit moderator decision
     */
    public function submitDecision(
        int $queueId,
        string $moderatorId,
        string $decision,
        string $notes = '',
        array $reasoning = []
    ): bool {
        try {
            $this->db->beginTransaction();

            // Update queue item
            $stmt = $this->db->prepare(
                "UPDATE moderation_queue
                 SET decision = :decision,
                     status = 'resolved',
                     moderator_id = :moderator_id,
                     moderator_notes = :notes,
                     resolved_at = NOW()
                 WHERE id = :id"
            );

            $stmt->execute([
                ':decision' => $decision,
                ':moderator_id' => $moderatorId,
                ':notes' => $notes,
                ':id' => $queueId
            ]);

            // Log in audit trail
            $this->auditLogger->log(
                content: $this->getContentPreview($queueId),
                result: new ModerationResult(
                    approved: $decision === 'approve',
                    violations: [],
                    severity: 'none',
                    action: $decision,
                    explanation: "Moderator decision: {$notes}",
                    confidence: 1.0
                ),
                action: new ModerationAction(
                    approved: $decision === 'approve',
                    type: $decision,
                    severity: 'none',
                    requiresHumanReview: false,
                    explanation: "Moderated by {$moderatorId}"
                ),
                context: ['moderator_reasoning' => $reasoning]
            );

            $this->db->commit();
            return true;
        } catch (\Exception $e) {
            $this->db->rollBack();
            error_log("Decision submission error: " . $e->getMessage());
            return false;
        }
    }

    /**
     * Appeal moderation decision
     */
    public function createAppeal(
        int $queueId,
        string $userId,
        string $reason,
        string $evidence = ''
    ): int {
        $stmt = $this->db->prepare(
            "INSERT INTO moderation_appeals
             (queue_id, user_id, reason, evidence, status, created_at)
             VALUES (:queue_id, :user_id, :reason, :evidence, 'pending', NOW())"
        );

        $stmt->execute([
            ':queue_id' => $queueId,
            ':user_id' => $userId,
            ':reason' => $reason,
            ':evidence' => $evidence
        ]);

        return (int)$this->db->lastInsertId();
    }

    /**
     * Get appeal details
     */
    public function getAppeal(int $appealId): ?array
    {
        $stmt = $this->db->prepare(
            "SELECT mq.*, ma.reason, ma.evidence, ma.status as appeal_status
             FROM moderation_appeals ma
             JOIN moderation_queue mq ON ma.queue_id = mq.id
             WHERE ma.id = :id"
        );
        $stmt->execute([':id' => $appealId]);
        return $stmt->fetch(\PDO::FETCH_ASSOC);
    }

    private function getContentPreview(int $queueId): string
    {
        $stmt = $this->db->prepare(
            "SELECT content FROM moderation_queue WHERE id = :id"
        );
        $stmt->execute([':id' => $queueId]);
        $result = $stmt->fetch(\PDO::FETCH_ASSOC);
        return $result['content'] ?? '';
    }
}

Complete Moderation API

php

<?php
# filename: api/moderate.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;
use App\Moderation\ModerationSystem;
use App\Moderation\ContentAnalyzer;
use App\Moderation\PolicyEngine;
use App\Moderation\ModerationQueue;
use App\Moderation\AuditLogger;

header('Content-Type: application/json');

// Initialize
$db = new PDO(getenv('DATABASE_DSN'));
$redis = new Redis();
$redis->connect('localhost', 6379);

$claude = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

$analyzer = new ContentAnalyzer($claude);
$policyEngine = new PolicyEngine();
$queue = new ModerationQueue($db, $redis);
$auditLogger = new AuditLogger($db);

$moderationSystem = new ModerationSystem(
    claude: $claude,
    analyzer: $analyzer,
    policyEngine: $policyEngine,
    queue: $queue,
    auditLogger: $auditLogger
);

// Handle request
$input = json_decode(file_get_contents('php://input'), true);

if (!isset($input['content'])) {
    http_response_code(400);
    echo json_encode(['error' => 'content required']);
    exit;
}

try {
    $result = $moderationSystem->moderateContent(
        content: $input['content'],
        contentType: $input['type'] ?? 'text',
        context: $input['context'] ?? []
    );

    echo json_encode([
        'approved' => $result->approved,
        'action' => $result->action,
        'severity' => $result->severity,
        'violations' => array_map(fn($v) => [
            'category' => $v->category,
            'type' => $v->type,
            'reason' => $v->reason
        ], $result->violations),
        'explanation' => $result->explanation,
        'confidence' => $result->confidence
    ]);

} catch (\Exception $e) {
    http_response_code(500);
    echo json_encode(['error' => $e->getMessage()]);
}

Database Schema

sql

-- Moderation queue table
CREATE TABLE moderation_queue (
    id INT AUTO_INCREMENT PRIMARY KEY,
    content TEXT NOT NULL,
    content_type VARCHAR(50) DEFAULT 'text',
    user_id VARCHAR(255) NULL,
    violations JSON NULL,
    severity ENUM('none', 'low', 'medium', 'high', 'critical') DEFAULT 'none',
    context JSON NULL,
    status ENUM('pending', 'in_review', 'resolved') DEFAULT 'pending',
    moderator_id VARCHAR(255) NULL,
    decision ENUM('approve', 'reject', 'flag') NULL,
    moderator_notes TEXT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    claimed_at TIMESTAMP NULL,
    resolved_at TIMESTAMP NULL,
    INDEX idx_status (status),
    INDEX idx_severity (severity),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Audit log table
CREATE TABLE moderation_audit_log (
    id INT AUTO_INCREMENT PRIMARY KEY,
    content_hash VARCHAR(64) NOT NULL,
    content_preview VARCHAR(200) NOT NULL,
    approved BOOLEAN NOT NULL,
    action VARCHAR(50) NOT NULL,
    severity ENUM('none', 'low', 'medium', 'high', 'critical') NOT NULL,
    violations JSON NULL,
    explanation TEXT NULL,
    confidence DECIMAL(3,2) DEFAULT 0.00,
    context JSON NULL,
    user_id VARCHAR(255) NULL,
    ip_address VARCHAR(45) NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_content_hash (content_hash),
    INDEX idx_user_id (user_id),
    INDEX idx_approved (approved),
    INDEX idx_action (action),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- User content table (for spam detection)
CREATE TABLE user_content (
    id INT AUTO_INCREMENT PRIMARY KEY,
    user_id VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    content_type VARCHAR(50) DEFAULT 'text',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Users table (for account age tracking)
CREATE TABLE users (
    id VARCHAR(255) PRIMARY KEY,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Moderation appeals table
CREATE TABLE moderation_appeals (
    id INT AUTO_INCREMENT PRIMARY KEY,
    queue_id INT NOT NULL,
    user_id VARCHAR(255) NOT NULL,
    reason TEXT NOT NULL,
    evidence TEXT NULL,
    status ENUM('pending', 'approved', 'rejected') DEFAULT 'pending',
    reviewed_by VARCHAR(255) NULL,
    reviewed_at TIMESTAMP NULL,
    reviewer_notes TEXT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (queue_id) REFERENCES moderation_queue(id),
    INDEX idx_status (status),
    INDEX idx_user_id (user_id),
    INDEX idx_created_at (created_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Analytics and Metrics

php

<?php
# filename: src/Moderation/ModerationAnalytics.php
declare(strict_types=1);

namespace App\Moderation;

class ModerationAnalytics
{
    public function __construct(
        private \PDO $db
    ) {}

    /**
     * Get comprehensive moderation metrics
     */
    public function getMetrics(\DateTime $startDate, \DateTime $endDate): array
    {
        return [
            'total_moderated' => $this->getTotalModerated($startDate, $endDate),
            'approval_rate' => $this->getApprovalRate($startDate, $endDate),
            'block_rate' => $this->getBlockRate($startDate, $endDate),
            'average_confidence' => $this->getAverageConfidence($startDate, $endDate),
            'violation_distribution' => $this->getViolationDistribution($startDate, $endDate),
            'severity_distribution' => $this->getSeverityDistribution($startDate, $endDate),
            'queue_stats' => $this->getQueueStats($startDate, $endDate),
            'top_violations' => $this->getTopViolations($startDate, $endDate),
            'pii_detections' => $this->getPIIDetections($startDate, $endDate),
            'spam_detections' => $this->getSpamDetections($startDate, $endDate)
        ];
    }

    private function getTotalModerated(\DateTime $start, \DateTime $end): int
    {
        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        return (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];
    }

    private function getApprovalRate(\DateTime $start, \DateTime $end): float
    {
        $total = $this->getTotalModerated($start, $end);
        if ($total === 0) {
            return 0.0;
        }

        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             AND approved = 1"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        $approved = (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];

        return ($approved / $total) * 100;
    }

    private function getBlockRate(\DateTime $start, \DateTime $end): float
    {
        $total = $this->getTotalModerated($start, $end);
        if ($total === 0) {
            return 0.0;
        }

        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             AND approved = 0"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        $blocked = (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];

        return ($blocked / $total) * 100;
    }

    private function getAverageConfidence(\DateTime $start, \DateTime $end): float
    {
        $stmt = $this->db->prepare(
            "SELECT AVG(confidence) as avg_confidence FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        $result = $stmt->fetch(\PDO::FETCH_ASSOC);
        return (float)($result['avg_confidence'] ?? 0);
    }

    private function getViolationDistribution(\DateTime $start, \DateTime $end): array
    {
        $stmt = $this->db->prepare(
            "SELECT violations FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             AND violations IS NOT NULL"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);

        $distribution = [];
        while ($row = $stmt->fetch(\PDO::FETCH_ASSOC)) {
            $violations = json_decode($row['violations'], true) ?? [];
            foreach ($violations as $violation) {
                $category = $violation['category'] ?? 'unknown';
                $distribution[$category] = ($distribution[$category] ?? 0) + 1;
            }
        }

        arsort($distribution);
        return $distribution;
    }

    private function getSeverityDistribution(\DateTime $start, \DateTime $end): array
    {
        $stmt = $this->db->prepare(
            "SELECT severity, COUNT(*) as count
             FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             GROUP BY severity"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);

        $distribution = [];
        while ($row = $stmt->fetch(\PDO::FETCH_ASSOC)) {
            $distribution[$row['severity']] = (int)$row['count'];
        }

        return $distribution;
    }

    private function getQueueStats(\DateTime $start, \DateTime $end): array
    {
        $stmt = $this->db->prepare(
            "SELECT
                COUNT(*) as total,
                SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) as pending,
                SUM(CASE WHEN status = 'in_review' THEN 1 ELSE 0 END) as in_review,
                SUM(CASE WHEN status = 'resolved' THEN 1 ELSE 0 END) as resolved,
                AVG(TIMESTAMPDIFF(MINUTE, created_at, resolved_at)) as avg_resolution_minutes
             FROM moderation_queue
             WHERE created_at BETWEEN :start AND :end"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);

        return $stmt->fetch(\PDO::FETCH_ASSOC) ?: [];
    }

    private function getTopViolations(\DateTime $start, \DateTime $end, int $limit = 10): array
    {
        $distribution = $this->getViolationDistribution($start, $end);
        return array_slice($distribution, 0, $limit, true);
    }

    private function getPIIDetections(\DateTime $start, \DateTime $end): int
    {
        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             AND violations LIKE '%pii%'"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        return (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];
    }

    private function getSpamDetections(\DateTime $start, \DateTime $end): int
    {
        $stmt = $this->db->prepare(
            "SELECT COUNT(*) as count FROM moderation_audit_log
             WHERE created_at BETWEEN :start AND :end
             AND violations LIKE '%spam%'"
        );
        $stmt->execute([
            ':start' => $start->format('Y-m-d H:i:s'),
            ':end' => $end->format('Y-m-d H:i:s')
        ]);
        return (int)$stmt->fetch(\PDO::FETCH_ASSOC)['count'];
    }
}

Usage Examples

php

<?php
# filename: examples/moderate-content.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use App\Moderation\ModerationSystem;
use App\Moderation\ContentAnalyzer;
use App\Moderation\PolicyEngine;
use App\Moderation\ModerationQueue;
use App\Moderation\AuditLogger;
use App\Moderation\ToxicityDetector;
use App\Moderation\PIIDetector;
use App\Moderation\SpamDetector;
use Anthropic\Anthropic;

// Initialize components
$db = new PDO(getenv('DATABASE_DSN'));
$redis = new Redis();
$redis->connect('localhost', 6379);

$claude = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

$analyzer = new ContentAnalyzer($claude);
$policyEngine = new PolicyEngine();
$queue = new ModerationQueue($db, $redis);
$auditLogger = new AuditLogger($db);

$moderationSystem = new ModerationSystem(
    claude: $claude,
    analyzer: $analyzer,
    policyEngine: $policyEngine,
    queue: $queue,
    auditLogger: $auditLogger
);

// Example 1: Moderate text content
echo "Example 1: Moderate Text Content\n";
echo str_repeat('=', 50) . "\n";

$result = $moderationSystem->moderateContent(
    content: "This is a great product! I love it.",
    contentType: 'text',
    context: ['user_id' => 'user123', 'ip_address' => '192.168.1.1']
);

echo "Approved: " . ($result->approved ? 'Yes' : 'No') . "\n";
echo "Action: {$result->action}\n";
echo "Severity: {$result->severity}\n";
echo "Confidence: " . round($result->confidence * 100, 2) . "%\n";
echo "Explanation: {$result->explanation}\n";
echo "\n";

// Example 2: Detect toxic language
echo "Example 2: Detect Toxic Language\n";
echo str_repeat('=', 50) . "\n";

$toxicityDetector = new ToxicityDetector($claude);
$toxicityReport = $toxicityDetector->detect(
    "This is educational content discussing hate speech patterns."
);

echo "Is Toxic: " . ($toxicityReport->isToxic ? 'Yes' : 'No') . "\n";
echo "Toxicity Score: " . round($toxicityReport->toxicityScore * 100, 2) . "%\n";
echo "Categories: " . implode(', ', $toxicityReport->categories) . "\n";
echo "Recommendation: {$toxicityReport->recommendation}\n";
echo "\n";

// Example 3: Detect and redact PII
echo "Example 3: Detect and Redact PII\n";
echo str_repeat('=', 50) . "\n";

$piiDetector = new PIIDetector($claude);
$textWithPII = "Contact me at john.doe@example.com or call 555-123-4567";
$piiReport = $piiDetector->detect($textWithPII);

echo "Has PII: " . ($piiReport->hasPII ? 'Yes' : 'No') . "\n";
echo "Risk Level: {$piiReport->riskLevel}\n";
echo "Original: {$textWithPII}\n";

if ($piiReport->hasPII) {
    $redacted = $piiDetector->redact($textWithPII, $piiReport);
    echo "Redacted: {$redacted}\n";
}
echo "\n";

// Example 4: Detect spam
echo "Example 4: Detect Spam\n";
echo str_repeat('=', 50) . "\n";

$spamDetector = new SpamDetector($claude, $db);
$spamReport = $spamDetector->detect(
    content: "Click here now! Amazing deals! Buy now!",
    userId: 'user456',
    context: []
);

echo "Is Spam: " . ($spamReport->isSpam ? 'Yes' : 'No') . "\n";
echo "Spam Score: " . round($spamReport->spamScore * 100, 2) . "%\n";
echo "Type: {$spamReport->type}\n";
echo "Indicators: " . implode(', ', $spamReport->indicators) . "\n";
echo "\n";

// Example 5: Batch moderation
echo "Example 5: Batch Moderation\n";
echo str_repeat('=', 50) . "\n";

$items = [
    ['id' => '1', 'content' => 'Great post!', 'type' => 'text'],
    ['id' => '2', 'content' => 'Spam content here', 'type' => 'text'],
    ['id' => '3', 'content' => 'Normal discussion', 'type' => 'text']
];

$batchResults = $moderationSystem->moderateBatch($items);

foreach ($batchResults as $itemId => $result) {
    echo "Item {$itemId}: " . ($result->approved ? 'Approved' : 'Blocked') . "\n";
    echo "  Action: {$result->action}, Severity: {$result->severity}\n";
}
echo "\n";

// Example 6: Get queue statistics
echo "Example 6: Queue Statistics\n";
echo str_repeat('=', 50) . "\n";

$stats = $queue->getStats();
echo "Total Items: {$stats['total']}\n";
echo "Pending: {$stats['pending']}\n";
echo "In Review: {$stats['in_review']}\n";
echo "Resolved: {$stats['resolved']}\n";
echo "Avg Resolution Time: " . round($stats['avg_resolution_minutes'] ?? 0, 2) . " minutes\n";

Data Structures

php

<?php
# filename: src/Moderation/DataStructures.php
declare(strict_types=1);

namespace App\Moderation;

readonly class ModerationResult
{
    public function __construct(
        public bool $approved,
        public array $violations,
        public string $severity,
        public string $action,
        public string $explanation,
        public float $confidence
    ) {}
}

readonly class ModerationAction
{
    public function __construct(
        public bool $approved,
        public string $type,
        public string $severity,
        public bool $requiresHumanReview,
        public string $explanation
    ) {}
}

class ContentAnalysis
{
    public array $violations;
    public array $piiDetected;
    public float $safetyScore;
    public bool $requiresReview;
    public string $suggestedAction;
    public float $confidence;

    public function __construct(array $data)
    {
        $this->violations = $data['violations'] ?? [];
        $this->piiDetected = $data['pii_detected'] ?? [];
        $this->safetyScore = $data['overall_safety_score'] ?? 1.0;
        $this->requiresReview = $data['requires_human_review'] ?? false;
        $this->suggestedAction = $data['suggested_action'] ?? 'approve';
        $this->confidence = $this->calculateConfidence($data);
    }

    private function calculateConfidence(array $data): float
    {
        if (empty($data['violations'])) {
            return 0.95;
        }

        $confidences = array_column($data['violations'], 'confidence');
        return !empty($confidences) ? array_sum($confidences) / count($confidences) : 0.5;
    }
}

readonly class PolicyViolation
{
    public function __construct(
        public string $policy,
        public string $category,
        public string $type,
        public float $severity,
        public float $severityScore,
        public string $reason,
        public ?string $evidence,
        public string $action
    ) {}
}

readonly class ToxicityReport
{
    public function __construct(
        public bool $isToxic,
        public float $toxicityScore,
        public array $categories,
        public array $targetedGroups,
        public array $contextualFactors,
        public string $recommendation
    ) {}
}

readonly class PIIReport
{
    public function __construct(
        public bool $hasPII,
        public array $items,
        public string $riskLevel
    ) {}
}

readonly class SpamReport
{
    public function __construct(
        public bool $isSpam,
        public float $spamScore,
        public array $indicators,
        public string $type,
        public string $recommendation
    ) {}
}

Wrap-up

Congratulations! You've built a comprehensive content moderation system. Here's what you've accomplished:

✓ Moderation System Core: Created an intelligent moderation platform that analyzes content with context awareness
✓ Content Analyzer: Implemented multi-category violation detection (toxic language, spam, PII, inappropriate content, misinformation, copyright)
✓ Toxicity Detection: Built context-aware toxic language detection that distinguishes legitimate discussion from harmful content
✓ PII Detection: Developed pattern-based and AI-enhanced PII detection with automatic redaction capabilities
✓ Spam Detection: Created behavioral analysis that combines content patterns with user behavior indicators
✓ Moderation Queue: Implemented priority-based queue system with Redis for real-time processing
✓ Policy Engine: Designed flexible policy enforcement with customizable rules and severity scoring
✓ Audit Logging: Built comprehensive audit trails for accountability and appeal processes
✓ Database Schema: Designed normalized database tables for queue management, audit logging, and appeals
✓ Analytics Dashboard: Created metrics tracking for moderation performance, violation distribution, and queue statistics
✓ Moderator Workflow: Implemented moderator dashboards, decision submission, and performance metrics
✓ Appeal System: Developed user appeal workflows with moderator review process
✓ Usage Examples: Provided practical examples demonstrating all moderation features

Key Concepts Learned

Context-Aware Analysis: Claude's understanding of nuance reduces false positives by distinguishing satire, quotes, and educational content from actual violations
Multi-Layered Detection: Combining pattern matching (fast) with AI analysis (accurate) provides both speed and precision
Severity Scoring: Quantitative severity scores enable automated decision-making while maintaining human oversight for edge cases
Priority Queues: Redis sorted sets enable efficient priority-based processing of moderation items
Policy Flexibility: Configurable policy engine allows different rules for different content types and contexts
Privacy Protection: PII detection and redaction protect user privacy and ensure GDPR/CCPA compliance
Behavioral Analysis: User behavior patterns (posting frequency, account age, repetitive content) enhance spam detection accuracy

Next Steps

Your moderation system is production-ready, but consider these enhancements:

Add image moderation using vision models for inappropriate visual content
Implement machine learning to improve detection accuracy from moderator feedback
Build a moderator dashboard UI for reviewing flagged content
Add appeal workflows allowing users to contest moderation decisions
Integrate with user reputation systems to adjust moderation thresholds
Create automated reporting for compliance and legal requirements
Add multi-language support with language-specific policy rules
Implement rate limiting and throttling to prevent abuse

Unique Features of Content Moderation

Unlike customer support (Chapter 28) or data extraction (Chapter 30), content moderation includes:

Policy Enforcement: Customizable policy engines with severity scoring
PII Detection & Redaction: Privacy-first approach protecting user data
Behavioral Analysis: User behavior pattern detection for spam (account age, posting frequency)
Multi-Layered Detection: Combining pattern matching + AI for accuracy and speed
Human Review Workflows: Queue prioritization for moderator review
Audit Trails: Complete logging for compliance, appeals, and accountability
Severity Scoring: Quantitative metrics for automated decision-making
False Positive Reduction: Context-aware analysis reducing legitimate content flagging

Troubleshooting

Issue: Content Analysis Returns No Violations When Violations Exist

Symptom: analyze() method returns empty violations array even when content clearly violates policies

Cause: JSON parsing failing or Claude response format changed

Solution: Add better error handling and logging:

php

private function parseAnalysis(string $jsonText): ContentAnalysis
{
    // Extract JSON from response
    if (preg_match('/\{.*\}/s', $jsonText, $matches)) {
        $data = json_decode($matches[0], true);
        if (json_last_error() !== JSON_ERROR_NONE) {
            error_log("JSON decode error: " . json_last_error_msg());
            error_log("Response text: " . $jsonText);
            throw new \RuntimeException('Failed to parse analysis: ' . json_last_error_msg());
        }
        return new ContentAnalysis($data);
    }

    error_log("No JSON found in response: " . $jsonText);
    throw new \RuntimeException('Failed to parse analysis: No JSON found');
}

Issue: PII Detection Misses Complex Cases

Symptom: Pattern-based detection works but AI detection returns empty results

Cause: AI detection prompt not specific enough or model selection incorrect

Solution: Use more specific prompts and verify model selection:

php

private function aiDetect(string $text): array
{
    $prompt = <<<PROMPT
Detect personally identifiable information (PII) in this text.

Text: {$text}

Identify ALL instances of:
- Full names (first + last name together, especially with context like "my name is" or "contact me at")
- Home addresses (street addresses with city/state)
- Government ID numbers (SSN, passport, driver's license)
- Financial information (credit card numbers, bank account numbers)
- Medical information (patient IDs, medical record numbers)
- Login credentials (usernames with passwords)
- Phone numbers (in any format)
- Email addresses (if not already detected by patterns)

Return JSON array with ALL matches found. Be thorough.
PROMPT;

    $response = $this->claude->messages()->create([
        'model' => 'claude-sonnet-4-20250514', // Use Sonnet for better detection
        'max_tokens' => 2048, // Increase for more matches
        'temperature' => 0.1, // Lower for consistency
        'messages' => [[
            'role' => 'user',
            'content' => $prompt
        ]]
    ]);
    // ... rest of parsing
}

Issue: Moderation Queue Items Not Processing

Symptom: Items added to queue but getNext() returns null

Cause: Redis connection failing or queue key mismatch

Solution: Add connection checks and verify Redis operations:

php

public function getNext(string $moderatorId): ?array
{
    try {
        // Verify Redis connection
        if (!$this->redis->ping()) {
            error_log("Redis connection failed");
            // Fallback to database-only queue
            return $this->getNextFromDatabase($moderatorId);
        }

        // Get highest priority item
        $items = $this->redis->zrevrange('moderation:queue', 0, 0);
        
        if (empty($items)) {
            return null;
        }

        $queueId = (int)$items[0];
        
        // Verify item exists in database
        $item = $this->getItem($queueId);
        if (!$item) {
            // Clean up orphaned Redis entry
            $this->redis->zrem('moderation:queue', (string)$queueId);
            return null;
        }

        // Claim the item
        if ($this->claimItem($queueId, $moderatorId)) {
            return $item;
        }

        return null;
    } catch (\Exception $e) {
        error_log("Queue error: " . $e->getMessage());
        return $this->getNextFromDatabase($moderatorId);
    }
}

Issue: False Positives on Legitimate Content

Symptom: Educational content, quotes, or satire being flagged as violations

Cause: System prompt not emphasizing context awareness enough

Solution: Strengthen system prompt with more examples:

php

private function getAnalysisSystemPrompt(): string
{
    return <<<SYSTEM
You are a content moderation expert analyzing user-generated content.

CRITICAL: Context is everything. Always consider:

1. **Quotes and References**: Content quoting violations is NOT a violation itself
   - "He said 'kill you'" → NOT a threat (it's a quote)
   - Educational discussion of hate speech → NOT hate speech

2. **Satire and Criticism**: Criticizing toxicity is NOT toxic
   - "This is what hate speech looks like: [example]" → Educational, NOT violation
   - Satirical content mocking harmful ideas → NOT a violation

3. **Cultural Context**: Understand cultural and linguistic nuances
   - Reclaimed language by marginalized groups → Context-dependent
   - Medical/health discussions → Appropriate in health forums

4. **Intent Matters**: Distinguish between:
   - Actual threats vs. hypothetical discussion
   - Real harassment vs. friendly banter (in appropriate contexts)
   - Spam vs. legitimate promotion (in appropriate channels)

When in doubt, flag for human review rather than blocking.

Your analysis must be:
- Objective and unbiased
- Context-aware (distinguish satire, quotes, educational content)
- Culturally sensitive
- Consistent with platform policies
- Detailed with specific examples

Severity Scoring:
- 0.0-0.3: Minor issues, likely acceptable
- 0.4-0.6: Moderate concerns, flag for review
- 0.7-0.8: Serious violations, likely block
- 0.9-1.0: Severe violations, immediate block

Always err on the side of caution for:
- Child safety
- Violence or threats
- Illegal activity
- PII exposure
SYSTEM;
}

Key Takeaways

✓ AI-powered moderation handles scale and context better than pure pattern matching
✓ Multi-layered approach (patterns + AI) provides accuracy and speed
✓ PII detection and redaction protect user privacy and legal compliance
✓ Sentiment and context analysis reduce false positives
✓ Human review for edge cases ensures fairness and accuracy
✓ Audit trails provide accountability and appeal processes
✓ Priority queues ensure critical content gets immediate attention
✓ Policy engines allow flexible, customizable rules
✓ Real-time and batch processing support different use cases
✓ Continuous learning from moderator decisions improves accuracy

You've built a comprehensive content moderation system!Check the box when you've finished reading, or scroll to the bottom to auto-complete.

Continue to Chapter 30: Data Extraction and Analysis to build intelligent data processing pipelines.

💻 Code Samples

All code examples from this chapter are available in the GitHub repository:

View Chapter 29 Code Samples

Clone and run locally:

bash

git clone https://github.com/dalehurley/codewithphp.git
cd codewithphp/code/claude-php/chapter-29
composer install
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
php -S localhost:8000 api/moderate.php

Chapter 29: Content Moderation System ​

Overview ​

Objectives ​

Prerequisites ​

Architecture Overview ​

Content Analyzer ​

Toxic Language Detection ​

PII Detection and Redaction ​

Spam Detection ​

Moderation Queue System ​

Policy Engine ​

Audit Logger ​

Moderator Workflow System ​

Complete Moderation API ​

Database Schema ​

Analytics and Metrics ​

Usage Examples ​

Data Structures ​

Wrap-up ​

Key Concepts Learned ​

Next Steps ​

Unique Features of Content Moderation ​

Troubleshooting ​

Issue: Content Analysis Returns No Violations When Violations Exist ​

Issue: PII Detection Misses Complex Cases ​

Issue: Moderation Queue Items Not Processing ​

Issue: False Positives on Legitimate Content ​

Further Reading ​

Key Takeaways ​

💻 Code Samples ​