15: Structured Outputs with JSON

Home›Series›Claude for PHP Developers›Chapter 15

Chapter 15: Structured Outputs with JSON

Overview

Getting reliable, structured data from AI is critical for production applications. In this chapter, you'll master techniques for extracting consistent JSON responses from Claude, validating outputs, handling edge cases, and building robust data extraction pipelines.

You'll learn schema definition strategies, validation patterns, error recovery techniques, and how to build data extraction systems that work reliably at scale. We'll cover everything from basic structured output extraction to advanced batch processing and streaming implementations.

By the end of this chapter, you'll have built a complete data extraction system that can reliably extract structured information from unstructured text, validate it against schemas, handle errors gracefully, and process multiple items efficiently.

Prerequisites

Before starting this chapter, you should have:

✓ JSON and schema knowledge (JSON Schema standard)
✓ Data validation experience in PHP
✓ Completed Chapter 00 and Chapter 05
✓ Understanding of type systems and validation
✓ PHP 8.4+ installed and working
✓ Composer for dependency management

Estimated Time: 45-60 minutes

What You'll Build

By the end of this chapter, you will have created:

A SchemaExtractor class with retry logic and schema validation
Pre-built schema definitions for common data types (person, product, event, article, transaction)
A CustomValidator class for business logic validation beyond JSON Schema
A BatchExtractor class for processing multiple items efficiently
A StreamingExtractor class for handling large outputs
Complete extraction pipeline examples for real-world use cases
Production-ready error handling and validation layers

You'll have a complete understanding of how to extract structured data from Claude responses, validate outputs, handle edge cases, and build robust data extraction systems.

Objectives

By completing this chapter, you will:

Understand how to define JSON schemas for structured output extraction
Create reusable extraction classes with retry logic and validation
Implement schema validation using JSON Schema and custom validators
Build pre-built schemas for common data extraction use cases
Master batch processing and streaming for large-scale extraction
Apply error recovery techniques and fallback strategies
Design production-ready extraction pipelines with multiple validation layers

Install Validation Library

bash

composer require justinrainbow/json-schema
composer require symfony/validator

Step 1: Basic Structured Output (~10 min)

Goal

Create a simple function that extracts structured contact information from unstructured text using Claude. We'll show both the native API feature (recommended) and prompt-based extraction (fallback for older models).

Actions

Use Claude's native response_format parameter for guaranteed JSON structure (Sonnet 4.5+)
Create a fallback function using prompt-based extraction for older models
Parse the response to extract JSON from markdown code blocks or plain text
Validate the JSON and handle parsing errors gracefully
Test with sample data to verify the extraction works correctly

php

<?php
# filename: examples/01-basic-structured-output.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;

$client = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

function extractContactInfo(string $text, Anthropic $client, bool $useNativeFormat = true): array
{
    $schema = [
        'type' => 'object',
        'properties' => [
            'name' => [
                'type' => 'string',
                'description' => 'Full name of the contact'
            ],
            'email' => [
                'type' => 'string',
                'format' => 'email',
                'description' => 'Email address'
            ],
            'phone' => [
                'type' => 'string',
                'description' => 'Phone number'
            ],
            'company' => [
                'type' => 'string',
                'description' => 'Company name'
            ],
            'title' => [
                'type' => 'string',
                'description' => 'Job title'
            ]
        ],
        'required' => ['name'],
        'additionalProperties' => false
    ];

    // Method 1: Native structured output (recommended for Sonnet 4.5+)
    if ($useNativeFormat) {
        try {
            $response = $client->messages()->create([
                'model' => 'claude-sonnet-4-20250514',
                'max_tokens' => 1024,
                'messages' => [
                    [
                        'role' => 'user',
                        'content' => "Extract contact information from this text:\n\n{$text}"
                    ]
                ],
                'response_format' => [
                    'type' => 'json_schema',
                    'json_schema' => [
                        'name' => 'contact_extraction',
                        'strict' => true,
                        'schema' => $schema
                    ]
                ]
            ]);

            if (empty($response->content) || !isset($response->content[0]->text)) {
                throw new \RuntimeException('Empty response from Claude API');
            }

            // Native format returns valid JSON directly
            $data = json_decode($response->content[0]->text, true);

            if (json_last_error() !== JSON_ERROR_NONE) {
                throw new \RuntimeException('Invalid JSON response: ' . json_last_error_msg());
            }

            return $data;
        } catch (\Exception $e) {
            // Fallback to prompt-based if native format fails
            if (str_contains($e->getMessage(), 'response_format') || 
                str_contains($e->getMessage(), 'not supported')) {
                // Model doesn't support native structured output, use fallback
                return extractContactInfoPromptBased($text, $client, $schema);
            }
            throw $e;
        }
    }

    // Method 2: Prompt-based extraction (fallback for older models)
    return extractContactInfoPromptBased($text, $client, $schema);
}

function extractContactInfoPromptBased(string $text, Anthropic $client, array $schema): array
{
    $schema_json = json_encode($schema, JSON_PRETTY_PRINT);

    $prompt = <<<PROMPT
Extract contact information from this text and return as JSON matching this schema:

Schema: {$schema_json}

Text: {$text}

Return ONLY valid JSON matching the schema. Use null for missing fields.
PROMPT;

    $response = $client->messages()->create([
        'model' => 'claude-sonnet-4-20250514',
        'max_tokens' => 1024,
        'temperature' => 0.3, // Lower temperature for more consistent output
        'messages' => [
            [
                'role' => 'user',
                'content' => $prompt
            ]
        ]
    ]);

    if (empty($response->content) || !isset($response->content[0]->text)) {
        throw new \RuntimeException('Empty response from Claude API');
    }

    $responseText = $response->content[0]->text;

    // Extract JSON from response
    if (preg_match('/```json\s*(\{.*?\})\s*```/s', $responseText, $matches)) {
        $responseText = $matches[1];
    } elseif (preg_match('/(\{.*?\})/s', $responseText, $matches)) {
        $responseText = $matches[1];
    }

    $data = json_decode($responseText, true);

    if (json_last_error() !== JSON_ERROR_NONE) {
        throw new \RuntimeException('Invalid JSON response: ' . json_last_error_msg());
    }

    return $data;
}

// Example usage
$businessCard = <<<TEXT
Dr. Sarah Johnson
Chief Technology Officer
TechCorp Industries
sarah.johnson@techcorp.com
+1 (555) 123-4567
TEXT;

$contact = extractContactInfo($businessCard, $client);
echo json_encode($contact, JSON_PRETTY_PRINT) . "\n";

Expected Result

json

{
    "name": "Dr. Sarah Johnson",
    "email": "sarah.johnson@techcorp.com",
    "phone": "+1 (555) 123-4567",
    "company": "TechCorp Industries",
    "title": "Chief Technology Officer"
}

Why It Works

Native Structured Output (Recommended): Claude's response_format parameter with json_schema type enforces strict JSON structure at the API level. When strict: true is set, Claude guarantees the response matches the schema exactly, eliminating parsing errors. This is the most reliable method for structured extraction and is available on Sonnet 4.5+ and Opus 4.1+ models.

Prompt-Based Extraction (Fallback): For older models or when native format isn't available, we use prompt-based extraction. The function defines a JSON schema in the prompt, and Claude extracts information to match it. The regex patterns extract JSON from markdown code blocks (if Claude wraps it) or directly from the response text. Lower temperature (0.3) increases consistency. The json_decode() function parses the JSON string into a PHP array, and we validate it using json_last_error().

When to Use Each Method:

Native response_format: Use for Sonnet 4.5+ and Opus 4.1+ models when you need guaranteed schema compliance
Prompt-based: Use for older models or when you need more flexibility in the extraction process

Troubleshooting

Error: "Invalid JSON response" — Claude may have returned explanatory text along with JSON. The regex patterns should handle this, but if it persists, check the raw response text to see what Claude actually returned. You can add echo $responseText; before parsing to debug.
Error: "Empty response from Claude API" — This indicates the API call succeeded but returned no content. Check your API key and model name. Ensure you're using a valid model like claude-sonnet-4-20250514.
Missing fields — If some fields are missing, Claude will use null for optional fields. Ensure required fields are marked in the schema.
Malformed JSON — If Claude returns JSON with syntax errors, the extraction will fail. Consider adding retry logic (shown in Step 2) to handle this.
Variable scope issues — The function now accepts $client as a parameter instead of using global. Make sure to pass the client when calling the function.

Step 2: Schema-Based Extraction Class (~15 min)

Goal

Build a reusable SchemaExtractor class that handles extraction, validation, retry logic, and error recovery automatically.

Actions

Create the SchemaExtractor class with retry logic and schema validation
Implement JSON parsing that handles markdown code blocks and plain JSON
Add schema validation using the JSON Schema validator library
Create an extractList method for extracting arrays of items

php

<?php
# filename: src/Extraction/SchemaExtractor.php
declare(strict_types=1);

namespace App\Extraction;

use Anthropic\Anthropic;
use JsonSchema\Validator;
use JsonSchema\Constraints\Constraint;

class SchemaExtractor
{
    public function __construct(
        private Anthropic $client,
        private int $maxRetries = 3
    ) {}

    /**
     * Extract structured data matching a JSON schema
     */
    public function extract(string $input, array $schema, string $context = ''): array
    {
        $attempt = 0;
        $lastError = null;

        while ($attempt < $this->maxRetries) {
            $attempt++;

            try {
                $data = $this->attemptExtraction($input, $schema, $context, $lastError);

                // Validate against schema
                $validation = $this->validateSchema($data, $schema);

                if ($validation['valid']) {
                    return $data;
                }

                // Store validation errors for next attempt
                $lastError = implode(', ', $validation['errors']);

                if ($attempt >= $this->maxRetries) {
                    throw new \RuntimeException(
                        "Schema validation failed after {$this->maxRetries} attempts: {$lastError}"
                    );
                }

            } catch (\Exception $e) {
                if ($attempt >= $this->maxRetries) {
                    throw $e;
                }
                $lastError = $e->getMessage();
            }
        }

        throw new \RuntimeException('Extraction failed after maximum retries');
    }

    private function attemptExtraction(
        string $input,
        array $schema,
        string $context,
        ?string $previousError
    ): array {
        $schemaJson = json_encode($schema, JSON_PRETTY_PRINT);

        $prompt = "Extract structured data from the input and return as JSON.\n\n";

        if ($context) {
            $prompt .= "Context: {$context}\n\n";
        }

        $prompt .= "Required JSON Schema:\n{$schemaJson}\n\n";

        if ($previousError) {
            $prompt .= "Previous attempt had errors: {$previousError}\n";
            $prompt .= "Please fix these issues and try again.\n\n";
        }

        $prompt .= "Input:\n{$input}\n\n";
        $prompt .= "Return ONLY valid JSON matching the schema exactly. No explanation or markdown.";

        $response = $this->client->messages()->create([
            'model' => 'claude-sonnet-4-20250514',
            'max_tokens' => 4096,
            'temperature' => 0.3, // Lower temperature for more consistent output
            'messages' => [
                [
                    'role' => 'user',
                    'content' => $prompt
                ]
            ]
        ]);

        if (empty($response->content) || !isset($response->content[0]->text)) {
            throw new \RuntimeException('Empty response from Claude API');
        }

        return $this->parseJSON($response->content[0]->text);
    }

    private function parseJSON(string $text): array
    {
        // Try to extract JSON from markdown code blocks first
        if (preg_match('/```json\s*(\{.*\}|\[.*\])\s*```/s', $text, $matches)) {
            $text = $matches[1];
        } elseif (preg_match('/(\{.*\}|\[.*\])/s', $text, $matches)) {
            // Match the outermost JSON object or array (greedy match)
            $text = $matches[1];
        }

        $data = json_decode($text, true);

        if (json_last_error() !== JSON_ERROR_NONE) {
            throw new \RuntimeException('Invalid JSON: ' . json_last_error_msg());
        }

        if (!is_array($data)) {
            throw new \RuntimeException('Expected array or object, got: ' . gettype($data));
        }

        return $data;
    }

    private function validateSchema(array $data, array $schema): array
    {
        $validator = new Validator();
        $dataObj = json_decode(json_encode($data));
        $schemaObj = json_decode(json_encode($schema));

        $validator->validate($dataObj, $schemaObj, Constraint::CHECK_MODE_APPLY_DEFAULTS);

        $errors = [];
        if (!$validator->isValid()) {
            foreach ($validator->getErrors() as $error) {
                $errors[] = sprintf("[%s] %s", $error['property'], $error['message']);
            }
        }

        return [
            'valid' => $validator->isValid(),
            'errors' => $errors
        ];
    }

    /**
     * Extract array of items
     */
    public function extractList(string $input, array $itemSchema, string $context = ''): array
    {
        $schema = [
            'type' => 'object',
            'properties' => [
                'items' => [
                    'type' => 'array',
                    'items' => $itemSchema
                ]
            ],
            'required' => ['items']
        ];

        $result = $this->extract($input, $schema, $context);
        return $result['items'] ?? [];
    }
}

Expected Result

The class provides a robust extraction system that:

Automatically retries failed extractions up to 3 times
Validates outputs against the provided schema
Provides detailed error messages for validation failures
Handles JSON parsing from various response formats
Supports both single object and array extraction

Why It Works

The extract() method implements a retry loop that attempts extraction up to maxRetries times. Each attempt validates the result against the JSON schema. If validation fails, the errors are fed back into the next prompt, allowing Claude to correct its output. The parseJSON() method uses regex to extract JSON from markdown code blocks (common when Claude formats responses) or directly from the text. The JSON Schema validator ensures the extracted data matches the expected structure and types.

Troubleshooting

Error: "Schema validation failed after 3 attempts" — The schema may be too strict or the input text doesn't contain the required information. Review the validation errors to see which fields are failing.
JSON parsing errors — If Claude consistently returns malformed JSON, try lowering the temperature (already set to 0.3) or simplifying the schema.
Infinite retry loop — Ensure the maxRetries limit is properly checked before retrying. The current implementation correctly checks $attempt >= $this->maxRetries before throwing exceptions.

Step 3: Common Data Extraction Schemas (~10 min)

Goal

Create a Schemas class with pre-built JSON schemas for common data extraction use cases to accelerate development.

Actions

Define schemas for person, product, event, article, and transaction data types
Include proper validation with required fields, types, formats, and constraints
Use nested objects for complex data structures like addresses and locations

php

<?php
# filename: src/Extraction/Schemas.php
declare(strict_types=1);

namespace App\Extraction;

class Schemas
{
    public static function person(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'first_name' => ['type' => 'string'],
                'last_name' => ['type' => 'string'],
                'email' => [
                    'type' => 'string',
                    'format' => 'email'
                ],
                'phone' => ['type' => 'string'],
                'company' => ['type' => 'string'],
                'title' => ['type' => 'string'],
                'address' => [
                    'type' => 'object',
                    'properties' => [
                        'street' => ['type' => 'string'],
                        'city' => ['type' => 'string'],
                        'state' => ['type' => 'string'],
                        'zip' => ['type' => 'string'],
                        'country' => ['type' => 'string']
                    ]
                ]
            ],
            'required' => ['first_name', 'last_name']
        ];
    }

    public static function product(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'name' => ['type' => 'string'],
                'sku' => ['type' => 'string'],
                'description' => ['type' => 'string'],
                'price' => [
                    'type' => 'number',
                    'minimum' => 0
                ],
                'currency' => [
                    'type' => 'string',
                    'default' => 'USD'
                ],
                'category' => ['type' => 'string'],
                'brand' => ['type' => 'string'],
                'in_stock' => ['type' => 'boolean'],
                'specifications' => [
                    'type' => 'object',
                    'additionalProperties' => ['type' => 'string']
                ],
                'tags' => [
                    'type' => 'array',
                    'items' => ['type' => 'string']
                ]
            ],
            'required' => ['name', 'price']
        ];
    }

    public static function event(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'title' => ['type' => 'string'],
                'description' => ['type' => 'string'],
                'start_date' => [
                    'type' => 'string',
                    'format' => 'date-time'
                ],
                'end_date' => [
                    'type' => 'string',
                    'format' => 'date-time'
                ],
                'location' => [
                    'type' => 'object',
                    'properties' => [
                        'name' => ['type' => 'string'],
                        'address' => ['type' => 'string'],
                        'city' => ['type' => 'string'],
                        'virtual' => ['type' => 'boolean']
                    ]
                ],
                'attendees' => [
                    'type' => 'array',
                    'items' => ['type' => 'string']
                ],
                'organizer' => ['type' => 'string']
            ],
            'required' => ['title', 'start_date']
        ];
    }

    public static function article(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'title' => ['type' => 'string'],
                'author' => ['type' => 'string'],
                'published_date' => [
                    'type' => 'string',
                    'format' => 'date'
                ],
                'summary' => [
                    'type' => 'string',
                    'maxLength' => 500
                ],
                'content' => ['type' => 'string'],
                'category' => ['type' => 'string'],
                'tags' => [
                    'type' => 'array',
                    'items' => ['type' => 'string']
                ],
                'reading_time_minutes' => [
                    'type' => 'integer',
                    'minimum' => 1
                ],
                'url' => [
                    'type' => 'string',
                    'format' => 'uri'
                ]
            ],
            'required' => ['title', 'content']
        ];
    }

    public static function transaction(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'transaction_id' => ['type' => 'string'],
                'date' => [
                    'type' => 'string',
                    'format' => 'date'
                ],
                'amount' => [
                    'type' => 'number',
                    'minimum' => 0
                ],
                'currency' => ['type' => 'string'],
                'type' => [
                    'type' => 'string',
                    'enum' => ['debit', 'credit', 'transfer']
                ],
                'description' => ['type' => 'string'],
                'merchant' => ['type' => 'string'],
                'category' => ['type' => 'string'],
                'status' => [
                    'type' => 'string',
                    'enum' => ['pending', 'completed', 'failed', 'cancelled']
                ]
            ],
            'required' => ['transaction_id', 'date', 'amount', 'type']
        ];
    }
}

Expected Result

You'll have a Schemas class with five ready-to-use schema methods:

person() — Contact information with address
product() — Product details with pricing and specifications
event() — Event information with dates and location
article() — Article metadata with content and tags
transaction() — Financial transaction data with status

Why It Works

Pre-built schemas follow JSON Schema standards with proper type definitions, format constraints (like email, date-time, uri), and required field specifications. Nested objects allow complex structures like addresses within person records. The schemas include sensible defaults (like currency: 'USD') and validation constraints (like minimum: 0 for prices) to ensure data quality.

Troubleshooting

Schema too restrictive — If extraction frequently fails, consider making more fields optional or removing strict format validations temporarily.
Missing fields — Ensure required fields match what's actually available in your source text. Claude can't extract information that isn't present.
Type mismatches — If Claude returns strings for numbers, add explicit type coercion or adjust the schema to accept strings and convert later.

Step 4: Advanced Extraction Pipeline (~10 min)

Goal

Build a complete example demonstrating the extraction pipeline with multiple real-world use cases using the pre-built schemas.

Actions

Extract products from a product listing text
Extract events from an email with meeting information
Extract transactions from a bank statement text

php

<?php
# filename: examples/02-extraction-pipeline.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;
use App\Extraction\SchemaExtractor;
use App\Extraction\Schemas;

$client = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

$extractor = new SchemaExtractor($client);

// Example 1: Extract products from description
echo "=== Extracting Products ===\n";

$productText = <<<TEXT
We have several great items in stock:

1. MacBook Pro 16" with M3 chip - \$2499
   Professional laptop with incredible performance
   In stock, free shipping

2. iPhone 15 Pro (256GB) - \$1099
   Latest flagship phone, available in all colors
   Category: Smartphones

3. AirPods Pro (2nd gen) - \$249
   Premium wireless earbuds with ANC
   Brand: Apple, Currently in stock
TEXT;

$products = $extractor->extractList(
    $productText,
    Schemas::product(),
    'Extract product information from this product listing'
);

echo json_encode($products, JSON_PRETTY_PRINT) . "\n\n";

// Example 2: Extract events from email
echo "=== Extracting Events ===\n";

$emailText = <<<TEXT
Hi team,

Quick reminder about our upcoming meetings:

Tech Review Meeting
March 15, 2025 at 2:00 PM - 3:30 PM
Conference Room A
Organizer: Sarah Johnson

Client Presentation
March 18, 2025 at 10:00 AM - 11:00 AM
Virtual (Zoom link to follow)
Organizer: Mike Chen
TEXT;

$events = $extractor->extractList(
    $emailText,
    Schemas::event(),
    'Extract all meeting/event information from this email'
);

echo json_encode($events, JSON_PRETTY_PRINT) . "\n\n";

// Example 3: Extract transactions from bank statement
echo "=== Extracting Transactions ===\n";

$statementText = <<<TEXT
Recent Transactions:

03/10/2025 - Amazon.com - \$87.43 - Online Shopping - Completed
03/11/2025 - Starbucks - \$5.67 - Food & Dining - Completed
03/12/2025 - Salary Deposit - \$3,500.00 - Income - Completed
03/13/2025 - Rent Payment - \$1,800.00 - Housing - Pending
TEXT;

$transactions = $extractor->extractList(
    $statementText,
    Schemas::transaction(),
    'Extract transaction information from this bank statement'
);

echo json_encode($transactions, JSON_PRETTY_PRINT) . "\n\n";

Expected Result

The script will extract structured data from three different text sources:

Products with names, prices, descriptions, and stock status
Events with dates, times, locations, and organizers
Transactions with amounts, dates, merchants, and status

Why It Works

The extractList() method wraps the item schema in a container schema that expects an items array. Claude processes the entire input text and extracts all matching items in a single API call, which is more efficient than processing each item separately. The context parameter helps Claude understand what type of information to look for, improving extraction accuracy.

Troubleshooting

Empty results — If extractList() returns an empty array, check that the input text actually contains the information matching the schema. Claude may not find any matching items.
Partial extraction — If only some items are extracted, the text may be ambiguous. Try providing more context or splitting the extraction into smaller chunks.
Schema mismatches — If extracted data doesn't match expected fields, review the schema and adjust field names or types to match what Claude is extracting.

Step 5: Custom Validators (~12 min)

Goal

Create a CustomValidator class that provides business logic validation beyond what JSON Schema can handle, and integrate it with the extraction pipeline.

Actions

Implement validation methods for email, phone, URL, date, range, enum, pattern, and length checks
Create a flexible validation system that accepts rules as arrays
Provide detailed error messages for each validation failure
Integrate with SchemaExtractor to add custom validation after schema validation

php

<?php
# filename: src/Extraction/CustomValidator.php
declare(strict_types=1);

namespace App\Extraction;

class CustomValidator
{
    private array $errors = [];

    public function validate(array $data, array $rules): bool
    {
        $this->errors = [];

        foreach ($rules as $field => $fieldRules) {
            $value = $data[$field] ?? null;

            foreach ($fieldRules as $rule => $params) {
                $method = 'validate' . ucfirst($rule);
                if (method_exists($this, $method)) {
                    $this->$method($field, $value, $params);
                }
            }
        }

        return empty($this->errors);
    }

    public function getErrors(): array
    {
        return $this->errors;
    }

    private function validateEmail(string $field, $value, $params): void
    {
        if ($value && !filter_var($value, FILTER_VALIDATE_EMAIL)) {
            $this->errors[$field][] = "Invalid email format";
        }
    }

    private function validatePhone(string $field, $value, $params): void
    {
        if ($value && !preg_match('/^\+?[\d\s\-\(\)]+$/', $value)) {
            $this->errors[$field][] = "Invalid phone format";
        }
    }

    private function validateUrl(string $field, $value, $params): void
    {
        if ($value && !filter_var($value, FILTER_VALIDATE_URL)) {
            $this->errors[$field][] = "Invalid URL format";
        }
    }

    private function validateDate(string $field, $value, $params): void
    {
        if ($value) {
            $date = \DateTime::createFromFormat($params['format'] ?? 'Y-m-d', $value);
            if (!$date || $date->format($params['format'] ?? 'Y-m-d') !== $value) {
                $this->errors[$field][] = "Invalid date format";
            }
        }
    }

    private function validateRange(string $field, $value, $params): void
    {
        if ($value !== null) {
            if (isset($params['min']) && $value < $params['min']) {
                $this->errors[$field][] = "Value below minimum {$params['min']}";
            }
            if (isset($params['max']) && $value > $params['max']) {
                $this->errors[$field][] = "Value above maximum {$params['max']}";
            }
        }
    }

    private function validateEnum(string $field, $value, $params): void
    {
        if ($value && !in_array($value, $params['values'])) {
            $this->errors[$field][] = "Value not in allowed list: " . implode(', ', $params['values']);
        }
    }

    private function validatePattern(string $field, $value, $params): void
    {
        if ($value && !preg_match($params['regex'], $value)) {
            $this->errors[$field][] = $params['message'] ?? "Value doesn't match required pattern";
        }
    }

    private function validateLength(string $field, $value, $params): void
    {
        if ($value) {
            $length = strlen($value);
            if (isset($params['min']) && $length < $params['min']) {
                $this->errors[$field][] = "Length below minimum {$params['min']}";
            }
            if (isset($params['max']) && $length > $params['max']) {
                $this->errors[$field][] = "Length above maximum {$params['max']}";
            }
        }
    }

    private function validateRequired(string $field, $value, $params): void
    {
        if ($params && ($value === null || $value === '')) {
            $this->errors[$field][] = "Field is required";
        }
    }
}

Integration Example

Here's how to use CustomValidator with SchemaExtractor:

php

<?php
# filename: examples/05-custom-validation-integration.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;
use App\Extraction\SchemaExtractor;
use App\Extraction\CustomValidator;
use App\Extraction\Schemas;

$client = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

$extractor = new SchemaExtractor($client);
$validator = new CustomValidator();

// Extract person data
$personText = "John Doe\njohn.doe@example.com\n+1-555-123-4567";
$person = $extractor->extract($personText, Schemas::person());

// Apply custom validation rules
$rules = [
    'email' => ['email' => true],
    'phone' => ['phone' => true],
    'first_name' => ['required' => true, 'length' => ['min' => 2, 'max' => 50]]
];

if (!$validator->validate($person, $rules)) {
    echo "Validation errors:\n";
    print_r($validator->getErrors());
} else {
    echo "Data is valid!\n";
    echo json_encode($person, JSON_PRETTY_PRINT) . "\n";
}

Expected Result

You'll have a CustomValidator class that can validate:

Email addresses using PHP's FILTER_VALIDATE_EMAIL
Phone numbers with regex pattern matching
URLs using PHP's FILTER_VALIDATE_URL
Dates in custom formats using DateTime::createFromFormat
Numeric ranges with min/max constraints
Enum values against allowed lists
Regex patterns for custom validation rules
String length constraints

Why It Works

The validator uses a rule-based system where each field can have multiple validation rules. The validate() method iterates through rules and calls corresponding validation methods dynamically using method_exists(). Each validation method checks the value and adds errors to the $errors array if validation fails. This allows combining JSON Schema validation (structural) with custom validation (business logic) for comprehensive data quality checks.

When to use CustomValidator vs JSON Schema:

JSON Schema: Use for structural validation (types, formats, required fields, nested structures)
CustomValidator: Use for business logic validation (complex rules, cross-field validation, domain-specific constraints)
Best Practice: Use JSON Schema first for structure, then CustomValidator for business rules

Troubleshooting

Validation not running — Ensure the rule name matches the method name (e.g., 'email' rule calls validateEmail() method). The ucfirst() function capitalizes the rule name to match the method naming convention.
False positives — Some validators (like phone) use regex that may be too strict. Adjust the regex pattern to match your data format requirements.
Date validation failures — Ensure the date format in $params['format'] matches the actual date format in your data. PHP's DateTime::createFromFormat is strict about format matching.

Step 6: Batch Extraction (~8 min)

Goal

Create a BatchExtractor class that processes multiple inputs efficiently, handling successes and errors gracefully.

Actions

Implement batch processing that extracts data from multiple inputs
Track successes and errors separately for each item
Provide summary statistics about the batch operation

php

<?php
# filename: examples/03-batch-extraction.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;
use App\Extraction\SchemaExtractor;
use App\Extraction\Schemas;

class BatchExtractor
{
    public function __construct(
        private SchemaExtractor $extractor
    ) {}

    public function extractBatch(array $inputs, array $schema, string $context = ''): array
    {
        $results = [];
        $errors = [];

        foreach ($inputs as $index => $input) {
            try {
                $results[$index] = $this->extractor->extract($input, $schema, $context);
            } catch (\Exception $e) {
                $errors[$index] = [
                    'error' => $e->getMessage(),
                    'input' => $input
                ];
            }
        }

        return [
            'results' => $results,
            'errors' => $errors,
            'success_count' => count($results),
            'error_count' => count($errors),
            'total' => count($inputs)
        ];
    }

    public function extractFromMultipleTexts(string $combinedText, array $itemSchema): array
    {
        // Let Claude extract all items in one go
        return $this->extractor->extractList($combinedText, $itemSchema);
    }
}

$client = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

$extractor = new SchemaExtractor($client);
$batchExtractor = new BatchExtractor($extractor);

// Example: Extract contacts from multiple business cards
$businessCards = [
    "John Smith\nCEO, TechCorp\njohn@techcorp.com\n555-1234",
    "Jane Doe\nCTO, StartupXYZ\njane.doe@startupxyz.com\n555-5678",
    "Bob Johnson\nVP Engineering, MegaSoft\nbob.j@megasoft.com\n555-9999"
];

$result = $batchExtractor->extractBatch(
    $businessCards,
    Schemas::person(),
    'Extract contact information from business card'
);

echo "Batch Extraction Results:\n";
echo "Success: {$result['success_count']}/{$result['total']}\n";
echo "Errors: {$result['error_count']}\n\n";

echo "Extracted Data:\n";
echo json_encode($result['results'], JSON_PRETTY_PRINT) . "\n";

Expected Result

The batch extractor will process all business cards and return:

A results array with successfully extracted contact information
An errors array with any failures (indexed by input position)
Summary statistics: success_count, error_count, and total

Why It Works

The extractBatch() method processes each input independently, catching exceptions for individual items so one failure doesn't stop the entire batch. This allows partial success scenarios where some items extract successfully while others fail. The method returns both results and errors, giving you complete visibility into the batch operation. The extractFromMultipleTexts() method processes all items in a single API call, which is more efficient when you have many items in one text source.

Troubleshooting

All items failing — If every item in a batch fails, check that the schema matches the input format. A single schema mismatch will cause all items to fail.
Memory issues — Processing very large batches may consume significant memory. Consider processing in chunks or using streaming (Step 7) for large datasets.
Rate limiting — Batch processing makes multiple API calls quickly. Implement rate limiting or delays between batches if you hit API rate limits.

Step 7: Streaming Structured Output (~8 min)

Goal

Create a StreamingExtractor class that processes structured output as it streams from Claude, useful for large responses.

Actions

Use dependency injection to pass the client to the constructor
Use the streaming API to receive data incrementally
Buffer the streamed content until complete
Parse JSON from the complete buffer with proper error handling

php

<?php
# filename: examples/04-streaming-structured.php
declare(strict_types=1);

require __DIR__ . '/../vendor/autoload.php';

use Anthropic\Anthropic;
use App\Extraction\Schemas;

$client = Anthropic::factory()
    ->withApiKey(getenv('ANTHROPIC_API_KEY'))
    ->make();

// Example usage
$inputText = "John Doe\njohn.doe@example.com\n+1-555-123-4567";
$extractor = new StreamingExtractor($client);
$schema = Schemas::person();
$result = $extractor->extractWithStreaming($inputText, $schema);
echo json_encode($result, JSON_PRETTY_PRINT) . "\n";

class StreamingExtractor
{
    private string $buffer = '';

    public function __construct(
        private Anthropic $client
    ) {}

    public function extractWithStreaming(string $input, array $schema): array
    {
        $schemaJson = json_encode($schema, JSON_PRETTY_PRINT);

        $prompt = <<<PROMPT
Extract data from this input as JSON matching this schema:

{$schemaJson}

Input:
{$input}

Return only valid JSON.
PROMPT;

        $stream = $this->client->messages()->createStreamed([
            'model' => 'claude-sonnet-4-20250514',
            'max_tokens' => 4096,
            'messages' => [
                ['role' => 'user', 'content' => $prompt]
            ]
        ]);

        $this->buffer = '';

        foreach ($stream as $event) {
            if ($event->type === 'content_block_delta') {
                if (isset($event->delta->text)) {
                    $this->buffer .= $event->delta->text;
                    echo "."; // Progress indicator
                }
            }
        }

        echo "\n";

        return $this->parseJSON($this->buffer);
    }

    private function parseJSON(string $text): array
    {
        // Try to extract JSON from markdown code blocks
        if (preg_match('/```json\s*(\{.*\})\s*```/s', $text, $matches)) {
            $text = $matches[1];
        } elseif (preg_match('/(\{.*\})/s', $text, $matches)) {
            $text = $matches[1];
        }

        $data = json_decode($text, true);

        if (json_last_error() !== JSON_ERROR_NONE) {
            throw new \RuntimeException('Invalid JSON in stream: ' . json_last_error_msg());
        }

        if (!is_array($data)) {
            throw new \RuntimeException('Expected array or object, got: ' . gettype($data));
        }

        return $data;
    }
}

Expected Result

The streaming extractor will process the response as it arrives, showing progress dots (.) for each chunk received, then parse and return the complete JSON structure.

Why It Works

The createStreamed() method returns an iterable stream of events. We iterate through events looking for content_block_delta events that contain text chunks. Each chunk is appended to a buffer. Once streaming completes, we parse the complete buffer as JSON. Streaming is useful for large responses because you can start processing data as it arrives rather than waiting for the complete response, and it provides visual feedback to users.

Troubleshooting

Incomplete JSON — If streaming stops before JSON is complete, the buffer may contain partial JSON that fails to parse. Ensure the stream completes fully before parsing. Check that max_tokens is sufficient for your response size.
No progress indicators — The dots (.) should appear as data streams. If they don't, check that the stream is actually streaming and not buffered. Verify the event type is content_block_delta.
Memory usage — For very large streams, consider processing JSON incrementally if possible, though this is complex with nested structures.
Constructor error — The StreamingExtractor now requires the client in the constructor. Make sure to instantiate it with new StreamingExtractor($client).
Error: "Expected array or object" — The parsed JSON must be an object or array. If Claude returns a primitive value, adjust your schema or prompt to request an object wrapper.

When to Use Structured Outputs vs Tool Use

Understanding when to use structured outputs versus tool use (from Chapter 11) is important:

Use Structured Outputs When:

Extracting data from text (documents, emails, user input)
Parsing unstructured information into structured formats
Converting text to JSON/arrays for storage or processing
One-way data extraction (no action needed)

Use Tool Use When:

Claude needs to perform actions (database queries, API calls, file operations)
Multi-step workflows requiring external system interaction
Dynamic decision-making based on tool results
Two-way communication (Claude requests → you execute → Claude responds)

Example Comparison:

php

// Structured Output: Extract data from text
$contact = $extractor->extract($businessCard, Schemas::person());
// Result: Array with contact information

// Tool Use: Query database based on extracted data
$tools = [
    [
        'name' => 'search_customer',
        'description' => 'Search for customer in database',
        'input_schema' => [
            'type' => 'object',
            'properties' => [
                'email' => ['type' => 'string']
            ]
        ]
    ]
];
// Claude can call this tool to look up the customer

Best Practice: Use structured outputs to extract data, then use tool use to act on that data.

Best Practices

1. Dependency Injection

Always pass dependencies (like the Anthropic client) through constructors or method parameters rather than using global variables:

php

// ✓ Good: Dependency injection
class SchemaExtractor
{
    public function __construct(private Anthropic $client) {}
}

// ❌ Bad: Global variable
function extract() {
    global $client; // Avoid this
}

2. Schema Design

php

// ✓ Good: Clear, specific schema
$schema = [
    'type' => 'object',
    'properties' => [
        'price' => [
            'type' => 'number',
            'minimum' => 0,
            'description' => 'Price in USD'
        ],
        'date' => [
            'type' => 'string',
            'pattern' => '^\d{4}-\d{2}-\d{2}$',
            'description' => 'Date in YYYY-MM-DD format'
        ]
    ],
    'required' => ['price', 'date']
];

// ❌ Bad: Vague, loose schema
$schema = [
    'type' => 'object',
    'properties' => [
        'data' => ['type' => 'string']
    ]
];

3. Error Recovery

php

// Always implement retry logic
try {
    $data = $extractor->extract($input, $schema);
} catch (\Exception $e) {
    // Log error
    error_log("Extraction failed: " . $e->getMessage());

    // Fallback strategy
    $data = $this->manualExtraction($input);
}

4. Validation Layers

php

// Layer 1: JSON Schema validation (structural)
$validation = $extractor->validateSchema($data, $schema);
if (!$validation['valid']) {
    // Handle schema validation errors
}

// Layer 2: Custom business logic validation
$customValidator = new CustomValidator();
$rules = [
    'email' => ['email' => true],
    'price' => ['range' => ['min' => 0, 'max' => 10000]]
];
if (!$customValidator->validate($data, $rules)) {
    // Handle custom validation errors
    $errors = $customValidator->getErrors();
}

// Layer 3: Data sanitization
$sanitized = $this->sanitizeData($data);

5. JSON Parsing Robustness

Always handle edge cases when parsing JSON from Claude responses:

php

// ✓ Good: Multiple extraction strategies with error handling
private function parseJSON(string $text): array
{
    // Try markdown code blocks first
    if (preg_match('/```json\s*(\{.*\})\s*```/s', $text, $matches)) {
        $text = $matches[1];
    } elseif (preg_match('/(\{.*\})/s', $text, $matches)) {
        $text = $matches[1];
    }

    $data = json_decode($text, true);
    
    if (json_last_error() !== JSON_ERROR_NONE) {
        throw new \RuntimeException('Invalid JSON: ' . json_last_error_msg());
    }
    
    if (!is_array($data)) {
        throw new \RuntimeException('Expected array or object');
    }
    
    return $data;
}

// ❌ Bad: No error handling
private function parseJSON(string $text): array
{
    return json_decode($text, true) ?? []; // Silent failure
}

6. Integration with Error Handling

For production systems, integrate with error handling patterns from Chapter 10:

php

use App\Claude\ResilientClaudeClient;
use App\Claude\ExponentialBackoff;

class ProductionExtractor extends SchemaExtractor
{
    public function __construct(
        private ResilientClaudeClient $resilientClient,
        private ExponentialBackoff $backoff
    ) {
        parent::__construct($resilientClient->getClient());
    }

    public function extract(string $input, array $schema, string $context = ''): array
    {
        return $this->backoff->execute(function () use ($input, $schema, $context) {
            return parent::extract($input, $schema, $context);
        });
    }
}

7. Caching Strategies

Cache extraction results for repeated inputs to reduce API costs:

php

class CachedExtractor extends SchemaExtractor
{
    public function __construct(
        Anthropic $client,
        private \Psr\SimpleCache\CacheInterface $cache,
        private int $ttl = 3600
    ) {
        parent::__construct($client);
    }

    public function extract(string $input, array $schema, string $context = ''): array
    {
        $cacheKey = 'extract:' . md5($input . serialize($schema) . $context);
        
        if ($this->cache->has($cacheKey)) {
            return $this->cache->get($cacheKey);
        }

        $result = parent::extract($input, $schema, $context);
        $this->cache->set($cacheKey, $result, $this->ttl);
        
        return $result;
    }
}

8. Schema Versioning

Version your schemas to handle changes over time:

php

class VersionedSchema
{
    public static function person(int $version = 2): array
    {
        return match($version) {
            1 => self::personV1(),
            2 => self::personV2(),
            default => throw new \InvalidArgumentException("Unknown schema version: {$version}")
        };
    }

    private static function personV1(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'name' => ['type' => 'string'],
                'email' => ['type' => 'string']
            ]
        ];
    }

    private static function personV2(): array
    {
        return [
            'type' => 'object',
            'properties' => [
                'first_name' => ['type' => 'string'],
                'last_name' => ['type' => 'string'],
                'email' => ['type' => 'string', 'format' => 'email']
            ],
            'required' => ['first_name', 'last_name']
        ];
    }
}

Exercises

Exercise 1: Extract Invoice Data

Goal: Create a schema and extraction function for invoice data.

Create a new schema method invoice() in the Schemas class that includes:

Invoice number (required string)
Date (required date)
Due date (date)
Customer information (object with name, email, address)
Line items (array of objects with description, quantity, unit_price, total)
Subtotal, tax, and total amounts (numbers)
Status (enum: 'draft', 'sent', 'paid', 'overdue')

Then create an extraction example that processes invoice text and extracts this structured data.

Validation: Test with sample invoice text and verify all required fields are extracted correctly.

Exercise 2: Add Custom Validation Rules

Goal: Extend the CustomValidator class with new validation rules.

Add validation methods for:

validateCreditCard() — Validates credit card numbers using Luhn algorithm
validatePostalCode() — Validates US ZIP codes (5 digits or 5+4 format)
validateAge() — Validates age is between 0 and 150

Create a test that validates person data using these new rules.

Validation: Ensure invalid data is caught and valid data passes all checks.

Wrap-up

Congratulations! You've completed Chapter 15: Structured Outputs with JSON. Here's what you've accomplished:

✓ Built a complete structured data extraction system with retry logic
✓ Created reusable schema definitions for common data types
✓ Implemented multi-layer validation (JSON Schema + custom validators)
✓ Built batch processing capabilities for efficient multi-item extraction
✓ Added streaming support for large responses
✓ Learned error recovery techniques and fallback strategies
✓ Understood how to design production-ready extraction pipelines

You now have the knowledge and tools to extract reliable, structured data from Claude responses in your PHP applications. These techniques are essential for building production applications that depend on AI-generated structured data.

Key Takeaways

✓ Use native response_format for Sonnet 4.5+ models to guarantee schema compliance
✓ Fallback to prompt-based extraction for older models or when native format isn't supported
✓ Use explicit JSON schemas to define expected output structure
✓ Implement retry logic with error feedback for robust extraction
✓ Validate outputs using both JSON Schema and custom validators
✓ Extract JSON from markdown code blocks automatically when using prompt-based method
✓ Lower temperature (0.3-0.5) produces more consistent structured outputs
✓ Pre-built schemas accelerate common extraction tasks
✓ Batch processing improves efficiency for multiple items
✓ Always handle JSON parsing errors gracefully
✓ Include descriptions in schemas to improve extraction accuracy
✓ Use structured outputs for data extraction, tool use for actions
✓ Cache extraction results to reduce API costs for repeated inputs
✓ Version schemas to handle changes over time
✓ Test extraction with edge cases and malformed inputs

You've mastered structured outputs with Claude!Check the box when you've finished reading, or scroll to the bottom to auto-complete.

💻 Code Samples

All code examples from this chapter are available in the GitHub repository:

View Chapter 15 Code Samples

Clone and run locally:

bash

git clone https://github.com/dalehurley/codewithphp.git
cd codewithphp/code/claude-php/chapter-15
composer install
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
php examples/01-basic-structured-output.php

Chapter 15: Structured Outputs with JSON ​

Overview ​

Prerequisites ​

What You'll Build ​

Objectives ​

Install Validation Library ​

Step 1: Basic Structured Output (~10 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 2: Schema-Based Extraction Class (~15 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 3: Common Data Extraction Schemas (~10 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 4: Advanced Extraction Pipeline (~10 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 5: Custom Validators (~12 min) ​

Goal ​

Actions ​

Integration Example ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 6: Batch Extraction (~8 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

Step 7: Streaming Structured Output (~8 min) ​

Goal ​

Actions ​

Expected Result ​

Why It Works ​

Troubleshooting ​

When to Use Structured Outputs vs Tool Use ​

Best Practices ​

1. Dependency Injection ​

2. Schema Design ​

3. Error Recovery ​

4. Validation Layers ​

5. JSON Parsing Robustness ​

6. Integration with Error Handling ​

7. Caching Strategies ​

8. Schema Versioning ​

Exercises ​

Exercise 1: Extract Invoice Data ​

Exercise 2: Add Custom Validation Rules ​

Wrap-up ​

Further Reading ​

Key Takeaways ​

💻 Code Samples ​

Chapter 15: Structured Outputs with JSON

Overview

Prerequisites

What You'll Build

Objectives

Install Validation Library

Step 1: Basic Structured Output (~10 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

Step 2: Schema-Based Extraction Class (~15 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

Step 3: Common Data Extraction Schemas (~10 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

Step 4: Advanced Extraction Pipeline (~10 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

Step 5: Custom Validators (~12 min)

Goal

Actions

Integration Example

Expected Result

Why It Works

Troubleshooting

Step 6: Batch Extraction (~8 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

Step 7: Streaming Structured Output (~8 min)

Goal

Actions

Expected Result

Why It Works

Troubleshooting

When to Use Structured Outputs vs Tool Use

Best Practices

1. Dependency Injection

2. Schema Design

3. Error Recovery

4. Validation Layers

5. JSON Parsing Robustness

6. Integration with Error Handling

7. Caching Strategies

8. Schema Versioning

Exercises

Exercise 1: Extract Invoice Data

Exercise 2: Add Custom Validation Rules

Wrap-up

Further Reading

Key Takeaways

💻 Code Samples