This document was generated using AI and has yet to be human reviewed

Data Schemas

Integration Layer — Formal definitions of data structures


Definition

Data schemas are formal, machine-readable definitions of data structures used throughout a system. They specify the shape, types, constraints, and relationships of data objects—enabling validation, code generation, and interoperability.

Relationship to the Specification

The specification defines what data a system processes and what constraints that data must satisfy. Data schemas encode those definitions in a format that tools and implementations can consume directly.

Specification DefinesSchema Provides
”User ID shall be a 128-bit UUID”userId: { type: "string", format: "uuid" }
”Message content limited to 4096 bytes”content: { type: "string", maxLength: 4096 }
”Timestamp in Unix milliseconds”timestamp: { type: "integer", minimum: 0 }
”Status must be one of: pending, active, closed”status: { enum: ["pending", "active", "closed"] }

Core Schema Formats

  • JSON Schema: For JSON data validation and documentation
  • Protocol Buffers: For efficient binary serialization
  • OpenAPI/AsyncAPI: For API request/response definitions
  • GraphQL SDL: For GraphQL type systems
  • Avro/Thrift: For cross-language data serialization
  • SQL DDL: For relational database structures

Dependency Chain

Specification
    ↓
Data Schemas ← formal encoding of data requirements
    ↓
API Documentation ← uses schemas for request/response
    ↓
Protocol Docs ← references schemas for message payloads
    ↓
Code & Comments ← generates types from schemas
    ↓
Test Suites ← validates data against schemas

Why Spec-Grounding Matters

Schemas that drift from specifications cause systemic problems:

  • Validation gaps: Constraints in the spec aren’t enforced because they’re missing from schemas
  • Incompatible implementations: Different services interpret data differently
  • Silent failures: Data passes validation but violates specification intent
  • Documentation mismatch: Human-readable docs describe different structures than schemas define

Specification-Grounded Schema Example

Consider a specification excerpt:

SPEC-3.2.1: A Message object shall contain:

  • id: A unique identifier (UUIDv4, required)
  • sender: The author’s public key (32 bytes, required)
  • content: The message body (string, max 4096 characters, required)
  • timestamp: Creation time (Unix milliseconds, required)
  • signature: Ed25519 signature over id||content||timestamp (64 bytes, required)

The corresponding JSON Schema:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/schemas/message.json",
  "title": "Message",
  "description": "Implements SPEC-3.2.1 (Message Object)",
  "type": "object",
  "required": ["id", "sender", "content", "timestamp", "signature"],
  "properties": {
    "id": {
      "type": "string",
      "format": "uuid",
      "description": "SPEC-3.2.1: Unique identifier (UUIDv4)"
    },
    "sender": {
      "type": "string",
      "pattern": "^[a-fA-F0-9]{64}$",
      "description": "SPEC-3.2.1: Author's public key (32 bytes, hex-encoded)"
    },
    "content": {
      "type": "string",
      "maxLength": 4096,
      "description": "SPEC-3.2.1: Message body (max 4096 characters)"
    },
    "timestamp": {
      "type": "integer",
      "minimum": 0,
      "description": "SPEC-3.2.1: Creation time (Unix milliseconds)"
    },
    "signature": {
      "type": "string",
      "pattern": "^[a-fA-F0-9]{128}$",
      "description": "SPEC-3.2.1: Ed25519 signature (64 bytes, hex-encoded)"
    }
  },
  "additionalProperties": false
}

Schema Evolution

When specifications change, schemas must evolve. Good practices:

  • Version schemas explicitly (semantic versioning preferred)
  • Document breaking vs. non-breaking changes
  • Provide migration guides when schemas change
  • Maintain backwards compatibility where spec allows
  • Include specification version in schema metadata

Best Practices

  • Generate schemas from specification definitions where possible
  • Include specification references in schema descriptions
  • Use schema validation in CI/CD pipelines
  • Generate code (types, classes) from schemas to ensure consistency
  • Test schemas against specification-derived test vectors
  • Publish schemas alongside API documentation