Architectural Description – Ea-2-Sa Local + Vectorized AI Implementation

Domain Context Icon

1️⃣ Purpose and Scope

This architecture describes how the Ea-2-Sa Platform operationalizes local AI inference with vectorized memory using a hybrid deployment pattern. It translates the conceptual framework from Part 1 into a working reference implementation unifying:

The solution demonstrates how architectural intent becomes executable code — a cornerstone of Ea-2-Sa’s Architecture-as-Code philosophy.

2️⃣ Architectural Context

Actor / RoleDescription
Developer / ArchitectConfigures and deploys the local AI environment and consumes API endpoints.
Ollama RuntimeLocal inference engine hosting open models such as Mistral 7B or Llama 3.
Pinecone ServiceManaged vector database storing contextual embeddings for RAG.
Vector Proxy ServiceNode.js microservice bridging Ollama and Pinecone for embedding and context fusion.
Kong GatewayProvides ingress layer for service discovery, auth, and observability.
Redis CacheCaches embedding and retrieval results for low-latency responses.

3️⃣ Architectural Viewpoints

A. Logical View

User → Kong Gateway → Vector Proxy → Ollama Runtime → Pinecone Service
                              ↘︎ Redis Cache (optional)

The Vector Proxy orchestrates requests between the local model and vector store, creating a lightweight RAG workflow. Kong enforces access control and telemetry across API interactions.

B. Deployment View

C. Data Flow

  1. User submits a prompt via /api/ai/infer.
  2. Kong validates API Key / JWT and routes to Vector Proxy.
  3. Proxy embeds the input and queries Pinecone for context vectors.
  4. Combined context + prompt sent to Ollama for local inference.
  5. Result cached in Redis and returned to the client.
  6. Metrics emitted via OpenTelemetry → Prometheus.

4️⃣ Technology View

LayerTechnologyFunction
Compute / ModelOllama RuntimeExecutes open-source LLMs locally.
Memory / KnowledgePinecone IndexStores embeddings and retrieves context vectors.
Integration / ControlKong + Vector ProxyOrchestrates traffic, policy, and embedding logic.
ObservabilityOpenTelemetry + PrometheusCollects metrics and exposes health.
Storage / CacheRedisProvides low-latency recall of frequent queries.

5️⃣ Security & Governance

6️⃣ Quality Attributes

AttributeDesign Mechanism
ScalabilityHorizontal container scaling or additional Pinecone pods.
ResilienceHealthchecks + Redis cache handle temporary Pinecone outages.
PortabilityConsistent runtime across dev/staging/prod with Docker Compose.
ObservabilityOpenTelemetry metrics exposed to Prometheus.
SecurityJWT tokens + TLS termination at Kong.
MaintainabilityModular microservices + declarative YAML configurations.

7️⃣ Alignment with TOGAF & SAFe

FrameworkCorresponding Element
TOGAF – Application LayerOllama Runtime & Vector Proxy
TOGAF – Data LayerPinecone Indexes & Redis Cache
TOGAF – Technology LayerDocker Network + Kong Gateway
SAFe – Portfolio LevelStrategic Theme: AI Enablement
SAFe – Program LevelFeatures: AI Pipeline Deployment, AI Governance Integration
SAFe – Team LevelUser Stories for container config, API testing, metric setup

8️⃣ Architectural Summary

This architecture embodies Ea-2-Sa’s principle of architectural traceability — every container, route, and configuration maps to a business capability. Local AI inference ensures sovereignty and cost efficiency; vectorized memory enables contextual reasoning; and declarative infrastructure delivers repeatable, governable deployment.

Outcome: A reproducible, secure, and observable Local + Vectorized AI environment aligned with enterprise architecture standards.

Next in this series → Part 3 – AI Governance & Compliance Automation.

📘 Download the Article