Safkat Nirjash
Writing/MLOps & Agent Evaluation
Use Case5 min read

The data is there. Getting to it shouldn't require an engineer: a framework for AI-powered operational self-service

When operational data sits behind engineering-only query interfaces, business users build up a hidden backlog of data requests. Here is how to add natural language query to a platform without touching production.

Safkat Nirjash·

The hidden cost of data request backlogs

Every operational platform generates data. Transaction records, balance summaries, usage analytics, reconciliation figures. The data exists, and it's genuinely valuable to the people running the business.

The problem is getting at it.

On most platforms, the data is reachable by engineers who can write SQL, run queries, and read raw output. For everyone else, the operators, partners, and clients who need the data to make a decision, the only path is a ticket to engineering. "Can you pull last month's redemption totals by partner?" "What's my current balance?" "Which transactions are pending settlement?"

Each request is small. Together they add up to a recurring engineering tax: hours spent fetching data that have nothing to do with building or improving the platform. And it compounds for as long as the system runs.

The AI sidecar pattern fixes this directly.


The pattern: a read-only AI layer over operational data

At its simplest, the pattern is a separate read-only system that sits next to the production platform and answers natural language questions about operational data.

graph TD
    subgraph PRODUCTION["Production Platform"]
        SOURCE_DB[Source Database]
        BACKEND[Application Backend]
    end

    subgraph SIDECAR["AI Query Sidecar (read-only)"]
        ETL[ETL Sync - scheduled]
        REPLICA[Read Replica Database]
        ROUTER[Intent Classifier - LLM]
        SQL_AGENT[SQL Agent]
        RAG_AGENT[RAG Agent - knowledge base]
        GUARDRAILS[Security Layer]
    end

    subgraph USERS["Business Users"]
        PROVIDER[Partner / Provider]
        MERCHANT[Operator / Merchant]
        ADMIN[Platform Admin]
    end

    SOURCE_DB -->|sync| ETL
    ETL --> REPLICA
    PROVIDER -->|natural language query| ROUTER
    MERCHANT -->|natural language query| ROUTER
    ADMIN -->|natural language query| ROUTER
    ROUTER --> SQL_AGENT
    ROUTER --> RAG_AGENT
    SQL_AGENT --> REPLICA
    SQL_AGENT --> GUARDRAILS
    RAG_AGENT --> GUARDRAILS
    GUARDRAILS --> PROVIDER
    GUARDRAILS --> MERCHANT
    GUARDRAILS --> ADMIN

The sidecar has no write path to the production system. It can't modify data. It can't create side effects. The worst a bug in the sidecar can do is return a wrong answer, not corrupt data or break a transaction. That's not an accident of the design. It's the whole point: it's what makes it safe to iterate quickly on the AI layer without putting production at risk.


The five components

1. ETL sync

The sidecar maintains its own read replica, synchronized from the production database on a defined schedule. The sync is incremental - only changed records are transferred. The replica is the only database the AI layer touches.

The sync interval defines the data freshness the system can offer. For most operational queries, a sync interval of minutes is sufficient. For real-time balance queries, the sync interval needs to match the latency tolerance.

2. Intent classification

Before running any query, the system classifies the user's natural language input into an intent. The intent decides which query template applies and which data scope the requesting user's role is allowed to see.

An LLM does the classification, mapping free-form language to a structured intent taxonomy. That handles phrasing variation a rule-based classifier would choke on. "What did I earn this week?" and "Show me my settlement for the last seven days" land on the same intent.

3. Role-based data isolation

The intent classification result is evaluated against the requesting user's role before any query is executed. Each role can access only a defined set of intents, and the queries generated for that intent are automatically scoped to the user's own data.

Role         Accessible Data
----------------------------------
Provider     Own gift cards, redemptions, balances
Merchant     Own outlet transactions, settlements, dues
Admin        Full platform - all tenants

A Provider asking "what's the platform GMV this month" gets a sensible answer for their role, not an error, but an answer scoped to their own volume. An Admin asking the same question gets platform-wide figures.

That's not only authorization. It's a UX call. Each role gets answers that mean something for their position, not a filtered slice of one universal query.

4. Query execution with guardrails

The SQL agent turns the classified intent into a database query, runs it against the read replica, and hands the result to the response generator.

The guardrails layer wraps that whole process. Input validation blocks prompt injection and SQL injection patterns before the intent classifier ever sees them. Output scrubbing masks PII, meaning email addresses, phone numbers, and personal identifiers, before a response leaves the sidecar. Rate limiting caps requests per user so nobody can turn the AI layer into an uncontrolled query engine.

5. Response generation

The raw query result goes to an LLM with instructions to write a natural language response. The user gets an answer in the same shape as the question, a plain-language summary of what the data shows, not a table of database rows.

For large result sets, server-sent events let the response start rendering before the full query finishes.


What this enables

A few things become possible once the sidecar is in place.

Non-technical users can serve themselves. A partner can ask "which of my cards are expiring this week?" at 9pm and get an answer right away, no engineering ticket.

The engineering queue shrinks. The "can you pull the data for X" category of work leaves the backlog, and engineering capacity moves to work that genuinely needs engineering.

Each role gets operational visibility that fits it. Partners, outlet operators, and admins all have different questions, and the sidecar answers all of them from one interface.

Decisions get faster. When the data is available on demand, a decision that used to wait on a retrieval cycle can be made on the spot.


What this does not do

The pattern has real limits, and naming them keeps the scope honest.

It doesn't replace analytics. The sidecar answers operational questions about specific entities, transactions, and balances. It is not a data warehouse. Trend analysis, cross-cohort segmentation, and forecasting belong in a dedicated analytics layer.

It doesn't handle writes. That's by design. The read-only constraint is the whole reason the sidecar is safe.

Data freshness is capped by the sync interval. Real-time accuracy needs a sync interval that matches the use case. Some questions, like "is this specific card active right now?", may mean accepting a small delay or routing to a live API call.

LLM-generated SQL needs validation. The SQL agent writes queries, and generated queries have to be checked before they run. Templates scoped to known intent patterns are safer than fully open-ended generation.


When to build this

The sidecar pattern is the right call when business users depend on engineering for routine data retrieval, when you can't modify the production system to add a query layer directly, when multi-tenant isolation makes open query access unsafe without a controlled interface, and when the operational questions are bounded enough to classify into a defined intent taxonomy.

It's the wrong call when the questions are too open-ended to classify, when every query needs sub-second real-time data, or when the data model is too tangled for LLM-generated SQL to be reliable without heavy validation.

The pattern pays off most on platforms with several user roles, each with its own bounded data needs and its own set of recurring questions.

Work with me

Ready to discuss your architecture?

I work with founders and engineering leaders as a Fractional CTO to translate business goals into technical strategy - and execute on them. Free 30-minute Technical Health Check to start.

Book a call