The data is there. Getting to it shouldn't require an engineer: a framework for AI-powered operational self-service
When operational data sits behind engineering-only query interfaces, business users build up a hidden backlog of data requests. Here is how to add natural language query to a platform without touching production.
The hidden cost of data request backlogs
Every operational platform generates data. Transaction records, balance summaries, usage analytics, reconciliation figures. The data exists, and it's genuinely valuable to the people running the business.
The problem is getting at it.
On most platforms, the data is reachable by engineers who can write SQL, run queries, and read raw output. For everyone else, the operators, partners, and clients who need the data to make a decision, the only path is a ticket to engineering. "Can you pull last month's redemption totals by partner?" "What's my current balance?" "Which transactions are pending settlement?"
Each request is small. Together they add up to a recurring engineering tax: hours spent fetching data that have nothing to do with building or improving the platform. And it compounds for as long as the system runs.
The AI sidecar pattern fixes this directly.
The pattern: a read-only AI layer over operational data
At its simplest, the pattern is a separate read-only system that sits next to the production platform and answers natural language questions about operational data.
graph TD
subgraph PRODUCTION["Production Platform"]
SOURCE_DB[Source Database]
BACKEND[Application Backend]
end
subgraph SIDECAR["AI Query Sidecar (read-only)"]
ETL[ETL Sync - scheduled]
REPLICA[Read Replica Database]
ROUTER[Intent Classifier - LLM]
SQL_AGENT[SQL Agent]
RAG_AGENT[RAG Agent - knowledge base]
GUARDRAILS[Security Layer]
end
subgraph USERS["Business Users"]
PROVIDER[Partner / Provider]
MERCHANT[Operator / Merchant]
ADMIN[Platform Admin]
end
SOURCE_DB -->|sync| ETL
ETL --> REPLICA
PROVIDER -->|natural language query| ROUTER
MERCHANT -->|natural language query| ROUTER
ADMIN -->|natural language query| ROUTER
ROUTER --> SQL_AGENT
ROUTER --> RAG_AGENT
SQL_AGENT --> REPLICA
SQL_AGENT --> GUARDRAILS
RAG_AGENT --> GUARDRAILS
GUARDRAILS --> PROVIDER
GUARDRAILS --> MERCHANT
GUARDRAILS --> ADMIN
The sidecar has no write path to the production system. It can't modify data. It can't create side effects. The worst a bug in the sidecar can do is return a wrong answer, not corrupt data or break a transaction. That's not an accident of the design. It's the whole point: it's what makes it safe to iterate quickly on the AI layer without putting production at risk.
The five components
1. ETL sync
The sidecar maintains its own read replica, synchronized from the production database on a defined schedule. The sync is incremental - only changed records are transferred. The replica is the only database the AI layer touches.
The sync interval defines the data freshness the system can offer. For most operational queries, a sync interval of minutes is sufficient. For real-time balance queries, the sync interval needs to match the latency tolerance.
2. Intent classification
Before running any query, the system classifies the user's natural language input into an intent. The intent decides which query template applies and which data scope the requesting user's role is allowed to see.
An LLM does the classification, mapping free-form language to a structured intent taxonomy. That handles phrasing variation a rule-based classifier would choke on. "What did I earn this week?" and "Show me my settlement for the last seven days" land on the same intent.
3. Role-based data isolation
The intent classification result is evaluated against the requesting user's role before any query is executed. Each role can access only a defined set of intents, and the queries generated for that intent are automatically scoped to the user's own data.
Role Accessible Data
----------------------------------
Provider Own gift cards, redemptions, balances
Merchant Own outlet transactions, settlements, dues
Admin Full platform - all tenants
A Provider asking "what's the platform GMV this month" gets a sensible answer for their role, not an error, but an answer scoped to their own volume. An Admin asking the same question gets platform-wide figures.
That's not only authorization. It's a UX call. Each role gets answers that mean something for their position, not a filtered slice of one universal query.
4. Query execution with guardrails
The SQL agent turns the classified intent into a database query, runs it against the read replica, and hands the result to the response generator.
The guardrails layer wraps that whole process. Input validation blocks prompt injection and SQL injection patterns before the intent classifier ever sees them. Output scrubbing masks PII, meaning email addresses, phone numbers, and personal identifiers, before a response leaves the sidecar. Rate limiting caps requests per user so nobody can turn the AI layer into an uncontrolled query engine.
5. Response generation
The raw query result goes to an LLM with instructions to write a natural language response. The user gets an answer in the same shape as the question, a plain-language summary of what the data shows, not a table of database rows.
For large result sets, server-sent events let the response start rendering before the full query finishes.
What this enables
A few things become possible once the sidecar is in place.
Non-technical users can serve themselves. A partner can ask "which of my cards are expiring this week?" at 9pm and get an answer right away, no engineering ticket.
The engineering queue shrinks. The "can you pull the data for X" category of work leaves the backlog, and engineering capacity moves to work that genuinely needs engineering.
Each role gets operational visibility that fits it. Partners, outlet operators, and admins all have different questions, and the sidecar answers all of them from one interface.
Decisions get faster. When the data is available on demand, a decision that used to wait on a retrieval cycle can be made on the spot.
What this does not do
The pattern has real limits, and naming them keeps the scope honest.
It doesn't replace analytics. The sidecar answers operational questions about specific entities, transactions, and balances. It is not a data warehouse. Trend analysis, cross-cohort segmentation, and forecasting belong in a dedicated analytics layer.
It doesn't handle writes. That's by design. The read-only constraint is the whole reason the sidecar is safe.
Data freshness is capped by the sync interval. Real-time accuracy needs a sync interval that matches the use case. Some questions, like "is this specific card active right now?", may mean accepting a small delay or routing to a live API call.
LLM-generated SQL needs validation. The SQL agent writes queries, and generated queries have to be checked before they run. Templates scoped to known intent patterns are safer than fully open-ended generation.
When to build this
The sidecar pattern is the right call when business users depend on engineering for routine data retrieval, when you can't modify the production system to add a query layer directly, when multi-tenant isolation makes open query access unsafe without a controlled interface, and when the operational questions are bounded enough to classify into a defined intent taxonomy.
It's the wrong call when the questions are too open-ended to classify, when every query needs sub-second real-time data, or when the data model is too tangled for LLM-generated SQL to be reliable without heavy validation.
The pattern pays off most on platforms with several user roles, each with its own bounded data needs and its own set of recurring questions.
Work with me
Ready to discuss your architecture?
I work with founders and engineering leaders as a Fractional CTO to translate business goals into technical strategy - and execute on them. Free 30-minute Technical Health Check to start.
Book a call