From days to seconds: how an AI query layer cleared the data request backlog on a gift card platform

The problem nobody had named

The gift card platform had two kinds of users who needed data every day to run their operations. Providers issued gift cards and needed to track issuance, redemption, and balances. Merchants accepted those cards and needed to track their own redemptions, settlements, and dues.

Both groups kept asking the same kind of question, over and over, in slightly different forms:

"How many of my cards were redeemed this week?"
"What's my current balance with the platform?"
"Which cards are expiring in the next 30 days?"
"What's my settlement amount for this month?"
"Why is this card showing a different balance than I expect?"

None of these are hard questions. The data sat right there in the platform's database. The answers were simple to produce.

But the only path ran through engineering. Every question became a ticket. Every ticket joined a queue. Every answer took days.

Users weren't complaining about it. They'd accepted it as how things worked. The teams had normalized it too. But the pile of requests had two consequences that were hard to see from inside the system.

The first: engineering time was going to data retrieval. The ticket volume wasn't enormous, but it was steady, and it added up to weeks of capacity per quarter spent producing answers that needed engineering access to the database but no engineering skill to generate.

The second: business decisions were stalling. A provider weighing a promotional campaign needed redemption data. If that data took three days to arrive, the decision got made on stale numbers or no numbers at all. The cost was real, and nobody was counting it.

The architecture decision that made this safe

The first decision, and the most important one, was a hard constraint: the AI query layer would have no write path to the production database.

That's not the obvious move when you're scoping an AI feature. The tempting thing is to wire the AI straight into production so it can answer with live data. But an AI layer with direct production access adds risk to production. Bugs in query generation, unexpected load from heavy queries, prompt injection attempts: all of them now have production consequences.

The sidecar removes that risk completely. The AI layer works off a read replica synced from production on a schedule. The replica is the only database it ever touches. A bug in the AI layer produces a wrong answer and nothing worse. It can't corrupt production data, lock production tables, or interfere with transaction processing.

graph TD
    subgraph PRODUCTION["Production Platform (.NET + SQL Server)"]
        APP[Application Backend]
        SOURCE[SQL Server Database]
    end

    subgraph ETL["ETL Layer"]
        SYNC[Sync Orchestrator - scheduled]
    end

    subgraph SIDECAR["AI Sidecar (Python + FastAPI)"]
        REPLICA[PostgreSQL Read Replica]
        ROUTER[LLM Router - intent classification]
        SQL_AGENT[SQL Agent]
        RAG_AGENT[RAG Agent]
        GUARDRAILS[Input Validator + Output Scrubber]
        RATE[Rate Limiter]
    end

    SOURCE -->|incremental sync| SYNC
    SYNC --> REPLICA
    APP -->|JWT auth| SIDECAR
    GUARDRAILS --> ROUTER
    ROUTER -->|structured intent| SQL_AGENT
    ROUTER -->|knowledge query| RAG_AGENT
    SQL_AGENT --> REPLICA

The ETL sync runs on a short interval, keeping the replica current enough for the questions the system is built to answer. Redemption counts, balance summaries, and settlement figures don't need millisecond freshness. They need minute-level freshness, and that's what the sidecar delivers.

How the query system works

Intent classification

When a user submits a natural language query, an LLM classifier maps the input to a structured intent before any data is touched. The intent taxonomy covers the question categories that show up in real operational usage:

Provider intents: gift card status, redemption analytics, balance summary, expiring cards.

Merchant intents: outlet redemptions, settlement figures, outstanding dues, denomination performance.

Admin intents: platform-wide volume, provider rankings, overall redemption rates.

The classification step matters for two reasons. It gates data access, because a classified intent gets checked against the user's role before any query runs. And it pins query generation to known templates for known intents, which shrinks the surface for SQL injection or unpredictable generated queries.

"Which of my cards were redeemed yesterday?" and "Show me redemptions from the last 24 hours" classify to the same intent, the same query template, the same data scope. The phrasing variation that would break a rule-based system gets absorbed by the LLM classifier.

Role-based data isolation

Each user authenticates with the same JWT they use on the production platform. The sidecar validates the token and pulls out the role and entity identity.

Every query is automatically scoped to the authenticated user's data:

Role	Query Scope
Provider	Own gift cards and redemptions only
Merchant	Own outlet transactions only
Admin	Full platform data

A Provider can't ask about another Provider's cards. A Merchant can't see platform-wide figures. Scoping happens at query generation, not at the response level. The query itself is written to return only authorized data.

So the system never has to post-filter results and hope the filter is right. The query is scoped before it ever runs.

The guardrails

Building an AI query layer over operational data means taking the security surface seriously.

Input validation blocks SQL injection patterns and prompt injection attempts before they reach the LLM. Queries with SQL keywords in odd positions, attempts to override system instructions, or anything matching a known injection signature get rejected before classification.

Output scrubbing masks PII. The platform data holds email addresses, phone numbers, and other personal identifiers. Responses are scanned and masked before they leave the sidecar. A provider asking about cardholder redemption patterns gets aggregate data, not individual cardholder records.

Rate limiting applies per-user caps at burst, minute, and hour granularity, so nobody can turn the query layer into an unrestricted database engine.

The business value that surfaced

The immediate win was the one we expected: providers and merchants could ask questions and get answers in seconds instead of days. The value that showed up over the following weeks and months was bigger than that.

Providers started running operations they used to skip. When a provider can see which cards expire this week in seconds, they can act on it: run a reminder campaign, offer an extension, contact affected cardholders. When that answer took three days, it arrived too late to use. The sidecar made a whole category of business action practical that had been theoretical.

Merchants found discrepancies they'd stopped looking for. Once several merchants could query their own settlement figures directly, they spotted gaps between expected and actual settlement amounts that had quietly built up. The data had always shown it. The friction of getting to the data was what had kept it hidden.

The platform got stickier. Providers and merchants who use the query layer come back to the platform more often than those who don't. Daily operational questions that used to mean an external email now get answered inside the platform. Engagement went up with no change to the core product.

Engineering got capacity back. The data request queue wasn't large before the sidecar. After it, for the question categories the sidecar covers, it dropped close to zero. The engineers who had been the path to this data could put that time into product work.

What it took to build, and what it cost

It took roughly six weeks from the first ETL design to production deployment. The ETL sync layer and the security guardrails ate the most time. The LLM routing and SQL agent work went faster than expected, because the intent taxonomy was nailed down before implementation started.

The ongoing cost is LLM inference per query. At production volumes that's a small fraction of the value created. Every query that doesn't become an engineering ticket has a cost nobody tracks but everyone has felt.

Maintenance is modest but real. The ETL sync needs attention when the source schema changes. The intent taxonomy needs to grow as new query patterns emerge from how people actually use it. Neither is heavy, but both need an owner.

What the sidecar did not change

The sidecar didn't change the production platform. No production schema changes, no changes to the application backend, no changes to auth in the main system. The JWT validation is shared, but the JWT itself still comes from the existing authentication system.

That was a deliberate constraint, and it's why the project landed in six weeks instead of six months. An additive layer with no production dependencies keeps the scope bounded and the risk contained.

The operational data was always there. Its value to the business was always real. The only thing that changed was the path to it.