NodeOps
UK

Guide

How to Build and Deploy an AI API Wrapper

Build a thin HTTP service that wraps the LLM API. Your clients call your endpoint. Your service adds the API key, applies rate limits, caches frequent requests, and returns the response. Deploy it as a standard API and manage it from the CLI.

The problem

You want to use OpenAI, Anthropic, or another LLM API in your product, but you do not want to expose your API key to clients. You need a proxy that adds authentication, rate limiting, caching, and custom prompts on top of the raw LLM API. Building this as a standalone service is the standard pattern.

Why wrap an LLM API instead of calling it directly

Calling OpenAI or Anthropic directly from your frontend exposes your API key. Even from a backend, every service that needs LLM access needs its own API key management. A wrapper centralizes access: one service holds the key, enforces rate limits, caches responses, adds custom system prompts, and logs usage. All other services call your wrapper.

What an API wrapper needs

At minimum: an HTTP endpoint that proxies requests to the LLM, environment variables for API keys, and HTTPS. Better: response caching (save money on repeated queries), rate limiting (prevent abuse), custom system prompts (add context without client-side changes), usage logging (track costs per user), and multi-model routing (try GPT-4o, fall back to Claude).

Example architecture

Client sends POST /api/chat with a prompt. Your wrapper adds the system prompt, checks the cache, calls OpenAI if cache misses, stores the response, logs the usage, and returns the result. Total code: under 100 lines in Express or FastAPI.

Approaches compared

Custom wrapper on a platform (CreateOS, Railway)

Pros

  • Full control over logic
  • Add caching, rate limits, custom prompts
  • Deploy in one command
  • Environment variables for API keys

Cons

  • You build and maintain it

Best for: Teams that need custom logic on top of LLM APIs

LLM Gateway (Portkey, LiteLLM, AI Gateway)

Pros

  • Pre-built proxy with caching and fallbacks
  • Multi-provider routing
  • Usage analytics

Cons

  • Another dependency
  • Less customizable
  • Potential vendor lock-in

Best for: Teams that need a gateway without building one

Edge function (Cloudflare AI Gateway, Vercel AI)

Pros

  • Low latency
  • Built-in caching
  • Managed infrastructure

Cons

  • Limited customization
  • Platform-specific APIs
  • Timeout constraints

Best for: Simple proxying without custom business logic

Deploy an AI API wrapper with CreateOS CLI

Here is how to do it step by step using CreateOS CLI.

1

Build the wrapper

$ # Express example const app = express(); app.post('/api/chat', async (req, res) => { const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: [{ role: 'system', content: SYSTEM_PROMPT }, ...req.body.messages] }); res.json(response); });

A thin Express server that proxies OpenAI with a custom system prompt. Under 50 lines of code.

2

Deploy

$ createos login && createos init && createos deploy

Deploy like any Node.js app. The CLI auto-detects Express and handles the rest.

3

Set API keys

$ createos env set OPENAI_API_KEY=sk-...

Your API key lives on the server. Clients never see it.

4

Scale for traffic

$ createos scale --replicas 2 --cpu 300

Scale up when traffic increases. Scale down when it does not.

Frequently asked questions

Is it safe to proxy an LLM API?
Yes, this is the recommended pattern. Your API key stays on your server. Clients authenticate with your wrapper using your own auth (API key, JWT, etc.), not the LLM provider's key. This is how every production AI product works.
How do I add caching to reduce costs?
Hash the request (model + messages) and check a cache (Redis, in-memory, or database) before calling the LLM. If the same request was made recently, return the cached response. This can reduce API costs by 30-60% for applications with repeated queries.
Can I use multiple LLM providers?
Yes. Your wrapper can route to different providers based on the request: use GPT-4o for complex tasks, Claude for long context, and a cheaper model for simple queries. If one provider is down, fall back to another. This is multi-model routing.
How do I charge users for API calls?
Track usage per user in your wrapper (count tokens or requests). Bill monthly or use a credit system. Platforms like CreateOS support a Skills marketplace where you can publish your wrapper as a monetizable API that charges per call.

Try it yourself

$ brew install createos

100,000+ Builders. One Workspace.

Get product updates, builder stories, and early access to features that help you ship faster.

CreateOS is a unified intelligent workspace where ideas move seamlessly from concept to live deployment, eliminating context-switching across tools, infrastructure, and workflows with the opportunity to monetize ideas immediately on the CreateOS Marketplace.