March 20, 2026

The AI Engineering Stack I Actually Use

A pragmatic look at what tools, models, and patterns are worth the hype in 2026 — from someone building things with them daily.

AI engineeringtoolsLLMsstack

Every week there's a new model or framework claiming to change everything. Most of it is noise. Here's what's actually in my stack after building AI-driven projects for the past two years.

Models

Claude for most generation tasks — coding assistance, content, structured output. The reasoning capabilities on complex, multi-step problems are consistently better than alternatives I've tried.

Gemini Flash for high-volume, latency-sensitive tasks where cost matters. The speed-to-quality ratio is hard to beat for things like classification or summarization at scale.

Local models via Ollama when I need offline capability or privacy guarantees. Mistral 7B and Llama 3 cover most of my local needs. The gap with frontier models is real but narrowing fast.

Frameworks

I've tried most of the orchestration frameworks. My current opinions:

LangChain — I used it heavily in 2024. I don't use it anymore. The abstractions created more problems than they solved once you moved past tutorials. The DX has improved but I still prefer building closer to the metal.

Anthropic SDK / OpenAI SDK directly — Just use these. They're good, they're maintained, and you won't spend hours debugging which layer of abstraction swallowed your tool call.

Structured output — I do almost all structured extraction with JSON mode or tool use rather than parsing unstructured text. Much more reliable.

Dev environment

Claude Code — AI-assisted coding inside the terminal. I use it for most things I'd have reached for a search engine for. It's most useful for tasks with clear context: refactoring a specific function, generating boilerplate, explaining unfamiliar code.

The key thing I've learned: keep your context tight. A small, focused conversation gets better results than a long conversation with lots of accumulated context.

Infrastructure

Vercel for front-end deployments. Zero-config Next.js deploys remain one of the best developer experiences in the ecosystem.

Railway for backend services. Cheap, fast to set up, sensible defaults.

Cloudflare R2 for object storage. S3-compatible, no egress fees — straightforward choice for most use cases.

What I'd change

If I were starting fresh today:

I'd spend more time on evals earlier. The hardest part of AI engineering isn't building the first version — it's knowing when it's actually working well enough to ship.
I'd be more skeptical of RAG as a default answer. Vector search + embedding is powerful, but it's also easy to build something that seems to work in demos and fails in production.
I'd design for model upgrades from the start. Models improve fast. If your system is tightly coupled to a specific model's quirks, every upgrade is a migration project.

The field moves fast. What I wrote here will be at least partially wrong in six months.