The AI Engineering Stack I Actually Use

A pragmatic look at what tools, models, and patterns are worth the hype in 2026 — from someone building things with them daily.

AI engineeringtoolsLLMsstack

Every week there's a new model or framework claiming to change everything. Most of it is noise. Here's what's actually in my stack after building AI-driven projects for the past two years.

Models

Claude for most generation tasks — coding assistance, content, structured output. The reasoning capabilities on complex, multi-step problems are consistently better than alternatives I've tried.

Gemini Flash for high-volume, latency-sensitive tasks where cost matters. The speed-to-quality ratio is hard to beat for things like classification or summarization at scale.

Local models via Ollama when I need offline capability or privacy guarantees. Mistral 7B and Llama 3 cover most of my local needs. The gap with frontier models is real but narrowing fast.

Frameworks

I've tried most of the orchestration frameworks. My current opinions:

LangChain — I used it heavily in 2024. I don't use it anymore. The abstractions created more problems than they solved once you moved past tutorials. The DX has improved but I still prefer building closer to the metal.

Anthropic SDK / OpenAI SDK directly — Just use these. They're good, they're maintained, and you won't spend hours debugging which layer of abstraction swallowed your tool call.

Structured output — I do almost all structured extraction with JSON mode or tool use rather than parsing unstructured text. Much more reliable.

Dev environment

Claude Code — AI-assisted coding inside the terminal. I use it for most things I'd have reached for a search engine for. It's most useful for tasks with clear context: refactoring a specific function, generating boilerplate, explaining unfamiliar code.

The key thing I've learned: keep your context tight. A small, focused conversation gets better results than a long conversation with lots of accumulated context.

Infrastructure

Vercel for front-end deployments. Zero-config Next.js deploys remain one of the best developer experiences in the ecosystem.

Railway for backend services. Cheap, fast to set up, sensible defaults.

Cloudflare R2 for object storage. S3-compatible, no egress fees — straightforward choice for most use cases.

What I'd change

If I were starting fresh today:

  • I'd spend more time on evals earlier. The hardest part of AI engineering isn't building the first version — it's knowing when it's actually working well enough to ship.
  • I'd be more skeptical of RAG as a default answer. Vector search + embedding is powerful, but it's also easy to build something that seems to work in demos and fails in production.
  • I'd design for model upgrades from the start. Models improve fast. If your system is tightly coupled to a specific model's quirks, every upgrade is a migration project.

The field moves fast. What I wrote here will be at least partially wrong in six months.