The AI Engineering Stack I Actually Use
A pragmatic look at what tools, models, and patterns are worth the hype in 2026 — from someone building things with them daily.
Every week there's a new model or framework claiming to change everything. Most of it is noise. Here's what's actually in my stack after building AI-driven projects for the past two years.
Models
Claude for most generation tasks — coding assistance, content, structured output. The reasoning capabilities on complex, multi-step problems are consistently better than alternatives I've tried.
Gemini Flash for high-volume, latency-sensitive tasks where cost matters. The speed-to-quality ratio is hard to beat for things like classification or summarization at scale.
Local models via Ollama when I need offline capability or privacy guarantees. Mistral 7B and Llama 3 cover most of my local needs. The gap with frontier models is real but narrowing fast.
Frameworks
I've tried most of the orchestration frameworks. My current opinions:
LangChain — I used it heavily in 2024. I don't use it anymore. The abstractions created more problems than they solved once you moved past tutorials. The DX has improved but I still prefer building closer to the metal.
Anthropic SDK / OpenAI SDK directly — Just use these. They're good, they're maintained, and you won't spend hours debugging which layer of abstraction swallowed your tool call.
Structured output — I do almost all structured extraction with JSON mode or tool use rather than parsing unstructured text. Much more reliable.
Dev environment
Claude Code — AI-assisted coding inside the terminal. I use it for most things I'd have reached for a search engine for. It's most useful for tasks with clear context: refactoring a specific function, generating boilerplate, explaining unfamiliar code.
The key thing I've learned: keep your context tight. A small, focused conversation gets better results than a long conversation with lots of accumulated context.
Infrastructure
Vercel for front-end deployments. Zero-config Next.js deploys remain one of the best developer experiences in the ecosystem.
Railway for backend services. Cheap, fast to set up, sensible defaults.
Cloudflare R2 for object storage. S3-compatible, no egress fees — straightforward choice for most use cases.
What I'd change
If I were starting fresh today:
- I'd spend more time on evals earlier. The hardest part of AI engineering isn't building the first version — it's knowing when it's actually working well enough to ship.
- I'd be more skeptical of RAG as a default answer. Vector search + embedding is powerful, but it's also easy to build something that seems to work in demos and fails in production.
- I'd design for model upgrades from the start. Models improve fast. If your system is tightly coupled to a specific model's quirks, every upgrade is a migration project.
The field moves fast. What I wrote here will be at least partially wrong in six months.