Dungeon Duo
Full-stack AI Dungeon Master for two cooperative players

A full-stack web application that brings tabletop role-playing to life through AI. Fully automated Dungeon Master for two players — dynamic narrative, scene illustrations, world map, and real-time text-to-speech. No preparation, no game master, no physical materials.

- —AI Dungeon Master embeds hidden structured data blocks in narrative to sync game state invisibly after every turn
- —Two-player character system with AI-generated portraits and personalised backstories
- —Dynamic 100×100 world map with automatic movement tracking after each AI response
- —AI-generated scene illustrations displayed inline in the game log as the adventure unfolds
- —Streaming TTS narration — PCM audio decoded from base64 and scheduled via Web Audio API as chunks arrive
- —Full session persistence via IndexedDB with JSON export/restore and adventure log rewind
- —Admin panel with live-switchable game profiles: D&D, Lovecraftian horror, Conan 2d20, Das Schwarze Auge 5, fairy-tale
Dungeon Duo AI is a full-stack web application that brings tabletop role-playing to life through artificial intelligence. The application serves as a fully automated Dungeon Master for two cooperative players, generating dynamic narrative, tracking game state, producing scene illustrations, and narrating stories aloud — all in real time. Players can jump into a D&D-style adventure without preparation, a game master, or physical materials.
The project emerged from a desire to make collaborative storytelling accessible: two people, a browser, and an API key are all that is needed to run a complete role-playing session.
Technical Stack
The frontend is built with React 19 and TypeScript, bundled with Vite, and styled using Tailwind CSS 4. Routing is handled by React Router 7, and local persistence uses IndexedDB via idb-keyval. The application is fully client-side — there is no dedicated backend server. All AI capabilities are consumed directly through Google's GenAI SDK, with Gemini 2.5 Flash as the primary model for narration, image generation, and text-to-speech synthesis.
The project is deployed continuously via a GitLab CI/CD pipeline that runs automated tests, bumps the version, generates a changelog, and deploys to a production server over SFTP.
Core Features
AI Dungeon Master — Every player action is sent to Gemini 2.5 Flash, which responds with narrative prose. Embedded within each response is a hidden structured data block (||DATA|{...}||) that carries game state updates: hit points, inventory changes, current map coordinates, active effects, and more. This design keeps the narrative clean while allowing seamless, invisible state synchronization after every turn.
Two-Player Character System — Both players create characters with names, classes, races, and D&D 5e ability scores. The AI generates personalized backstories on setup, and character portraits are produced by the image generation model. Each character has independently tracked HP, spells, and inventory throughout the session.
Dynamic World Map — Player movement is tracked automatically on a 100×100 grid. The map visualizes explored locations, current position, and visited paths, updating after every AI response without any manual input from the players.
AI-Generated Scene Illustrations — After each narrative response, the application requests a scene illustration from Gemini's image model. Images are displayed inline in the game log, creating a visual record of the adventure as it unfolds.
Streaming Text-to-Speech — The Dungeon Master's narration is read aloud using Gemini's TTS API. Audio chunks stream in real time and are decoded from base64 PCM, buffered, and played back as they arrive. Each player and the DM have distinct voice settings, chosen from over 30 available voices.
Game Persistence — Sessions auto-save to IndexedDB between turns. Full game states — including embedded images — can be exported to JSON and restored later. An adventure log view allows admins to browse and rewind to any prior game state.
Admin Interface — A password-protected admin panel exposes system prompt templates, model selection, API key management, and token usage monitoring. Multiple game profiles are available out of the box: classic D&D, Lovecraftian horror, Conan 2d20, Das Schwarze Auge 5, and a fairy-tale setting. Profiles can be edited and switched live without reloading.
Engineering Highlights
One of the more interesting technical challenges was making the AI responses carry structured data without breaking narrative immersion. Rather than splitting the model call into a "story call" and a "data call," the system prompt instructs the model to embed a JSON block at the end of every response using a custom delimiter. The parser strips this block silently, updates state, and displays only the story text to the players.
Another challenge was audio streaming. The TTS API returns audio incrementally, and the application needed to begin playback before the full response arrived. This was solved by maintaining an ordered chunk buffer, decoding PCM audio from base64 in a Web Audio API context, and scheduling chunks for playback as they completed — while still correctly handling cancellation when the user interrupts narration.
Retry logic with exponential backoff (2 s, 4 s, 8 s) handles Google API rate limits gracefully, and a fallback model strategy automatically switches to a lighter Gemini model when the primary model quota is exhausted.
Version 1.21.0, actively maintained. Localized in German and designed for a German-speaking audience.