Fastman - AI Chat Interface
Interactive AI-powered chat application built with SvelteKit and Cerebras AI, featuring real-time streaming responses and a polished UX
Try It
An AI-powered chat app that provides an interactive "Ask Prakhar" experience — built with SvelteKit and powered by Cerebras AI's ultra-fast inference engine.
Try asking Prakhar something
The Problem
Traditional chat interfaces often feel disconnected from the person they represent. I wanted a conversational AI that could authentically represent my engineering philosophy, project decisions, and technical thinking — with minimal latency.
What It Does
Real-Time Streaming
- Responses stream token-by-token for immediate feedback
- Visual loading indicators during generation
- Smooth scroll-to-bottom with auto-detection
Message Management
- Edit any user message and regenerate responses from that point
- Regenerate the last assistant response with one click
- Clear chat history with confirmation
- Export conversations as JSON
Rate Limit Handling
- Real-time rate limit display with tokens remaining
- Graceful error overlay when limits are hit
- Countdown timer until limits reset
Developer Experience
- Syntax-highlighted code blocks using highlight.js
- Markdown rendering with marked.js
- Responsive design optimized for mobile and desktop
Technical Architecture
Frontend Stack
- SvelteKit — Chosen for its reactive state management and minimal boilerplate
- Svelte 5 — Leveraging
$stateand$effectrunes for cleaner reactivity - Tailwind CSS v4 — Utility-first styling with Vite plugin integration
- TypeScript — Full type safety across components
AI Integration
- Cerebras AI — Using the
gpt-oss-120bmodel for fast inference - Vercel AI SDK — Handles streaming text generation with automatic retries
- Custom System Prompt — 60+ line prompt encoding my engineering philosophy and communication style
Component Architecture
Chat.svelte → Main orchestrator (state + API calls)
Message.svelte → Individual message bubbles with actions
CodeBlock.svelte → Syntax-highlighted code rendering
EmptyState.svelte → Starter prompts for new conversations
RateLimitOverlay.svelte → Full-screen error state for rate limits
API Design
Single SvelteKit server endpoint (/api/chat/+server.ts) that:
- Validates request origin against allowlist
- Streams responses using NDJSON format
- Captures and forwards rate limit headers
- Handles errors gracefully with typed responses
Key Decisions
Why Cerebras? — Speed was the priority. Cerebras delivers sub-second first-token latency, making the chat feel instant. The free tier is generous enough for a portfolio project.
Why SvelteKit over Next.js? — For a chat-heavy app, Svelte's reactivity model felt more natural than React hooks. Managing streaming state, message arrays, and UI updates was cleaner with $state runes.
Message Editing Strategy — When a user edits a message, all subsequent messages are discarded. This prevents confusing conversation branches and keeps history linear.
Rate Limit UX — Instead of failing silently, the UI proactively shows remaining tokens and displays a full-screen overlay when limits are hit.
Challenges Solved
Streaming State Management — Coordinating streaming responses with message updates, scroll behavior, and loading states required careful state sequencing. Svelte's reactive primitives made this significantly easier than React would have.
Origin Validation — The API validates both origin and referer headers to prevent abuse and embedded iframe attacks.
Export Format — Chat exports include ISO timestamps and structured JSON for easy parsing and import elsewhere.
Tech Stack
| Layer | Tech |
|---|---|
| Frontend | SvelteKit 2.x + Svelte 5 |
| Styling | Tailwind CSS v4 |
| Language | TypeScript 5 |
| AI | Cerebras AI SDK + Vercel AI SDK |
| Streaming | NDJSON protocol |
| Syntax | highlight.js |
| Markdown | marked.js |
| Deploy | Vercel (Edge Runtime) |
| Dev | Bun + Prettier + ESLint |
Results
- Sub-second response latency on most queries
- Zero downtime since deployment
- Handles edge cases (rate limits, network errors) gracefully
Lessons Learned
Streaming UX requires rethinking traditional request/response patterns. You can't just show a spinner — every millisecond of perceived latency matters.
Good system prompts are 80% of AI application quality. The engineering behind the prompt matters more than the model choice.
Rate limits should be a first-class UX consideration. Not an afterthought — users need to know before they hit a wall.
Svelte's reactivity model shines for real-time interfaces. Less ceremony than React for the same result.