Skip to content

Fastman - AI Chat Interface

Interactive AI-powered chat application built with SvelteKit and Cerebras AI, featuring real-time streaming responses and a polished UX

Try It

An AI-powered chat app that provides an interactive "Ask Prakhar" experience — built with SvelteKit and powered by Cerebras AI's ultra-fast inference engine.

Try asking Prakhar something

The Problem

Traditional chat interfaces often feel disconnected from the person they represent. I wanted a conversational AI that could authentically represent my engineering philosophy, project decisions, and technical thinking — with minimal latency.

What It Does

Real-Time Streaming

  • Responses stream token-by-token for immediate feedback
  • Visual loading indicators during generation
  • Smooth scroll-to-bottom with auto-detection

Message Management

  • Edit any user message and regenerate responses from that point
  • Regenerate the last assistant response with one click
  • Clear chat history with confirmation
  • Export conversations as JSON

Rate Limit Handling

  • Real-time rate limit display with tokens remaining
  • Graceful error overlay when limits are hit
  • Countdown timer until limits reset

Developer Experience

  • Syntax-highlighted code blocks using highlight.js
  • Markdown rendering with marked.js
  • Responsive design optimized for mobile and desktop

Technical Architecture

Frontend Stack

  • SvelteKit — Chosen for its reactive state management and minimal boilerplate
  • Svelte 5 — Leveraging $state and $effect runes for cleaner reactivity
  • Tailwind CSS v4 — Utility-first styling with Vite plugin integration
  • TypeScript — Full type safety across components

AI Integration

  • Cerebras AI — Using the gpt-oss-120b model for fast inference
  • Vercel AI SDK — Handles streaming text generation with automatic retries
  • Custom System Prompt — 60+ line prompt encoding my engineering philosophy and communication style

Component Architecture

Chat.svelte          → Main orchestrator (state + API calls)
Message.svelte       → Individual message bubbles with actions
CodeBlock.svelte     → Syntax-highlighted code rendering
EmptyState.svelte    → Starter prompts for new conversations
RateLimitOverlay.svelte → Full-screen error state for rate limits

API Design

Single SvelteKit server endpoint (/api/chat/+server.ts) that:

  1. Validates request origin against allowlist
  2. Streams responses using NDJSON format
  3. Captures and forwards rate limit headers
  4. Handles errors gracefully with typed responses

Key Decisions

Why Cerebras? — Speed was the priority. Cerebras delivers sub-second first-token latency, making the chat feel instant. The free tier is generous enough for a portfolio project.

Why SvelteKit over Next.js? — For a chat-heavy app, Svelte's reactivity model felt more natural than React hooks. Managing streaming state, message arrays, and UI updates was cleaner with $state runes.

Message Editing Strategy — When a user edits a message, all subsequent messages are discarded. This prevents confusing conversation branches and keeps history linear.

Rate Limit UX — Instead of failing silently, the UI proactively shows remaining tokens and displays a full-screen overlay when limits are hit.

Challenges Solved

Streaming State Management — Coordinating streaming responses with message updates, scroll behavior, and loading states required careful state sequencing. Svelte's reactive primitives made this significantly easier than React would have.

Origin Validation — The API validates both origin and referer headers to prevent abuse and embedded iframe attacks.

Export Format — Chat exports include ISO timestamps and structured JSON for easy parsing and import elsewhere.

Tech Stack

LayerTech
FrontendSvelteKit 2.x + Svelte 5
StylingTailwind CSS v4
LanguageTypeScript 5
AICerebras AI SDK + Vercel AI SDK
StreamingNDJSON protocol
Syntaxhighlight.js
Markdownmarked.js
DeployVercel (Edge Runtime)
DevBun + Prettier + ESLint

Results

  • Sub-second response latency on most queries
  • Zero downtime since deployment
  • Handles edge cases (rate limits, network errors) gracefully

Lessons Learned

Streaming UX requires rethinking traditional request/response patterns. You can't just show a spinner — every millisecond of perceived latency matters.

Good system prompts are 80% of AI application quality. The engineering behind the prompt matters more than the model choice.

Rate limits should be a first-class UX consideration. Not an afterthought — users need to know before they hit a wall.

Svelte's reactivity model shines for real-time interfaces. Less ceremony than React for the same result.