From One Selfie to Four Cartoon Avatars: How I Built an AI Avatar Generator with Next.js

#webdev #ai #programming #nanobanana

TL;DR
User Flow: Upload 1 selfie, pick a style pack, and get 4 unique cartoon avatars.

The Secret Sauce: A two-stage pipeline—Visual Description → Style Generation—to balance character likeness with artistic consistency.

MVP Architecture: Used JSON-based storage for task and subscription management to enable rapid validation with minimal overhead.

Background: Why Build This?
I noticed a common pain point among users:

People want avatars that are recognizable but protect their privacy.

People want consistent styles, not a "lottery" of random AI outputs.

Most people don't want to write complex prompts.

My mission was simple: "One selfie, many styles."

The user journey is compressed into three steps:

Upload a selfie.

Select a Style Pack.

Generate and download.

Functional Design: Style-First & "Prompt-less" UI
The core concept is the Style Pack. I bundled Prompts, Negative Prompts, and recommended aspect ratios into single configuration objects.

This keeps the artistic output consistent while providing a "Prompt-less" experience for the user.

Simplified Configuration:

TypeScript
const stylePacks = [
{
id: "anime-lineart",
name: "Clean Anime",
promptTemplate: "clean lineart, soft pastel colors, high-quality digital art",
negativePrompt: "low quality, text, watermark, blurry, realistic photo",
},
]
Currently, the app features 10 built-in styles, including Anime, Cel-shaded, Chibi, 3D, Pixel Art, and Claymation.

The Workflow: Balancing Likeness vs. Artistry
To avoid the instability of direct img2img (which often creates "uncanny valley" results), I implemented a two-stage process:

Visual Description (VLM): Use a vision model (like Gemini) to extract key features from the selfie (hair style, glasses, facial expression, etc.).

Stylized Generation: Combine the Style Pack Prompt + Visual Description to generate 4 distinct avatars.

This ensures the "Style" dominates the aesthetic while the "Visual Description" maintains the user's identity. I also added a slider for Likeness vs. Style priority to give users more control.

System Architecture: The MVP Stack
My goal was speed-to-market. Here is the stack I chose:

Next.js App Router: Full-stack integration; API routes handle the task orchestration.

OpenRouter: A single unified API to call both Gemini (for vision) and various image generation models.

Supabase Auth: Quick implementation of Google Social Login.

Creem / PayPal: Handling subscriptions and international payments.

Tailwind CSS + shadcn/ui: For a clean, responsive UI.

Lightweight Task Queue
For the MVP, I skipped complex message brokers like RabbitMQ. Instead, I used an in-memory queue backed by JSON storage:

States: queued → running → succeeded/failed/canceled

Automatic timeouts and retry logic.

This setup is more than enough for initial traffic and is incredibly easy to maintain.

Subscriptions & Rate Limiting
To keep GPU costs under control, I enforced strict rules at the API layer:

Authorization: Only subscribed users can trigger generation tasks.

Quota: 1 credit per generation (yielding 4 images).

Concurrency: Maximum of 1 active task per user.

Retention: Images are stored for 7 days by default, with an option for users to delete them manually.

Lessons Learned & Pitfalls
Model Compatibility: Not all models through OpenRouter support image output natively. I had to build a robust configuration handler with mock fallbacks.

Identity Drift: Pure prompting often loses the person's likeness. Introducing the "Visual Description" stage stabilized the results significantly.

JSON for Storage: While simple, you must be careful with concurrent writes. I implemented basic file-locking to prevent data corruption during the MVP stage.

Roadmap
Integrate stronger identity-preserving solutions (like LoRA or DreamBooth).

Add a dashboard for Style Pack management.

Migrate to a persistent task queue (e.g., Upstash or BullMQ) and S3-compatible object storage.

If you're interested in AI-driven UX or want to discuss the technical implementation of avatar generators, let’s connect in the comments!

A few tips for your dev.to post:
Add Visuals: Since this is a "generator," users will want to see results. Include a "Before & After" image in the post.

Code Snippets: You mentioned you're a computer teacher—developers on dev.to love seeing how you handled the OpenRouter API call or the JSON file-locking. Feel free to add a bit more code!

The "Call to Action": Since you are an independent developer, mentioning that you are looking for feedback on the UX is a great way to get comments.

Would you like me to expand on any specific technical part, such as the JSON file-locking logic or the OpenRouter integration code?