Live Demo: https://www.zltest.online/
TL;DR
User Flow: Upload 1 selfie, pick a style pack, and get 4 unique cartoon avatars.
The Secret Sauce: A two-stage pipeline—Visual Description → Style Generation—to balance character likeness with artistic consistency.
MVP Architecture: Used JSON-based storage for task and subscription management to enable rapid validation with minimal overhead.
Background: Why Build This?
I noticed a common pain point among users:
People want avatars that are recognizable but protect their privacy.
People want consistent styles, not a "lottery" of random AI outputs.
Most people don't want to write complex prompts.
My mission was simple: "One selfie, many styles."
The user journey is compressed into three steps:
Upload a selfie.
Select a Style Pack.
Generate and download.
Functional Design: Style-First & "Prompt-less" UI
The core concept is the Style Pack. I bundled Prompts, Negative Prompts, and recommended aspect ratios into single configuration objects.


This keeps the artistic output consistent while providing a "Prompt-less" experience for the user.
Simplified Configuration:
TypeScript
const stylePacks = [
{
id: "anime-lineart",
name: "Clean Anime",
promptTemplate: "clean lineart, soft pastel colors, high-quality digital art",
negativePrompt: "low quality, text, watermark, blurry, realistic photo",
},
]
Currently, the app features 10 built-in styles, including Anime, Cel-shaded, Chibi, 3D, Pixel Art, and Claymation.
The Workflow: Balancing Likeness vs. Artistry
To avoid the instability of direct img2img (which often creates "uncanny valley" results), I implemented a two-stage process:
Visual Description (VLM): Use a vision model (like Gemini) to extract key features from the selfie (hair style, glasses, facial expression, etc.).
Stylized Generation: Combine the Style Pack Prompt + Visual Description to generate 4 distinct avatars.
This ensures the "Style" dominates the aesthetic while the "Visual Description" maintains the user's identity. I also added a slider for Likeness vs. Style priority to give users more control.
System Architecture: The MVP Stack
My goal was speed-to-market. Here is the stack I chose:
Next.js App Router: Full-stack integration; API routes handle the task orchestration.
OpenRouter: A single unified API to call both Gemini (for vision) and various image generation models.
Supabase Auth: Quick implementation of Google Social Login.
Creem / PayPal: Handling subscriptions and international payments.
Tailwind CSS + shadcn/ui: For a clean, responsive UI.
Lightweight Task Queue
For the MVP, I skipped complex message brokers like RabbitMQ. Instead, I used an in-memory queue backed by JSON storage:
States: queued → running → succeeded/failed/canceled
Automatic timeouts and retry logic.
This setup is more than enough for initial traffic and is incredibly easy to maintain.
Subscriptions & Rate Limiting
To keep GPU costs under control, I enforced strict rules at the API layer:
Authorization: Only subscribed users can trigger generation tasks.
Quota: 1 credit per generation (yielding 4 images).
Concurrency: Maximum of 1 active task per user.
Retention: Images are stored for 7 days by default, with an option for users to delete them manually.
Lessons Learned & Pitfalls
Model Compatibility: Not all models through OpenRouter support image output natively. I had to build a robust configuration handler with mock fallbacks.
Identity Drift: Pure prompting often loses the person's likeness. Introducing the "Visual Description" stage stabilized the results significantly.
JSON for Storage: While simple, you must be careful with concurrent writes. I implemented basic file-locking to prevent data corruption during the MVP stage.
Roadmap
Integrate stronger identity-preserving solutions (like LoRA or DreamBooth).
Add a dashboard for Style Pack management.
Migrate to a persistent task queue (e.g., Upstash or BullMQ) and S3-compatible object storage.
If you're interested in AI-driven UX or want to discuss the technical implementation of avatar generators, let’s connect in the comments!
A few tips for your dev.to post:
Add Visuals: Since this is a "generator," users will want to see results. Include a "Before & After" image in the post.
Code Snippets: You mentioned you're a computer teacher—developers on dev.to love seeing how you handled the OpenRouter API call or the JSON file-locking. Feel free to add a bit more code!
The "Call to Action": Since you are an independent developer, mentioning that you are looking for feedback on the UX is a great way to get comments.
Would you like me to expand on any specific technical part, such as the JSON file-locking logic or the OpenRouter integration code?
Top comments (0)