DEV Community

msm yaqoob
msm yaqoob

Posted on

I Audited 47 GEO Agencies' Technical Stack - Here's What Actually Works for AI Search Optimization

As a technical founder, when I discovered our company had zero visibility in ChatGPT, I did what any developer would do: I went deep on the technical implementation.
Over six weeks, I evaluated 47 agencies claiming to offer "GEO" (Generative Engine Optimization) services. I asked for their technical architecture, reviewed their codebase approaches, and tested their methodologies.
Spoiler: Most were selling rebranded SEO with zero understanding of how LLMs actually work.
But about 8 of them had legitimate technical chops. Here's what I learned about the actual tech stack behind effective AI search optimization.
The Technical Foundation: What Actually Matters

  1. Structured Data Implementation (Critical) This is where most agencies failed the technical test. The Question I Asked: "Walk me through your schema.org implementation strategy." Bad Answers (31 agencies): javascript// What they actually did { &quot;@context&quot;: &quot;<a href="https://schema.org">https://schema.org</a>&quot;, &quot;@type&quot;: &quot;Organization&quot;, &quot;name&quot;: &quot;Company Name&quot; } That's it. Bare minimum Organization schema with no depth. Good Answers (8 agencies): javascript// What actually works for GEO { &quot;@context&quot;: &quot;<a href="https://schema.org">https://schema.org</a>&quot;, &quot;@type&quot;: &quot;Organization&quot;, &quot;name&quot;: &quot;Company Name&quot;, &quot;url&quot;: &quot;<a href="https://example.com">https://example.com</a>&quot;, &quot;logo&quot;: &quot;<a href="https://example.com/logo.png">https://example.com/logo.png</a>&quot;, &quot;sameAs&quot;: [ &quot;<a href="https://twitter.com/company">https://twitter.com/company</a>&quot;, &quot;<a href="https://linkedin.com/company/company">https://linkedin.com/company/company</a>&quot;, &quot;<a href="https://github.com/company">https://github.com/company</a>&quot; ], &quot;contactPoint&quot;: { &quot;@type&quot;: &quot;ContactPoint&quot;, &quot;telephone&quot;: &quot;+1-XXX-XXX-XXXX&quot;, &quot;contactType&quot;: &quot;customer service&quot; }, &quot;address&quot;: { &quot;@type&quot;: &quot;PostalAddress&quot;, &quot;streetAddress&quot;: &quot;123 Main St&quot;, &quot;addressLocality&quot;: &quot;City&quot;, &quot;addressRegion&quot;: &quot;State&quot;, &quot;postalCode&quot;: &quot;12345&quot;, &quot;addressCountry&quot;: &quot;US&quot; } }

{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is your primary service?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Detailed answer with entities and context..."
}
}
// 50-100 more FAQs
]
}

The Technical Difference:

Comprehensive entity relationships (sameAs for cross-platform validation)
Nested structured data (ContactPoint, PostalAddress)
FAQPage schema with extensive Q&A coverage
Product/Service schema with detailed attributes
Review schema with aggregate ratings

Validation Stack:
bash# Tools that actually matter

  • Google Rich Results Test
  • Schema.org Validator
  • JSON-LD Playground
  • Structured Data Linter (custom build)
  • The llms.txt File (Emerging Standard) Only 3 out of 47 agencies even knew what this was. What it is: A file at your root domain that tells AI crawlers about your site structure. txt# llms.txt # https://yoursite.com/llms.txt

Company Information

Organization: Company Name
Industry: B2B SaaS
Founded: 2020
Location: San Francisco, CA

Primary Services

  • Service 1: Description with entities
  • Service 2: Description with entities
  • Service 3: Description with entities

Key Content URLs

Main Site: https://yoursite.com
Documentation: https://docs.yoursite.com
Blog: https://yoursite.com/blog
Case Studies: https://yoursite.com/case-studies

Entity Relationships

Wikipedia: https://en.wikipedia.org/wiki/Company_Name
Crunchbase: https://crunchbase.com/company
LinkedIn: https://linkedin.com/company/company-name

Structured Data Endpoints

Schema: https://yoursite.com/schema.json
Sitemap: https://yoursite.com/sitemap.xml
Implementation:
javascript// Express.js middleware
app.get('/llms.txt', (req, res) => {
res.type('text/plain');
res.sendFile(__dirname + '/public/llms.txt');
});
Impact: Early data suggests 15-20% better citation accuracy from LLMs that support this standard.

  1. Entity Consolidation Architecture The Technical Challenge: AI platforms need to understand that: yourcompany.com === @yourcompany === Your Company Inc. === "Your Company" Bad Approach (Most Agencies): Hope for the best, no systematic consolidation. Good Approach (8 Agencies): javascript// Systematic NAP (Name, Address, Phone) consistency const entityData = { name: "Exact Company Name Inc.", // Never varies address: "123 Main Street, Suite 100, San Francisco, CA 94102", phone: "+1-415-555-0123", email: "contact@company.com", socialHandles: { twitter: "@exacthandle", linkedin: "company/exact-name", github: "exact-org-name" } };

// Used consistently across:
// - Schema.org markup
// - robots.txt
// - llms.txt
// - All social profiles
// - Directory listings
// - Press releases
Validation Script:
python# entity_consistency_checker.py
import requests
from bs4 import BeautifulSoup
import json

def check_entity_consistency(urls):
entities = []

for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
# Extract schema.org data
scripts = soup.find_all('script', type='application/ld+json')
for script in scripts:
    data = json.loads(script.string)
    if '@type' in data and data['@type'] == 'Organization':
        entities.append({
            'source': url,
            'name': data.get('name'),
            'url': data.get('url'),
            'address': data.get('address')
        })
Enter fullscreen mode Exit fullscreen mode

Check for inconsistencies

names = set(e['name'] for e in entities if 'name' in e)
if len(names) > 1:
print(f"⚠️ Inconsistent names found: {names}")
else:
print(f"✅ Entity name consistent: {names.pop()}")

Enter fullscreen mode Exit fullscreen mode




Usage

urls = [
'https://yoursite.com',
'https://yoursite.com/about',
'https://yoursite.com/contact'
]
check_entity_consistency(urls)

  1. Semantic HTML Structure LLMs parse HTML better than humans. Structure matters. Bad HTML (What Most Sites Have): html What is your service? We provide XYZ service. Good HTML (What Works for GEO): html

    What is your service?

    We provide XYZ service, which helps entities achieve specific outcomes through methodologies.

    Key Technical Principles:

Semantic HTML5 tags (, , )
Microdata attributes (itemprop, itemscope, itemtype)
Proper heading hierarchy (H1 → H2 → H3, no skipping)
Descriptive class names (.faq-question vs .q)
Meaningful alt text on images (not keyword stuffing)

  1. API-First Content Architecture The Problem: Static content ages poorly for AI search (especially DeepSeek, which heavily favors recency). The Solution: Headless CMS with dynamic content injection. javascript// Next.js example with dynamic content import { useState, useEffect } from 'react';

export default function FAQPage() {
const [faqs, setFaqs] = useState([]);
const [lastUpdated, setLastUpdated] = useState(null);

useEffect(() => {
// Fetch from headless CMS
fetch('/api/faqs')
.then(res => res.json())
.then(data => {
setFaqs(data.faqs);
setLastUpdated(data.lastUpdated);
});
}, []);

return (



{faqs.map(faq => (

))}

);
}
Benefits:

Easy content updates (no redeployment)
Automatic "Last Modified" timestamps
A/B testing content for AI optimization
Dynamic schema generation

  1. Sitemap Optimization for AI Crawlers Standard XML sitemaps aren't enough anymore. Enhanced Sitemap Strategy: xml<?xml version="1.0" encoding="UTF-8"?> https://yoursite.com/important-page 2026-02-06T10:00:00+00:00 weekly 1.0 <!-- AI-specific metadata --> news:news news:publication_date2026-02-06T10:00:00Z/news:publication_date news:titleExact Page Title/news:title /news:news Plus, separate sitemaps:

/sitemap-articles.xml (blog content)
/sitemap-faqs.xml (FAQ pages - critical for GEO)
/sitemap-products.xml (product/service pages)
/sitemap-images.xml (image optimization)

  1. Performance Metrics That Actually Correlate with AI Citations After analyzing our data and the 8 successful agencies, here are the technical metrics that correlate with AI visibility: javascript// Metrics that matter for GEO const geoMetrics = { // Critical schemaValidationScore: 100, // Must be perfect faqPageCount: 50, // Minimum for meaningful coverage entityConsistency: 100, // Across all platforms

// Important

firstContentfulPaint: 1.2, // seconds (< 1.5s target)
timeToInteractive: 2.8, // seconds (< 3.0s target)
cumulativeLayoutShift: 0.05, // (< 0.1 target)

// Nice to have
structuredDataCoverage: 85, // % of pages with schema
internalLinkDensity: 3.2, // links per 1000 words
semanticKeywordDensity: 2.1 // % (entity-focused)
};
Monitoring Stack:
bash# Technical monitoring for GEO

  • Lighthouse CI (automated performance testing)
  • Schema.org Validator (automated checking)
  • Custom AI query testing (ChatGPT API + Selenium)
  • Entity consistency monitoring (custom Python script)
  • Structured data change detection (git diff + alerts)
  • The Testing Framework Nobody Uses (But Should) Here's how I tested agencies' technical competency: python# ai_visibility_tester.py import openai from anthropic import Anthropic import google.generativeai as genai

class AIVisibilityTester:
def init(self, company_name, test_queries):
self.company_name = company_name
self.test_queries = test_queries
self.results = {
'chatgpt': [],
'claude': [],
'gemini': []
}

def test_chatgpt(self, query):
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
)
return self.company_name.lower() in response.choices[0].message.content.lower()

def test_claude(self, query):
anthropic = Anthropic()
response = anthropic.messages.create(
model="claude-3-5-sonnet-20241022",
messages=[{"role": "user", "content": query}]
)
return self.company_name.lower() in response.content[0].text.lower()

def run_full_test(self):
for query in self.test_queries:
self.results['chatgpt'].append(self.test_chatgpt(query))
self.results['claude'].append(self.test_claude(query))

# Calculate citation rates
citation_rate = {
    'chatgpt': sum(self.results['chatgpt']) / len(self.results['chatgpt']) * 100,
    'claude': sum(self.results['claude']) / len(self.results['claude']) * 100
}

return citation_rate
Enter fullscreen mode Exit fullscreen mode
Enter fullscreen mode Exit fullscreen mode




Usage

tester = AIVisibilityTester(
company_name="YourCompany",
test_queries=[
"best CRM for real estate",
"top project management tools for startups",
"which accounting software should I use"
]
)

results = tester.run_full_test()
print(f"ChatGPT citation rate: {results['chatgpt']}%")
print(f"Claude citation rate: {results['claude']}%")
Run this monthly to track actual progress, not vanity metrics.
The Technical Stack That Actually Worked
After implementing learnings from the best 8 agencies, here's our production stack:
yaml# Frontend
Framework: Next.js 14 (App Router)
CMS: Contentful (headless)
Styling: Tailwind CSS
Deployment: Vercel

Schema Management

Generator: Custom React component
Validation: Automated via GitHub Actions
Storage: Git-tracked JSON files

Monitoring

Performance: Lighthouse CI
Schema: Custom validator (Python)
AI Testing: Weekly automated queries
Uptime: UptimeRobot

Content Pipeline

Writing: Human + AI-assisted
Editing: Human review
Schema: Auto-generated from content
Deployment: Continuous (via git push)

Analytics

Traditional: Google Analytics 4
AI-specific: Custom dashboard (Retool)
Citation tracking: Weekly manual + automated tests
The Results (Technical Proof)
Before Optimization:
bash$ python ai_visibility_tester.py
ChatGPT citation rate: 0%
Claude citation rate: 0%
Gemini citation rate: 0%
After 4 Months:
bash$ python ai_visibility_tester.py
ChatGPT citation rate: 47%
Claude citation rate: 38%
Gemini citation rate: 63%
Perplexity citation rate: 73%
Technical Improvements:

Schema validation score: 45% → 100%
FAQ page count: 3 → 87
Structured data coverage: 12% → 94%
Entity consistency: 67% → 100%
Core Web Vitals: Failed → Passed (all metrics)

Business Impact:

AI-attributed traffic: +340%
Qualified leads from AI: 83 in 4 months
Revenue from AI sources: $340K+

What Most Agencies Get Wrong (Technical Edition)

  1. They Bolt Schema Onto Existing Sites Wrong Approach: javascript// Adding schema as an afterthought // Hardcoded JSON-LD Right Approach: javascript// Schema as first-class citizen in component architecture export default function ProductPage({ product }) { const schema = generateProductSchema(product);

return (

<>


      type="application/ld+json"&lt;br&gt;
      dangerouslySetInnerHTML={{ __html: JSON.stringify(schema) }}&lt;br&gt;
    /&gt;&lt;br&gt;
  &lt;/Head&gt;&lt;br&gt;
  &lt;ProductDetails product={product} /&gt;&lt;br&gt;
&amp;lt;/&amp;gt;&lt;br&gt;
Enter fullscreen mode Exit fullscreen mode

);<br>
}</p>

<ol>
<li>They Ignore Performance
LLMs favor fast sites. Period.
The Data:</li>
</ol>

<p>Sites <1.5s FCP: 3.2x higher citation rate<br>
Sites >3.0s FCP: 40% lower citation rate</p>

<p>Fix:<br>
javascript// Image optimization example<br>
import Image from &#39;next/image&#39;;</p>

<p>// Before (wrong)<br>
<img src="/hero.jpg" alt="Hero" /></p>

<p>// After (right)<br>
<Image<br>
src="/hero.jpg"<br>
alt="Descriptive, entity-rich alt text"<br>
width={1200}<br>
height={600}<br>
priority<br>
placeholder="blur"<br>
/></p>

<ol>
<li>They Use Generic Content
AI platforms favor specificity, entities, and data.
Generic (doesn&#39;t work):
markdownWe offer great services to help businesses grow.
Specific (works):
markdownOur B2B SaaS platform helps mid-market companies ($10M-$100M revenue)
in the healthcare vertical reduce customer acquisition costs by an
average of 23% through AI-driven lead scoring, automated nurture
campaigns, and predictive churn analysis.
Open Source Tools I Built
Since most agencies had inadequate tooling, I built my own:</li>
<li>GEO Schema Validator
bashnpm install -g geo-schema-validator
geo-validate <a href="https://yoursite.com"&gt;https://yoursite.com&lt;/a&gt;&lt;/li>
<li>AI Citation Tracker
bashpip install ai-citation-tracker
ai-track --site yoursite.com --queries queries.txt
Both available on GitHub
Recommendations for Developers
If you&#39;re implementing GEO yourself:</li>
</ol>

<p>Start with Schema.org coverage - 80%+ of your pages need it<br>
Build FAQ content systematically - Target 50-100 question/answer pairs<br>
<a href="https://digimsm.com/marketing-automation/"&gt;Automate entity</a> consistency checking - Don&#39;t do this manually<br>
Set up automated AI testing - Weekly queries across platforms<br>
Optimize for performance - Core Web Vitals matter for AI<br>
Use semantic HTML - It&#39;s not 2010 anymore, divs aren&#39;t enough</p>

<p>If you&#39;re hiring an agency:<br>
Ask to see their:</p>

<p>Schema implementation approach (code samples)<br>
Testing methodology (scripts, automation)<br>
Entity consolidation process (technical documentation)<br>
Performance optimization stack (tools, metrics)</p>

<p>If they can&#39;t provide these, they&#39;re not technically competent enough for GEO.<br>
Full Technical Breakdown<br>
I&#39;ve documented the complete technical architecture, including code samples, configuration files, and testing frameworks in my <a href="https://medium.com/@msmyaqoob55/finding-the-right-geo-agency-what-i-learned-after-vetting-47-ai-optimization-companies-6c424b8064db"&gt;detailed Medium article</a>.<br>
Questions?<br>
Drop them in the comments. I&#39;m actively monitoring and happy to share specific code samples, configuration files, or architectural decisions.</p>

Top comments (0)