msm yaqoob

Posted on Feb 6

I Audited 47 GEO Agencies' Technical Stack - Here's What Actually Works for AI Search Optimization

#webdev #unoplatformchallenge #ai #tutorial

As a technical founder, when I discovered our company had zero visibility in ChatGPT, I did what any developer would do: I went deep on the technical implementation.
Over six weeks, I evaluated 47 agencies claiming to offer "GEO" (Generative Engine Optimization) services. I asked for their technical architecture, reviewed their codebase approaches, and tested their methodologies.
Spoiler: Most were selling rebranded SEO with zero understanding of how LLMs actually work.
But about 8 of them had legitimate technical chops. Here's what I learned about the actual tech stack behind effective AI search optimization.
The Technical Foundation: What Actually Matters

Structured Data Implementation (Critical) This is where most agencies failed the technical test. The Question I Asked: "Walk me through your schema.org implementation strategy." Bad Answers (31 agencies): javascript// What they actually did { "@context": "<a href="https://schema.org">https://schema.org</a>", "@type": "Organization", "name": "Company Name" } That's it. Bare minimum Organization schema with no depth. Good Answers (8 agencies): javascript// What actually works for GEO { "@context": "<a href="https://schema.org">https://schema.org</a>", "@type": "Organization", "name": "Company Name", "url": "<a href="https://example.com">https://example.com</a>", "logo": "<a href="https://example.com/logo.png">https://example.com/logo.png</a>", "sameAs": [ "<a href="https://twitter.com/company">https://twitter.com/company</a>", "<a href="https://linkedin.com/company/company">https://linkedin.com/company/company</a>", "<a href="https://github.com/company">https://github.com/company</a>" ], "contactPoint": { "@type": "ContactPoint", "telephone": "+1-XXX-XXX-XXXX", "contactType": "customer service" }, "address": { "@type": "PostalAddress", "streetAddress": "123 Main St", "addressLocality": "City", "addressRegion": "State", "postalCode": "12345", "addressCountry": "US" } }

{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "What is your primary service?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Detailed answer with entities and context..."
}
}
// 50-100 more FAQs
]
}

The Technical Difference:

Comprehensive entity relationships (sameAs for cross-platform validation)
Nested structured data (ContactPoint, PostalAddress)
FAQPage schema with extensive Q&A coverage
Product/Service schema with detailed attributes
Review schema with aggregate ratings

Validation Stack:
bash# Tools that actually matter

Google Rich Results Test
Schema.org Validator
JSON-LD Playground
Structured Data Linter (custom build)
The llms.txt File (Emerging Standard) Only 3 out of 47 agencies even knew what this was. What it is: A file at your root domain that tells AI crawlers about your site structure. txt# llms.txt # https://yoursite.com/llms.txt

Primary Services

Service 1: Description with entities
Service 2: Description with entities
Service 3: Description with entities

Key Content URLs

Main Site: https://yoursite.com
Documentation: https://docs.yoursite.com
Blog: https://yoursite.com/blog
Case Studies: https://yoursite.com/case-studies

Entity Relationships

Wikipedia: https://en.wikipedia.org/wiki/Company_Name
Crunchbase: https://crunchbase.com/company
LinkedIn: https://linkedin.com/company/company-name

Structured Data Endpoints

Schema: https://yoursite.com/schema.json
Sitemap: https://yoursite.com/sitemap.xml
Implementation:
javascript// Express.js middleware
app.get('/llms.txt', (req, res) => {
res.type('text/plain');
res.sendFile(__dirname + '/public/llms.txt');
});
Impact: Early data suggests 15-20% better citation accuracy from LLMs that support this standard.

Entity Consolidation Architecture The Technical Challenge: AI platforms need to understand that: yourcompany.com === @yourcompany === Your Company Inc. === "Your Company" Bad Approach (Most Agencies): Hope for the best, no systematic consolidation. Good Approach (8 Agencies): javascript// Systematic NAP (Name, Address, Phone) consistency const entityData = { name: "Exact Company Name Inc.", // Never varies address: "123 Main Street, Suite 100, San Francisco, CA 94102", phone: "+1-415-555-0123", email: "contact@company.com", socialHandles: { twitter: "@exacthandle", linkedin: "company/exact-name", github: "exact-org-name" } };

// Used consistently across:
// - Schema.org markup
// - robots.txt
// - llms.txt
// - All social profiles
// - Directory listings
// - Press releases
Validation Script:
python# entity_consistency_checker.py
import requests
from bs4 import BeautifulSoup
import json

def check_entity_consistency(urls):
entities = []

for url in urls:

    response = requests.get(url)

    soup = BeautifulSoup(response.content, 'html.parser')

# Extract schema.org data
scripts = soup.find_all('script', type='application/ld+json')
for script in scripts:
    data = json.loads(script.string)
    if '@type' in data and data['@type'] == 'Organization':
        entities.append({
            'source': url,
            'name': data.get('name'),
            'url': data.get('url'),
            'address': data.get('address')
        })



    

    





  
  
  Check for inconsistencies


names = set(e['name'] for e in entities if 'name' in e)

if len(names) > 1:

    print(f"⚠️ Inconsistent names found: {names}")

else:

    print(f"✅ Entity name consistent: {names.pop()}")

Usage

urls = [
'https://yoursite.com',
'https://yoursite.com/about',
'https://yoursite.com/contact'
]
check_entity_consistency(urls)

Semantic HTML Structure LLMs parse HTML better than humans. Structure matters. Bad HTML (What Most Sites Have): html What is your service? We provide XYZ service. Good HTML (What Works for GEO): html
What is your service?

We provide XYZ service, which helps entities achieve specific outcomes through methodologies.
Key Technical Principles:

Semantic HTML5 tags (, , )
Microdata attributes (itemprop, itemscope, itemtype)
Proper heading hierarchy (H1 → H2 → H3, no skipping)
Descriptive class names (.faq-question vs .q)
Meaningful alt text on images (not keyword stuffing)

API-First Content Architecture The Problem: Static content ages poorly for AI search (especially DeepSeek, which heavily favors recency). The Solution: Headless CMS with dynamic content injection. javascript// Next.js example with dynamic content import { useState, useEffect } from 'react';

export default function FAQPage() {
const [faqs, setFaqs] = useState([]);
const [lastUpdated, setLastUpdated] = useState(null);

useEffect(() => {
// Fetch from headless CMS
fetch('/api/faqs')
.then(res => res.json())
.then(data => {
setFaqs(data.faqs);
setLastUpdated(data.lastUpdated);
});
}, []);

return (

{faqs.map(faq => (

))}

);
}
Benefits:

Easy content updates (no redeployment)
Automatic "Last Modified" timestamps
A/B testing content for AI optimization
Dynamic schema generation

Sitemap Optimization for AI Crawlers Standard XML sitemaps aren't enough anymore. Enhanced Sitemap Strategy: xml<?xml version="1.0" encoding="UTF-8"?> https://yoursite.com/important-page 2026-02-06T10:00:00+00:00 weekly 1.0  news:news news:publication_date2026-02-06T10:00:00Z/news:publication_date news:titleExact Page Title/news:title /news:news Plus, separate sitemaps:

/sitemap-articles.xml (blog content)
/sitemap-faqs.xml (FAQ pages - critical for GEO)
/sitemap-products.xml (product/service pages)
/sitemap-images.xml (image optimization)

Performance Metrics That Actually Correlate with AI Citations After analyzing our data and the 8 successful agencies, here are the technical metrics that correlate with AI visibility: javascript// Metrics that matter for GEO const geoMetrics = { // Critical schemaValidationScore: 100, // Must be perfect faqPageCount: 50, // Minimum for meaningful coverage entityConsistency: 100, // Across all platforms

// Important

firstContentfulPaint: 1.2, // seconds (< 1.5s target)
timeToInteractive: 2.8, // seconds (< 3.0s target)
cumulativeLayoutShift: 0.05, // (< 0.1 target)

// Nice to have
structuredDataCoverage: 85, // % of pages with schema
internalLinkDensity: 3.2, // links per 1000 words
semanticKeywordDensity: 2.1 // % (entity-focused)
};
Monitoring Stack:
bash# Technical monitoring for GEO

Lighthouse CI (automated performance testing)
Schema.org Validator (automated checking)
Custom AI query testing (ChatGPT API + Selenium)
Entity consistency monitoring (custom Python script)
Structured data change detection (git diff + alerts)
The Testing Framework Nobody Uses (But Should) Here's how I tested agencies' technical competency: python# ai_visibility_tester.py import openai from anthropic import Anthropic import google.generativeai as genai

class AIVisibilityTester:
def init(self, company_name, test_queries):
self.company_name = company_name
self.test_queries = test_queries
self.results = {
'chatgpt': [],
'claude': [],
'gemini': []
}

def test_chatgpt(self, query):

    response = openai.ChatCompletion.create(

        model="gpt-4",

        messages=[{"role": "user", "content": query}]

    )

    return self.company_name.lower() in response.choices[0].message.content.lower()

def test_claude(self, query):

    anthropic = Anthropic()

    response = anthropic.messages.create(

        model="claude-3-5-sonnet-20241022",

        messages=[{"role": "user", "content": query}]

    )

    return self.company_name.lower() in response.content[0].text.lower()

def run_full_test(self):

    for query in self.test_queries:

        self.results['chatgpt'].append(self.test_chatgpt(query))

        self.results['claude'].append(self.test_claude(query))

# Calculate citation rates
citation_rate = {
    'chatgpt': sum(self.results['chatgpt']) / len(self.results['chatgpt']) * 100,
    'claude': sum(self.results['claude']) / len(self.results['claude']) * 100
}

return citation_rate

Usage

tester = AIVisibilityTester(
company_name="YourCompany",
test_queries=[
"best CRM for real estate",
"top project management tools for startups",
"which accounting software should I use"
]
)

results = tester.run_full_test()
print(f"ChatGPT citation rate: {results['chatgpt']}%")
print(f"Claude citation rate: {results['claude']}%")
Run this monthly to track actual progress, not vanity metrics.
The Technical Stack That Actually Worked
After implementing learnings from the best 8 agencies, here's our production stack:
yaml# Frontend
Framework: Next.js 14 (App Router)
CMS: Contentful (headless)
Styling: Tailwind CSS
Deployment: Vercel

Schema Management

Generator: Custom React component
Validation: Automated via GitHub Actions
Storage: Git-tracked JSON files

Monitoring

Performance: Lighthouse CI
Schema: Custom validator (Python)
AI Testing: Weekly automated queries
Uptime: UptimeRobot

Content Pipeline

Writing: Human + AI-assisted
Editing: Human review
Schema: Auto-generated from content
Deployment: Continuous (via git push)

Analytics

Traditional: Google Analytics 4
AI-specific: Custom dashboard (Retool)
Citation tracking: Weekly manual + automated tests
The Results (Technical Proof)
Before Optimization:
bash$ python ai_visibility_tester.py
ChatGPT citation rate: 0%
Claude citation rate: 0%
Gemini citation rate: 0%
After 4 Months:
bash$ python ai_visibility_tester.py
ChatGPT citation rate: 47%
Claude citation rate: 38%
Gemini citation rate: 63%
Perplexity citation rate: 73%
Technical Improvements:

Schema validation score: 45% → 100%
FAQ page count: 3 → 87
Structured data coverage: 12% → 94%
Entity consistency: 67% → 100%
Core Web Vitals: Failed → Passed (all metrics)

Business Impact:

AI-attributed traffic: +340%
Qualified leads from AI: 83 in 4 months
Revenue from AI sources: $340K+

What Most Agencies Get Wrong (Technical Edition)

They Bolt Schema Onto Existing Sites Wrong Approach: javascript// Adding schema as an afterthought // Hardcoded JSON-LD Right Approach: javascript// Schema as first-class citizen in component architecture export default function ProductPage({ product }) { const schema = generateProductSchema(product);

return (

<>

      type="application/ld+json"&lt;br&gt;
      dangerouslySetInnerHTML={{ __html: JSON.stringify(schema) }}&lt;br&gt;
    /&gt;&lt;br&gt;
  &lt;/Head&gt;&lt;br&gt;
  &lt;ProductDetails product={product} /&gt;&lt;br&gt;
&amp;lt;/&amp;gt;&lt;br&gt;

); 
}

<ol>
<li>They Ignore Performance
LLMs favor fast sites. Period.
The Data:</li>
</ol>

Sites <1.5s FCP: 3.2x higher citation rate 
Sites >3.0s FCP: 40% lower citation rate

Fix: 
javascript// Image optimization example 
import Image from 'next/image';

// Before (wrong) 
<img src="/hero.jpg" alt="Hero" />

// After (right) 
<Image 
src="/hero.jpg" 
alt="Descriptive, entity-rich alt text" 
width={1200} 
height={600} 
priority 
placeholder="blur" 
/>

<ol>
<li>They Use Generic Content
AI platforms favor specificity, entities, and data.
Generic (doesn't work):
markdownWe offer great services to help businesses grow.
Specific (works):
markdownOur B2B SaaS platform helps mid-market companies ($10M-$100M revenue)
in the healthcare vertical reduce customer acquisition costs by an
average of 23% through AI-driven lead scoring, automated nurture
campaigns, and predictive churn analysis.
Open Source Tools I Built
Since most agencies had inadequate tooling, I built my own:</li>
<li>GEO Schema Validator
bashnpm install -g geo-schema-validator
geo-validate <a href="https://yoursite.com">https://yoursite.com</a></li>
<li>AI Citation Tracker
bashpip install ai-citation-tracker
ai-track --site yoursite.com --queries queries.txt
Both available on GitHub
Recommendations for Developers
If you're implementing GEO yourself:</li>
</ol>

Start with Schema.org coverage - 80%+ of your pages need it 
Build FAQ content systematically - Target 50-100 question/answer pairs 
<a href="https://digimsm.com/marketing-automation/">Automate entity</a> consistency checking - Don't do this manually 
Set up automated AI testing - Weekly queries across platforms 
Optimize for performance - Core Web Vitals matter for AI 
Use semantic HTML - It's not 2010 anymore, divs aren't enough

If you're hiring an agency: 
Ask to see their:

Schema implementation approach (code samples) 
Testing methodology (scripts, automation) 
Entity consolidation process (technical documentation) 
Performance optimization stack (tools, metrics)

If they can't provide these, they're not technically competent enough for GEO. 
Full Technical Breakdown 
I've documented the complete technical architecture, including code samples, configuration files, and testing frameworks in my <a href="https://medium.com/@msmyaqoob55/finding-the-right-geo-agency-what-i-learned-after-vetting-47-ai-optimization-companies-6c424b8064db">detailed Medium article</a>. 
Questions? 
Drop them in the comments. I'm actively monitoring and happy to share specific code samples, configuration files, or architectural decisions.

DEV Community

I Audited 47 GEO Agencies' Technical Stack - Here's What Actually Works for AI Search Optimization

Company Information

Primary Services

Key Content URLs

Entity Relationships

Structured Data Endpoints

Check for inconsistencies

Usage

What is your service?

Usage

Schema Management

Monitoring

Content Pipeline

Analytics

Top comments (0)