FREE GUIDE / 06 / NO ACCOUNT

The 27 things AI looks at on your site.

A long-form checklist organized by category. Each item explains why it matters, how to ship the fix, and how much it moves your readiness score. Bookmark this page — it stays current with model behavior.

Checks

Crawlability

01 / 04 · 7 ITEMS

Before any model can quote you, it needs to reach you. The default web stack blocks AI crawlers more often than people realize — anti-bot tools, aggressive CDNs, and copy-pasted disallow rules silently exclude every model that matters.

Allow AI crawler user-agents in robots.txt

CRITICAL

Every frontier model uses a named user-agent. If your robots.txt blocks them — or worse, allows only Googlebot — you are invisible to ChatGPT, Claude, Perplexity, and Gemini. The default Next.js and Vercel configs lean cautious; check yours.

HOW TO FIXIn /robots.txt, explicitly allow: GPTBot, Google-Extended, PerplexityBot, ClaudeBot, CCBot, anthropic-ai. If you use a global disallow, list these above it.

IMPACT+12 pts

Publish a clean XML sitemap

STANDARD

Models follow sitemaps to discover canonical URLs. A sitemap that 404s, contains stale URLs, or omits half your content is worse than no sitemap — it teaches the model your site is broken.

HOW TO FIXGenerate at /sitemap.xml. Validate every URL returns 200. Include lastmod on every entry. Reference from robots.txt: Sitemap: https://yourdomain.com/sitemap.xml.

IMPACT+5 pts

Server-render the content models need to quote

CRITICAL

Some models render JavaScript, most do not. If your pricing, product detail, or About content only appears after hydration, expect to be misquoted. The page in view-source is what models see.

HOW TO FIXRun curl https://yourdomain.com/pricing. If pricing is missing from the response, you need SSR or static rendering for that route. React Router v7, Next.js, and Astro all support this — verify it's on.

IMPACT+11 pts

Don't rate-limit AI user-agents on Cloudflare / Vercel

STANDARD

Cloudflare's "Bot Fight Mode" challenges most AI crawlers by default. Vercel's firewall settings are equally aggressive. If a crawler hits a CAPTCHA, it never returns.

HOW TO FIXIn Cloudflare: Security → Bots → set AI crawlers to "Allow." In Vercel: Firewall → exempt verified AI user-agents. Verify with curl using the exact user-agent string.

IMPACT+7 pts

Set canonical URLs on every page

QUICK WIN

Models cite canonicals. If a page lacks one — or worse, points to a different URL — models may cite the wrong URL or none at all. Common with marketing landing pages and UTM-laden links.

HOW TO FIXAdd <link rel="canonical" href="https://yourdomain.com/page"> in <head>. Self-reference is fine for default pages. Strip query strings.

IMPACT+4 pts

Return 200 for HEAD requests

STANDARD

Several crawlers HEAD before GET. If your CDN returns 405 on HEAD — common with static-only hosts — you lose those crawlers.

HOW TO FIXcurl -I https://yourdomain.com. Expect HTTP/2 200. If you get 405, configure your host to accept HEAD on all routes.

IMPACT+3 pts

Keep TTFB under 800ms for AI crawlers

STANDARD

Crawlers timeout faster than browsers. If your time-to-first-byte exceeds 1 second on a cold cache, you're a coin flip. Models that time out twice often stop trying.

HOW TO FIXUse an edge CDN. Cache HTML at the edge with appropriate TTLs. Profile with curl -w "%{time_starttransfer}\n".

IMPACT+3 pts

Structured Data

02 / 04 · 8 ITEMS

Structured data is how you tell models what your page is without writing prose. Models prefer schema to HTML body text — it's faster to parse, harder to misinterpret, and unambiguous about types.

Publish `llms.txt` at root

CRITICAL

The brand-level manifest file frontier models check before parsing your HTML. Sites with a clean llms.txt get cited more accurately; sites without one get reconstructed from scratch every query.

HOW TO FIXUse the llms.txt Generator. Drop at /llms.txt. Verify with curl. Update when facts change.

IMPACT+14 pts

Organization JSON-LD on homepage and About

CRITICAL

The atomic identity card. Name, URL, logo, founding date, contact, social profiles. Read by every model. Without it, models reconstruct your identity from page titles and footer copy.

HOW TO FIXUse the JSON-LD Generator. Drop the script in <head>. Validate with Google Rich Results.

IMPACT+10 pts

FAQPage schema on support & pricing pages

STANDARD

Q&A pairs in machine-readable form. Pulls directly into Google AI Overview and Perplexity. The fastest way to seed answers to known prospect questions.

HOW TO FIXAdd FAQPage schema with at least 5 question/answer pairs. Keep answers under 80 words. Use plain language.

IMPACT+8 pts

Product schema on every product page

STANDARD

Critical for AI shopping agents and pricing queries. Without it, models cite "no pricing information available."

HOW TO FIXInclude name, description, brand, offers.price, offers.priceCurrency, offers.availability, and aggregateRating if applicable.

IMPACT+7 pts

OG and Twitter card metadata

QUICK WIN

OG tags travel with your links wherever they go. Models read them as a secondary identity source — and use the OG title and description when summarizing.

HOW TO FIXog:title, og:description, og:image, og:type, twitter:card. The image is the one most teams forget — make it real.

IMPACT+4 pts

BreadcrumbList for navigation hierarchy

STANDARD

Tells models where this page sits in your site. Used to disambiguate similar pages and improve citation accuracy. Bonus: improves Google sitelinks.

HOW TO FIXOn any nested page, add BreadcrumbList with an itemListElement array. Each entry has a position, name, and item URL.

IMPACT+3 pts

Validate every schema block

QUICK WIN

Schema-stuffing or broken JSON-LD is worse than no schema at all. Models that hit a parse error on one block stop trusting the rest.

HOW TO FIXUse the Schema Validator and Google Rich Results. Both should return clean.

IMPACT+3 pts

Avoid schema duplication across blocks

STANDARD

Two Organization blocks on the same page — common when a CMS injects one and your template injects another — confuses models. Pick one source of truth.

HOW TO FIXAudit pages with the validator. Remove duplicate @type blocks. Use @graph to combine multiple types in one JSON-LD script.

IMPACT+2 pts

Content

03 / 04 · 7 ITEMS

Models read your prose. They prefer plain claims, atomic facts, and consistent terminology. Marketing copy is the enemy of accuracy — adjectives confuse, hedge words signal weakness, and superlatives get ignored.

State pricing in plain text on a /pricing page

CRITICAL

"Starting at $X" buried in marketing prose. "Contact sales for pricing." Both produce "no pricing information available" in model responses. Plain dollar amounts at the top of a /pricing route work.

HOW TO FIXFirst H1 or H2 of /pricing should contain the actual price. Use schema.org/Offer markup. Update priceValidUntil annually.

IMPACT+9 pts

Use atomic, declarative facts in About

STANDARD

"Founded in 2019" beats "established for over half a decade." Models extract facts, not vibes. Vague language gets paraphrased into hallucination.

HOW TO FIXOpen your About page. For each adjective, ask: "What's the fact behind this?" Replace the adjective with the fact.

IMPACT+5 pts

Consistent terminology for your category

STANDARD

"Platform" on the homepage, "tool" in the docs, "service" in the FAQ — models pick one and stick with it. The one they pick is rarely yours.

HOW TO FIXPick one noun. Use it everywhere. Especially in <title>, meta description, H1, and JSON-LD description.

IMPACT+4 pts

Name founders and key people

STANDARD

Brand identity questions often include "who founded" or "who runs." If your About page hides leadership, the model invents an answer or refuses to engage.

HOW TO FIXFounder names with role and founding date on About. Add Person JSON-LD with jobTitle for senior leaders.

IMPACT+3 pts

Publish a clear /contact route

QUICK WIN

A working email or contact form, indexable. Hidden behind login? Models say "no contact information available." That answer travels.

HOW TO FIX/contact route, public, with at least one email address in plain text. Add ContactPoint JSON-LD on the Organization block.

IMPACT+3 pts

Use H1 / H2 / H3 hierarchically

STANDARD

Models lean on heading structure to segment context. Skipped levels (H1 → H4) and multiple H1s on one page reduce parsing accuracy.

HOW TO FIXOne H1 per page. H2s for major sections. H3s under H2s. No design-driven heading levels.

IMPACT+2 pts

Date-stamp time-sensitive content

STANDARD

Models prefer recent sources. Undated pages lose to dated competitors on the same query. Even a "last updated" footer helps.

HOW TO FIXAdd datePublished and dateModified to articles and product pages. Display "Last updated" in human-readable form.

IMPACT+2 pts

Authority

04 / 04 · 5 ITEMS

Trust signals are the slowest to move but most durable. Models cross-reference what you say with what other sources say. The more sites that confirm your facts — and the more authoritative those sites — the more accurately you get cited.

Wikipedia presence (or at minimum, mention)

STANDARD

Models weight Wikipedia heavily for entity disambiguation and founding facts. You don't need a full article — a mention on a related page can be enough.

HOW TO FIXVerify notability standards before attempting. Reference your brand in industry, category, or founder pages with citations to third-party coverage.

IMPACT+6 pts

Link consistently to social profiles

STANDARD

Cross-referencing entity identity. sameAs in your Organization JSON-LD pointing to LinkedIn, X, GitHub, YouTube — whichever you actually use.

HOW TO FIXAdd a sameAs array to your Organization schema. Use canonical profile URLs (no UTM, no shortened links).

IMPACT+4 pts

Earn citations from publications models trust

STANDARD

Models lean on a small set of high-authority publications for trust. A TechCrunch or Wired mention is worth ten product directories. Specialty trades count if they're indexed.

HOW TO FIXPitch the publications in your category. Get cited with a working link. Avoid private newsletters — models can't index gated content.

IMPACT+5 pts

Consistent NAP (Name, Address, Phone) across the web

STANDARD

Spelling variations between your site, Crunchbase, LinkedIn, and your Twitter bio create entity-resolution noise. Models hedge when they can't reconcile.

HOW TO FIXAudit your name spelling on every public profile. Pick one canonical form. Update directories.

IMPACT+2 pts

Maintain a /press or /media page

QUICK WIN

Models cite press coverage. A consolidated page that links to coverage with quotes and dates makes you easier to cite — and signals that you're a real entity worth citing.

HOW TO FIX/press or /media. List each piece of coverage with outlet, date, and a representative quote. Link out.

IMPACT+3 pts

The 27 things AI looks at on your site.

Crawlability

Allow AI crawler user-agents in robots.txt

Publish a clean XML sitemap

Server-render the content models need to quote

Don't rate-limit AI user-agents on Cloudflare / Vercel

Set canonical URLs on every page

Return 200 for HEAD requests

Keep TTFB under 800ms for AI crawlers

Structured Data

Publish llms.txt at root

Organization JSON-LD on homepage and About

FAQPage schema on support & pricing pages

Product schema on every product page

OG and Twitter card metadata

BreadcrumbList for navigation hierarchy

Validate every schema block

Avoid schema duplication across blocks

Content

State pricing in plain text on a /pricing page

Use atomic, declarative facts in About

Consistent terminology for your category

Name founders and key people

Publish a clear /contact route

Use H1 / H2 / H3 hierarchically

Date-stamp time-sensitive content

Authority

Wikipedia presence (or at minimum, mention)

Link consistently to social profiles

Earn citations from publications models trust

Consistent NAP (Name, Address, Phone) across the web

Maintain a /press or /media page

Publish `llms.txt` at root