FREE GUIDE / 06 / NO ACCOUNT

The 27 things AI looks at on your site.

A long-form checklist organized by category. Each item explains why it matters, how to ship the fix, and how much it moves your readiness score. Bookmark this page — it stays current with model behavior.

27
Checks
4
Categories
~45m
To work through
Q2 '26
Last updated

Crawlability

Before any model can quote you, it needs to reach you. The default web stack blocks AI crawlers more often than people realize — anti-bot tools, aggressive CDNs, and copy-pasted disallow rules silently exclude every model that matters.

01

Allow AI crawler user-agents in robots.txt

CRITICAL

Every frontier model uses a named user-agent. If your robots.txt blocks them — or worse, allows only Googlebot — you are invisible to ChatGPT, Claude, Perplexity, and Gemini. The default Next.js and Vercel configs lean cautious; check yours.

HOW TO FIX In /robots.txt, explicitly allow: GPTBot, Google-Extended, PerplexityBot, ClaudeBot, CCBot, anthropic-ai. If you use a global disallow, list these above it.
IMPACT+12 pts
02

Publish a clean XML sitemap

STANDARD

Models follow sitemaps to discover canonical URLs. A sitemap that 404s, contains stale URLs, or omits half your content is worse than no sitemap — it teaches the model your site is broken.

HOW TO FIX Generate at /sitemap.xml. Validate every URL returns 200. Include lastmod on every entry. Reference from robots.txt: Sitemap: https://yourdomain.com/sitemap.xml.
IMPACT+5 pts
03

Server-render the content models need to quote

CRITICAL

Some models render JavaScript, most do not. If your pricing, product detail, or About content only appears after hydration, expect to be misquoted. The page in view-source is what models see.

HOW TO FIX Run curl https://yourdomain.com/pricing. If pricing is missing from the response, you need SSR or static rendering for that route. React Router v7, Next.js, and Astro all support this — verify it's on.
IMPACT+11 pts
04

Don't rate-limit AI user-agents on Cloudflare / Vercel

STANDARD

Cloudflare's "Bot Fight Mode" challenges most AI crawlers by default. Vercel's firewall settings are equally aggressive. If a crawler hits a CAPTCHA, it never returns.

HOW TO FIX In Cloudflare: Security → Bots → set AI crawlers to "Allow." In Vercel: Firewall → exempt verified AI user-agents. Verify with curl using the exact user-agent string.
IMPACT+7 pts
05

Set canonical URLs on every page

QUICK WIN

Models cite canonicals. If a page lacks one — or worse, points to a different URL — models may cite the wrong URL or none at all. Common with marketing landing pages and UTM-laden links.

HOW TO FIX Add <link rel="canonical" href="https://yourdomain.com/page"> in <head>. Self-reference is fine for default pages. Strip query strings.
IMPACT+4 pts
06

Return 200 for HEAD requests

STANDARD

Several crawlers HEAD before GET. If your CDN returns 405 on HEAD — common with static-only hosts — you lose those crawlers.

HOW TO FIX curl -I https://yourdomain.com. Expect HTTP/2 200. If you get 405, configure your host to accept HEAD on all routes.
IMPACT+3 pts
07

Keep TTFB under 800ms for AI crawlers

STANDARD

Crawlers timeout faster than browsers. If your time-to-first-byte exceeds 1 second on a cold cache, you're a coin flip. Models that time out twice often stop trying.

HOW TO FIX Use an edge CDN. Cache HTML at the edge with appropriate TTLs. Profile with curl -w "%{time_starttransfer}\n".
IMPACT+3 pts

Structured Data

Structured data is how you tell models what your page is without writing prose. Models prefer schema to HTML body text — it's faster to parse, harder to misinterpret, and unambiguous about types.

08

Publish llms.txt at root

CRITICAL

The brand-level manifest file frontier models check before parsing your HTML. Sites with a clean llms.txt get cited more accurately; sites without one get reconstructed from scratch every query.

HOW TO FIX Use the llms.txt Generator. Drop at /llms.txt. Verify with curl. Update when facts change.
IMPACT+14 pts
09

Organization JSON-LD on homepage and About

CRITICAL

The atomic identity card. Name, URL, logo, founding date, contact, social profiles. Read by every model. Without it, models reconstruct your identity from page titles and footer copy.

HOW TO FIX Use the JSON-LD Generator. Drop the script in <head>. Validate with Google Rich Results.
IMPACT+10 pts
10

FAQPage schema on support & pricing pages

STANDARD

Q&A pairs in machine-readable form. Pulls directly into Google AI Overview and Perplexity. The fastest way to seed answers to known prospect questions.

HOW TO FIX Add FAQPage schema with at least 5 question/answer pairs. Keep answers under 80 words. Use plain language.
IMPACT+8 pts
11

Product schema on every product page

STANDARD

Critical for AI shopping agents and pricing queries. Without it, models cite "no pricing information available."

HOW TO FIX Include name, description, brand, offers.price, offers.priceCurrency, offers.availability, and aggregateRating if applicable.
IMPACT+7 pts
12

Open Graph and Twitter card metadata

QUICK WIN

OG tags travel with your links wherever they go. Models read them as a secondary identity source — and use the OG title and description when summarizing.

HOW TO FIX og:title, og:description, og:image, og:type, twitter:card. The image is the one most teams forget — make it real.
IMPACT+4 pts
13

BreadcrumbList for navigation hierarchy

STANDARD

Tells models where this page sits in your site. Used to disambiguate similar pages and improve citation accuracy. Bonus: improves Google sitelinks.

HOW TO FIX On any nested page, add BreadcrumbList with an itemListElement array. Each entry has a position, name, and item URL.
IMPACT+3 pts
14

Validate every schema block

QUICK WIN

Schema-stuffing or broken JSON-LD is worse than no schema at all. Models that hit a parse error on one block stop trusting the rest.

HOW TO FIX Use the Schema Validator and Google Rich Results. Both should return clean.
IMPACT+3 pts
15

Avoid schema duplication across blocks

STANDARD

Two Organization blocks on the same page — common when a CMS injects one and your template injects another — confuses models. Pick one source of truth.

HOW TO FIX Audit pages with the validator. Remove duplicate @type blocks. Use @graph to combine multiple types in one JSON-LD script.
IMPACT+2 pts

Content

Models read your prose. They prefer plain claims, atomic facts, and consistent terminology. Marketing copy is the enemy of accuracy — adjectives confuse, hedge words signal weakness, and superlatives get ignored.

16

State pricing in plain text on a /pricing page

CRITICAL

"Starting at $X" buried in marketing prose. "Contact sales for pricing." Both produce "no pricing information available" in model responses. Plain dollar amounts at the top of a /pricing route work.

HOW TO FIX First H1 or H2 of /pricing should contain the actual price. Use schema.org/Offer markup. Update priceValidUntil annually.
IMPACT+9 pts
17

Use atomic, declarative facts in About

STANDARD

"Founded in 2019" beats "established for over half a decade." Models extract facts, not vibes. Vague language gets paraphrased into hallucination.

HOW TO FIX Open your About page. For each adjective, ask: "What's the fact behind this?" Replace the adjective with the fact.
IMPACT+5 pts
18

Consistent terminology for your category

STANDARD

"Platform" on the homepage, "tool" in the docs, "service" in the FAQ — models pick one and stick with it. The one they pick is rarely yours.

HOW TO FIX Pick one noun. Use it everywhere. Especially in <title>, meta description, H1, and JSON-LD description.
IMPACT+4 pts
19

Name founders and key people

STANDARD

Brand identity questions often include "who founded" or "who runs." If your About page hides leadership, the model invents an answer or refuses to engage.

HOW TO FIX Founder names with role and founding date on About. Add Person JSON-LD with jobTitle for senior leaders.
IMPACT+3 pts
20

Publish a clear /contact route

QUICK WIN

A working email or contact form, indexable. Hidden behind login? Models say "no contact information available." That answer travels.

HOW TO FIX /contact route, public, with at least one email address in plain text. Add ContactPoint JSON-LD on the Organization block.
IMPACT+3 pts
21

Use H1 / H2 / H3 hierarchically

STANDARD

Models lean on heading structure to segment context. Skipped levels (H1 → H4) and multiple H1s on one page reduce parsing accuracy.

HOW TO FIX One H1 per page. H2s for major sections. H3s under H2s. No design-driven heading levels.
IMPACT+2 pts
22

Date-stamp time-sensitive content

STANDARD

Models prefer recent sources. Undated pages lose to dated competitors on the same query. Even a "last updated" footer helps.

HOW TO FIX Add datePublished and dateModified to articles and product pages. Display "Last updated" in human-readable form.
IMPACT+2 pts

Authority

Trust signals are the slowest to move but most durable. Models cross-reference what you say with what other sources say. The more sites that confirm your facts — and the more authoritative those sites — the more accurately you get cited.

23

Wikipedia presence (or at minimum, mention)

STANDARD

Models weight Wikipedia heavily for entity disambiguation and founding facts. You don't need a full article — a mention on a related page can be enough.

HOW TO FIX Verify notability standards before attempting. Reference your brand in industry, category, or founder pages with citations to third-party coverage.
IMPACT+6 pts
24

Link consistently to social profiles

STANDARD

Cross-referencing entity identity. sameAs in your Organization JSON-LD pointing to LinkedIn, X, GitHub, YouTube — whichever you actually use.

HOW TO FIX Add a sameAs array to your Organization schema. Use canonical profile URLs (no UTM, no shortened links).
IMPACT+4 pts
25

Earn citations from publications models trust

STANDARD

Models lean on a small set of high-authority publications for trust. A TechCrunch or Wired mention is worth ten product directories. Specialty trades count if they're indexed.

HOW TO FIX Pitch the publications in your category. Get cited with a working link. Avoid private newsletters — models can't index gated content.
IMPACT+5 pts
26

Consistent NAP (Name, Address, Phone) across the web

STANDARD

Spelling variations between your site, Crunchbase, LinkedIn, and your Twitter bio create entity-resolution noise. Models hedge when they can't reconcile.

HOW TO FIX Audit your name spelling on every public profile. Pick one canonical form. Update directories.
IMPACT+2 pts
27

Maintain a /press or /media page

QUICK WIN

Models cite press coverage. A consolidated page that links to coverage with quotes and dates makes you easier to cite — and signals that you're a real entity worth citing.

HOW TO FIX /press or /media. List each piece of coverage with outlet, date, and a representative quote. Link out.
IMPACT+3 pts