The Honest Version of a DIY Audit Guide
Most “how to do your own SEO audit” articles fall into one of two traps. Either they are thinly disguised sales pitches — “here are the first three steps, now hire us for the rest” — or they are enormous checklists that assume you have enterprise tool subscriptions and a decade of experience interpreting crawl data. This guide aims for something more useful: a genuinely practical process that tells you what you can assess yourself with free tools, what each finding actually means, and where the limits of self-assessment honestly sit.
The guide follows the five-stage search visibility maturity curve — the same framework we use in our professional audits and explain in our SEO vs GEO comparison. Each stage builds on the previous one: there is no point optimising for AI citability (Stage 4) if search engines cannot crawl your site (Stage 1). Work through them in order, and do not skip ahead.
The honest summary upfront: you can do Stages 1–3 yourself with free tools and reasonable technical confidence. Stages 4–5 require expertise, competitive intelligence and manual AI testing that is difficult to replicate without specialist knowledge. Knowing where that boundary sits is itself valuable — it tells you exactly where a professional audit would add the most value for your specific situation.
What You Need Before Starting
Before running through the stages, ensure you have access to the following. All are free.
Google Search Console. If your site is not verified in Search Console, set this up first — it takes minutes and provides the most reliable data about how Google sees your site. You need at least a few weeks of data for meaningful analysis, so if you are starting from scratch, set it up now and return to this guide once data has accumulated.
Google Analytics (GA4). Your primary source of traffic and engagement data. If GA4 is not installed or was only recently configured, you will have limited historical data — but even a few weeks of data is useful for identifying patterns.
Screaming Frog SEO Spider (free version). The free version crawls up to 500 URLs, which is sufficient for most small-to-medium business sites. If your site has more than 500 indexable URLs, the free tier will give you a sample but not the complete picture — our SEO Practitioner’s Toolkit covers the paid alternatives.
PageSpeed Insights. Google’s free tool for assessing Core Web Vitals and page performance. No account needed.
Google’s Rich Results Test. Free tool for validating structured data markup. Essential for Stages 3 and 4.
AI platform access. Free accounts on ChatGPT, Perplexity and Google Gemini. You will use these for Stages 4 and 5 testing.
A spreadsheet for recording findings is essential. We recommend a simple structure: one tab per stage, columns for finding, severity (critical / high / medium / low), page or URL affected, recommended action, and status. This becomes your implementation tracker once the audit is complete.
Stage 1: Crawlable — Can Search Engines Access Your Content?
This is the absolute baseline. If search engines cannot discover and access your pages, nothing else in the audit matters.
Run a Technical Crawl
Open Screaming Frog, enter your homepage URL, and run a crawl. Once complete, you are looking for several things. First, compare the number of URLs found against what you expect. If your site has 200 pages but the crawler only finds 80, you have orphaned content — pages with no internal links pointing to them that search engines may not discover. If the crawler finds 2,000 URLs but your site only has 200 pages, you likely have index bloat from parameter URLs, tag archives, internal search results or CMS-generated duplicate paths.
Check the response codes tab. Every page that should be accessible to users and search engines should return a 200 status. Catalogue any 3xx redirects — look for redirect chains where one redirect points to another redirect rather than the final destination. Flag all 4xx errors, particularly on pages that should exist. Check for 5xx server errors that indicate intermittent availability problems.
Check Robots.txt
Navigate to yourdomain.com/robots.txt in your browser. Read the disallow rules carefully. Are any important sections of your site blocked? Common accidental blocks include staging directories that were never removed, entire subdirectories containing important content, and overly broad wildcard rules that catch more than intended. While you are here, check whether AI crawlers are blocked — GPTBot, ClaudeBot, PerplexityBot, Google-Extended. If they are disallowed, AI platforms cannot access your content at all.
Verify Your XML Sitemap
Check yourdomain.com/sitemap.xml. Does it exist? Does it load without errors? Does it include all your important pages and exclude pages you do not want indexed (thank-you pages, admin pages, parameter URLs)? Cross-reference a sample of URLs from the sitemap against your Screaming Frog crawl to ensure consistency. Submit the sitemap through Google Search Console if you have not already — check the Sitemaps report for any submission errors.
Check HTTPS and Security
Navigate to the HTTP version of your site (http://yourdomain.com). Does it redirect to HTTPS with a 301? Check for mixed content warnings in your browser’s developer console — HTTP resources loaded on HTTPS pages can trigger security warnings and undermine trust signals. Verify your SSL certificate is valid and not expiring soon.
Stage 2: Indexable — Is Your Content in Google’s Index?
A page can be crawlable but not indexed. Google discovers millions of pages it chooses not to include in its index because it considers them low-value, duplicate, or otherwise unworthy of serving to users. This stage identifies indexation gaps.
Review Index Coverage
In Google Search Console, navigate to the Pages report (previously Index Coverage). This shows every URL Google knows about, categorised by status: indexed, not indexed (with a reason), and excluded. Focus on the “not indexed” URLs. Common reasons include “Crawled – currently not indexed” (Google found the page but considered it insufficient quality to index), “Duplicate without user-selected canonical” (Google identified near-duplicate content), and “Excluded by noindex tag” (check whether this is intentional).
For each category, review a sample of URLs. Are legitimate, important pages showing up as not indexed? If so, the reason code tells you the likely cause — thin content, duplication, technical blocking. These are your highest-priority indexation fixes.
Check for Index Bloat
Run a site:yourdomain.com search in Google. Compare the result count against the number of pages you actually want indexed. If Google shows significantly more results than you have meaningful pages, you have index bloat — low-value URLs consuming crawl budget and diluting your site’s quality signals. Common culprits include parameter URLs from filters and sorting, paginated archives, tag and category pages with minimal unique content, and internal search results pages.
Identify Content Cannibalisation
In Google Search Console’s Performance report, search for your most important target keywords. For each keyword, check how many different pages from your site appear in search results. If multiple pages are competing for the same keyword, they are cannibalising each other — splitting authority and confusing Google about which page to rank. This is one of the most commercially damaging issues we find in audits, and it is often invisible without deliberate investigation. Resolution typically involves consolidating the competing pages, differentiating their targeting, or using canonical tags to signal the preferred version. Our work with Pro2col involved resolving exactly this kind of cannibalisation across 146 blog posts — the impact on rankings was immediate and substantial.
Stage 3: Rankable — Is Your Content Competitive?
With crawlability and indexation addressed, this stage assesses whether your content is good enough to rank competitively — the quality, structure and authority signals that determine your position in search results.
Assess Core Web Vitals
Run your top 10–15 pages through PageSpeed Insights. For each page, check the three Core Web Vitals metrics: Largest Contentful Paint (LCP — should be under 2.5 seconds), Interaction to Next Paint (INP — should be under 200 milliseconds), and Cumulative Layout Shift (CLS — should be under 0.1). Pages failing these thresholds are at a ranking disadvantage and likely have user experience issues that hurt conversion rates too. Note which pages fail and which metrics are the culprit — the fix differs depending on whether the problem is server response time, render-blocking resources, image sizes, or layout instability from dynamic content loading.
Review On-Page Fundamentals
Using your Screaming Frog crawl data, check title tags for uniqueness, appropriate length (50–60 characters) and keyword relevance. Flag duplicates, missing titles and titles that are too generic. Check that every page has a single H1 and that heading hierarchy follows a logical structure (H1 → H2 → H3, no skipped levels). Review meta descriptions for uniqueness and compelling copy — while they do not directly affect rankings, they significantly influence click-through rates from search results.
Evaluate Content Quality
For your 10–15 most important pages, read them critically as if you were a potential customer who found them via search. Does the content genuinely answer the question or need that brought the person to the page? Does it demonstrate real expertise — specific examples, data, frameworks, insights that a non-expert could not produce? Is it comprehensive enough that the visitor does not need to go elsewhere for the answer? Or is it thin, vague, padded with generic statements that any business could make?
Compare each page to the top three ranking competitors for its target keyword. Open the competitor pages in adjacent tabs and honestly assess: is your content better, equivalent, or worse? What do they cover that you do not? What do they provide — data, examples, tools, depth — that your page lacks? This competitive content comparison is one of the most revealing exercises in the entire audit. It often shows that pages you thought were “good enough” are significantly outclassed by what competitors have published.
Audit Internal Linking
In Screaming Frog, check the inlinks count for your most important pages. These pages should receive more internal links than secondary pages — internal links distribute authority and signal importance. Identify orphaned pages (zero internal links), important pages with too few links, and anchor text quality. Are your internal link anchor texts descriptive and relevant (“our SEO audit checklist” rather than “click here”)? If you use Screaming Frog’s free tier, the crawl visualisation gives you a basic picture of your site’s internal linking architecture.
Check Backlinks (Free Tier)
Ahrefs offers a free backlink checker (ahrefs.com/backlink-checker) and Google Search Console’s Links report shows your top linked pages and top linking sites. These give you a directional picture of your backlink profile without a paid subscription. Check how many referring domains link to your site, which pages attract the most links, and whether the linking sites are relevant and authoritative. For a more complete picture, paid tools like Ahrefs or Semrush are necessary — but the free data is sufficient to identify whether your authority baseline is strong, moderate or weak relative to what you would expect for your market. If your most linked page is your homepage and nothing else has attracted external links, that tells you something important about your content’s link-worthiness.
Stage 4: Referenceable — Can AI Systems Extract and Cite Your Content?
This is where the audit transitions from traditional SEO into AI visibility territory. The checks here assess whether your content is structured in a way that AI systems can parse, evaluate and cite — and this is also where self-assessment starts to reach its limits.
Test Structured Data Coverage
Run your homepage, your most important service page and your best content page through Google’s Rich Results Test. What structured data is present? At minimum, you should have Organisation schema on your homepage (with name, description, URL, logo, sameAs links to your social profiles and directories). Service or content pages benefit from FAQPage schema (if they include Q&A content), HowTo schema (if they describe a process), and Article schema with author attribution.
The key measure is not just presence but completeness. An Organisation schema that only includes your name and URL is far less useful than one that includes description, foundingDate, areaServed, knowsAbout, sameAs, and employee references. The more complete your structured data, the more confidently AI systems can identify and categorise your entity. If your structured data is minimal or absent, this is a significant gap for both traditional SEO rich results and AI visibility.
Assess Content Citability
Read your most important pages with a specific question in mind: if an AI system were answering a question about your topic, could it extract a specific, attributable fact from this page? Content that makes vague claims — “we provide excellent service”, “our team has decades of experience” — is not citable because there is nothing specific for an AI to reference. Content that includes concrete data (“our audit identified 1,100 duplicate URLs across 146 blog posts”), named frameworks, specific processes and evidence-based claims is citable because the AI can attribute discrete facts to your source.
This assessment is subjective, which is one reason it benefits from expert evaluation. But even a rough self-assessment reveals whether your content leans towards marketing language or expert substance. Pages heavy on the former and light on the latter need restructuring for AI visibility.
Check Entity Signal Consistency
Search for your brand name and check how it is described across your website, Google Business Profile, LinkedIn company page, relevant industry directories and any other platforms where you have a presence. Is the description of your expertise consistent? Does your business name appear in the same format everywhere? Inconsistencies — describing yourself as “digital marketing agency” on one platform and “SEO consultancy” on another — confuse AI models that are trying to build a coherent entity profile. The goal is a consistent entity signal across every touchpoint.
Stage 5: Recommendable — Does AI Actually Cite You?
The final stage tests what actually matters: when potential customers ask AI platforms questions in your space, does your brand appear in the answers?
Run AI Platform Queries
Open ChatGPT, Perplexity and Gemini. For each platform, ask 10–15 questions that your potential customers would ask. Include your brand name queries (“What is [your brand]?”, “What does [your brand] do?”), category queries (“Who are the best [your service] providers in [your area]?”, “What should I look for in a [your service]?”), and comparison queries (“Compare [your brand] with [competitor]”, “What are the leading [your product] solutions?”). Record the results in your spreadsheet: which platform, what query, whether you were cited, which competitors were cited, and any inaccuracies in how your brand was described.
Identify Patterns
Look for patterns in your results. Are you cited on some platforms but not others? For some query types but not others? Are specific competitors consistently cited where you are not? These patterns point to the root cause. If you appear on Perplexity but not ChatGPT, the issue may be entity authority rather than content quality. If you are cited for brand queries but not category queries, your topical authority needs strengthening. If a specific competitor dominates, examine what they are doing differently — their content structure, schema, entity signals.
The Limits of Self-Assessment
This is where honesty matters. You can run the queries yourself and record the results, but the strategic interpretation — understanding why AI platforms cite competitor A over competitor B, identifying the specific content and entity interventions that would change the outcome, quantifying the commercial impact of the gaps — requires expertise and competitive intelligence that a self-audit cannot fully provide. Testing 10–15 queries gives you a directional signal. A professional GEO audit tests 50+ queries systematically, analyses citation patterns across platforms, builds a competitive citation matrix, and maps findings to a prioritised remediation plan with estimated impact.
Think of it this way: Stages 1–3 are like checking your own blood pressure and weight. Stage 4 is like running basic blood tests. Stage 5 is like interpreting those results in the context of your family history, lifestyle and risk factors — that is where professional judgment adds the most value. The self-assessment tells you whether something needs attention. The professional assessment tells you exactly what, how urgently, and what the fix is worth.
Turning Findings into a Prioritised Plan
Once you have worked through all five stages, your spreadsheet should contain a list of findings across multiple categories. The temptation is to try to fix everything at once — resist it. Instead, categorise each finding by the standard priority framework: critical (actively harming performance, fix this week), high impact (measurable improvement expected within 30 days), strategic (longer-term improvements that compound over 60–90 days), and monitoring (not broken, but track over time).
Within each priority category, order by effort. Quick wins — high-impact changes that require minimal effort — should always come first. Fixing a robots.txt block that is hiding an entire section of your site from AI crawlers takes five minutes and could transform your AI visibility. Consolidating cannibalised content across three competing pages requires more effort but typically delivers significant ranking improvements. Building comprehensive structured data across your entire site is a larger project but compounds over time. The sequencing matters: a robots.txt fix this afternoon delivers more value than a structured data project that sits on the backlog for three months.
If the self-audit reveals that your Stages 1–3 health is solid but Stages 4–5 show significant gaps, you know exactly where professional expertise would add the most value. If Stages 1–2 are problematic, fix those first — they are the foundation everything else depends on. Our what to expect guide explains the professional audit tiers and what each delivers, and the cost guide provides transparent UK pricing.
Whether you implement the findings yourself or bring in support, the self-audit gives you something invaluable: clarity about where you actually stand. Get in touch if you want to discuss your findings with us — the initial consultation is free. Or try our free search visibility score tool for an automated baseline to complement your manual assessment.