# SideGuy Solutions — robots.txt # AI crawlers explicitly welcomed (Rodrigo Stockebrand AEO Play 18, 2026-05-09) # Brands that moved from default-allow to explicit allow + llms.txt gained # 48% more LLM-referred traffic within one quarter (Cloudflare AI Crawler Report 2025). # ─── AI Search & Chat Crawlers (explicit allow) ─── # OpenAI / ChatGPT / SearchGPT User-agent: GPTBot Allow: / Disallow: /data/ Disallow: /command-center/ Disallow: /hub-footer-snippet.html Disallow: /build-status.html User-agent: ChatGPT-User Allow: / Disallow: /data/ Disallow: /command-center/ User-agent: OAI-SearchBot Allow: / Disallow: /data/ Disallow: /command-center/ # Anthropic / Claude User-agent: ClaudeBot Allow: / Disallow: /data/ Disallow: /command-center/ Disallow: /hub-footer-snippet.html User-agent: Claude-User Allow: / Disallow: /data/ Disallow: /command-center/ User-agent: Claude-Web Allow: / Disallow: /data/ Disallow: /command-center/ User-agent: anthropic-ai Allow: / Disallow: /data/ Disallow: /command-center/ # Perplexity User-agent: PerplexityBot Allow: / Disallow: /data/ Disallow: /command-center/ Disallow: /hub-footer-snippet.html User-agent: Perplexity-User Allow: / Disallow: /data/ Disallow: /command-center/ # Google AI (Gemini, Bard, AI Overviews — separate from Googlebot) User-agent: Google-Extended Allow: / Disallow: /data/ Disallow: /command-center/ # Apple Intelligence User-agent: Applebot-Extended Allow: / Disallow: /data/ Disallow: /command-center/ # Microsoft / Bing Copilot User-agent: bingbot Allow: / Disallow: /data/ Disallow: /command-center/ # Meta AI User-agent: Meta-ExternalAgent Allow: / Disallow: /data/ Disallow: /command-center/ User-agent: FacebookBot Allow: / Disallow: /data/ Disallow: /command-center/ # DuckDuckGo Assist User-agent: DuckAssistBot Allow: / Disallow: /data/ Disallow: /command-center/ # You.com User-agent: YouBot Allow: / Disallow: /data/ Disallow: /command-center/ # Amazon (Alexa AI) User-agent: Amazonbot Allow: / Disallow: /data/ Disallow: /command-center/ # Common Crawl (feeds many AI training datasets) User-agent: CCBot Allow: / Disallow: /data/ Disallow: /command-center/ # Cohere User-agent: cohere-ai Allow: / Disallow: /data/ Disallow: /command-center/ # Mistral / Mixtral User-agent: MistralAI-User Allow: / Disallow: /data/ Disallow: /command-center/ # ─── Default policy for all other crawlers ─── User-agent: * Allow: / # Block partial/snippet HTML files (non-pages) Disallow: /hub-footer-snippet.html Disallow: /x-twitter-integration-for-small-business.html Disallow: /build-status.html Disallow: /-hub.html Disallow: /8-companies-ate-73-billion-youre-still-begging-for-scraps-for-nonprofit-community-in-dallas-san-diego-faq-cost.html # Internal data + dashboards (not for indexing) Disallow: /data/ Disallow: /command-center/ # ─── DEAD PATHS — old auto-gen iterations (2026-05-11 cleanup) ─── # 1,817 URLs removed from sitemap.xml today · these paths return 404 in production # Disallow accelerates Google's "Not Found (404)" validation from PENDING → FAILED → drop-from-index Disallow: /matrix/ Disallow: /hubs/ Disallow: /auto/problem-pages/ # ─── AI-specific signal files ─── # llms.txt = curated index of canonical pages (per llmstxt.org spec) # llms-full.txt = full markdown of important pages for AI ingestion # ─── Sitemaps ─── Sitemap: https://www.sideguysolutions.com/sitemap-index.xml Sitemap: https://www.sideguysolutions.com/sitemap.xml Sitemap: https://www.sideguysolutions.com/sitemap-2.xml Sitemap: https://www.sideguysolutions.com/sitemap-3.xml Sitemap: https://www.sideguysolutions.com/sitemap-4.xml Sitemap: https://www.sideguysolutions.com/sitemap-5.xml