webtomcp

2026-05-17 · KY · ~5 min read

How to Turn Your Website Into an MCP Server (Step-by-Step)

Two paths reach the same outcome — a website you can query as an MCP server from Claude, ChatGPT, Cursor, or any other MCP client. Path A: self-host the crawler, vector index, and MCP server yourself. Path B: use a managed service. This post walks through both honestly, including the architectural pieces you can't skip on the self-host route, so you can pick with eyes open.

What "turning a website into an MCP server" actually requires

An MCP server that lets an AI answer questions about a website needs four pieces, regardless of who builds it:

  1. A crawler that fetches every URL on the site (respecting robots.txt), handles JavaScript-rendered pages, dedups duplicates, and re-crawls on a schedule.
  2. An extractor + chunker that converts each HTML page into clean Markdown, splits it into retrieval-sized chunks (typically 500–1500 chars, with overlap), and discards boilerplate.
  3. A hybrid search index — both vector (semantic similarity via embeddings) and keyword (BM25 / FTS5) — so the AI can find relevant chunks across the whole site for any query.
  4. An MCP server endpoint that speaks the Model Context Protocol over streamable HTTP, accepts search / fetch tool calls from the client, runs them against the index, and returns chunks with source URLs.

The MCP layer itself is the easy part — it's a small wrapper. The hard parts are crawling reliably, keeping the index fresh, and the search quality.

Path A — Self-host the whole stack

Honest assessment: a working MVP is 2–4 days for an experienced developer. A production-grade version that handles JS rendering, bot challenges, content updates, and search-quality edge cases is closer to 2–4 weeks of work, plus ongoing maintenance.

What you'll wire together

Real-world pitfalls you'll hit

Cost shape

Per 1,000-page site, expect roughly: $1–$5 for the initial embedding pass, $0.50–$2 for weekly refreshes, plus your vector store and compute costs. The cost scales with total chunks (page count × ~10 chunks/page on average), not query volume.

Path B — Use a managed service (5 minutes)

If you'd rather not be the one debugging SPA detection at 2 AM, the same end state is available as a managed service. WebToMCP does all four pieces above on Cloudflare's edge: submit a URL, get an MCP endpoint URL, paste it into your AI client.

  1. Sign in at webtomcp.net with Google (free tier, no credit card).
  2. Add your website URL — your docs, store, wiki, or any public site.
  3. Wait 1–10 minutes for crawling + indexing. JS rendering is metered per plan.
  4. Copy the MCP endpoint URL from the dashboard and paste it into Claude Desktop, ChatGPT, Cursor, Gemini CLI, or Codex.

Free tier: 1 KB, 1 website, ~500 questions/month. Hobby ($9/mo): 5 KBs, 10 websites, 5,000 questions, 100 JS-rendered pages. See pricing →

Side-by-side

Aspect Self-host WebToMCP (managed)
Time to first endpoint2–4 days MVP, 2–4 weeks production5 minutes
Cost per 1k-page site$1–$5 embed + infra$0 (free) or $9/mo
JS rendering, canonical, refresh, dedupYou build all of itBuilt in
CustomizationFull — your stack, your callLimited to dashboard settings
OperationsYour on-call rotationOurs

When to pick each

Pick self-host when: you have unusual indexing rules (private auth-gated content, custom chunking by document type, embedded retrieval inside your existing data pipeline), you want to own the stack end-to-end, or you already have RAG infra in production for other use cases and adding MCP is incremental.

Pick managed when: you want an MCP endpoint live today, your site is a typical content-heavy public site (docs, store, wiki, blog), you don't have an existing RAG pipeline to plug into, or your team's time is better spent on product than on crawler maintenance.

A pragmatic hybrid

Many teams ship the managed version first (5 minutes to evaluate whether MCP-querying your site moves your AI experience), then decide if the workflow is valuable enough to bring in-house. If WebToMCP's defaults work for you, there's no need to migrate. If you need something WebToMCP doesn't expose, the self-host architecture above is the same shape — you can swap us out when you outgrow us.


Questions? developer@webtomcp.net. Or sign in with Google to index your own site (free).