Back to blog

What is StripFeed?

·3 min read·StripFeed Team

The Problem: AI Agents Can't Read the Web

AI agents, RAG pipelines, and LLM workflows need to read web pages. But the web serves HTML filled with navigation, ads, scripts, tracking pixels, and formatting noise. Feeding raw HTML to an AI model wastes 60-80% of your token budget on content that adds zero value.

Cloudflare's "Markdown for Agents" solves this for sites behind Cloudflare on paid plans. That's roughly 20% of the web. The other 80% still serves raw HTML.

StripFeed bridges that gap. One API call. Any URL. Clean Markdown back.

How It Works

Send a URL to StripFeed, get back clean, token-efficient Markdown. No LLM is involved. The pipeline is deterministic and fast:

  1. Fetch the URL (with smart caching)
  2. Extract the main content using Mozilla's Readability algorithm
  3. Convert to clean Markdown
  4. Count tokens and track cost per AI model

Here's what it looks like:

curl "https://www.stripfeed.dev/api/v1/fetch?url=https://example.com/article" \
  -H "Authorization: Bearer sf_live_your_key_here"

You get back clean Markdown with useful headers:

X-StripFeed-Tokens: 1,247
X-StripFeed-Savings: 74.2%
X-StripFeed-Cache: HIT
X-StripFeed-Fetch-Ms: 89
Content-Type: text/markdown; charset=utf-8

The savings header tells you exactly how many tokens you saved compared to the raw HTML. In most cases, that's 60-80%.

Key Features

CSS Selector Extraction - Don't need the whole page? Target specific content with CSS selectors:

?url=https://example.com&selector=article.main-content

Multiple Output Formats - Get content as Markdown, JSON, plain text, or cleaned HTML:

?url=https://example.com&format=json

Built-in Caching - Results are cached for 1 hour by default. Customize with ?ttl= (up to 24 hours) or bypass with ?cache=false. Cache status is returned in the response headers.

Batch Processing - Process up to 10 URLs in a single request:

curl -X POST "https://www.stripfeed.dev/api/v1/batch" \
  -H "Authorization: Bearer sf_live_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com/page1", "https://example.com/page2"]}'

Token Counting with Cost Tracking - Every response includes accurate token counts (GPT-4o tokenizer). Set a model parameter to track costs per AI model across your dashboard.

Truncation Control - Limit output with ?max_tokens=500. StripFeed truncates at paragraph boundaries so you never get a cut-off sentence.

Who It's For

AI Agent Builders - Your agent needs to read web pages as part of its workflow. StripFeed gives it clean Markdown without the infrastructure overhead of running headless browsers or maintaining scraping code.

RAG Pipelines - Ingest web content into your vector database. Clean Markdown means better embeddings and fewer wasted tokens on noise.

LLM Workflows - Any pipeline that processes web content: summarization, analysis, data extraction, content monitoring.

MCP Integrations - StripFeed ships an MCP server that works with Claude Code, Cursor, Windsurf, and any MCP-compatible client:

npx -y @stripfeed/mcp-server

Getting Started

Option 1: curl

curl "https://www.stripfeed.dev/api/v1/fetch?url=https://example.com" \
  -H "Authorization: Bearer sf_live_your_key_here"

Option 2: TypeScript SDK

npm install stripfeed
import { StripFeed } from "stripfeed";

const sf = new StripFeed("sf_live_your_key_here");
const result = await sf.fetch("https://example.com");
console.log(result.markdown);

Option 3: Python SDK

pip install stripfeed
from stripfeed import StripFeed

sf = StripFeed("sf_live_your_key_here")
result = sf.fetch("https://example.com")
print(result.markdown)

Pricing

  • Free: 200 requests/month, 1 API key, Markdown output
  • Pro: $19/month (or $149/year) for 100K requests, unlimited API keys, all formats, batch processing, CSS selectors, and full analytics
  • Enterprise: Custom pricing for unlimited usage, dedicated infrastructure, and SLA

Ready to try it? Sign up for free and get your API key in under a minute. Or try the live demo to see it in action.