How to Build an AI Agent That Reads the Web

February 28, 2026·4 min read·StripFeed Team

What We're Building

A simple AI agent that can read any web page and answer questions about it. The agent:

Takes a URL and a question
Fetches the page as clean Markdown (via StripFeed)
Sends the Markdown + question to an LLM
Returns the answer

No headless browsers. No HTML parsing. No scraping infrastructure.

Prerequisites

A StripFeed API key (free, takes 30 seconds)
An OpenAI or Anthropic API key
Node.js 18+ or Python 3.9+

TypeScript Implementation

Install the dependencies:

npm install stripfeed openai

Build the agent:

import StripFeed from "stripfeed";
import OpenAI from "openai";

const sf = new StripFeed("sf_live_your_key");
const openai = new OpenAI();

async function askAboutUrl(url: string, question: string) {
  // Step 1: Fetch the page as clean Markdown
  const page = await sf.fetch(url);

  // Step 2: Send to LLM with the question
  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: "Answer the user's question based on the provided web page content. Be concise and accurate.",
      },
      {
        role: "user",
        content: `## Web Page Content\n\n${page.markdown}\n\n## Question\n\n${question}`,
      },
    ],
  });

  return response.choices[0].message.content;
}

// Usage
const answer = await askAboutUrl(
  "https://docs.anthropic.com/en/docs/about-claude/models",
  "What is the context window size for Claude Sonnet 4?"
);
console.log(answer);

That's it. The sf.fetch() call handles content extraction, HTML-to-Markdown conversion, and token counting. You get clean content ready for your LLM.

Python Implementation

pip install stripfeed openai

from stripfeed import StripFeed
from openai import OpenAI

sf = StripFeed("sf_live_your_key")
client = OpenAI()

def ask_about_url(url: str, question: str) -> str:
    # Step 1: Fetch the page as clean Markdown
    page = sf.fetch(url)

    # Step 2: Send to LLM with the question
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Answer the user's question based on the provided web page content. Be concise and accurate.",
            },
            {
                "role": "user",
                "content": f"## Web Page Content\n\n{page.markdown}\n\n## Question\n\n{question}",
            },
        ],
    )

    return response.choices[0].message.content

# Usage
answer = ask_about_url(
    "https://docs.anthropic.com/en/docs/about-claude/models",
    "What is the context window size for Claude Sonnet 4?"
)
print(answer)

Adding Batch Research

Your agent often needs to read multiple pages. StripFeed's batch endpoint processes up to 10 URLs in parallel:

async function researchTopic(urls: string[], question: string) {
  // Fetch all pages in one request
  const results = await sf.batch(urls);

  // Combine all content
  const combinedContent = results
    .filter((r) => r.ok)
    .map((r, i) => `## Source ${i + 1}: ${r.url}\n\n${r.markdown}`)
    .join("\n\n---\n\n");

  const response = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      {
        role: "system",
        content: "Answer the question using the provided sources. Cite which source you got each piece of information from.",
      },
      {
        role: "user",
        content: `${combinedContent}\n\n## Question\n\n${question}`,
      },
    ],
  });

  return response.choices[0].message.content;
}

Targeting Specific Content

Some pages have a lot of sidebar content, ads, or related articles that aren't relevant. Use CSS selectors to extract only what you need:

// Only get the main article content
const page = await sf.fetch("https://example.com/blog/post", {
  selector: "article.post-content",
});

// Only get the pricing table
const pricing = await sf.fetch("https://example.com/pricing", {
  selector: ".pricing-table",
});

This reduces tokens even further and gives your agent exactly the content it needs.

Controlling Token Budget

If you're working with a limited context window or want to cap costs, use maxTokens:

const page = await sf.fetch("https://example.com/long-article", {
  maxTokens: 3000,
});

if (page.truncated) {
  console.log("Content was truncated to fit token limit");
}

StripFeed truncates at paragraph boundaries, so the content always ends cleanly.

Caching for Repeated Access

If your agent reads the same documentation pages frequently, caching saves both time and API calls:

// Cache for 12 hours (Pro plan)
const page = await sf.fetch("https://docs.example.com/api-reference", {
  ttl: 43200,
});

The first request fetches and caches. Subsequent requests within the TTL return instantly from cache.

What's Next

This basic pattern (fetch as Markdown, send to LLM) is the foundation for more complex agent workflows:

RAG pipelines: Ingest web content into your vector database with clean Markdown instead of noisy HTML
Content monitoring: Periodically fetch pages and detect changes
Competitive analysis: Batch-fetch competitor pages and compare features
Documentation Q&A: Build a chatbot that answers questions from any documentation site

Get your free API key and start building.