# rehype-harden

A rehype plugin that ensures that untrusted markdown does not contain images from and links to unexpected origins.

This is particularly important for markdown returned from [LLMs in AI agents which might have been subject to prompt
injection](https://vercel.com/blog/building-secure-ai-agents).

## Secure prefixes

This package validates URL prefixes and URL origins. Prefix allow-lists can be circumvented
with open redirects, so make sure to make the prefixes are specific enough to avoid such attacks.

E.g. it is more secure to allow `https://example.com/images/` than it is to allow all of
`https://example.com/` which may contain open redirects.

Additionally, URLs may contain path traversal like `/../`. This package does not resolve these.
It is your responsibility that your web server does not allow such traversal.

## Features

- 🔒 **URL Filtering**: Blocks links and images that don't match allowed URL prefixes
- 🔧 **Drop-in**: Works with any rehype-compatible pipeline

## Installation

```bash
npm install rehype-harden
# or
yarn add rehype-harden
# or
pnpm add rehype-harden
```

## Quick Start

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remarkRehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://github.com/", "https://docs."],
    allowedImagePrefixes: ["https://via.placeholder.com", "/"],
  })
  .use(/* whatever compiler you want, eg hast-to-jsx-runtime or hast-to-svelte */);
```

## API

### Args

#### `defaultOrigin?: string`

- The origin to resolve relative URLs against
- Required when `allowedLinkPrefixes` or `allowedImagePrefixes` are provided (except when using wildcard `["*"]`)
- When using wildcard `["*"]` without `defaultOrigin`, relative URLs (e.g., `/path`, `./page`) are allowed and preserved as-is
- Example: `"https://mysite.com"`

#### `allowedLinkPrefixes?: string[]`

- Array of URL prefixes that are allowed for links
- Links not matching these prefixes will be blocked and shown as `[blocked]`
- Use `"*"` to allow all URLs (disables filtering. However, `javascript:` and `data:` URLs are always disallowed)
- Default: `[]` (blocks all links)
- Example: `['https://github.com/', 'https://docs.example.com/']` or `['*']`

#### `allowedImagePrefixes?: string[]`

- Array of URL prefixes that are allowed for images
- Images not matching these prefixes will be blocked and shown as placeholders
- Use `"*"` to allow all URLs (disables filtering. However, `javascript:` and `data:` URLs are always disallowed unless `allowDataImages` is enabled)
- Default: `[]` (blocks all images)
- Example: `['https://via.placeholder.com/', '/']` or `['*']`

#### `allowDataImages?: boolean`

- When set to `true`, allows `data:image/*` URLs (base64-encoded images) in image sources
- This is useful for scenarios where images are embedded directly in markdown (e.g., documents converted from PDF or .docx)
- Only `data:image/*` URLs are allowed; other `data:` URLs (like `data:text/html`) remain blocked for security
- `data:` URLs are never allowed in links, regardless of this setting
- Default: `false` (blocks all data: URLs)
- Example: `true`

#### `allowedProtocols?: string[]`

- Array of custom URL protocols that are allowed in links
- Useful for deep links to applications (e.g., `tel:`, `mailto:`, `postman:`, `vscode:`, `slack:`)
- Use `"*"` to allow all protocols that can be parsed as valid URLs
- Dangerous protocols (`javascript:`, `data:`, `file:`, `vbscript:`) are **always blocked** regardless of this setting
- Default: `[]` (only allows built-in safe protocols: `https:`, `http:`, `mailto:`, `irc:`, `ircs:`, `xmpp:`, `blob:`)
- Example: `['tel:', 'postman:', 'vscode:']` or `['*']`

#### `linkBlockPolicy?: BlockPolicyType`

- Controls how blocked links are handled
- `"indicator"` (default): Renders as plain text with `[blocked]` suffix and the blocked URL in a title attribute
- `"text-only"`: Renders just the link text without any indicator or URL
- `"remove"`: Removes the blocked link entirely from the output

#### `imageBlockPolicy?: BlockPolicyType`

- Controls how blocked images are handled
- `"indicator"` (default): Renders as a placeholder span with `[Image blocked: {alt text}]`
- `"text-only"`: Renders just the alt text (images with no alt text are removed)
- `"remove"`: Removes the blocked image entirely from the output

#### `blockedImageClass?: string`

- When an image is blocked with the `"indicator"` policy, the replacement span includes this class for styling.

#### `blockedLinkClass?: string`

- Same as above, but for blocked links using the `"indicator"` policy.

## Examples

### Basic Usage with Default Blocking

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

// Blocks all external links and images by default
const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden) // No options = blocks everything
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);
```

### Allow Specific Domains

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: [
      "https://github.com/",
      "https://docs.github.com/",
      "https://www.npmjs.com/",
    ],
    allowedImagePrefixes: [
      "https://via.placeholder.com/",
      "https://images.unsplash.com/",
      "/", // Allow relative images
    ],
  })
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);
```

### Relative URL Handling

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://mysite.com/"],
    allowedImagePrefixes: ["https://mysite.com/"],
  })
  .use(/* your compiler */);

const markdownWithRelativeUrls = `
[Relative Link](/internal-page)
![Relative Image](/images/logo.png)
`;

const result = processor.processSync(markdownWithRelativeUrls);
```

### Allow All URLs (Wildcard)

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedLinkPrefixes: ["*"],
    allowedImagePrefixes: ["*"],
  })
  .use(/* your compiler */);

const markdownWithExternalUrls = `
[Any Link](https://anywhere.com/link)
![Any Image](https://untrusted-site.com/image.jpg)
[Relative Link](/internal-page)
`;

const result = processor.processSync(markdownWithExternalUrls);
// All URLs are allowed, including relative URLs like /internal-page
```

**Note**: Using `"*"` disables URL filtering entirely. Only use this when you trust the markdown source. When using wildcard without `defaultOrigin`, relative URLs are preserved as-is in the output.

### Allow Base64 Images

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedImagePrefixes: ["https://mysite.com/"],
    allowDataImages: true, // Enable base64 images
  })
  .use(/* your compiler */);

const markdownWithBase64Images = `
![Base64 Image](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==)
![Regular Image](https://mysite.com/image.png)
`;

const result = processor.processSync(markdownWithBase64Images);
```

**Note**: This is particularly useful when converting documents from formats like PDF or .docx where images are embedded as base64. Only `data:image/*` URLs are allowed; other data: URLs remain blocked for security.

### Blob URLs

Blob URLs (`blob:`) are automatically allowed by default for both links and images. These are browser-generated URLs that reference in-memory objects and are commonly used for:
- Previewing user-uploaded files before upload
- Client-side image manipulation
- Displaying generated content

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedImagePrefixes: ["https://mysite.com/"],
  })
  .use(/* your compiler */);

const markdownWithBlobUrl = `
![Preview](blob:https://example.com/40a5fb5a-d56d-4a33-b4e2-0acf6a8e5f64)
`;

const result = processor.processSync(markdownWithBlobUrl);
// The blob: URL will be allowed even without being in allowedImagePrefixes
```

**Note**: Blob URLs are safe because they can only reference content already loaded in the browser's memory. They cannot be used to exfiltrate data or load external resources.

### Custom Protocol Support

Enable custom protocols for deep linking to applications and services:

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedProtocols: ['tel:', 'mailto:', 'postman:', 'vscode:', 'slack:'],
  })
  .use(/* your compiler */);

const markdownWithCustomProtocols = `
[Call us](tel:+1234567890)
[Email support](mailto:support@example.com)
[Open in Postman](postman://open/collection)
[View in VS Code](vscode://file/path/to/file.ts)
[Join Slack](slack://channel?id=C123456)
`;

const result = processor.processSync(markdownWithCustomProtocols);
// All these custom protocol links will be allowed
```

**Common use cases:**
- **`tel:`** - Phone number links that open the dialer on mobile devices
- **`mailto:`** - Email links (allowed by default, but shown here for completeness)
- **`sms:`** - SMS/text message links
- **`postman:`**, **`vscode:`**, **`slack:`** - Deep links to desktop applications
- **Custom app protocols** - Links to your own Electron or native applications

You can also use the wildcard to allow any custom protocol:

```ts
const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    allowedProtocols: ['*'], // Allow all protocols
  })
  .use(/* your compiler */);
```

**Security Note**: Even with `allowedProtocols: ['*']`, dangerous protocols like `javascript:`, `data:`, `file:`, and `vbscript:` are **always blocked** for security. Custom protocols are safe because they trigger OS-level protocol handlers and don't execute in the browser context.

### Block Policies

Control how blocked content is handled instead of the default `[blocked]` indicator:

```ts
import { harden, BlockPolicy } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remarkRehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://trusted.com/"],
    allowedImagePrefixes: ["https://trusted.com/"],
    linkBlockPolicy: "text-only", // Show link text only, no [blocked] indicator
    imageBlockPolicy: "remove", // Remove blocked images entirely
  })
  .use(/* your compiler */);
```

Available policies: `"indicator"` (default), `"text-only"`, `"remove"`.

### Custom Styling for Blocked Content

```ts
import { harden } from "rehype-harden";
import remarkParse from "remark-parse";
import remarkRehype from "remark-rehype";
import { unified } from "unified";

const processor = unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(harden, {
    defaultOrigin: "https://mysite.com",
    allowedLinkPrefixes: ["https://trusted.com/"],
    allowedImagePrefixes: ["https://trusted.com/"],
    blockedLinkClass: "blocked-link",
    blockedImageClass: "blocked-image",
  })
  .use(/* your compiler */);

const result = processor.processSync(markdownContent);
```

## Security Features

### URL Filtering

- **Links**: Filters `href` attributes in `<a>` elements
- **Images**: Filters `src` attributes in `<img>` elements
- **Relative URLs**: Properly resolves and validates relative URLs against `defaultOrigin`
- **Path Traversal Protection**: Normalizes URLs to prevent `../` attacks
- **Wildcard Support**: Use `"*"` prefix to disable filtering (only when markdown is trusted)
- **Prefix Matching**: Validates that URLs start with allowed prefixes and have matching origins

### Blocked Content Handling

Behavior is configurable per element type via `linkBlockPolicy` and `imageBlockPolicy`:

- **`"indicator"`** (default): Blocked links show a `[blocked]` suffix; blocked images show `[Image blocked: {alt}]`
- **`"text-only"`**: Outputs just the link text or image alt text with no indicator
- **`"remove"`**: Removes blocked elements entirely from the output

### Attack Prevention

- **XSS Prevention**: Blocks `javascript:`, `data:`, `vbscript:`, `file:` and other dangerous protocols (always, regardless of configuration)
- **Redirect Protection**: Prevents unauthorized redirects to malicious sites
- **Tracking Prevention**: Blocks unauthorized image tracking pixels
- **Domain Spoofing**: Validates full URLs, not just domains
- **Safe Protocols**: Allows safe protocols including `https:`, `http:`, `mailto:`, `blob:`, and others while blocking dangerous ones
- **Custom Protocols**: Optional support for custom protocols (e.g., `tel:`, `postman:`, `vscode:`) with explicit opt-in via `allowedProtocols`

## Testing

The package includes comprehensive tests covering:

- Basic markdown rendering
- URL filtering for links and images
- Relative URL handling
- Security bypass prevention
- Edge cases and malformed URLs
- TypeScript type safety

Run tests:

```bash
pnpm test
```

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

MIT License - see the [LICENSE](LICENSE) file for details.

## Security

If you discover a security vulnerability, please send an e-mail to <security@vercel.com>.