Agent package

Capture useful UI screenshots from Node.js.

@domshot/agent gives agents and automation tools a camera for the web page. It can capture one known selector, find a target like "pricing card", inspect candidates, plan shots, and create polished PNG packs.

The mental model

DOMShot should not be the brain of your product or writing workflow. The calling agent decides the story, the section, and which visual supports that story. DOMShot provides the eyes and camera: it can inspect visible UI, find likely elements, capture them, style them, and return confidence, warnings, and file paths.

Eyes

Inspect the page, score visible candidates, and explain why an element looks useful.

Camera

Capture one element, several similar elements, or a planned shot pack as PNG files.

Evidence

Return selectors, dimensions, confidence, warnings, contact sheets, and reports for review.

Install

Install the npm package in the project where the agent or script will run. Playwright Chromium is the local browser DOMShot uses for normal public page captures. If the package is not visible on npm yet, the public release is still pending approval.

npm install @domshot/agent
npx playwright install chromium

Expected result: Node can import @domshot/agent, and CLI/MCP commands can start a local browser.

Selector mode vs target mode

Use selector when you know the exact CSS selector. This is the precise path. Use target when the agent only knows the meaning of the thing it wants, such as pricing card, download button, or balance card. If both are provided, selector wins.

Mode	Use when	Tradeoff
`selector`	You know exactly what element should be captured.	Most exact, but the caller must know the page structure.
`target`	The agent knows what it wants but not the CSS selector.	More intuitive, but dense pages may need inspect, kinds, steps, or retry.

Capture one exact element

Use this when you already know the selector. DOMShot opens the URL, waits for that element, captures it, applies the selected preset, and writes the PNG.

import { captureDomshot } from "@domshot/agent";

const result = await captureDomshot({
  url: "https://example.com",
  selector: "h1",
  output: "artifacts/example-heading.png",
  preset: "floating"
});

console.log(result.path, result.width, result.height);

Expected output: artifacts/example-heading.png and a result object with path, width, height, selector, fallback status, and warnings if anything needed review.

Capture by target

Use this when an agent can describe the element in normal language. DOMShot scans visible elements, scores likely matches using DOM and visual signals, chooses the best candidate, and then captures that resolved selector.

import { captureDomshot } from "@domshot/agent";

const result = await captureDomshot({
  url: "https://example.com",
  target: "pricing card",
  output: "artifacts/pricing-card.png",
  background: "transparent",
  shadow: "soft",
  aspectRatio: "1:1"
});

console.log(result.selector, result.target?.score);

Expected output: a styled square PNG, plus the selector DOMShot chose and sanitized target metadata. If the selected element is not right, inspect first or retry with a more specific target.

Use an existing Playwright page

Use this when your automation already controls a Playwright page. DOMShot reuses that page instead of opening a new browser. This is useful when your script has already logged in, clicked a tab, or prepared page state.

import { chromium } from "playwright";
import { captureElement } from "@domshot/agent";

const browser = await chromium.launch();
const page = await browser.newPage();

try {
  await page.goto("https://example.com");
  await captureElement(page, {
    selector: "h1",
    output: "artifacts/heading.png",
    background: "transparent"
  });
} finally {
  await browser.close();
}

Next step: if the page is private or logged in outside Playwright, use CDP or profile mode from the CLI/MCP troubleshooting docs.

Plan before capture

A plan is a dry run. It helps the agent decide what should be captured before writing final PNG files. Use it for a feature section, blog post, docs page, pricing page, or homepage section when the agent needs to compare candidates and avoid weak visuals.

import { createDomshotPlan } from "@domshot/agent";

const plan = await createDomshotPlan({
  url: "https://example.com",
  intent: "homepage feature section",
  target: "feature cards",
  count: 4,
  output: ".domshot/plan.json",
  style: "auto"
});

console.log(plan.shots.map((shot) => ({
  selector: shot.selector,
  suggestedOutput: shot.suggestedOutput,
  decision: shot.decision.decision,
  why: shot.decision.why,
  risks: shot.decision.risks
})));

Expected output: a JSON plan with planned shots, selectors, suggested file names, style suggestions, decisions like use or review, and retry advice. It does not contain screenshot payloads.

Create a shot pack

A shot pack is the higher-level workflow. Use it when the agent needs final image assets for a homepage section, blog post, docs block, pricing page, or social post. It inspects the page, chooses useful candidates, captures PNGs, writes a contact sheet, and produces a sanitized report.

import { createDomshotShotPack } from "@domshot/agent";

const pack = await createDomshotShotPack({
  url: "https://example.com",
  intent: "homepage feature section",
  target: "feature cards",
  count: 4,
  outputDir: "artifacts/homepage-shot-pack",
  style: "auto"
});

console.log(pack.contactSheet, pack.report);

Expected output: multiple PNG files, a contact sheet for visual comparison, and a report.json with selected reasons, rejected candidates, confidence, warnings, and style choices.

Plan vs shot pack

Use a plan when the agent should think before writing images. Use a shot pack when you want DOMShot to inspect, plan, capture, and write the full set in one workflow.

Plan

No final PNGs required. Good for review, debugging, and deciding what should be captured.

Shot pack

Writes polished PNGs, a contact sheet, and a report. Good for content and landing-page assets.

Capture set

Captures several similar elements, like pricing cards, when you already know the group you want.

Useful options

These are the options most agents need first. Styling options change the finished PNG; discovery options help DOMShot find or prepare the right element.

Option	Use
`selector`	Exact CSS selector to capture.
`target`	Plain-language target such as `pricing card` or `download button`.
`intent`	Purpose for recommendations, plans, and shot packs.
`background`	`transparent`, `aurora`, `ocean`, `sunset`, `graphite`, `paper`, or `onyx`.
`shadow`	`none`, `soft`, `deep`, or `crisp`.
`aspectRatio`	`auto`, `1:1`, `4:3`, `16:9`, and other supported ratios.
`outputWidth`	Final PNG width for polished marketing or docs assets.
`steps`	Click, wait, scroll, or delay before inspecting or capturing.

If a page has a cookie banner, hidden tab, lazy content, or modal, use steps so the page is in the right state before DOMShot inspects it.

Result metadata

Results include file paths, dimensions, selector, fallback status, warnings, and sanitized target/candidate metadata. DOMShot does not return cookies, tokens, raw page text, or base64 image payloads in planning metadata.

Use metadata as evidence, not as a final creative decision. Confidence, warnings, rejected candidates, and contact sheets help the agent decide whether the screenshot supports the story or whether it should retry with another target, kind, step, or selector.

CLI docs MCP docs Troubleshooting