API reference (Web)

Use this doc when you need to customize Midscene's browser automation agents or review browser-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).

Action Space

PuppeteerAgent, PlaywrightAgent, and Chrome Bridge share one action space; the Midscene Agent can use these actions while planning tasks:

  • Tap — Left-click an element.
  • RightClick — Right-click an element.
  • DoubleClick — Double-click an element.
  • Hover — Hover over an element.
  • Input — Enter text with replace/append/clear modes.
  • KeyboardPress — Press a specified key (optionally focusing a target element first).
  • Scroll — Scroll from an element or screen center; supports scroll-to-top/bottom/left/right helpers.
  • DragAndDrop — Drag from one element to another.
  • LongPress — Long-press a target element with optional duration.
  • Swipe — Touch-style swipe gesture (available when enableTouchEventsInActionSpace is true).
  • ClearInput — Clear the contents of an input field.
  • Navigate — Open a URL in the current tab.
  • Reload — Reload the page.
  • GoBack — Navigate back in history.

PuppeteerAgent

Use Midscene against a Puppeteer-controlled browser when you need AI actions in your own Puppeteer workflows.

Import

import { PuppeteerAgent } from '@midscene/web/puppeteer';

Constructor

const agent = new PuppeteerAgent(page, {
  // browser-specific options...
});

Browser-specific options

In addition to the base agent options, Puppeteer exposes:

  • forceSameTabNavigation: boolean — Restrict navigation to the current tab. Default true.
  • waitForNavigationTimeout: number — Maximum wait when a step causes navigation. Default 5000 (set 0 to skip waiting).
  • waitForNetworkIdleTimeout: number — Wait for network idle between actions to reduce flakiness. Default 2000 (set 0 to skip waiting).
  • enableTouchEventsInActionSpace: boolean — Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Default false.
  • forceChromeSelectRendering: boolean — Force select elements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Puppeteer > 24.6.0.
  • customActions: DeviceAction[] — Register bespoke actions defined via defineAction so planning can call domain-specific steps.

Usage notes

Info
  • One agent per page: by default (forceSameTabNavigation: true), Midscene opens new links in the current tab for easier debugging. Set it to false if you want new tabs, and create a new agent per tab.
  • For the full list of interaction methods, see API reference (Common).

Examples

Quick start

import puppeteer from 'puppeteer';
import { PuppeteerAgent } from '@midscene/web/puppeteer';

const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.ebay.com');

const agent = new PuppeteerAgent(page, {
  actionContext: 'When a cookie dialog appears, accept it.',
});

await agent.aiAct('search "Noise cancelling headphones" and open first result');
const items = await agent.aiQuery(
  '{itemTitle: string, price: number}[], list two products with price',
);
console.log(items);

await agent.aiAssert('there is a category filter on the left sidebar');
await browser.close();

Connect to a remote Puppeteer browser

import puppeteer from 'puppeteer';
import { PuppeteerAgent } from '@midscene/web/puppeteer';

const browser = await puppeteer.connect({
  browserWSEndpoint: process.env.REMOTE_CDP_URL!,
});

const [page = await browser.newPage()] = await browser.pages();
const agent = new PuppeteerAgent(page, {
  waitForNetworkIdleTimeout: 0,
});

await agent.aiAct('open https://example.com and click the login button');
await agent.destroy();
await browser.disconnect();

See also

PlaywrightAgent

Use Midscene inside a Playwright browser for AI-driven testing or automation alongside your Playwright flows.

Import

import { PlaywrightAgent } from '@midscene/web/playwright';

Constructor

const agent = new PlaywrightAgent(page, {
  // browser-specific options...
});

Browser-specific options

  • forceSameTabNavigation: boolean — Keep automation inside the active tab. Default true.
  • waitForNavigationTimeout: number — Wait time for navigation completion. Default 5000 (set 0 to disable).
  • waitForNetworkIdleTimeout: number — Wait between actions for network idle. Default 2000 (set 0 to disable).
  • enableTouchEventsInActionSpace: boolean — Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Default false.
  • forceChromeSelectRendering: boolean — Force select elements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Playwright ≥ 1.52.0.
  • customActions: DeviceAction[] — Extend planning with project-specific actions.

Usage notes

Info
  • One agent per page: with forceSameTabNavigation (default true), Midscene intercepts new tabs for stability. Set it to false to allow new tabs and create a separate agent for each.
  • For the full list of interaction methods, see API reference (Common).

Examples

Quick start

import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';

const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
await page.goto('https://www.ebay.com');

const agent = new PlaywrightAgent(page);
await agent.aiAct('search "Noise cancelling headphones" and wait for results');
await agent.aiWaitFor('the results grid becomes visible');

const price = await agent.aiNumber('price of the first headphone');
console.log('first price', price);

await agent.aiTap('click the first result card');
await browser.close();

Extend Playwright tests with Midscene fixtures

// playwright.config.ts
export default defineConfig({
  reporter: [['list'], ['@midscene/web/playwright-reporter']],
});

// e2e/fixture.ts
import { test as base } from '@playwright/test';
import { PlaywrightAiFixture } from '@midscene/web/playwright';

export const test = base.extend(
  PlaywrightAiFixture({ waitForNetworkIdleTimeout: 1000 }),
);

// e2e/examples.spec.ts
test('search flow', async ({ agentForPage, page }) => {
  await page.goto('https://www.ebay.com');
  const agent = await agentForPage(page);
  await agent.aiAct('search "keyboard" and open first listing');
  await agent.aiAssert('a product detail page is opened');
});

See also

Chrome Bridge Agent

Bridge Mode lets Midscene operate your currently active desktop Chrome tab through the extension instead of launching a dedicated automation browser.

Import

import { AgentOverChromeBridge } from '@midscene/web/bridge-mode';

Constructor

const agent = new AgentOverChromeBridge({
  allowRemoteAccess: false,
  // other bridge options...
});

Bridge options

  • closeNewTabsAfterDisconnect?: boolean — Close any bridge-created tabs when the agent is destroyed. Default false.
  • allowRemoteAccess?: boolean — Allow remote machines to attach. Defaults to false (binds to 127.0.0.1).
  • host?: string — Override the interface for the bridge server. Takes precedence over allowRemoteAccess.
  • port?: number — TCP port for the bridge server. Default 3766.

See Bridge Mode by Chrome extension for full installation and capability details.

Usage notes

Info

Call connectCurrentTab or connectNewTabWithUrl before issuing other actions. Each AgentOverChromeBridge instance can only attach to one tab; create a new instance after destroy.

Bridge methods

connectCurrentTab()

function connectCurrentTab(options?: {
  forceSameTabNavigation?: boolean;
}): Promise<void>;
  • options.forceSameTabNavigation (default true) intercepts new tabs and opens them in the current tab to simplify debugging; set to false if you want normal new-tab behavior (create a separate agent per tab).
  • Resolves on a successful handshake with the active tab; rejects if the extension is not allowed to connect.

connectNewTabWithUrl()

function connectNewTabWithUrl(
  url: string,
  options?: { forceSameTabNavigation?: boolean },
): Promise<void>;
  • url — Address to open in a new desktop tab before attaching.
  • options — Same as connectCurrentTab.
  • Resolves when the new tab is opened and the bridge is connected.

destroy()

function destroy(closeNewTabsAfterDisconnect?: boolean): Promise<void>;
  • closeNewTabsAfterDisconnect — Optional runtime override for the constructor setting; true closes bridge-created tabs on teardown.
  • Resolves after the bridge connection and local server are fully cleaned up.

Examples

Open a new desktop tab

import { AgentOverChromeBridge } from '@midscene/web/bridge-mode';

const agent = new AgentOverChromeBridge();
await agent.connectNewTabWithUrl('https://www.bing.com');

await agent.ai('search "AI automation" and summarise first result');
await agent.aiAssert('some search results show up');
await agent.destroy();

Attach to current tab

import { AgentOverChromeBridge } from '@midscene/web/bridge-mode';

const agent = new AgentOverChromeBridge({
  allowRemoteAccess: false,
  closeNewTabsAfterDisconnect: true,
});

await agent.connectCurrentTab({ forceSameTabNavigation: true });
await agent.aiAct('open Gmail and report how many unread emails are visible');
await agent.destroy();

See also