Integrate with Puppeteer

Puppeteer is a Node.js library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi. Puppeteer runs in the headless (no visible UI) by default but can be configured to run in a visible ("headful") browser.

Demo Projects

Set up API keys for model

Set your model configs into the environment variables. You may refer to Model strategy for more details.

export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1"
export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key"
export MIDSCENE_MODEL_NAME="replace-with-your-model-name"
export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family"

For more configuration details, please refer to Model strategy and Model configuration.

Integration with Midscene Agent

Step 1. Install dependencies

npm
yarn
pnpm
bun
deno
npm install @midscene/web puppeteer tsx --save-dev

Step 2. Write scripts

Write and save the following code as ./demo.ts.

./demo.ts
import puppeteer from "puppeteer";
import { PuppeteerAgent } from "@midscene/web/puppeteer";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    const browser = await puppeteer.launch({
      headless: false, // here we use headed mode to help debug
    });

    const page = await browser.newPage();
    await page.setViewport({
      width: 1280,
      height: 800,
      deviceScaleFactor: 1,
    });

    await page.goto("https://www.ebay.com");
    await sleep(5000);

    // 👀 init Midscene agent
    const agent = new PuppeteerAgent(page);

    // 👀 type keywords, perform a search
    await agent.aiAct('type "Headphones" in search box, hit Enter');
    await sleep(5000);

    // 👀 understand the page content, find the items
    const items = await agent.aiQuery(
      "{itemTitle: string, price: Number}[], find item in list and corresponding price"
    );
    console.log("headphones in stock", items);

    // 👀 assert by AI
    await agent.aiAssert("There is a category filter on the left");

    await browser.close();
  })()
);

Step 3. Run

Using tsx to run, you will get the data of Headphones on eBay:

# run
npx tsx demo.ts

# it should print 
#  [
#   {
#     itemTitle: 'Beats by Dr. Dre Studio Buds Totally Wireless Noise Cancelling In Ear + OPEN BOX',
#     price: 505.15
#   },
#   {
#     itemTitle: 'Skullcandy Indy Truly Wireless Earbuds-Headphones Green Mint',
#     price: 186.69
#   }
# ]

For the complete catalog of agent methods, see the API reference.

Step 4: View the report

After the above command executes successfully, the console will output: Midscene - report file updated: /path/to/report/some_id.html. You can open this file in a browser to view the report.

Advanced

About opening in a new tab

Each Agent instance is bound to a single page. For easier debugging, Midscene intercepts new tabs by default (for example, links with target="_blank") and opens them in the current page.

If you want to allow new tabs again, set forceSameTabNavigation to false—but you must create a new Agent instance for each new tab.

const mid = new PuppeteerAgent(page, {
  forceSameTabNavigation: false,
});

Connect Midscene Agent to a Remote Puppeteer Browser

Example Project

You can find an example project of remote Puppeteer integration here: https://github.com/web-infra-dev/midscene-example/tree/main/remote-puppeteer-demo

Use this approach when you want to reuse a browser that already runs inside your own infrastructure—such as a persistent cloud worker, a third-party browser grid, or an on-prem desktop. Wiring Midscene into that remote Puppeteer instance keeps the browser close to the target environment, cuts repeated startup costs, and lets you centralize management while keeping the same AI automation APIs.

In practice you manually:

  1. Obtain a CDP WebSocket URL from the remote browser service
  2. Use Puppeteer to connect to the remote browser
  3. Create a Midscene agent for AI-driven automation

Prerequisites

npm
yarn
pnpm
bun
deno
npm install puppeteer @midscene/web --save-dev

Getting a CDP WebSocket URL

You can get a CDP WebSocket URL from various sources, for example:

  • BrowserBase: Sign up at https://browserbase.com and get your CDP URL
  • Browserless: Use https://browserless.io or run your own instance
  • Local Chrome: Run Chrome with --remote-debugging-port=9222 and use ws://localhost:9222/devtools/browser/...
  • Docker: Run Chrome in a Docker container with debugging port exposed

Basic Example

import puppeteer from 'puppeteer';
import { PuppeteerAgent } from '@midscene/web/puppeteer';

// Assuming you already have a CDP WebSocket URL
const cdpWsUrl = 'ws://your-remote-browser.com/devtools/browser/your-session-id';

// Connect to remote browser
const browser = await puppeteer.connect({
  browserWSEndpoint: cdpWsUrl
});

// Get or create page
const pages = await browser.pages();
const page = pages[0] || await browser.newPage();

// Create Midscene agent
const agent = new PuppeteerAgent(page);

// Use AI methods
await agent.aiAct('navigate to https://example.com');
await agent.aiAct('click the login button');
const result = await agent.aiQuery('get page title: {title: string}');

// Cleanup
await agent.destroy();
await browser.disconnect();

Provide custom actions

Use the customActions option to extend the agent's action space with your own actions defined via defineAction. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // carry out your clicking logic using locate + count
  },
});

const agent = new PuppeteerAgent(page, {
  customActions: [ContinuousClick],
});

await agent.aiAct('click the red button five times');

Check Integrate with any interface for more details about defining custom actions.

More