Integrate with Puppeteer
Puppeteer is a Node.js library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi. Puppeteer runs in the headless (no visible UI) by default but can be configured to run in a visible ("headful") browser.
you can check the demo project of Puppeteer here: https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo
There is also a demo of Puppeteer with Vitest: https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
For more configuration details, please refer to Model strategy and Model configuration.
Integration with Midscene Agent
Step 1. Install dependencies
Step 2. Write scripts
Write and save the following code as ./demo.ts.
Step 3. Run
Using tsx to run, you will get the data of Headphones on eBay:
For the complete catalog of agent methods, see the API reference.
Step 4: View the report
After the above command executes successfully, the console will output: Midscene - report file updated: /path/to/report/some_id.html. You can open this file in a browser to view the report.
Advanced
About opening in a new tab
Each Agent instance is bound to a single page. For easier debugging, Midscene intercepts new tabs by default (for example, links with target="_blank") and opens them in the current page.
If you want to allow new tabs again, set forceSameTabNavigation to false—but you must create a new Agent instance for each new tab.
Connect Midscene Agent to a Remote Puppeteer Browser
You can find an example project of remote Puppeteer integration here: https://github.com/web-infra-dev/midscene-example/tree/main/remote-puppeteer-demo
Use this approach when you want to reuse a browser that already runs inside your own infrastructure—such as a persistent cloud worker, a third-party browser grid, or an on-prem desktop. Wiring Midscene into that remote Puppeteer instance keeps the browser close to the target environment, cuts repeated startup costs, and lets you centralize management while keeping the same AI automation APIs.
In practice you manually:
- Obtain a CDP WebSocket URL from the remote browser service
- Use Puppeteer to connect to the remote browser
- Create a Midscene agent for AI-driven automation
Prerequisites
Getting a CDP WebSocket URL
You can get a CDP WebSocket URL from various sources, for example:
- BrowserBase: Sign up at https://browserbase.com and get your CDP URL
- Browserless: Use https://browserless.io or run your own instance
- Local Chrome: Run Chrome with
--remote-debugging-port=9222and usews://localhost:9222/devtools/browser/... - Docker: Run Chrome in a Docker container with debugging port exposed
Basic Example
Provide custom actions
Use the customActions option to extend the agent's action space with your own actions defined via defineAction. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.
Check Integrate with any interface for more details about defining custom actions.
More
- For every Agent method, check the API Reference.
- For the Puppeteer API reference, see Puppeteer Agent API.
- Demo projects
- Puppeteer demo: https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo
- Puppeteer + Vitest demo: https://github.com/web-infra-dev/midscene-example/tree/main/puppeteer-with-vitest-demo

