Integrate with Playwright
Playwright.js is an open-source automation library developed by Microsoft, mainly used for end-to-end testing and web scraping of web applications.
There are two ways to integrate with Playwright:
- Directly integrate and call the Midscene Agent via script, suitable for quick prototyping, data scraping, and automation scripts.
- Integrate Midscene into Playwright test cases, suitable for UI testing scenarios.
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1"
export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key"
export MIDSCENE_MODEL_NAME="replace-with-your-model-name"
export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family"
For more configuration details, please refer to Model strategy and Model configuration.
Direct integration with Midscene agent
Step 1: Install dependencies
npm install @midscene/web playwright @playwright/test tsx --save-dev
yarn add @midscene/web playwright @playwright/test tsx --save-dev
pnpm add @midscene/web playwright @playwright/test tsx --save-dev
bun add @midscene/web playwright @playwright/test tsx --save-dev
deno add npm:@midscene/web npm:playwright npm:@playwright/test npm:tsx --save-dev
Step 2: Write the script
Save the following code as ./demo.ts:
import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';
import 'dotenv/config'; // read environment variables from .env file
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
(async () => {
const browser = await chromium.launch({
headless: true, // 'true' means we can't see the browser window
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
await page.setViewportSize({
width: 1280,
height: 768,
});
await page.goto('https://www.ebay.com');
await sleep(5000); // 👀 init Midscene agent
const agent = new PlaywrightAgent(page);
// 👀 type keywords, perform a search
await agent.aiAct('type "Headphones" in search box, hit Enter');
// 👀 wait for the loading
await agent.aiWaitFor('there is at least one headphone item on page');
// or you may use a plain sleep:
// await sleep(5000);
// 👀 understand the page content, find the items
const items = await agent.aiQuery(
'{itemTitle: string, price: Number}[], find item in list and corresponding price',
);
console.log('headphones in stock', items);
const isMoreThan1000 = await agent.aiBoolean(
'Is the price of the headphones more than 1000?',
);
console.log('isMoreThan1000', isMoreThan1000);
const price = await agent.aiNumber(
'What is the price of the first headphone?',
);
console.log('price', price);
const name = await agent.aiString(
'What is the name of the first headphone?',
);
console.log('name', name);
const location = await agent.aiLocate(
'What is the location of the first headphone?',
);
console.log('location', location);
// 👀 assert by AI
await agent.aiAssert('There is a category filter on the left');
// 👀 click on the first item
await agent.aiTap('the first item in the list');
await browser.close();
})(),
);
For more Agent API details, please refer to API Reference.
Step 3: Run the script
Use tsx to run, and you will see the product information printed in the terminal:
# run
npx tsx demo.ts
# The terminal should output something like:
# [
# {
# itemTitle: 'JBL Tour Pro 2 - True wireless Noise Cancelling earbuds with Smart Charging Case',
# price: 551.21
# },
# {
# itemTitle: 'Soundcore Space One Wireless Headphones 40H ANC Playtime 2XStronger Voice',
# price: 543.94
# }
# ]
Step 4: View the run report
After the above command executes successfully, it will output: Midscene - report file updated: /path/to/report/some_id.html. Open this file in your browser to view the report.
Integration in Playwright test cases
Here we assume you already have a repository with Playwright integration.
Step 1: Add dependencies and update configuration
Add dependencies
npm install @midscene/web --save-dev
yarn add @midscene/web --save-dev
pnpm add @midscene/web --save-dev
bun add @midscene/web --save-dev
deno add npm:@midscene/web --save-dev
Update playwright.config.ts
export default defineConfig({
testDir: './e2e',
+ timeout: 90 * 1000,
+ reporter: [["list"], ["@midscene/web/playwright-reporter", { type: "merged" }]], // type optional, default is "merged", means multiple test cases generate one report, optional value is "separate", means one report for each test case
});
The type option of the reporter configuration can be merged or separate. The default value is merged, which indicates that one merged report for all test cases; the optional value is separate, indicating that the report is separated for each test case.
Step 2: Extend the test instance
Save the following code as ./e2e/fixture.ts:
import { test as base } from '@playwright/test';
import type { PlayWrightAiFixtureType } from '@midscene/web/playwright';
import { PlaywrightAiFixture } from '@midscene/web/playwright';
export const test = base.extend<PlayWrightAiFixtureType>(
PlaywrightAiFixture({
waitForNetworkIdleTimeout: 2000, // optional, the timeout for waiting for network idle between each action, default is 2000ms
}),
);
Step 3: Write test cases
Review the full catalog of action, query, and utility methods in the Agent API reference. When you need lower-level control, you can use agentForPage to obtain the underlying PageAgent instance and call any API directly:
test('case demo', async ({ agentForPage, page }) => {
const agent = await agentForPage(page);
await agent.recordToReport();
const logContent = agent._unstableLogContent();
console.log(logContent);
});
Example code
./e2e/ebay-search.spec.ts
import { expect } from '@playwright/test';
import { test } from './fixture';
test.beforeEach(async ({ page }) => {
page.setViewportSize({ width: 400, height: 905 });
await page.goto('https://www.ebay.com');
await page.waitForLoadState('networkidle');
});
test('search headphone on ebay', async ({
ai,
aiQuery,
aiAssert,
aiInput,
aiTap,
aiScroll,
aiWaitFor,
aiRightClick,
recordToReport,
}) => {
// Use aiInput to enter search keyword
await aiInput('Headphones', 'search box');
// Use aiTap to click search button
await aiTap('search button');
// Wait for search results to load
await aiWaitFor('search results list loaded', { timeoutMs: 5000 });
// Use aiScroll to scroll to bottom
await aiScroll(
{
direction: 'down',
scrollType: 'untilBottom',
},
'search results list',
);
// Use aiQuery to get product information
const items = await aiQuery<Array<{ title: string; price: number }>>(
'get product titles and prices from search results',
);
console.log('headphones in stock', items);
expect(items?.length).toBeGreaterThan(0);
// Use aiAssert to verify filter functionality
await aiAssert('category filter exists on the left side');
// Use recordToReport to capture the current state
await recordToReport('Search Results', {
content: 'Final search results for headphones',
});
});
For more Agent API details, please refer to API Reference.
Step 4. Run test cases
npx playwright test ./e2e/ebay-search.spec.ts
Step 5. View test report
After the command executes successfully, it will output: Midscene - report file updated: ./current_cwd/midscene_run/report/some_id.html. Open this file in your browser to view the report.
Advanced
About opening in a new tab
Each Agent instance is bound to a single page. To make debugging easier, Midscene intercepts new tabs by default (for example, links with target="_blank") and opens them in the current page.
If you want to restore opening in a new tab, set forceSameTabNavigation to false—but you’ll need to create a new Agent instance for each new tab.
const mid = new PlaywrightAgent(page, {
forceSameTabNavigation: false,
});
Connect Midscene Agent to a Remote Playwright Browser
Connect to a remote Playwright browser when you already run browsers in your own infra or vendor grid. This keeps the browser close to the target environment, avoids repeated launches, and still lets Midscene drive it with the same AI APIs.
Prerequisites
npm install playwright @playwright/test @midscene/web --save-dev
yarn add playwright @playwright/test @midscene/web --save-dev
pnpm add playwright @playwright/test @midscene/web --save-dev
bun add playwright @playwright/test @midscene/web --save-dev
deno add npm:playwright npm:@playwright/test npm:@midscene/web --save-dev
Getting a CDP WebSocket URL
You can get a CDP WebSocket URL from various sources, for example:
- BrowserBase: Sign up at https://browserbase.com and get your CDP URL
- Browserless: Use https://browserless.io or run your own instance
- Local Chrome: Run Chrome with
--remote-debugging-port=9222 and use ws://localhost:9222/devtools/browser/...
- Docker: Run Chrome in a Docker container with debugging port exposed
Code example
import { chromium } from 'playwright';
import { PlaywrightAgent } from '@midscene/web/playwright';
// CDP WebSocket URL from your remote browser service
const cdpWsUrl = 'ws://your-remote-browser.com/devtools/browser/your-session-id';
// Connect and pick a page
const browser = await chromium.connectOverCDP(cdpWsUrl);
const context = browser.contexts()[0];
const page = context.pages()[0] || await context.newPage();
// Create Midscene agent (usage matches any Playwright agent)
const agent = new PlaywrightAgent(page);
// Use AI methods as usual
await agent.aiAct('navigate to https://example.com');
await agent.aiAct('click the login button');
const result = await agent.aiQuery('get page title: {title: string}');
// Cleanup
await agent.destroy();
await browser.close();
Once connected, keep using PlaywrightAgent the same way you would with a locally launched browser.
Provide custom actions
Use the customActions option to extend the agent's action space with your own actions defined via defineAction. When provided, these actions will be appended to the built-in ones so the agent can call them during planning.
import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
const ContinuousClick = defineAction({
name: 'continuousClick',
description: 'Click the same target repeatedly',
paramSchema: z.object({
locate: getMidsceneLocationSchema(),
count: z
.number()
.int()
.positive()
.describe('How many times to click'),
}),
async call(param) {
const { locate, count } = param;
console.log('click target center', locate.center);
console.log('click count', count);
// carry out your clicking logic using locate + count
},
});
const agent = new PlaywrightAgent(page, {
customActions: [ContinuousClick],
});
await agent.aiAct('click the red button five times');
Check Integrate with any interface for more details about defining custom actions.
More