API reference (Web)
Use this doc when you need to customize Midscene's browser automation agents or review browser-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).
Action Space
PuppeteerAgent, PlaywrightAgent, and Chrome Bridge share one action space; the Midscene Agent can use these actions while planning tasks:
Tap— Left-click an element.RightClick— Right-click an element.DoubleClick— Double-click an element.Hover— Hover over an element.Input— Enter text withreplace/append/clearmodes.KeyboardPress— Press a specified key (optionally focusing a target element first).Scroll— Scroll from an element or screen center; supports scroll-to-top/bottom/left/right helpers.DragAndDrop— Drag from one element to another.LongPress— Long-press a target element with optional duration.Swipe— Touch-style swipe gesture (available whenenableTouchEventsInActionSpaceistrue).ClearInput— Clear the contents of an input field.Navigate— Open a URL in the current tab.Reload— Reload the page.GoBack— Navigate back in history.
PuppeteerAgent
Use Midscene against a Puppeteer-controlled browser when you need AI actions in your own Puppeteer workflows.
Import
Constructor
Browser-specific options
In addition to the base agent options, Puppeteer exposes:
forceSameTabNavigation: boolean— Restrict navigation to the current tab. Defaulttrue.waitForNavigationTimeout: number— Maximum wait when a step causes navigation. Default5000(set0to skip waiting).waitForNetworkIdleTimeout: number— Wait for network idle between actions to reduce flakiness. Default2000(set0to skip waiting).enableTouchEventsInActionSpace: boolean— Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Defaultfalse.forceChromeSelectRendering: boolean— Forceselectelements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Puppeteer >24.6.0.customActions: DeviceAction[]— Register bespoke actions defined viadefineActionso planning can call domain-specific steps.
Usage notes
- One agent per page: by default (
forceSameTabNavigation: true), Midscene opens new links in the current tab for easier debugging. Set it tofalseif you want new tabs, and create a new agent per tab. - For the full list of interaction methods, see API reference (Common).
Examples
Quick start
Connect to a remote Puppeteer browser
See also
- Integrate with Puppeteer for installation, fixtures, and remote-CDP guidance.
PlaywrightAgent
Use Midscene inside a Playwright browser for AI-driven testing or automation alongside your Playwright flows.
Import
Constructor
Browser-specific options
forceSameTabNavigation: boolean— Keep automation inside the active tab. Defaulttrue.waitForNavigationTimeout: number— Wait time for navigation completion. Default5000(set0to disable).waitForNetworkIdleTimeout: number— Wait between actions for network idle. Default2000(set0to disable).enableTouchEventsInActionSpace: boolean— Add touch gestures (like swipe) to the action space so the agent can handle touch-only interactions. Defaultfalse.forceChromeSelectRendering: boolean— Forceselectelements to render with Chrome's base-select styling so they're visible in screenshots/element extraction; requires Playwright ≥1.52.0.customActions: DeviceAction[]— Extend planning with project-specific actions.
Usage notes
- One agent per page: with
forceSameTabNavigation(defaulttrue), Midscene intercepts new tabs for stability. Set it tofalseto allow new tabs and create a separate agent for each. - For the full list of interaction methods, see API reference (Common).
Examples
Quick start
Extend Playwright tests with Midscene fixtures
See also
- Integrate with Playwright for setup, fixtures, and advanced configuration.
Chrome Bridge Agent
Bridge Mode lets Midscene operate your currently active desktop Chrome tab through the extension instead of launching a dedicated automation browser.
Import
Constructor
Bridge options
closeNewTabsAfterDisconnect?: boolean— Close any bridge-created tabs when the agent is destroyed. Defaultfalse.allowRemoteAccess?: boolean— Allow remote machines to attach. Defaults tofalse(binds to127.0.0.1).host?: string— Override the interface for the bridge server. Takes precedence overallowRemoteAccess.port?: number— TCP port for the bridge server. Default3766.
See Bridge Mode by Chrome extension for full installation and capability details.
Usage notes
Call connectCurrentTab or connectNewTabWithUrl before issuing other actions. Each AgentOverChromeBridge instance can only attach to one tab; create a new instance after destroy.
Bridge methods
connectCurrentTab()
options.forceSameTabNavigation(defaulttrue) intercepts new tabs and opens them in the current tab to simplify debugging; set tofalseif you want normal new-tab behavior (create a separate agent per tab).- Resolves on a successful handshake with the active tab; rejects if the extension is not allowed to connect.
connectNewTabWithUrl()
url— Address to open in a new desktop tab before attaching.options— Same asconnectCurrentTab.- Resolves when the new tab is opened and the bridge is connected.
destroy()
closeNewTabsAfterDisconnect— Optional runtime override for the constructor setting;truecloses bridge-created tabs on teardown.- Resolves after the bridge connection and local server are fully cleaned up.
Examples
Open a new desktop tab
Attach to current tab
See also
- API reference (Common) for shared agent methods.
- Bridge mode for extension setup, command sequence, and YAML usage.

