English

MCP Server

Midscene provides a MCP server that allows AI assistants to control browsers, automate web tasks and write automation scripts for Midscene.

MCP Introduction

MCP (Model Context Protocol) is a standardized way for AI models to interact with external tools and capabilities. MCP servers expose a set of tools that AI models can invoke to perform various tasks. In Midscene's case, these tools allow AI models to control browsers, navigate web pages, interact with UI elements, and more.

Use Cases

Control browsers to execute automation tasks
Automatically generate Midscene automation scripts

Examples

Generate Midscene test cases for the Sauce Demo site

Setting Up Midscene MCP

Prerequisites

An OpenAI API key or another supported AI model provider. For more information, see Choosing an AI Model.
For Chrome browser integration (Bridge Mode):
- Install the Midscene Chrome extension (download from Chrome Web Extension)
- Switch to "Bridge Mode" in the extension and click "Allow Connection"

Configuration

Add the Midscene MCP server to your MCP configuration:

{
  "mcpServers": {
    "mcp-midscene": {
      "command": "npx",
      "args": ["-y", "@midscene/mcp"],
      "env": {
        "MIDSCENE_MODEL_NAME": "REPLACE_WITH_YOUR_MODEL_NAME",
        "OPENAI_API_KEY": "REPLACE_WITH_YOUR_OPENAI_API_KEY",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

For more information about configuring AI models, see Choosing an AI Model.

Available Tools

Midscene MCP provides the following browser automation tools:

Category	Tool Name	Description
Navigation	midscene_navigate	Navigate to a specified URL in the current tab
Tab Management	midscene_get_tabs	Get a list of all open browser tabs
	midscene_set_active_tab	Switch to a specific tab by ID
Page Interaction	midscene_aiTap	Click on an element described in natural language
	midscene_aiInput	Input text into a form field or element
	midscene_aiHover	Hover over an element
	midscene_aiKeyboardPress	Press a specific keyboard key
	midscene_aiScroll	Scroll the page or a specific element
Verification and Observation	midscene_aiWaitFor	Wait for a condition to be true on the page
	midscene_aiAssert	Assert that a condition is true on the page
	midscene_screenshot	Take a screenshot of the current page
Playwright Code Example	midscene_playwright_example	Provides Playwright code examples for Midscene

midscene_navigate: Navigate to a specified URL in the current tab
Parameters: - url: The URL to navigate to

Tab Management

midscene_get_tabs: Get a list of all open browser tabs, including their IDs, titles, and URLs

Parameters: None
midscene_set_active_tab: Switch to a specific tab by ID

Parameters: - tabId: The ID of the tab to activate

Page Interaction

midscene_aiTap: Click on an element described in natural language

Parameters: - locate: Natural language description of the element to click
midscene_aiInput: Input text into a form field or element

Parameters: - value: The text to input - locate: Natural language description of the element to input text into
midscene_aiHover: Hover over an element

Parameters: - locate: Natural language description of the element to hover over
midscene_aiKeyboardPress: Press a specific keyboard key

Parameters: - key: The key to press (e.g., 'Enter', 'Tab', 'Escape') - locate: (Optional) Description of element to focus before pressing the key - deepThink: (Optional) If true, uses more precise element location
midscene_aiScroll: Scroll the page or a specific element

Parameters: - direction: 'up', 'down', 'left', or 'right' - scrollType: 'once', 'untilBottom', 'untilTop', 'untilLeft', or 'untilRight' - distance: (Optional) Distance to scroll in pixels - locate: (Optional) Description of the element to scroll - deepThink: (Optional) If true, uses more precise element location

Verification and Observation

midscene_aiWaitFor: Wait for a condition to be true on the page

Parameters: - assertion: Natural language description of the condition to wait for - timeoutMs: (Optional) Maximum time to wait in milliseconds - checkIntervalMs: (Optional) How often to check the condition
midscene_aiAssert: Assert that a condition is true on the page

Parameters: - assertion: Natural language description of the condition to check
midscene_screenshot: Take a screenshot of the current page

Parameters: - name: Name for the screenshot

Common Issues

What advantages does Midscene MCP have over other browser MCPs?

Midscene MCP supports Bridge mode by default, allowing you to directly control your current browser without needing to log in again or download a browser
Midscene MCP includes built-in optimal prompt templates and operation execution practices for browser page control, providing more stable and reliable browser automation experiences compared to other MCP implementations
Midscene MCP automatically generates execution case reports after completing tasks, allowing you to view the execution process at any time

Local port conflicts when multiple Clients are used

Problem description

When users simultaneously use Midscene MCP in multiple local clients (Claude Desktop, Cursor MCP, etc.), port conflicts may occur causing server errors

Solution

Temporarily close the MCP server in the extra clients
Execute the command:

# For macOS/Linux:
lsof -i:3766 | awk 'NR>1 {print $2}' | xargs -r kill -9

# For Windows:
FOR /F "tokens=5" %i IN ('netstat -ano ^| findstr :3766') DO taskkill /F /PID %i

How to Access Midscene Execution Reports

After each task execution, a Midscene task report is generated. You can open this HTML report directly from the command line:

# Replace the opened address with your report filename
open report_file_name.html

On This Page

#MCP Server

#Use Cases

#Examples

#Setting Up Midscene MCP

#Prerequisites

#Configuration

#Available Tools

#Navigation

#Tab Management

#Page Interaction

#Verification and Observation

#Common Issues

#What advantages does Midscene MCP have over other browser MCPs?

#Local port conflicts when multiple Clients are used

#How to Access Midscene Execution Reports