MCP Server

Midscene provides a MCP server that allows AI assistants to control browsers, automate web tasks and write automation scripts for Midscene.

MCP Introduction

MCP (Model Context Protocol) is a standardized way for AI models to interact with external tools and capabilities. MCP servers expose a set of tools that AI models can invoke to perform various tasks. In Midscene's case, these tools allow AI models to control browsers, navigate web pages, interact with UI elements, and more.

Use Cases

  • Control browsers to execute automation tasks
  • Automatically generate Midscene automation scripts

Examples

Generate Midscene test cases for the Sauce Demo site

Setting Up Midscene MCP

Prerequisites

  1. An OpenAI API key or another supported AI model provider. For more information, see Choosing an AI Model.
  2. For Chrome browser integration (Bridge Mode):
    • Install the Midscene Chrome extension (download from Chrome Web Extension)
    • Switch to "Bridge Mode" in the extension and click "Allow Connection"

Configuration

Add the Midscene MCP server to your MCP configuration:

{
  "mcpServers": {
    "mcp-midscene": {
      "command": "npx",
      "args": ["-y", "@midscene/mcp"],
      "env": {
        "MIDSCENE_MODEL_NAME": "REPLACE_WITH_YOUR_MODEL_NAME",
        "OPENAI_API_KEY": "REPLACE_WITH_YOUR_OPENAI_API_KEY",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

For more information about configuring AI models, see Choosing an AI Model.

Available Tools

Midscene MCP provides the following browser automation tools:

Category Tool Name Description
Navigation midscene_navigate Navigate to a specified URL in the current tab
Tab Management midscene_get_tabs Get a list of all open browser tabs
midscene_set_active_tab Switch to a specific tab by ID
Page Interaction midscene_aiTap Click on an element described in natural language
midscene_aiInput Input text into a form field or element
midscene_aiHover Hover over an element
midscene_aiKeyboardPress Press a specific keyboard key
midscene_aiScroll Scroll the page or a specific element
Verification and Observation midscene_aiWaitFor Wait for a condition to be true on the page
midscene_aiAssert Assert that a condition is true on the page
midscene_screenshot Take a screenshot of the current page
Playwright Code Example midscene_playwright_example Provides Playwright code examples for Midscene
  • midscene_navigate: Navigate to a specified URL in the current tab
    Parameters: - url: The URL to navigate to

Tab Management

  • midscene_get_tabs: Get a list of all open browser tabs, including their IDs, titles, and URLs

    Parameters: None
  • midscene_set_active_tab: Switch to a specific tab by ID

    Parameters: - tabId: The ID of the tab to activate

Page Interaction

  • midscene_aiTap: Click on an element described in natural language

    Parameters: - locate: Natural language description of the element to click
  • midscene_aiInput: Input text into a form field or element

    Parameters: - value: The text to input - locate: Natural language description of the element to input text into
  • midscene_aiHover: Hover over an element

    Parameters: - locate: Natural language description of the element to hover over
  • midscene_aiKeyboardPress: Press a specific keyboard key

    Parameters: - key: The key to press (e.g., 'Enter', 'Tab', 'Escape') - locate: (Optional) Description of element to focus before pressing the key - deepThink: (Optional) If true, uses more precise element location
  • midscene_aiScroll: Scroll the page or a specific element

    Parameters: - direction: 'up', 'down', 'left', or 'right' - scrollType: 'once', 'untilBottom', 'untilTop', 'untilLeft', or 'untilRight' - distance: (Optional) Distance to scroll in pixels - locate: (Optional) Description of the element to scroll - deepThink: (Optional) If true, uses more precise element location

Verification and Observation

  • midscene_aiWaitFor: Wait for a condition to be true on the page

    Parameters: - assertion: Natural language description of the condition to wait for - timeoutMs: (Optional) Maximum time to wait in milliseconds - checkIntervalMs: (Optional) How often to check the condition
  • midscene_aiAssert: Assert that a condition is true on the page

    Parameters: - assertion: Natural language description of the condition to check
  • midscene_screenshot: Take a screenshot of the current page

    Parameters: - name: Name for the screenshot

Common Issues

What advantages does Midscene MCP have over other browser MCPs?

  • Midscene MCP supports Bridge mode by default, allowing you to directly control your current browser without needing to log in again or download a browser
  • Midscene MCP includes built-in optimal prompt templates and operation execution practices for browser page control, providing more stable and reliable browser automation experiences compared to other MCP implementations
  • Midscene MCP automatically generates execution case reports after completing tasks, allowing you to view the execution process at any time

Local port conflicts when multiple Clients are used

Problem description

When users simultaneously use Midscene MCP in multiple local clients (Claude Desktop, Cursor MCP, etc.), port conflicts may occur causing server errors

Solution

  • Temporarily close the MCP server in the extra clients
  • Execute the command:
# For macOS/Linux:
lsof -i:3766 | awk 'NR>1 {print $2}' | xargs -r kill -9

# For Windows:
FOR /F "tokens=5" %i IN ('netstat -ano ^| findstr :3766') DO taskkill /F /PID %i

How to Access Midscene Execution Reports

After each task execution, a Midscene task report is generated. You can open this HTML report directly from the command line:

# Replace the opened address with your report filename
open report_file_name.html

image