MCP Server

Midscene provides a MCP server that allows AI assistants to control Android devices, automate mobile app testing tasks.

MCP Introduction

MCP (Model Context Protocol) is a standardized way for AI models to interact with external tools and capabilities. MCP servers expose a set of tools that AI models can invoke to perform various tasks. For Midscene, these tools allow AI models to connect to Android devices, launch apps, interact with UI elements, and more.

Use Cases

  • Execute automated testing on Android devices
  • Control Android apps for UI interaction

Setting Up Midscene MCP

Prerequisites

  1. An OpenAI API key or another supported AI model provider. For more information, see Choosing an AI Model.
  2. Android adb tool installed and configured
  3. Android device with USB debugging enabled and connected to your computer

Configuration

Add the Midscene MCP server to your MCP configuration, note that the MIDSCENE_MCP_ANDROID_MODE environment variable is required:

{
  "mcpServers": {
    "mcp-midscene": {
      "command": "npx",
      "args": ["-y", "@midscene/mcp"],
      "env": {
        "MIDSCENE_MODEL_NAME": "REPLACE_WITH_YOUR_MODEL_NAME",
        "OPENAI_API_KEY": "REPLACE_WITH_YOUR_OPENAI_API_KEY",
        "MIDSCENE_MCP_ANDROID_MODE": "true",
        "MCP_SERVER_REQUEST_TIMEOUT": "800000"
      }
    }
  }
}

For more information about configuring AI models, see Choosing an AI Model.

Available Tools

Midscene MCP provides the following Android device automation tools:

CategoryTool NameDescription
Device Managementmidscene_android_list_devicesList all connected Android devices
midscene_android_connectConnect to a specific Android device
App Controlmidscene_android_launchLaunch an app or open a webpage on Android device
System Operationsmidscene_android_backPress the back button on Android device
midscene_android_homePress the home button on Android device
Page Interactionmidscene_aiTapClick on an element described in natural language
midscene_aiInputInput text into a form field or element
midscene_aiKeyboardPressPress a specific keyboard key
midscene_aiScrollScroll the page or a specific element
Verification and Observationmidscene_aiWaitForWait for a condition to be true on the page
midscene_aiAssertAssert that a condition is true on the page
midscene_screenshotTake a screenshot of the current page

Device Management

  • midscene_android_list_devices: List all connected Android devices available for automation

    Parameters: None
  • midscene_android_connect: Connect to an Android device via ADB

    Parameters: - deviceId: (Optional) Device ID to connect to. If not provided, uses the first available device.

App Control

  • midscene_android_launch: Launch an app or navigate to a URL on Android device
    Parameters: - uri: Package name, activity name, or URL to launch

System Operations

  • midscene_android_back: Press the back button on Android device

    Parameters: None
  • midscene_android_home: Press the home button on Android device

    Parameters: None

Page Interaction

  • midscene_aiTap: Click on an element described in natural language

    Parameters: - locate: Natural language description of the element to click
  • midscene_aiInput: Input text into a form field or element

    Parameters: - value: The text to input - locate: Natural language description of the element to input text into
  • midscene_aiKeyboardPress: Press a specific keyboard key

    Parameters: - key: The key to press (e.g., 'Enter', 'Tab', 'Escape') - locate: (Optional) Description of element to focus before pressing the key - deepThink: (Optional) If true, uses more precise element location
  • midscene_aiScroll: Scroll the page or a specific element

    Parameters: - direction: 'up', 'down', 'left', or 'right' - scrollType: 'once', 'untilBottom', 'untilTop', 'untilLeft', or 'untilRight' - distance: (Optional) Distance to scroll in pixels - locate: (Optional) Description of the element to scroll - deepThink: (Optional) If true, uses more precise element location

Verification and Observation

  • midscene_aiWaitFor: Wait for a condition to be true on the page

    Parameters: - assertion: Natural language description of the condition to wait for - timeoutMs: (Optional) Maximum time to wait in milliseconds - checkIntervalMs: (Optional) How often to check the condition
  • midscene_aiAssert: Assert that a condition is true on the page

    Parameters: - assertion: Natural language description of the condition to check
  • midscene_screenshot: Take a screenshot of the current page

    Parameters: - name: Name for the screenshot

Common Issues

How to Connect an Android Device?

  1. Ensure Android SDK is installed and ADB is configured
  2. Enable Developer Options and USB debugging on your Android device
  3. Connect the device to your computer via USB cable
  4. Run adb devices to confirm the device is connected
  5. Use midscene_android_list_devices in MCP to view available devices

How to Launch an Android App?

Use the midscene_android_launch tool with parameters that can be:

  • App package name: e.g., com.android.chrome
  • Activity name: e.g., com.android.chrome/.MainActivity
  • Web URL: e.g., https://www.example.com

Local port conflicts when multiple Clients are used

Problem description

When users simultaneously use Midscene MCP in multiple local clients (Claude Desktop, Cursor MCP, etc.), port conflicts may occur causing server errors

Solution

  • Temporarily close the MCP server in the extra clients
  • Execute the command:
# For macOS/Linux:
lsof -i:3766 | awk 'NR>1 {print $2}' | xargs -r kill -9

# For Windows:
FOR /F "tokens=5" %i IN ('netstat -ano ^| findstr :3766') DO taskkill /F /PID %i

How to Access Midscene Execution Reports

After each task execution, a Midscene task report is generated. You can open this HTML report directly from the command line:

# Replace the opened address with your report filename
open report_file_name.html

image