English

Integrate with iOS (WebDriverAgent)

After connecting iOS devices using WebDriverAgent, you can use Midscene javascript SDK to control iOS devices.

Demo Projects

Control iOS devices with javascript SDK: https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo

Integrate with Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo

Showcases

More showcases

ios

About WebDriver and Midscene's Relationship

WebDriver is a standard protocol established by W3C for browser automation, providing a unified API to control different browsers and applications. The WebDriver protocol defines the communication method between client and server, enabling automation tools to control various user interfaces across platforms.

Through the efforts of the Appium team and other open source communities, the industry now has many excellent libraries that convert desktop and mobile device automation operations into WebDriver protocol. These tools include:

Appium - Cross-platform mobile automation framework
WebDriverAgent - Service dedicated to iOS device automation
Selenium - Web browser automation tool
WinAppDriver - Windows application automation tool

Midscene adapts to the WebDriver protocol, which means developers can use AI models to perform intelligent automated operations on any device that supports WebDriver. Through this design, Midscene can not only control traditional operations like clicking and typing, but also:

Understand interface content and context
Execute complex multi-step operations
Perform intelligent assertions and validations
Extract and analyze interface data

On iOS platform, Midscene connects to iOS devices through WebDriverAgent, allowing you to control iOS apps and system using natural language descriptions.

Preparation

Install Node.js

Install Node.js 18 or higher.

Prepare API Key

Prepare an API Key for a visual language (VL) model.

You can find supported models and configurations for Midscene.js in the Choose a Model documentation.

Prepare WebDriver Server

Before getting started, you need to set up the iOS development environment:

macOS (required for iOS development)
Xcode and Xcode command line tools
iOS Simulator or real device

Environment Configuration

Before using Midscene iOS, you need to prepare the WebDriverAgent service. Please refer to the official documentation for setup:

Simulator Configuration: Run Prebuilt WDA
Real Device Configuration: Real Device Configuration

Verify Environment Configuration

After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:

Access URL: http://localhost:8100/status

Correct Response Example:

{
  "value": {
    "build": {
      "version": "10.1.1",
      "time": "Sep 24 2025 18:56:41",
      "productBundleIdentifier": "com.facebook.WebDriverAgentRunner"
    },
    "os": {
      "testmanagerdVersion": 65535,
      "name": "iOS",
      "sdkVersion": "26.0",
      "version": "26.0"
    },
    "device": "iphone",
    "ios": {
      "ip": "10.91.115.63"
    },
    "message": "WebDriverAgent is ready to accept commands",
    "state": "success",
    "ready": true
  },
  "sessionId": "BCAD9603-F714-447C-A9E6-07D58267966B"
}

If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.

Set up AI model service

Set your model configs into the environment variables. You may refer to choose a model for more details.

# replace with your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

# You may need more configs, such as model name and endpoint, please refer to [choose a model](../choose-a-model)
export OPENAI_BASE_URL="..."

Integrate Midscene

Step 1: Install dependencies

npm

yarn

pnpm

bun

npm install @midscene/ios --save-dev

Step 2: Write scripts

Here's an example using iOS Safari browser to search for headphones.

Write the following code and save it as ./demo.ts

./demo.ts

import {
  IOSAgent,
  IOSDevice,
  agentFromWebDriverAgent,
} from '@midscene/ios';

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    // Method 1: Create device and agent directly
    const page = new IOSDevice({
      wdaPort: 8100,
      wdaHost: 'localhost',
    });

    // 👀 Initialize Midscene agent
    const agent = new IOSAgent(page, {
      aiActionContext:
        'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    });
    await page.connect();

    // Method 2: Or use convenience function (recommended)
    // const agent = await agentFromWebDriverAgent({
    //   wdaPort: 8100,
    //   wdaHost: 'localhost',
    //   aiActionContext: 'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    // });

    // 👀 Directly open ebay.com webpage (recommended approach)
    await page.launch('https://ebay.com');
    await sleep(3000);

    // 👀 Enter keywords and perform search
    await agent.aiAction('Search for "Headphones"');

    // 👀 Wait for loading to complete
    await agent.aiWaitFor('At least one headphone product is displayed on the page');
    // Or you can use a simple sleep:
    // await sleep(5000);

    // 👀 Understand page content and extract data
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find product titles and prices in the list',
    );
    console.log('Headphone product information', items);

    // 👀 Use AI assertion
    await agent.aiAssert('Multiple headphone products are displayed on the interface');

    await page.destroy();
  })(),
);

Step 3: Run

Use tsx to run the script

# run
npx tsx demo.ts

Shortly after, you will see output like this:

[
 {
   itemTitle: 'AirPods Pro (2nd generation) with MagSafe Charging Case (USB-C)',
   price: 249
 },
 {
   itemTitle: 'Sony WH-1000XM4 Wireless Premium Noise Canceling Overhead Headphones',
   price: 278
 }
]

Step 4: View execution report

When the above command executes successfully, it will output in the console: Midscene - report file updated: /path/to/report/some_id.html. Open this file in a browser to view the report.

Constructor and Interface

`IOSDevice` Constructor

The IOSDevice constructor supports the following parameters:

opts?: IOSDeviceOpt - Optional parameters for IOSDevice configuration
- wdaPort?: number - Optional, WebDriverAgent port. Default is 8100.
- wdaHost?: string - Optional, WebDriverAgent host. Default is 'localhost'.
- autoDismissKeyboard?: boolean - Optional, whether to automatically dismiss keyboard after text input. Default is true.
- customActions?: DeviceAction<any>[] - Optional, list of custom device actions.

Additional iOS Agent Interfaces

In addition to the common Agent interfaces in API Reference, IOSAgent provides some additional interfaces:

`agent.launch()`

Launch a web page or native iOS application.

Type

function launch(uri: string): Promise<void>;

Parameters:
- uri: string - URI to open, can be a web url, native app bundle identifier, or custom URL scheme
Return Value:
- Promise<void>
Example:

import { IOSAgent, IOSDevice, agentFromWebDriverAgent } from '@midscene/ios';

// Method 1: Create device and agent manually
const page = new IOSDevice();
const agent = new IOSAgent(page);
await page.connect();

// Method 2: Use convenience function (recommended)
const agent = await agentFromWebDriverAgent();

await agent.launch('https://www.apple.com'); // Open web page
await agent.launch('com.apple.mobilesafari'); // Launch Safari
await agent.launch('com.apple.Preferences'); // Launch Settings app
await agent.launch('myapp://profile/user/123'); // Open app deep link
await agent.launch('tel:+1234567890'); // Make a phone call
await agent.launch('mailto:example@email.com'); // Send an email

`agentFromWebDriverAgent()` (Recommended)

Create an IOSAgent by connecting to WebDriverAgent service. This is the most convenient way.

Type

function agentFromWebDriverAgent(
  opts?: PageAgentOpt & IOSDeviceOpt,
): Promise<IOSAgent>;

Parameters:
- opts?: PageAgentOpt & IOSDeviceOpt - Optional, configuration for initializing IOSAgent. PageAgentOpt refers to Constructor, IOSDeviceOpt configuration values refer to IOSDevice Constructor
Return Value:
- Promise<IOSAgent> Returns an IOSAgent instance
Example:

import { agentFromWebDriverAgent } from '@midscene/ios';

// Use default WebDriverAgent address (localhost:8100)
const agent = await agentFromWebDriverAgent();

// Use custom WebDriverAgent address
const agent = await agentFromWebDriverAgent({
  wdaHost: 'localhost',
  wdaPort: 8100,
  aiActionContext: 'If popups appear, click agree',
});

Extending Custom Interaction Actions

Using the customActions option combined with custom interaction actions defined by defineAction, you can extend the Agent's action space. These actions are appended after built-in actions, making them available for the Agent to call during planning.

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { IOSAgent, IOSDevice } from '@midscene/ios';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // Implement custom click logic combining locate + count
  },
});

const agent = await agentFromWebDriverAgent({
  customActions: [ContinuousClick],
});

await agent.aiAction('Click the red button five times');

For more details about custom actions, refer to Integrate with any interface.

For more Agent API interfaces, refer to API Reference.
For more prompting tips, refer to Prompting Tips

FAQ

Why can't I control my device through WebDriverAgent even though it's connected?

Please check the following:

Developer Mode: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
UI Automation: Ensure it's enabled in Settings > Developer > UI Automation
Device Trust: Ensure the device trusts the current Mac

What are the differences between simulators and real devices?

Feature	Real Device	Simulator
Port Forwarding	Requires iproxy	Not required
Developer Mode	Must enable	Auto-enabled
UI Automation Settings	Must enable manually	Auto-enabled
Performance	Real device performance	Depends on Mac performance
Sensors	Real hardware	Simulated data

How to use custom WebDriverAgent port and host?

You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:

// Method 1: Using IOSDevice
const device = new IOSDevice({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

// Method 2: Using convenience function (recommended)
const agent = await agentFromWebDriverAgent({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

For remote devices, you also need to set up port forwarding accordingly:

iproxy 8100 8100 YOUR_DEVICE_ID

iOS-Specific Actions

The iOS package includes iOS-specific actions that can be used in automation:

// Press home button
await agent.callAction('IOSHomeButton');

// Open app switcher
await agent.callAction('IOSAppSwitcher');

// Long press with custom duration
await agent.callAction('IOSLongPress', {
  locate: 'menu item',
  duration: 2000, // 2 seconds
});

Best Practices

1. Device Management

Always properly connect and destroy devices:

try {
  await device.connect();
  // Your automation code here
} finally {
  await device.destroy();
}

2. Wait for UI Updates

iOS animations and transitions may need time to complete:

await agent.aiTap('button');
await sleep(1000); // Wait for animation
await agent.aiAssert('new screen loaded');

3. Handle Keyboard Input

For better text input handling:

await agent.aiInput('text', 'input field', {
  autoDismissKeyboard: true, // Automatically dismiss keyboard
});

4. Bundle Identifiers

Common iOS app bundle identifiers:

Safari: com.apple.mobilesafari
Settings: com.apple.Preferences
Messages: com.apple.MobileSMS
Camera: com.apple.camera
Photos: com.apple.mobileslideshow

Testing Integration

Vitest Integration

test/ios.test.ts

import { describe, it, beforeAll, afterAll } from 'vitest';
import { IOSDevice, IOSAgent } from '@midscene/ios';

describe('iOS App Tests', () => {
  let device: IOSDevice;
  let agent: IOSAgent;

  beforeAll(async () => {
    device = new IOSDevice();
    agent = new IOSAgent(device);
    await device.connect();

    // Or use the convenience function (recommended):
    // agent = await agentFromWebDriverAgent();
  });

  afterAll(async () => {
    await device.destroy();
  });

  it('should launch Safari and navigate', async () => {
    await device.launch('com.apple.mobilesafari');
    await agent.aiAssert('Safari is open');
  });
});

Troubleshooting

WebDriverAgent Connection Issues

If you encounter WebDriverAgent connection issues:

Check port forwarding:

lsof -i:8100  # Should show iproxy process

Rebuild WebDriverAgent:

# The iOS package will automatically rebuild when needed

Check device trust:
- Ensure your Mac is trusted on the iOS device
- Check Developer Mode is enabled

Common Errors

"Device not found":

Verify device is connected via USB
Check Device Id with idevice_id -l
Ensure port forwarding is active

"WebDriverAgent session failed":

Restart port forwarding
Check if WebDriverAgent is running on device
Verify development team configuration

"Element not found":

Use more descriptive element descriptions
Wait for UI animations to complete
Check if element is visible on screen

Next Steps

Explore API Reference for complete method documentation
Check out Prompting Tips for better AI interactions
Learn about Model Configuration for optimal performance

On this page

#Integrate with iOS (WebDriverAgent)

#About WebDriver and Midscene's Relationship

#Preparation

#Install Node.js

#Prepare API Key

#Prepare WebDriver Server

#Environment Configuration

#Verify Environment Configuration

#Set up AI model service

#Integrate Midscene

#Step 1: Install dependencies

#Step 2: Write scripts

#Step 3: Run

#Step 4: View execution report

#Constructor and Interface

#IOSDevice Constructor

#Additional iOS Agent Interfaces

#agent.launch()

#agentFromWebDriverAgent() (Recommended)

#Extending Custom Interaction Actions

#More

#FAQ

#Why can't I control my device through WebDriverAgent even though it's connected?

#What are the differences between simulators and real devices?

#How to use custom WebDriverAgent port and host?

#iOS-Specific Actions

#Best Practices

#1. Device Management

#2. Wait for UI Updates

#3. Handle Keyboard Input

#4. Bundle Identifiers

#Testing Integration

#Vitest Integration

#Troubleshooting

#WebDriverAgent Connection Issues

#Common Errors

#Next Steps