Integrate with iOS (WebDriverAgent)

About WebDriver and Midscene's Relationship

WebDriver is a standard protocol established by W3C for browser automation, providing a unified API to control different browsers and applications. The WebDriver protocol defines the communication method between client and server, enabling automation tools to control various user interfaces across platforms.

Through the efforts of the Appium team and other open source communities, the industry now has many excellent libraries that convert desktop and mobile device automation operations into WebDriver protocol. These tools include:

  • Appium - Cross-platform mobile automation framework
  • WebDriverAgent - Service dedicated to iOS device automation
  • Selenium - Web browser automation tool
  • WinAppDriver - Windows application automation tool

Midscene adapts to the WebDriver protocol, which means developers can use AI models to perform intelligent automated operations on any device that supports WebDriver. Through this design, Midscene can not only control traditional operations like clicking and typing, but also:

  • Understand interface content and context
  • Execute complex multi-step operations
  • Perform intelligent assertions and validations
  • Extract and analyze interface data

On iOS platform, Midscene connects to iOS devices through WebDriverAgent, allowing you to control iOS apps and system using natural language descriptions.


After connecting iOS devices using WebDriverAgent, you can use Midscene javascript SDK to control iOS devices.

Prepare WebDriver Service

Before getting started, you need to set up the iOS development environment:

  • macOS (required for iOS development)
  • Xcode and Xcode command line tools
  • iOS Simulator or real device

Environment Configuration

Before using Midscene iOS, you need to prepare the WebDriverAgent server. Please refer to the official documentation for setup:

Verify Environment Configuration

After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:

Access URL: http://localhost:8100/status

Correct Response Example:

{
  "value": {
    "build": {
      "version": "10.1.1",
      "time": "Sep 24 2025 18:56:41",
      "productBundleIdentifier": "com.facebook.WebDriverAgentRunner"
    },
    "os": {
      "testmanagerdVersion": 65535,
      "name": "iOS",
      "sdkVersion": "26.0",
      "version": "26.0"
    },
    "device": "iphone",
    "ios": {
      "ip": "10.91.115.63"
    },
    "message": "WebDriverAgent is ready to accept commands",
    "state": "success",
    "ready": true
  },
  "sessionId": "BCAD9603-F714-447C-A9E6-07D58267966B"
}

If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.

Set up AI model service

Set your model configs into the environment variables. You may refer to choose a model for more details.

# replace with your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

# You may need more configs, such as model name and endpoint, please refer to [choose a model](../choose-a-model)
export OPENAI_BASE_URL="..."

Integrate Midscene

Step 1: Install dependencies

npm
yarn
pnpm
bun
npm install @midscene/ios --save-dev

Step 2: Write scripts

Here's an example using iOS Safari browser to search for headphones.

Write the following code and save it as ./demo.ts

./demo.ts
import {
  IOSAgent,
  IOSDevice,
  agentFromWebDriverAgent,
} from '@midscene/ios';

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    // Method 1: Create device and agent directly
    const page = new IOSDevice({
      wdaPort: 8100,
      wdaHost: 'localhost',
    });

    // 👀 Initialize Midscene agent
    const agent = new IOSAgent(page, {
      aiActionContext:
        'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    });
    await page.connect();

    // Method 2: Or use convenience function (recommended)
    // const agent = await agentFromWebDriverAgent({
    //   wdaPort: 8100,
    //   wdaHost: 'localhost',
    //   aiActionContext: 'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    // });

    // 👀 Directly open ebay.com webpage (recommended approach)
    await page.launch('https://ebay.com');
    await sleep(3000);

    // 👀 Enter keywords and perform search
    await agent.aiAction('Search for "Headphones"');

    // 👀 Wait for loading to complete
    await agent.aiWaitFor('At least one headphone product is displayed on the page');
    // Or you can use a simple sleep:
    // await sleep(5000);

    // 👀 Understand page content and extract data
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find product titles and prices in the list',
    );
    console.log('Headphone product information', items);

    // 👀 Use AI assertion
    await agent.aiAssert('Multiple headphone products are displayed on the interface');

    await page.destroy();
  })(),
);

Step 3: Run

Use tsx to run the script

# run
npx tsx demo.ts

Shortly after, you will see output like this:

[
 {
   itemTitle: 'AirPods Pro (2nd generation) with MagSafe Charging Case (USB-C)',
   price: 249
 },
 {
   itemTitle: 'Sony WH-1000XM4 Wireless Premium Noise Canceling Overhead Headphones',
   price: 278
 }
]

Step 4: View execution report

When the above command executes successfully, it will output in the console: Midscene - report file updated: /path/to/report/some_id.html. Open this file in a browser to view the report.

Constructor and Interface

IOSDevice Constructor

The IOSDevice constructor supports the following parameters:

  • opts?: IOSDeviceOpt - Optional parameters for IOSDevice configuration
    • wdaPort?: number - Optional, WebDriverAgent port. Default is 8100.
    • wdaHost?: string - Optional, WebDriverAgent host. Default is 'localhost'.
    • autoDismissKeyboard?: boolean - Optional, whether to automatically dismiss keyboard after text input. Default is true.
    • customActions?: DeviceAction<any>[] - Optional, list of custom device actions.

Additional iOS Agent Interfaces

In addition to the common Agent interfaces in API Reference, IOSAgent provides some additional interfaces:

agent.launch()

Launch a web page or native iOS application.

  • Type
function launch(uri: string): Promise<void>;
  • Parameters:

    • uri: string - URI to open, can be a web url, native app bundle identifier, or custom URL scheme
  • Return Value:

    • Promise<void>
  • Example:

import { IOSAgent, IOSDevice, agentFromWebDriverAgent } from '@midscene/ios';

// Method 1: Create device and agent manually
const page = new IOSDevice();
const agent = new IOSAgent(page);
await page.connect();

// Method 2: Use convenience function (recommended)
const agent = await agentFromWebDriverAgent();

await agent.launch('https://www.apple.com'); // Open web page
await agent.launch('com.apple.mobilesafari'); // Launch Safari
await agent.launch('com.apple.Preferences'); // Launch Settings app
await agent.launch('myapp://profile/user/123'); // Open app deep link
await agent.launch('tel:+1234567890'); // Make a phone call
await agent.launch('mailto:example@email.com'); // Send an email

Create an IOSAgent by connecting to WebDriverAgent service. This is the most convenient way.

  • Type
function agentFromWebDriverAgent(
  opts?: PageAgentOpt & IOSDeviceOpt,
): Promise<IOSAgent>;
  • Parameters:

    • opts?: PageAgentOpt & IOSDeviceOpt - Optional, configuration for initializing IOSAgent. PageAgentOpt refers to Constructor, IOSDeviceOpt configuration values refer to IOSDevice Constructor
  • Return Value:

    • Promise<IOSAgent> Returns an IOSAgent instance
  • Example:

import { agentFromWebDriverAgent } from '@midscene/ios';

// Use default WebDriverAgent address (localhost:8100)
const agent = await agentFromWebDriverAgent();

// Use custom WebDriverAgent address
const agent = await agentFromWebDriverAgent({
  wdaHost: 'localhost',
  wdaPort: 8100,
  aiActionContext: 'If popups appear, click agree',
});

Extending Custom Interaction Actions

Using the customActions option combined with custom interaction actions defined by defineAction, you can extend the Agent's action space. These actions are appended after built-in actions, making them available for the Agent to call during planning.

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { IOSAgent, IOSDevice } from '@midscene/ios';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z
      .number()
      .int()
      .positive()
      .describe('How many times to click'),
  }),
  async call(param) {
    const { locate, count } = param;
    console.log('click target center', locate.center);
    console.log('click count', count);
    // Implement custom click logic combining locate + count
  },
});

const agent = await agentFromWebDriverAgent({
  customActions: [ContinuousClick],
});

await agent.aiAction('Click the red button five times');

For more details about custom actions, refer to Integrate with any interface.

More

FAQ

Why can't I control my device through WebDriverAgent even though it's connected?

Please check the following:

  1. Developer Mode: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
  2. UI Automation: Ensure it's enabled in Settings > Developer > UI Automation
  3. Device Trust: Ensure the device trusts the current Mac

What are the differences between simulators and real devices?

FeatureReal DeviceSimulator
Port ForwardingRequires iproxyNot required
Developer ModeMust enableAuto-enabled
UI Automation SettingsMust enable manuallyAuto-enabled
PerformanceReal device performanceDepends on Mac performance
SensorsReal hardwareSimulated data

How to use custom WebDriverAgent port and host?

You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:

// Method 1: Using IOSDevice
const device = new IOSDevice({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

// Method 2: Using convenience function (recommended)
const agent = await agentFromWebDriverAgent({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

For remote devices, you also need to set up port forwarding accordingly:

iproxy 8100 8100 YOUR_DEVICE_ID

iOS-Specific Actions

The iOS package includes iOS-specific actions that can be used in automation:

// Press home button
await agent.callAction('IOSHomeButton');

// Open app switcher
await agent.callAction('IOSAppSwitcher');

// Long press with custom duration
await agent.callAction('IOSLongPress', {
  locate: 'menu item',
  duration: 2000, // 2 seconds
});

Best Practices

1. Device Management

Always properly connect and destroy devices:

try {
  await device.connect();
  // Your automation code here
} finally {
  await device.destroy();
}

2. Wait for UI Updates

iOS animations and transitions may need time to complete:

await agent.aiTap('button');
await sleep(1000); // Wait for animation
await agent.aiAssert('new screen loaded');

3. Handle Keyboard Input

For better text input handling:

await agent.aiInput('text', 'input field', {
  autoDismissKeyboard: true, // Automatically dismiss keyboard
});

4. Bundle Identifiers

Common iOS app bundle identifiers:

  • Safari: com.apple.mobilesafari
  • Settings: com.apple.Preferences
  • Messages: com.apple.MobileSMS
  • Camera: com.apple.camera
  • Photos: com.apple.mobileslideshow

Testing Integration

Vitest Integration

test/ios.test.ts
import { describe, it, beforeAll, afterAll } from 'vitest';
import { IOSDevice, IOSAgent } from '@midscene/ios';

describe('iOS App Tests', () => {
  let device: IOSDevice;
  let agent: IOSAgent;

  beforeAll(async () => {
    device = new IOSDevice();
    agent = new IOSAgent(device);
    await device.connect();

    // Or use the convenience function (recommended):
    // agent = await agentFromWebDriverAgent();
  });

  afterAll(async () => {
    await device.destroy();
  });

  it('should launch Safari and navigate', async () => {
    await device.launch('com.apple.mobilesafari');
    await agent.aiAssert('Safari is open');
  });
});

Troubleshooting

WebDriverAgent Connection Issues

If you encounter WebDriverAgent connection issues:

  1. Check port forwarding:

    lsof -i:8100  # Should show iproxy process
  2. Rebuild WebDriverAgent:

    # The iOS package will automatically rebuild when needed
  3. Check device trust:

    • Ensure your Mac is trusted on the iOS device
    • Check Developer Mode is enabled

Common Errors

"Device not found":

  • Verify device is connected via USB
  • Check Device Id with idevice_id -l
  • Ensure port forwarding is active

"WebDriverAgent session failed":

  • Restart port forwarding
  • Check if WebDriverAgent is running on device
  • Verify development team configuration

"Element not found":

  • Use more descriptive element descriptions
  • Wait for UI animations to complete
  • Check if element is visible on screen

Next Steps