iOS getting started

This guide walks you through everything required to automate an iOS device with Midscene: connect a real phone through WebDriverAgent, configure model credentials, try the no-code Playground, and run your first JavaScript script.

Set up API keys for model

Set your model configs into the environment variables. You may refer to Model strategy for more details.

export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1"
export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key"
export MIDSCENE_MODEL_NAME="replace-with-your-model-name"
export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family"

For more configuration details, please refer to Model strategy and Model configuration.

Preparation

Install Node.js

Install Node.js 18 or higher.

Prepare API Key

Prepare an API Key for a visual language (VL) model.

You can find supported models and configurations for Midscene.js in the Model strategy documentation.

Prepare WebDriver Server

Before getting started, you need to set up the iOS development environment:

  • macOS (required for iOS development)
  • Xcode and Xcode command line tools
  • iOS Simulator or real device

Environment Configuration

Before using Midscene iOS, you need to prepare the WebDriverAgent service.

Version Requirement

WebDriverAgent version must be >= 7.0.0

Please refer to the official documentation for setup:

Verify Environment Configuration

After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:

Access URL: http://localhost:8100/status

Correct Response Example:

{
  "value": {
    "build": {
      "version": "10.1.1",
      "time": "Sep 24 2025 18:56:41",
      "productBundleIdentifier": "com.facebook.WebDriverAgentRunner"
    },
    "os": {
      "testmanagerdVersion": 65535,
      "name": "iOS",
      "sdkVersion": "26.0",
      "version": "26.0"
    },
    "device": "iphone",
    "ios": {
      "ip": "10.91.115.63"
    },
    "message": "WebDriverAgent is ready to accept commands",
    "state": "success",
    "ready": true
  },
  "sessionId": "BCAD9603-F714-447C-A9E6-07D58267966B"
}

If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.

Try Playground

Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as @midscene/ios, so anything that works here will behave the same once scripted.

  1. Launch the Playground CLI:
npx --yes @midscene/ios-playground
  1. Click the gear button to enter the configuration page and paste your API key config. Refer back to Model configuration if you still need credentials.

Start experiencing

After configuration, you can start using Midscene right away. It provides several key operation tabs:

  • Act: interact with the page. This is Auto Planning, corresponding to aiAct. For example:
Type “Midscene” in the search box, run the search, and open the first result
Fill out the registration form and make sure every field passes validation
  • Query: extract JSON data from the interface, corresponding to aiQuery.

Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.

Extract the user ID from the page and return JSON data in the { id: string } structure
  • Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to aiAssert.
There is a login button on the page, with a user agreement link below it
  • Tap: click on an element. This is Instant Action, corresponding to aiTap.
Click the login button

For the difference between Auto Planning and Instant Action, see the API document.

Integration with Midscene Agent

Once Playground works, move to a repeatable script with the JavaScript SDK.

Step 1. Install dependencies

npm
yarn
pnpm
bun
deno
npm install @midscene/ios --save-dev

Step 2. Write scripts

Save the following code as ./demo.ts. It opens Safari on the device, searches eBay, and asserts the result list.

./demo.ts
import {
  IOSAgent,
  IOSDevice,
  agentFromWebDriverAgent,
} from '@midscene/ios';

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    // Method 1: Create device and agent directly
    const page = new IOSDevice({
      wdaPort: 8100,
      wdaHost: 'localhost',
    });

    // 👀 Initialize Midscene agent
    const agent = new IOSAgent(page, {
      aiActionContext:
        'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    });
    await page.connect();

    // Method 2: Or use convenience function (recommended)
    // const agent = await agentFromWebDriverAgent({
    //   wdaPort: 8100,
    //   wdaHost: 'localhost',
    //   aiActionContext: 'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
    // });

    // 👀 Directly open ebay.com webpage (recommended approach)
    await page.launch('https://ebay.com');
    await sleep(3000);

    // 👀 Enter keywords and perform search
    await agent.aiAct('Search for "Headphones"');

    // 👀 Wait for loading to complete
    await agent.aiWaitFor('At least one headphone product is displayed on the page');
    // Or you can use a simple sleep:
    // await sleep(5000);

    // 👀 Understand page content and extract data
    const items = await agent.aiQuery(
      '{itemTitle: string, price: Number}[], find product titles and prices in the list',
    );
    console.log('Headphone product information', items);

    // 👀 Use AI assertion
    await agent.aiAssert('Multiple headphone products are displayed on the interface');

    await page.destroy();
  })(),
);

Step 3. Run

npx tsx demo.ts

Step 4: View the report

Successful runs print Midscene - report file updated: /path/to/report/some_id.html. Open the generated HTML file in a browser to replay every interaction, query, and assertion.

API reference and more resources

Looking for constructors, helper methods, and platform-only device APIs? See the dedicated iOS API reference for detailed parameter lists plus advanced topics like custom actions. For API surfaces shared across platforms, head to the common API reference.

FAQ

Why can't I control my device through WebDriverAgent even though it's connected?

Please check the following:

  1. Developer Mode: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
  2. UI Automation: Ensure it's enabled in Settings > Developer > UI Automation
  3. Device Trust: Ensure the device trusts the current Mac

What are the differences between simulators and real devices?

FeatureReal DeviceSimulator
Port ForwardingRequires iproxyNot required
Developer ModeMust enableAuto-enabled
UI Automation SettingsMust enable manuallyAuto-enabled
PerformanceReal device performanceDepends on Mac performance
SensorsReal hardwareSimulated data

How to use custom WebDriverAgent port and host?

You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:

// Method 1: Using IOSDevice
const device = new IOSDevice({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

// Method 2: Using convenience function (recommended)
const agent = await agentFromWebDriverAgent({
  wdaPort: 8100,        // Custom port
  wdaHost: '192.168.1.100', // Custom host
});

For remote devices, you also need to set up port forwarding accordingly:

iproxy 8100 8100 YOUR_DEVICE_ID

More