API reference (iOS)

Use this doc when you need to customize iOS device behavior, wire Midscene into WebDriverAgent-driven workflows, or troubleshoot WDA requests. For shared constructor options (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).

Action Space

IOSDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:

  • Tap — Tap an element.
  • DoubleClick — Double-tap an element.
  • Input — Enter text with replace/append/clear modes and optional autoDismissKeyboard.
  • Scroll — Scroll from an element or screen center in any direction, including scroll-to-top/bottom/left/right helpers.
  • DragAndDrop — Drag from one element to another.
  • KeyboardPress — Press a specified key.
  • IOSLongPress — Long-press a target element with optional duration.
  • ClearInput — Clear the contents of an input field.
  • Launch — Open a URL, bundle identifier, or URL scheme.
  • RunWdaRequest — Call WebDriverAgent REST endpoints directly.
  • IOSHomeButton — Trigger the iOS system Home action.
  • IOSAppSwitcher — Open the iOS multitasking view.

IOSDevice

Create a WebDriverAgent-backed instance that an IOSAgent can drive.

Import

import { IOSDevice } from '@midscene/ios';

Constructor

const device = new IOSDevice({
  // device options...
});

Device options

  • wdaPort?: number — WebDriverAgent port. Default 8100.
  • wdaHost?: string — WebDriverAgent host. Default 'localhost'.
  • autoDismissKeyboard?: boolean — Hide the keyboard after text input. Default true.
  • customActions?: DeviceAction<any>[] — Additional device actions exposed to the agent.

Usage notes

  • Ensure Developer Mode is enabled and WDA can reach the device; use iproxy when forwarding ports from a real device.
  • Use wdaHost/wdaPort to target remote devices or custom WDA deployments.
  • For shared interaction methods, see API reference (Common).

Examples

Quick start

import { IOSAgent, IOSDevice } from '@midscene/ios';

const device = new IOSDevice({ wdaHost: 'localhost', wdaPort: 8100 });
await device.connect();

const agent = new IOSAgent(device, {
  aiActionContext: 'If any permission dialog appears, accept it.',
});

await agent.launch('https://ebay.com');
await agent.aiAct('Search for "Headphones"');
const items = await agent.aiQuery(
  '{itemTitle: string, price: Number}[], list headphone products',
);
console.log(items);

Custom host and port

const device = new IOSDevice({
  wdaHost: '192.168.1.100',
  wdaPort: 8300,
});
await device.connect();

IOSAgent

Wire Midscene's AI planner to an IOSDevice for UI automation over WebDriverAgent.

Import

import { IOSAgent } from '@midscene/ios';

Constructor

const agent = new IOSAgent(device, {
  // common agent options...
});

iOS-specific options

  • customActions?: DeviceAction<any>[] — Extend planning with actions defined via defineAction.
  • All other fields match API constructors: generateReport, reportFileName, aiActionContext, modelConfig, cacheId, createOpenAIClient, onTaskStartTip, and more.

Usage notes

Info

iOS-specific methods

agent.launch()

Launch a web URL, native application bundle, or custom scheme.

function launch(uri: string): Promise<void>;
  • uri: string — Destination to open (web URL, bundle identifier, URL scheme, tel/mailto, etc.).
await agent.launch('https://www.apple.com');
await agent.launch('com.apple.Preferences');
await agent.launch('myapp://profile/user/123');
await agent.launch('tel:+1234567890');

agent.runWdaRequest()

Execute raw WebDriverAgent REST calls when you need low-level control.

function runWdaRequest(
  method: string,
  endpoint: string,
  data?: Record<string, any>,
): Promise<any>;
  • method: string — HTTP verb (GET, POST, DELETE, etc.).
  • endpoint: string — WebDriverAgent endpoint path.
  • data?: Record<string, any> — Optional JSON body.
const screen = await agent.runWdaRequest('GET', '/wda/screen');
await agent.runWdaRequest('POST', '/session/test/wda/pressButton', { name: 'home' });
  • agent.home(): Promise<void> — Return to the Home screen.
  • agent.appSwitcher(): Promise<void> — Reveal the multitasking view.

Helper utilities

agentFromWebDriverAgent()

Connect to WebDriverAgent and return a ready-to-use IOSAgent.

function agentFromWebDriverAgent(
  opts?: PageAgentOpt & IOSDeviceOpt,
): Promise<IOSAgent>;
  • opts?: PageAgentOpt & IOSDeviceOpt — Combine common agent options with IOSDevice settings.
import { agentFromWebDriverAgent } from '@midscene/ios';

const agent = await agentFromWebDriverAgent({
  wdaHost: 'localhost',
  wdaPort: 8100,
  aiActionContext: 'Accept permission dialogs automatically.',
});

Extending custom interaction actions

Extend the Agent's action space by supplying customActions with handlers created via defineAction. These actions appear after the built-in ones and can be called during planning.

import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { agentFromWebDriverAgent } from '@midscene/ios';

const ContinuousClick = defineAction({
  name: 'continuousClick',
  description: 'Click the same target repeatedly',
  paramSchema: z.object({
    locate: getMidsceneLocationSchema(),
    count: z.number().int().positive().describe('How many times to click'),
  }),
  async call({ locate, count }) {
    console.log('click target center', locate.center);
    console.log('click count', count);
  },
});

const agent = await agentFromWebDriverAgent({
  customActions: [ContinuousClick],
});

await agent.aiAct('Click the red button five times');

See also