API reference (iOS)
Use this doc when you need to customize iOS device behavior, wire Midscene into WebDriverAgent-driven workflows, or troubleshoot WDA requests. For shared constructor options (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).
Action Space
IOSDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:
Tap— Tap an element.DoubleClick— Double-tap an element.Input— Enter text withreplace/append/clearmodes and optionalautoDismissKeyboard.Scroll— Scroll from an element or screen center in any direction, including scroll-to-top/bottom/left/right helpers.DragAndDrop— Drag from one element to another.KeyboardPress— Press a specified key.IOSLongPress— Long-press a target element with optional duration.ClearInput— Clear the contents of an input field.Launch— Open a URL, bundle identifier, or URL scheme.RunWdaRequest— Call WebDriverAgent REST endpoints directly.IOSHomeButton— Trigger the iOS system Home action.IOSAppSwitcher— Open the iOS multitasking view.
IOSDevice
Create a WebDriverAgent-backed instance that an IOSAgent can drive.
Import
Constructor
Device options
wdaPort?: number— WebDriverAgent port. Default8100.wdaHost?: string— WebDriverAgent host. Default'localhost'.autoDismissKeyboard?: boolean— Hide the keyboard after text input. Defaulttrue.customActions?: DeviceAction<any>[]— Additional device actions exposed to the agent.
Usage notes
- Ensure Developer Mode is enabled and WDA can reach the device; use
iproxywhen forwarding ports from a real device. - Use
wdaHost/wdaPortto target remote devices or custom WDA deployments. - For shared interaction methods, see API reference (Common).
Examples
Quick start
Custom host and port
IOSAgent
Wire Midscene's AI planner to an IOSDevice for UI automation over WebDriverAgent.
Import
Constructor
iOS-specific options
customActions?: DeviceAction<any>[]— Extend planning with actions defined viadefineAction.- All other fields match API constructors:
generateReport,reportFileName,aiActionContext,modelConfig,cacheId,createOpenAIClient,onTaskStartTip, and more.
Usage notes
- Use one agent per device connection.
- iOS-only helpers such as
launchandrunWdaRequestare also exposed in YAML scripts. See iOS platform-specific actions. - For shared interaction methods, see API reference (Common).
iOS-specific methods
agent.launch()
Launch a web URL, native application bundle, or custom scheme.
uri: string— Destination to open (web URL, bundle identifier, URL scheme, tel/mailto, etc.).
agent.runWdaRequest()
Execute raw WebDriverAgent REST calls when you need low-level control.
method: string— HTTP verb (GET,POST,DELETE, etc.).endpoint: string— WebDriverAgent endpoint path.data?: Record<string, any>— Optional JSON body.
Navigation helpers
agent.home(): Promise<void>— Return to the Home screen.agent.appSwitcher(): Promise<void>— Reveal the multitasking view.
Helper utilities
agentFromWebDriverAgent()
Connect to WebDriverAgent and return a ready-to-use IOSAgent.
opts?: PageAgentOpt & IOSDeviceOpt— Combine common agent options withIOSDevicesettings.
Extending custom interaction actions
Extend the Agent's action space by supplying customActions with handlers created via defineAction. These actions appear after the built-in ones and can be called during planning.
See also
- iOS getting started for setup and scripting steps.
- Integrate with any interface for custom actions and schemas.

