iOS getting started
This guide walks you through everything required to automate an iOS device with Midscene: connect a real phone through WebDriverAgent, configure model credentials, try the no-code Playground, and run your first JavaScript script.
Control iOS devices with JavaScript: https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo
Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo
Set up API keys for model
Set your model configs into the environment variables. You may refer to Model strategy for more details.
For more configuration details, please refer to Model strategy and Model configuration.
Preparation
Install Node.js
Install Node.js 18 or higher.
Prepare API Key
Prepare an API Key for a visual language (VL) model.
You can find supported models and configurations for Midscene.js in the Model strategy documentation.
Prepare WebDriver Server
Before getting started, you need to set up the iOS development environment:
- macOS (required for iOS development)
- Xcode and Xcode command line tools
- iOS Simulator or real device
Environment Configuration
Before using Midscene iOS, you need to prepare the WebDriverAgent service.
WebDriverAgent version must be >= 7.0.0
Please refer to the official documentation for setup:
- Simulator Configuration: Run Prebuilt WDA
- Real Device Configuration: Real Device Configuration
Verify Environment Configuration
After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:
Access URL: http://localhost:8100/status
Correct Response Example:
If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.
Try Playground
Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as @midscene/ios, so anything that works here will behave the same once scripted.

- Launch the Playground CLI:
- Click the gear button to enter the configuration page and paste your API key config. Refer back to Model configuration if you still need credentials.

Start experiencing
After configuration, you can start using Midscene right away. It provides several key operation tabs:
- Act: interact with the page. This is Auto Planning, corresponding to
aiAct. For example:
- Query: extract JSON data from the interface, corresponding to
aiQuery.
Similar methods include aiBoolean(), aiNumber(), and aiString() for directly extracting booleans, numbers, and strings.
- Assert: understand the page and assert; if the condition is not met, throw an error, corresponding to
aiAssert.
- Tap: click on an element. This is Instant Action, corresponding to
aiTap.
For the difference between Auto Planning and Instant Action, see the API document.
Integration with Midscene Agent
Once Playground works, move to a repeatable script with the JavaScript SDK.
Step 1. Install dependencies
Step 2. Write scripts
Save the following code as ./demo.ts. It opens Safari on the device, searches eBay, and asserts the result list.
Step 3. Run
Step 4: View the report
Successful runs print Midscene - report file updated: /path/to/report/some_id.html. Open the generated HTML file in a browser to replay every interaction, query, and assertion.
API reference and more resources
Looking for constructors, helper methods, and platform-only device APIs? See the dedicated iOS API reference for detailed parameter lists plus advanced topics like custom actions. For API surfaces shared across platforms, head to the common API reference.
FAQ
Why can't I control my device through WebDriverAgent even though it's connected?
Please check the following:
- Developer Mode: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
- UI Automation: Ensure it's enabled in Settings > Developer > UI Automation
- Device Trust: Ensure the device trusts the current Mac
What are the differences between simulators and real devices?
How to use custom WebDriverAgent port and host?
You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:
For remote devices, you also need to set up port forwarding accordingly:
More
- For every Agent method, check the API reference (Common).
- For the iOS API reference, see iOS Agent API.
- Demo projects
- iOS JavaScript SDK demo: https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo
- iOS + Vitest demo: https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo

