Integrate with iOS (WebDriverAgent)
About WebDriver and Midscene's Relationship
WebDriver is a standard protocol established by W3C for browser automation, providing a unified API to control different browsers and applications. The WebDriver protocol defines the communication method between client and server, enabling automation tools to control various user interfaces across platforms.
Through the efforts of the Appium team and other open source communities, the industry now has many excellent libraries that convert desktop and mobile device automation operations into WebDriver protocol. These tools include:
- Appium - Cross-platform mobile automation framework
- WebDriverAgent - Service dedicated to iOS device automation
- Selenium - Web browser automation tool
- WinAppDriver - Windows application automation tool
Midscene adapts to the WebDriver protocol, which means developers can use AI models to perform intelligent automated operations on any device that supports WebDriver. Through this design, Midscene can not only control traditional operations like clicking and typing, but also:
- Understand interface content and context
- Execute complex multi-step operations
- Perform intelligent assertions and validations
- Extract and analyze interface data
On iOS platform, Midscene connects to iOS devices through WebDriverAgent, allowing you to control iOS apps and system using natural language descriptions.
After connecting iOS devices using WebDriverAgent, you can use Midscene javascript SDK to control iOS devices.
Prepare WebDriver Service
Before getting started, you need to set up the iOS development environment:
- macOS (required for iOS development)
- Xcode and Xcode command line tools
- iOS Simulator or real device
Environment Configuration
Before using Midscene iOS, you need to prepare the WebDriverAgent server. Please refer to the official documentation for setup:
Verify Environment Configuration
After completing the configuration, you can verify whether the service is working properly by accessing WebDriverAgent's status endpoint:
Access URL: http://localhost:8100/status
Correct Response Example:
{
"value": {
"build": {
"version": "10.1.1",
"time": "Sep 24 2025 18:56:41",
"productBundleIdentifier": "com.facebook.WebDriverAgentRunner"
},
"os": {
"testmanagerdVersion": 65535,
"name": "iOS",
"sdkVersion": "26.0",
"version": "26.0"
},
"device": "iphone",
"ios": {
"ip": "10.91.115.63"
},
"message": "WebDriverAgent is ready to accept commands",
"state": "success",
"ready": true
},
"sessionId": "BCAD9603-F714-447C-A9E6-07D58267966B"
}
If you can successfully access this endpoint and receive a similar JSON response as shown above, it indicates that WebDriverAgent is properly configured and running.
Set up AI model service
Set your model configs into the environment variables. You may refer to choose a model for more details.
# replace with your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
# You may need more configs, such as model name and endpoint, please refer to [choose a model](../choose-a-model)
export OPENAI_BASE_URL="..."
Integrate Midscene
Step 1: Install dependencies
npm install @midscene/ios --save-dev
Step 2: Write scripts
Here's an example using iOS Safari browser to search for headphones.
Write the following code and save it as ./demo.ts
./demo.ts
import {
IOSAgent,
IOSDevice,
agentFromWebDriverAgent,
} from '@midscene/ios';
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
(async () => {
// Method 1: Create device and agent directly
const page = new IOSDevice({
wdaPort: 8100,
wdaHost: 'localhost',
});
// 👀 Initialize Midscene agent
const agent = new IOSAgent(page, {
aiActionContext:
'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
});
await page.connect();
// Method 2: Or use convenience function (recommended)
// const agent = await agentFromWebDriverAgent({
// wdaPort: 8100,
// wdaHost: 'localhost',
// aiActionContext: 'If any location, permission, user agreement, etc. popup appears, click agree. If login page appears, close it.',
// });
// 👀 Directly open ebay.com webpage (recommended approach)
await page.launch('https://ebay.com');
await sleep(3000);
// 👀 Enter keywords and perform search
await agent.aiAction('Search for "Headphones"');
// 👀 Wait for loading to complete
await agent.aiWaitFor('At least one headphone product is displayed on the page');
// Or you can use a simple sleep:
// await sleep(5000);
// 👀 Understand page content and extract data
const items = await agent.aiQuery(
'{itemTitle: string, price: Number}[], find product titles and prices in the list',
);
console.log('Headphone product information', items);
// 👀 Use AI assertion
await agent.aiAssert('Multiple headphone products are displayed on the interface');
await page.destroy();
})(),
);
Step 3: Run
Use tsx
to run the script
Shortly after, you will see output like this:
[
{
itemTitle: 'AirPods Pro (2nd generation) with MagSafe Charging Case (USB-C)',
price: 249
},
{
itemTitle: 'Sony WH-1000XM4 Wireless Premium Noise Canceling Overhead Headphones',
price: 278
}
]
Step 4: View execution report
When the above command executes successfully, it will output in the console: Midscene - report file updated: /path/to/report/some_id.html
. Open this file in a browser to view the report.
Constructor and Interface
IOSDevice
Constructor
The IOSDevice constructor supports the following parameters:
opts?: IOSDeviceOpt
- Optional parameters for IOSDevice configuration
wdaPort?: number
- Optional, WebDriverAgent port. Default is 8100.
wdaHost?: string
- Optional, WebDriverAgent host. Default is 'localhost'.
autoDismissKeyboard?: boolean
- Optional, whether to automatically dismiss keyboard after text input. Default is true.
customActions?: DeviceAction<any>[]
- Optional, list of custom device actions.
Additional iOS Agent Interfaces
In addition to the common Agent interfaces in API Reference, IOSAgent provides some additional interfaces:
agent.launch()
Launch a web page or native iOS application.
function launch(uri: string): Promise<void>;
-
Parameters:
uri: string
- URI to open, can be a web url, native app bundle identifier, or custom URL scheme
-
Return Value:
-
Example:
import { IOSAgent, IOSDevice, agentFromWebDriverAgent } from '@midscene/ios';
// Method 1: Create device and agent manually
const page = new IOSDevice();
const agent = new IOSAgent(page);
await page.connect();
// Method 2: Use convenience function (recommended)
const agent = await agentFromWebDriverAgent();
await agent.launch('https://www.apple.com'); // Open web page
await agent.launch('com.apple.mobilesafari'); // Launch Safari
await agent.launch('com.apple.Preferences'); // Launch Settings app
await agent.launch('myapp://profile/user/123'); // Open app deep link
await agent.launch('tel:+1234567890'); // Make a phone call
await agent.launch('mailto:example@email.com'); // Send an email
agentFromWebDriverAgent()
(Recommended)
Create an IOSAgent by connecting to WebDriverAgent service. This is the most convenient way.
function agentFromWebDriverAgent(
opts?: PageAgentOpt & IOSDeviceOpt,
): Promise<IOSAgent>;
-
Parameters:
opts?: PageAgentOpt & IOSDeviceOpt
- Optional, configuration for initializing IOSAgent. PageAgentOpt refers to Constructor, IOSDeviceOpt configuration values refer to IOSDevice Constructor
-
Return Value:
Promise<IOSAgent>
Returns an IOSAgent instance
-
Example:
import { agentFromWebDriverAgent } from '@midscene/ios';
// Use default WebDriverAgent address (localhost:8100)
const agent = await agentFromWebDriverAgent();
// Use custom WebDriverAgent address
const agent = await agentFromWebDriverAgent({
wdaHost: 'localhost',
wdaPort: 8100,
aiActionContext: 'If popups appear, click agree',
});
Extending Custom Interaction Actions
Using the customActions
option combined with custom interaction actions defined by defineAction
, you can extend the Agent's action space. These actions are appended after built-in actions, making them available for the Agent to call during planning.
import { getMidsceneLocationSchema, z } from '@midscene/core';
import { defineAction } from '@midscene/core/device';
import { IOSAgent, IOSDevice } from '@midscene/ios';
const ContinuousClick = defineAction({
name: 'continuousClick',
description: 'Click the same target repeatedly',
paramSchema: z.object({
locate: getMidsceneLocationSchema(),
count: z
.number()
.int()
.positive()
.describe('How many times to click'),
}),
async call(param) {
const { locate, count } = param;
console.log('click target center', locate.center);
console.log('click count', count);
// Implement custom click logic combining locate + count
},
});
const agent = await agentFromWebDriverAgent({
customActions: [ContinuousClick],
});
await agent.aiAction('Click the red button five times');
For more details about custom actions, refer to Integrate with any interface.
More
FAQ
Why can't I control my device through WebDriverAgent even though it's connected?
Please check the following:
- Developer Mode: Ensure it's enabled in Settings > Privacy & Security > Developer Mode
- UI Automation: Ensure it's enabled in Settings > Developer > UI Automation
- Device Trust: Ensure the device trusts the current Mac
What are the differences between simulators and real devices?
Feature | Real Device | Simulator |
---|
Port Forwarding | Requires iproxy | Not required |
Developer Mode | Must enable | Auto-enabled |
UI Automation Settings | Must enable manually | Auto-enabled |
Performance | Real device performance | Depends on Mac performance |
Sensors | Real hardware | Simulated data |
How to use custom WebDriverAgent port and host?
You can specify WebDriverAgent port and host through the IOSDevice constructor or agentFromWebDriverAgent:
// Method 1: Using IOSDevice
const device = new IOSDevice({
wdaPort: 8100, // Custom port
wdaHost: '192.168.1.100', // Custom host
});
// Method 2: Using convenience function (recommended)
const agent = await agentFromWebDriverAgent({
wdaPort: 8100, // Custom port
wdaHost: '192.168.1.100', // Custom host
});
For remote devices, you also need to set up port forwarding accordingly:
iproxy 8100 8100 YOUR_DEVICE_ID
iOS-Specific Actions
The iOS package includes iOS-specific actions that can be used in automation:
// Press home button
await agent.callAction('IOSHomeButton');
// Open app switcher
await agent.callAction('IOSAppSwitcher');
// Long press with custom duration
await agent.callAction('IOSLongPress', {
locate: 'menu item',
duration: 2000, // 2 seconds
});
Best Practices
1. Device Management
Always properly connect and destroy devices:
try {
await device.connect();
// Your automation code here
} finally {
await device.destroy();
}
2. Wait for UI Updates
iOS animations and transitions may need time to complete:
await agent.aiTap('button');
await sleep(1000); // Wait for animation
await agent.aiAssert('new screen loaded');
3. Handle Keyboard Input
For better text input handling:
await agent.aiInput('text', 'input field', {
autoDismissKeyboard: true, // Automatically dismiss keyboard
});
4. Bundle Identifiers
Common iOS app bundle identifiers:
- Safari:
com.apple.mobilesafari
- Settings:
com.apple.Preferences
- Messages:
com.apple.MobileSMS
- Camera:
com.apple.camera
- Photos:
com.apple.mobileslideshow
Testing Integration
Vitest Integration
test/ios.test.ts
import { describe, it, beforeAll, afterAll } from 'vitest';
import { IOSDevice, IOSAgent } from '@midscene/ios';
describe('iOS App Tests', () => {
let device: IOSDevice;
let agent: IOSAgent;
beforeAll(async () => {
device = new IOSDevice();
agent = new IOSAgent(device);
await device.connect();
// Or use the convenience function (recommended):
// agent = await agentFromWebDriverAgent();
});
afterAll(async () => {
await device.destroy();
});
it('should launch Safari and navigate', async () => {
await device.launch('com.apple.mobilesafari');
await agent.aiAssert('Safari is open');
});
});
Troubleshooting
WebDriverAgent Connection Issues
If you encounter WebDriverAgent connection issues:
-
Check port forwarding:
lsof -i:8100 # Should show iproxy process
-
Rebuild WebDriverAgent:
# The iOS package will automatically rebuild when needed
-
Check device trust:
- Ensure your Mac is trusted on the iOS device
- Check Developer Mode is enabled
Common Errors
"Device not found":
- Verify device is connected via USB
- Check Device Id with
idevice_id -l
- Ensure port forwarding is active
"WebDriverAgent session failed":
- Restart port forwarding
- Check if WebDriverAgent is running on device
- Verify development team configuration
"Element not found":
- Use more descriptive element descriptions
- Wait for UI animations to complete
- Check if element is visible on screen
Next Steps