Support iOS automation
From Midscene v0.29, we are happy to announce the support for iOS automation. The era for AI-driven iOS automation is here!
Showcases
Open Twitter and auto-like the first tweet by @midscene_ai.
Suitable for all apps
For our developers, all you need is the WebDriver Server and a visual-language model (vl-model) service. Everything is ready!
Behind the scenes, we utilize the visual grounding capabilities of vl-model to locate target elements on the screen. So, regardless of whether it's a native iOS app, a Safari web page, or a hybrid app with a WebView, it makes no difference. Developers can write automation scripts without the burden of worrying about the technology stack of the app.
With all the power of Midscene
When using Midscene to do web automation, our users loves the tools like playgrounds and reports. Now, we bring the same power to iOS automation!
Use the playground to run automation without any code
Use the report to replay the whole process
Write the automation scripts by YAML file
Open Safari on iOS device, search for content and extract information.
# Open Safari browser on iOS device, search for content and extract information
ios:
deviceId: "iPhone"
bundleId: "com.apple.mobilesafari"
tasks:
- name: search content
flow:
- aiAction: tap address bar
- aiAction: input 'Midscene AI automation'
- aiAction: tap search button
- sleep: 3000
- aiAction: scroll down 500px
- name: extract search results
flow:
- aiQuery: >
{title: string, url: string, description: string}[],
return search result titles, links and descriptions
name: searchResults
- name: verify page elements
flow:
- aiAssert: there is a search results list on the page
Use the JavaScript SDK
Use the javascript SDK to do the automation by code.
import { IOSAgent, IOSDevice } from '@midscene/ios';
import "dotenv/config"; // read environment variables from .env file
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
(async () => {
// 👀 initialize iOS device
const device = new IOSDevice({
deviceId: 'iPhone',
bundleId: 'com.apple.mobilesafari'
});
// 👀 initialize Midscene agent
const agent = new IOSAgent(device, {
aiActionContext:
'If any permission popup appears, tap allow. If login page pops up, skip it.',
});
await device.connect();
await device.launchApp();
await sleep(3000);
// 👀 tap address bar and input search keywords
await agent.aiAction('tap address bar and input "Midscene automation"');
// 👀 perform search
await agent.aiAction('tap search button');
// 👀 wait for loading to complete
await agent.aiWaitFor("there is at least one search result on the page");
// or you may use a plain sleep:
// await sleep(5000);
// 👀 understand page content, find search results
const results = await agent.aiQuery(
"{title: string, url: string}[], find titles and links in search results list"
);
console.log("search results", results);
// 👀 assert by AI
await agent.aiAssert("relevant search results are displayed on the page");
})()
);
Two style APIs to do interaction
The auto-planning style:
await agent.ai('tap address bar and input "Midscene automation", then search');
The instant action style:
await agent.aiTap('address bar');
await agent.aiInput('Midscene automation', 'address bar');
await agent.aiTap('search button');
Quick start
You can use the playground to experience the iOS automation without any code. Please refer to Quick experience with iOS for more details.
After the experience, you can integrate with the iOS device by javascript code. Please refer to Integrate with iOS(WebDriverAgent) for more details.
If you prefer the yaml file for automation scripts, please refer to Automate with scripts in yaml.