Support iOS automation

From Midscene v0.29, we are happy to announce the support for iOS automation. The era for AI-driven iOS automation is here!

Showcases

Auto-like tweets

Open Twitter and auto-like the first tweet by @midscene_ai.

Suitable for all apps

For our developers, all you need is the WebDriver Server and a visual-language model (vl-model) service. Everything is ready!

Behind the scenes, we utilize the visual grounding capabilities of vl-model to locate target elements on the screen. So, regardless of whether it's a native iOS app, a Safari web page, or a hybrid app with a WebView, it makes no difference. Developers can write automation scripts without the burden of worrying about the technology stack of the app.

With all the power of Midscene

When using Midscene to do web automation, our users loves the tools like playgrounds and reports. Now, we bring the same power to iOS automation!

Use the playground to run automation without any code

Use the report to replay the whole process

Write the automation scripts by YAML file

Open Safari on iOS device, search for content and extract information.

# Open Safari browser on iOS device, search for content and extract information

ios:
  deviceId: "iPhone"
  bundleId: "com.apple.mobilesafari"

tasks:
  - name: search content
    flow:
      - aiAction: tap address bar
      - aiAction: input 'Midscene AI automation'
      - aiAction: tap search button
      - sleep: 3000
      - aiAction: scroll down 500px

  - name: extract search results
    flow:
      - aiQuery: >
          {title: string, url: string, description: string}[],
          return search result titles, links and descriptions
        name: searchResults

  - name: verify page elements
    flow:
      - aiAssert: there is a search results list on the page

Use the JavaScript SDK

Use the javascript SDK to do the automation by code.

import { IOSAgent, IOSDevice } from '@midscene/ios';
import "dotenv/config"; // read environment variables from .env file

const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
  (async () => {
    // 👀 initialize iOS device
    const device = new IOSDevice({
      deviceId: 'iPhone',
      bundleId: 'com.apple.mobilesafari'
    });

    // 👀 initialize Midscene agent
    const agent = new IOSAgent(device, {
      aiActionContext:
        'If any permission popup appears, tap allow. If login page pops up, skip it.',
    });

    await device.connect();
    await device.launchApp();

    await sleep(3000);

    // 👀 tap address bar and input search keywords
    await agent.aiAction('tap address bar and input "Midscene automation"');

    // 👀 perform search
    await agent.aiAction('tap search button');

    // 👀 wait for loading to complete
    await agent.aiWaitFor("there is at least one search result on the page");
    // or you may use a plain sleep:
    // await sleep(5000);

    // 👀 understand page content, find search results
    const results = await agent.aiQuery(
      "{title: string, url: string}[], find titles and links in search results list"
    );
    console.log("search results", results);

    // 👀 assert by AI
    await agent.aiAssert("relevant search results are displayed on the page");
  })()
);

Two style APIs to do interaction

The auto-planning style:

await agent.ai('tap address bar and input "Midscene automation", then search');

The instant action style:

await agent.aiTap('address bar');
await agent.aiInput('Midscene automation', 'address bar');
await agent.aiTap('search button');

Quick start

You can use the playground to experience the iOS automation without any code. Please refer to Quick experience with iOS for more details.

After the experience, you can integrate with the iOS device by javascript code. Please refer to Integrate with iOS(WebDriverAgent) for more details.

If you prefer the yaml file for automation scripts, please refer to Automate with scripts in yaml.