Support Android automation
From Midscene v0.15, we are happy to announce the support for Android automation. The era for AI-driven Android automation is here!
Showcases
Navigation to attraction
Open Maps, search for a destination, and navigate to it.
Open Twitter, auto-like the first tweet by @midscene_ai.
Suitable for all apps
For our developers, all you need is the adb connection and a visual-language model (vl-model) service. Everything is ready!
Behind the scenes, we utilize the visual grounding capabilities of vl-model to locate target elements on the screen. So, regardless of whether it's a native app, a Lynx page, or a hybrid app with a webview, it makes no difference. Developers can write automation scripts without the burden of worrying about the technology stack of the app.
With all the power of Midscene
When using Midscene to do web automation, our users loves the tools like playgrounds and reports. Now, we bring the same power to Android automation!
Use the playground to run automation without any code
Use the report to replay the whole process
Write the automation scripts by YAML file
Connect to the device, open ebay.com, and get some items info.
# Search headphone on eBay, extract the items info into a JSON file, and assert the shopping cart icon
android:
deviceId: s4ey59
tasks:
- name: search headphones
flow:
- aiAction: open browser and navigate to ebay.com
- aiAction: type 'Headphones' in ebay search box, hit Enter
- sleep: 5000
- aiAction: scroll down the page for 800px
- name: extract headphones info
flow:
- aiQuery: >
{name: string, price: number, subTitle: string}[], return item name, price and the subTitle on the lower right corner of each item
name: headphones
- name: assert Filter button
flow:
- aiAssert: There is a Filter button on the page
Use the JavaScript SDK
Use the javascript SDK to do the automation by code.
import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';
import "dotenv/config"; // read environment variables from .env file
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));
Promise.resolve(
(async () => {
const devices = await getConnectedDevices();
const page = new AndroidDevice(devices[0].udid);
// 👀 init Midscene agent
const agent = new AndroidAgent(page,{
aiActionContext:
'If any location, permission, user agreement, etc. popup, click agree. If login page pops up, close it.',
});
await page.connect();
await page.launch('https://www.ebay.com');
await sleep(5000);
// 👀 type keywords, perform a search
await agent.aiAction('type "Headphones" in search box, hit Enter');
// 👀 wait for the loading
await agent.aiWaitFor("there is at least one headphone item on page");
// or you may use a plain sleep:
// await sleep(5000);
// 👀 understand the page content, find the items
const items = await agent.aiQuery(
"{itemTitle: string, price: Number}[], find item in list and corresponding price"
);
console.log("headphones in stock", items);
// 👀 assert by AI
await agent.aiAssert("There is a category filter on the left");
})()
);
Two style APIs to do interaction
The auto-planning style:
await agent.ai('input "Headphones" in search box, hit Enter');
The instant action style:
await agent.aiInput('Headphones', 'search box');
await agent.aiKeyboardPress('Enter');
Quick start
You can use the playground to experience the Android automation without any code. Please refer to Quick experience with Android for more details.
After the experience, you can integrate with the Android device by javascript code. Please refer to Integrate with Android(adb) for more details.
If you prefer the yaml file for automation scripts, please refer to Automate with scripts in yaml.