--- url: /android-api-reference.md --- # API reference (Android) Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic [API reference (Common)](./api). ## Action Space `AndroidDevice` uses the following action space; the Midscene Agent can use these actions while planning tasks: - `Tap` — Tap an element. - `DoubleClick` — Double-tap an element. - `Input` — Enter text with `replace`/`append`/`clear` modes and optional `autoDismissKeyboard`. - `Scroll` — Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right. - `DragAndDrop` — Drag from one element to another. - `KeyboardPress` — Press a specified key. - `AndroidLongPress` — Long-press a target element with optional duration. - `AndroidPull` — Pull up or down (e.g., to refresh) with optional distance and duration. - `ClearInput` — Clear the contents of an input field. - `Launch` — Open a web URL or `package/.Activity` string. - `RunAdbShell` — Execute raw `adb shell` commands. - `AndroidBackButton` — Trigger the system back action. - `AndroidHomeButton` — Return to the home screen. - `AndroidRecentAppsButton` — Open the multitasking/recent apps view. ## AndroidDevice {#androiddevice} Create a connection to an adb-managed device that an AndroidAgent can drive. ### Import ```ts import { AndroidDevice, getConnectedDevices } from '@midscene/android'; ``` ### Constructor ```ts const device = new AndroidDevice(deviceId, { // device options... }); ``` ### Device options - `deviceId: string` — Value returned by `adb devices` or `getConnectedDevices()`. - `autoDismissKeyboard?: boolean` — Automatically hide the keyboard after input. Default `true`. - `keyboardDismissStrategy?: 'esc-first' | 'back-first'` — Order for dismissing keyboards. Default `'esc-first'`. - `androidAdbPath?: string` — Custom path to the adb executable. - `remoteAdbHost?: string` / `remoteAdbPort?: number` — Point to a remote adb server. - `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` — Choose when to invoke [yadb](https://github.com/ysbing/YADB) for text input. Default `'yadb-for-non-ascii'`. - `displayId?: number` — Target a specific virtual display if the device mirrors multiple displays. - `screenshotResizeScale?: number` — Downscale screenshots before sending them to the model. Defaults to `1 / devicePixelRatio`. - `alwaysRefreshScreenInfo?: boolean` — Re-query rotation and screen size every step. Default `false`. ### Usage notes - Discover devices with `getConnectedDevices()`; the `udid` matches `adb devices`. - Supports remote adb via `remoteAdbHost/remoteAdbPort`; set `androidAdbPath` if adb is not on PATH. - Use `screenshotResizeScale` to cut latency on high-DPI devices. ### Examples #### Quick start ```ts import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android'; const [first] = await getConnectedDevices(); const device = new AndroidDevice(first.udid); await device.connect(); const agent = new AndroidAgent(device, { aiActionContext: 'If a permissions dialog appears, accept it.', }); await agent.launch('https://www.ebay.com'); await agent.aiAct('search "Headphones" and wait for results'); const items = await agent.aiQuery( '{itemTitle: string, price: number}[], find item in list and corresponding price', ); console.log(items); ``` #### Launch native packages ```ts await agent.launch('com.android.settings/.Settings'); await agent.back(); await agent.home(); ``` ## AndroidAgent {#androidagent} Wire Midscene's AI planner to an AndroidDevice for UI automation. ### Import ```ts import { AndroidAgent } from '@midscene/android'; ``` ### Constructor ```ts const agent = new AndroidAgent(device, { // common agent options... }); ``` ### Android-specific options - `customActions?: DeviceAction[]` — Extend planning with actions defined via `defineAction`. - All other fields match [API constructors](./api#common-parameters): `generateReport`, `reportFileName`, `aiActionContext`, `modelConfig`, `cacheId`, `createOpenAIClient`, `onTaskStartTip`, and more. ### Usage notes :::info - Use one agent per device connection. - Android-only helpers such as `launch` and `runAdbShell` are also exposed in YAML scripts. See [Android platform-specific actions](./automate-with-scripts-in-yaml#the-android-part). - For shared interaction methods, see [API reference (Common)](./api#interaction-methods). ::: ### Android-specific methods #### `agent.launch()` Launch a web URL or native Android activity/package. ```ts function launch(uri: string): Promise; ``` - `uri: string` — Either a webpage URL or a package/`package/.Activity` string such as `com.android.settings/.Settings`. #### `agent.runAdbShell()` Run a raw `adb shell` command through the connected device. ```ts function runAdbShell(command: string): Promise; ``` - `command: string` — Command passed verbatim to `adb shell`. ```ts const result = await agent.runAdbShell('dumpsys battery'); console.log(result); ``` #### Navigation helpers - `agent.back(): Promise` — Trigger the Android system Back action. - `agent.home(): Promise` — Return to the launcher. - `agent.recentApps(): Promise` — Open the Recents/Overview screen. ### Helper utilities #### `agentFromAdbDevice()` Create an `AndroidAgent` from any connected adb device. ```ts function agentFromAdbDevice( deviceId?: string, opts?: PageAgentOpt & AndroidDeviceOpt, ): Promise; ``` - `deviceId?: string` — Connect to a specific device; omitted means “first available”. - `opts?: PageAgentOpt & AndroidDeviceOpt` — Combine agent options with [AndroidDevice](#androiddevice) settings. #### `getConnectedDevices()` Enumerate adb devices Midscene can drive. ```ts function getConnectedDevices(): Promise>; ``` ### See also - [Android getting started](./android-getting-started) for setup and scripting steps. --- url: /android-getting-started.md --- import { PackageManagerTabs } from '@theme'; # Android Getting Started This guide walks you through everything required to automate an Android device with Midscene: connect a real phone over adb, configure model credentials, try the no-code Playground, and run your first JavaScript script. :::info Demo Projects Control Android devices with JavaScript: [https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo) Integrate Vitest for testing: [https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo) ::: ## Set up API keys for model Set your model configs into the environment variables. You may refer to [Model strategy](../model-strategy) for more details. ```bash export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1" export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key" export MIDSCENE_MODEL_NAME="replace-with-your-model-name" export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family" ``` For more configuration details, please refer to [Model strategy](../model-strategy) and [Model configuration](../model-config). ## Prepare your Android device Before scripting, confirm adb can talk to your device and the device trusts your machine. ### Install adb and set `ANDROID_HOME` - Install via [Android Studio](https://developer.android.com/studio) or the [command-line tools](https://developer.android.com/studio#command-line-tools-only) - Verify installation: ```bash adb --version ``` Example output indicates success: ```log Android Debug Bridge version 1.0.41 Version 34.0.4-10411341 Installed as /usr/local/bin//adb Running on Darwin 24.3.0 (arm64) ``` - Set `ANDROID_HOME` as documented in [Android environment variables](https://developer.android.com/tools/variables), then confirm: ```bash echo $ANDROID_HOME ``` Any non-empty output means it is configured: ```log /Users/your_username/Library/Android/sdk ``` ### Enable USB debugging and verify the device In the system settings developer options, enable **USB debugging** (and **USB debugging (Security settings)** if present), then connect the device via USB.

android usb debug

Verify the connection: ```bash adb devices -l ``` Example success output: ```log List of devices attached s4ey59 device usb:34603008X product:cezanne model:M2006J device:cezan transport_id:3 ``` ## Try Playground (no code) Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as `@midscene/android`, so anything that works here will behave the same once scripted. ![](/android-playground.png) 1. Launch the Playground CLI: ```bash npx --yes @midscene/android-playground ``` 2. Click the gear icon in the Playground window, then paste your API key configuration. Refer back to [Model configuration](./model-config) if you still need credentials. ![](/android-set-env.png) ### Start experiencing After configuration, you can start using Midscene right away. It provides several key operation tabs: - **Act**: interact with the page. This is Auto Planning, corresponding to `aiAct`. For example: ``` Type “Midscene” in the search box, run the search, and open the first result ``` ``` Fill out the registration form and make sure every field passes validation ``` - **Query**: extract JSON data from the interface, corresponding to `aiQuery`. Similar methods include `aiBoolean()`, `aiNumber()`, and `aiString()` for directly extracting booleans, numbers, and strings. ``` Extract the user ID from the page and return JSON data in the { id: string } structure ``` - **Assert**: understand the page and assert; if the condition is not met, throw an error, corresponding to `aiAssert`. ``` There is a login button on the page, with a user agreement link below it ``` - **Tap**: click on an element. This is Instant Action, corresponding to `aiTap`. ``` Click the login button ``` > For the difference between Auto Planning and Instant Action, see the [API](../api.mdx) document. ## Integration with Midscene Agent Once Playground works, move to a repeatable script with the JavaScript SDK. ### Step 1. Install dependencies ### Step 2. Write scripts Save the following code as `./demo.ts`. It opens the browser on the device, searches eBay, and asserts the result list. ```typescript title="./demo.ts" import { AndroidAgent, AndroidDevice, getConnectedDevices, } from '@midscene/android'; const sleep = (ms) => new Promise((r) => setTimeout(r, ms)); Promise.resolve( (async () => { const devices = await getConnectedDevices(); const device = new AndroidDevice(devices[0].udid); const agent = new AndroidAgent(device, { aiActionContext: 'If any location, permission, user agreement, etc. popup, click agree. If login page pops up, close it.', }); await device.connect(); await agent.aiAct('open browser and navigate to ebay.com'); await sleep(5000); await agent.aiAct('type "Headphones" in search box, hit Enter'); await agent.aiWaitFor('There is at least one headphone product'); const items = await agent.aiQuery( '{itemTitle: string, price: Number}[], find item in list and corresponding price', ); console.log('headphones in stock', items); await agent.aiAssert('There is a category filter on the left'); })(), ); ``` ### Step 3. Run ```bash npx tsx demo.ts ``` ### Step 4: View the report Successful runs print `Midscene - report file updated: /path/to/report/some_id.html`. Open the generated HTML file in a browser to replay every interaction, query, and assertion. ## Advanced Use this section when you need to customize device behavior, wire Midscene into your framework, or troubleshoot adb issues. For detailed constructor parameters, jump to the [API reference(Android)](./android-api-reference). ### Extend Midscene on Android Use `defineAction()` for custom gestures and pass them through `customActions`. Midscene will append them to the planner so AI can call your domain-specific action names. ```typescript import { getMidsceneLocationSchema, z } from '@midscene/core'; import { defineAction } from '@midscene/core/device'; import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android'; const ContinuousClick = defineAction({ name: 'continuousClick', description: 'Click the same target repeatedly', paramSchema: z.object({ locate: getMidsceneLocationSchema(), count: z.number().int().positive().describe('How many times to click'), }), async call(param) { const { locate, count } = param; console.log('click target center', locate.center); console.log('click count', count); }, }); const devices = await getConnectedDevices(); const device = new AndroidDevice(devices[0].udid); await device.connect(); const agent = new AndroidAgent(device, { customActions: [ContinuousClick], }); await agent.aiAct('click the red button five times'); ``` See [Integrate with any interface](./integrate-with-any-interface#define-a-custom-action) for a deeper explanation of custom actions and action schemas. ## More - For every Agent method, check the [API reference (Common)](./api#interaction-methods). - For the Android API reference, see [Android Agent API](./android-api-reference). - Demo projects - Android JavaScript SDK demo: [https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo) - Android + Vitest demo: [https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo) ### Complete example (Vitest + AndroidAgent) ```typescript import { AndroidAgent, AndroidDevice, getConnectedDevices, } from '@midscene/android'; import type { TestStatus } from '@midscene/core'; import { ReportMergingTool } from '@midscene/core/report'; import { sleep } from '@midscene/core/utils'; import type ADB from 'appium-adb'; import { afterAll, afterEach, beforeAll, beforeEach, describe, it, } from 'vitest'; describe('Android Settings Test', () => { let page: AndroidDevice; let adb: ADB; let agent: AndroidAgent; let startTime: number; let itTestStatus: TestStatus = 'passed'; const reportMergingTool = new ReportMergingTool(); beforeAll(async () => { const devices = await getConnectedDevices(); page = new AndroidDevice(devices[0].udid); adb = await page.getAdb(); }); beforeEach((ctx) => { startTime = performance.now(); agent = new AndroidAgent(page, { groupName: ctx.task.name, }); }); afterEach((ctx) => { if (ctx.task.result?.state === 'pass') { itTestStatus = 'passed'; } else if (ctx.task.result?.state === 'skip') { itTestStatus = 'skipped'; } else if (ctx.task.result?.errors?.[0].message.includes('timed out')) { itTestStatus = 'timedOut'; } else { itTestStatus = 'failed'; } reportMergingTool.append({ reportFilePath: agent.reportFile as string, reportAttributes: { testId: `${ctx.task.name}`, testTitle: `${ctx.task.name}`, testDescription: 'description', testDuration: (Date.now() - ctx.task.result?.startTime!) | 0, testStatus: itTestStatus, }, }); }); afterAll(() => { reportMergingTool.mergeReports('my-android-setting-test-report'); }); it('toggle wlan', async () => { await adb.shell('input keyevent KEYCODE_HOME'); await sleep(1000); await adb.shell('am start -n com.android.settings/.Settings'); await sleep(1000); await agent.aiAct('find and enter WLAN setting'); await agent.aiAct( 'toggle WLAN status *once*, if WLAN is off pls turn it on, otherwise turn it off.', ); }); it('toggle bluetooth', async () => { await adb.shell('input keyevent KEYCODE_HOME'); await sleep(1000); await adb.shell('am start -n com.android.settings/.Settings'); await sleep(1000); await agent.aiAct('find and enter bluetooth setting'); await agent.aiAct( 'toggle bluetooth status *once*, if bluetooth is off pls turn it on, otherwise turn it off.', ); }); }); ``` :::tip Merged reports are stored inside `midscene_run/report` by default. Override the directory with `MIDSCENE_RUN_DIR` when running in CI. ::: ## FAQ ### Why can't I control the device even though I've connected it? A common error is: ``` Error: Exception occurred while executing 'tap': java.lang.SecurityException: Injecting input events requires the caller (or the source of the instrumentation, if any) to have the INJECT_EVENTS permission. ``` Make sure USB debugging is enabled and the device is unlocked in developer options.

android usb debug

### How do I use a custom adb path or remote adb server? Set the environment variables first: ```bash export MIDSCENE_ADB_PATH=/path/to/adb export MIDSCENE_ADB_REMOTE_HOST=192.168.1.100 export MIDSCENE_ADB_REMOTE_PORT=5037 ``` You can also provide the same information via the constructor: ```typescript const device = new AndroidDevice('s4ey59', { androidAdbPath: '/path/to/adb', remoteAdbHost: '192.168.1.100', remoteAdbPort: 5037, }); ``` --- url: /android-introduction.md --- # Android Automation Support Midscene can drive adb tools to support Android automation. By adapting a visual model solution, the automation process works with any app tech stack—whether built with Native, Flutter, React Native, or Lynx. Developers only need to focus on the final experience when debugging UI automation scripts. The Android UI automation solution comes with all the features of Midscene: - Supports zero-code trial using Playground. - Supports JavaScript SDK. - Supports automation scripts in YAML format and command-line tools. - Supports HTML reports to replay all operation paths. ## Showcases **Prompt** : Open the Booking App, search for a hotel in Tokyo for four adults on Christmas, with a score of 8 or above.