--- url: /android-api-reference.md --- # API reference (Android) Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic [API reference (Common)](./api). ## Action Space `AndroidDevice` uses the following action space; the Midscene Agent can use these actions while planning tasks: - `Tap` — Tap an element. - `DoubleClick` — Double-tap an element. - `Input` — Enter text with `replace`/`append`/`clear` modes and optional `autoDismissKeyboard`. - `Scroll` — Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right. - `DragAndDrop` — Drag from one element to another. - `KeyboardPress` — Press a specified key. - `LongPress` — Long-press a target element with optional duration. - `PullGesture` — Pull up or down (e.g., to refresh) with optional distance and duration. - `ClearInput` — Clear the contents of an input field. - `Launch` — Open a web URL or `package/.Activity` string. - `RunAdbShell` — Execute raw `adb shell` commands. - `AndroidBackButton` — Trigger the system back action. - `AndroidHomeButton` — Return to the home screen. - `AndroidRecentAppsButton` — Open the multitasking/recent apps view. ## AndroidDevice {#androiddevice} Create a connection to an adb-managed device that an AndroidAgent can drive. ### Import ```ts import { AndroidDevice, getConnectedDevices } from '@midscene/android'; ``` ### Constructor ```ts const device = new AndroidDevice(deviceId, { // device options... }); ``` ### Device options - `deviceId: string` — Value returned by `adb devices` or `getConnectedDevices()`. - `autoDismissKeyboard?: boolean` — Automatically hide the keyboard after input. Default `true`. - `keyboardDismissStrategy?: 'esc-first' | 'back-first'` — Order for dismissing keyboards. Default `'esc-first'`. - `androidAdbPath?: string` — Custom path to the adb executable. - `remoteAdbHost?: string` / `remoteAdbPort?: number` — Point to a remote adb server. - `imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'` — Choose when to invoke [yadb](https://github.com/ysbing/YADB) for text input. Default `'yadb-for-non-ascii'`. - `'yadb-for-non-ascii'` (default) — Uses yadb for Unicode characters (including Latin Unicode like ö, é, ñ), Chinese, Japanese, and format specifiers (like %s, %d). Pure ASCII text uses the faster native `adb input text`. - `'always-yadb'` — Always uses yadb for all text input, providing maximum compatibility but slightly slower for pure ASCII text. - `displayId?: number` — Target a specific virtual display if the device mirrors multiple displays. - `screenshotResizeScale?: number` — Downscale screenshots before sending them to the model. Defaults to `1 / devicePixelRatio`. - `alwaysRefreshScreenInfo?: boolean` — Re-query rotation and screen size every step. Default `false`. - `scrcpyConfig?: object` — Scrcpy high-performance screenshot configuration, disabled by default. See [Scrcpy Screenshot Mode](#scrcpy) below. ### Scrcpy Screenshot Mode {#scrcpy} By default, Midscene captures screenshots via `adb shell screencap`, which takes ~500–2000ms per call. Enabling Scrcpy mode streams H.264 video from the device and captures frames in real time, reducing screenshot latency to approximately **100–200ms**. **How to enable:** ```ts const device = new AndroidDevice(deviceId, { scrcpyConfig: { enabled: true, }, }); ``` **Optional parameters:** | Parameter | Type | Default | Description | |-----------|------|---------|-------------| | `enabled` | `boolean` | `false` | Enable Scrcpy screenshots | | `maxSize` | `number` | `0` | Max video dimension (width or height). `0` = no scaling | | `videoBitRate` | `number` | `2000000` | H.264 encoding bitrate (bps) | | `idleTimeoutMs` | `number` | `30000` | Auto-disconnect after idle (ms). Set to `0` to disable | :::tip Scrcpy mode automatically falls back to ADB screenshots if the connection fails. No extra error handling is needed. ::: ### Usage notes - Discover devices with `getConnectedDevices()`; the `udid` matches `adb devices`. - Supports remote adb via `remoteAdbHost/remoteAdbPort`; set `androidAdbPath` if adb is not on PATH. - Use `screenshotResizeScale` to cut latency on high-DPI devices. ### Examples #### Quick start ```ts import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android'; const [first] = await getConnectedDevices(); const device = new AndroidDevice(first.udid); await device.connect(); const agent = new AndroidAgent(device, { aiActionContext: 'If a permissions dialog appears, accept it.', }); await agent.launch('https://www.ebay.com'); await agent.aiAct('search "Headphones" and wait for results'); const items = await agent.aiQuery( '{itemTitle: string, price: number}[], find item in list and corresponding price', ); console.log(items); ``` #### Launch native packages ```ts await agent.launch('com.android.settings/.Settings'); await agent.back(); await agent.home(); ``` ## AndroidAgent {#androidagent} Wire Midscene's AI planner to an AndroidDevice for UI automation. ### Import ```ts import { AndroidAgent } from '@midscene/android'; ``` ### Constructor ```ts const agent = new AndroidAgent(device, { // common agent options... }); ``` ### Android-specific options - `customActions?: DeviceAction[]` — Extend planning with actions defined via `defineAction`. - `appNameMapping?: Record` — Map friendly app names to package names. When you pass an app name to `launch(target)`, the agent will look up the package name in this mapping. If no mapping is found, it will attempt to launch `target` as-is. User-provided mappings take precedence over default mappings. - All other fields match [API constructors](./api#common-parameters): `generateReport`, `reportFileName`, `aiActionContext`, `modelConfig`, `cacheId`, `createOpenAIClient`, `onTaskStartTip`, and more. ### Usage notes :::info - Use one agent per device connection. - Android-only helpers such as `launch` and `runAdbShell` are also exposed in YAML scripts. See [Android platform-specific actions](./automate-with-scripts-in-yaml#the-android-part). - For shared interaction methods, see [API reference (Common)](./api#interaction-methods). ::: ### Android-specific methods #### `agent.launch()` Launch a web URL or native Android activity/package. ```ts function launch(target: string): Promise; ``` - `target: string` — Can be a web URL, a string in `package/.Activity` format (e.g., `com.android.settings/.Settings`), an app package name, or an app name. If you pass an app name and it exists in `appNameMapping`, it will be automatically resolved to the mapped package name; otherwise, `target` will be launched as-is. #### `agent.runAdbShell()` Run a raw `adb shell` command through the connected device. ```ts function runAdbShell(command: string): Promise; ``` - `command: string` — Command passed verbatim to `adb shell`. ```ts const result = await agent.runAdbShell('dumpsys battery'); console.log(result); ``` #### Navigation helpers - `agent.back(): Promise` — Trigger the Android system Back action. - `agent.home(): Promise` — Return to the launcher. - `agent.recentApps(): Promise` — Open the Recents/Overview screen. ### Helper utilities #### `agentFromAdbDevice()` Create an `AndroidAgent` from any connected adb device. ```ts function agentFromAdbDevice( deviceId?: string, opts?: PageAgentOpt & AndroidDeviceOpt, ): Promise; ``` - `deviceId?: string` — Connect to a specific device; omitted means “first available”. - `opts?: PageAgentOpt & AndroidDeviceOpt` — Combine agent options with [AndroidDevice](#androiddevice) settings. #### `getConnectedDevices()` Enumerate adb devices Midscene can drive. ```ts function getConnectedDevices(): Promise>; ``` ### See also - [Android getting started](./android-getting-started) for setup and scripting steps. --- url: /android-getting-started.md --- import { PackageManagerTabs } from '@theme'; # Android Getting Started This guide walks you through everything required to automate an Android device with Midscene: connect a real phone over adb, configure model credentials, try the no-code Playground, and run your first JavaScript script. :::info Demo Projects Control Android devices with JavaScript: [https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo](https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo) Integrate Vitest for testing: [https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo](https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo) ::: ## Set up API keys for model Set your model configs into the environment variables. You may refer to [Model strategy](../model-strategy) for more details. ```bash export MIDSCENE_MODEL_BASE_URL="https://replace-with-your-model-service-url/v1" export MIDSCENE_MODEL_API_KEY="replace-with-your-api-key" export MIDSCENE_MODEL_NAME="replace-with-your-model-name" export MIDSCENE_MODEL_FAMILY="replace-with-your-model-family" ``` For more configuration details, please refer to [Model strategy](../model-strategy) and [Model configuration](../model-config). ## Prepare your Android device Before scripting, confirm adb can talk to your device and the device trusts your machine. ### Install adb and set `ANDROID_HOME` - Install via [Android Studio](https://developer.android.com/studio) or the [command-line tools](https://developer.android.com/studio#command-line-tools-only) - Verify installation: ```bash adb --version ``` Example output indicates success: ```log Android Debug Bridge version 1.0.41 Version 34.0.4-10411341 Installed as /usr/local/bin//adb Running on Darwin 24.3.0 (arm64) ``` - Set `ANDROID_HOME` as documented in [Android environment variables](https://developer.android.com/tools/variables), then confirm: ```bash echo $ANDROID_HOME ``` Any non-empty output means it is configured: ```log /Users/your_username/Library/Android/sdk ``` ### Enable USB debugging and verify the device In the system settings developer options, enable **USB debugging** (and **USB debugging (Security settings)** if present), then connect the device via USB.

android usb debug

Verify the connection: ```bash adb devices -l ``` Example success output: ```log List of devices attached s4ey59 device usb:34603008X product:cezanne model:M2006J device:cezan transport_id:3 ``` ## Try Playground (no code) Playground is the fastest way to validate the connection and observe AI-driven steps without writing code. It shares the same core as `@midscene/android`, so anything that works here will behave the same once scripted.