API reference (Android)

Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).

Action Space

AndroidDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:

  • Tap — Tap an element.
  • DoubleClick — Double-tap an element.
  • Input — Enter text with replace/append/clear modes and optional autoDismissKeyboard.
  • Scroll — Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right.
  • DragAndDrop — Drag from one element to another.
  • KeyboardPress — Press a specified key.
  • AndroidLongPress — Long-press a target element with optional duration.
  • AndroidPull — Pull up or down (e.g., to refresh) with optional distance and duration.
  • ClearInput — Clear the contents of an input field.
  • Launch — Open a web URL or package/.Activity string.
  • RunAdbShell — Execute raw adb shell commands.
  • AndroidBackButton — Trigger the system back action.
  • AndroidHomeButton — Return to the home screen.
  • AndroidRecentAppsButton — Open the multitasking/recent apps view.

AndroidDevice

Create a connection to an adb-managed device that an AndroidAgent can drive.

Import

import { AndroidDevice, getConnectedDevices } from '@midscene/android';

Constructor

const device = new AndroidDevice(deviceId, {
  // device options...
});

Device options

  • deviceId: string — Value returned by adb devices or getConnectedDevices().
  • autoDismissKeyboard?: boolean — Automatically hide the keyboard after input. Default true.
  • keyboardDismissStrategy?: 'esc-first' | 'back-first' — Order for dismissing keyboards. Default 'esc-first'.
  • androidAdbPath?: string — Custom path to the adb executable.
  • remoteAdbHost?: string / remoteAdbPort?: number — Point to a remote adb server.
  • imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii' — Choose when to invoke yadb for text input. Default 'yadb-for-non-ascii'.
  • displayId?: number — Target a specific virtual display if the device mirrors multiple displays.
  • screenshotResizeScale?: number — Downscale screenshots before sending them to the model. Defaults to 1 / devicePixelRatio.
  • alwaysRefreshScreenInfo?: boolean — Re-query rotation and screen size every step. Default false.

Usage notes

  • Discover devices with getConnectedDevices(); the udid matches adb devices.
  • Supports remote adb via remoteAdbHost/remoteAdbPort; set androidAdbPath if adb is not on PATH.
  • Use screenshotResizeScale to cut latency on high-DPI devices.

Examples

Quick start

import { AndroidAgent, AndroidDevice, getConnectedDevices } from '@midscene/android';

const [first] = await getConnectedDevices();
const device = new AndroidDevice(first.udid);
await device.connect();

const agent = new AndroidAgent(device, {
  aiActionContext: 'If a permissions dialog appears, accept it.',
});

await agent.launch('https://www.ebay.com');
await agent.aiAct('search "Headphones" and wait for results');
const items = await agent.aiQuery(
  '{itemTitle: string, price: number}[], find item in list and corresponding price',
);
console.log(items);

Launch native packages

await agent.launch('com.android.settings/.Settings');
await agent.back();
await agent.home();

AndroidAgent

Wire Midscene's AI planner to an AndroidDevice for UI automation.

Import

import { AndroidAgent } from '@midscene/android';

Constructor

const agent = new AndroidAgent(device, {
  // common agent options...
});

Android-specific options

  • customActions?: DeviceAction[] — Extend planning with actions defined via defineAction.
  • All other fields match API constructors: generateReport, reportFileName, aiActionContext, modelConfig, cacheId, createOpenAIClient, onTaskStartTip, and more.

Usage notes

Info

Android-specific methods

agent.launch()

Launch a web URL or native Android activity/package.

function launch(uri: string): Promise<void>;
  • uri: string — Either a webpage URL or a package/package/.Activity string such as com.android.settings/.Settings.

agent.runAdbShell()

Run a raw adb shell command through the connected device.

function runAdbShell(command: string): Promise<string>;
  • command: string — Command passed verbatim to adb shell.
const result = await agent.runAdbShell('dumpsys battery');
console.log(result);
  • agent.back(): Promise<void> — Trigger the Android system Back action.
  • agent.home(): Promise<void> — Return to the launcher.
  • agent.recentApps(): Promise<void> — Open the Recents/Overview screen.

Helper utilities

agentFromAdbDevice()

Create an AndroidAgent from any connected adb device.

function agentFromAdbDevice(
  deviceId?: string,
  opts?: PageAgentOpt & AndroidDeviceOpt,
): Promise<AndroidAgent>;
  • deviceId?: string — Connect to a specific device; omitted means “first available”.
  • opts?: PageAgentOpt & AndroidDeviceOpt — Combine agent options with AndroidDevice settings.

getConnectedDevices()

Enumerate adb devices Midscene can drive.

function getConnectedDevices(): Promise<Array<{
  udid: string;
  state: string;
  port?: number;
}>>;

See also