API reference (Android)
Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common).
Action Space
AndroidDevice uses the following action space; the Midscene Agent can use these actions while planning tasks:
Tap— Tap an element.DoubleClick— Double-tap an element.Input— Enter text withreplace/append/clearmodes and optionalautoDismissKeyboard.Scroll— Scroll from an element or screen center in any direction, with helpers to reach the top, bottom, left, or right.DragAndDrop— Drag from one element to another.KeyboardPress— Press a specified key.AndroidLongPress— Long-press a target element with optional duration.AndroidPull— Pull up or down (e.g., to refresh) with optional distance and duration.ClearInput— Clear the contents of an input field.Launch— Open a web URL orpackage/.Activitystring.RunAdbShell— Execute rawadb shellcommands.AndroidBackButton— Trigger the system back action.AndroidHomeButton— Return to the home screen.AndroidRecentAppsButton— Open the multitasking/recent apps view.
AndroidDevice
Create a connection to an adb-managed device that an AndroidAgent can drive.
Import
Constructor
Device options
deviceId: string— Value returned byadb devicesorgetConnectedDevices().autoDismissKeyboard?: boolean— Automatically hide the keyboard after input. Defaulttrue.keyboardDismissStrategy?: 'esc-first' | 'back-first'— Order for dismissing keyboards. Default'esc-first'.androidAdbPath?: string— Custom path to the adb executable.remoteAdbHost?: string/remoteAdbPort?: number— Point to a remote adb server.imeStrategy?: 'always-yadb' | 'yadb-for-non-ascii'— Choose when to invoke yadb for text input. Default'yadb-for-non-ascii'.displayId?: number— Target a specific virtual display if the device mirrors multiple displays.screenshotResizeScale?: number— Downscale screenshots before sending them to the model. Defaults to1 / devicePixelRatio.alwaysRefreshScreenInfo?: boolean— Re-query rotation and screen size every step. Defaultfalse.
Usage notes
- Discover devices with
getConnectedDevices(); theudidmatchesadb devices. - Supports remote adb via
remoteAdbHost/remoteAdbPort; setandroidAdbPathif adb is not on PATH. - Use
screenshotResizeScaleto cut latency on high-DPI devices.
Examples
Quick start
Launch native packages
AndroidAgent
Wire Midscene's AI planner to an AndroidDevice for UI automation.
Import
Constructor
Android-specific options
customActions?: DeviceAction[]— Extend planning with actions defined viadefineAction.- All other fields match API constructors:
generateReport,reportFileName,aiActionContext,modelConfig,cacheId,createOpenAIClient,onTaskStartTip, and more.
Usage notes
Info
- Use one agent per device connection.
- Android-only helpers such as
launchandrunAdbShellare also exposed in YAML scripts. See Android platform-specific actions. - For shared interaction methods, see API reference (Common).
Android-specific methods
agent.launch()
Launch a web URL or native Android activity/package.
uri: string— Either a webpage URL or a package/package/.Activitystring such ascom.android.settings/.Settings.
agent.runAdbShell()
Run a raw adb shell command through the connected device.
command: string— Command passed verbatim toadb shell.
Navigation helpers
agent.back(): Promise<void>— Trigger the Android system Back action.agent.home(): Promise<void>— Return to the launcher.agent.recentApps(): Promise<void>— Open the Recents/Overview screen.
Helper utilities
agentFromAdbDevice()
Create an AndroidAgent from any connected adb device.
deviceId?: string— Connect to a specific device; omitted means “first available”.opts?: PageAgentOpt & AndroidDeviceOpt— Combine agent options with AndroidDevice settings.
getConnectedDevices()
Enumerate adb devices Midscene can drive.
See also
- Android getting started for setup and scripting steps.

