API Reference (PC Desktop)

This page documents the PC desktop-specific APIs provided by @midscene/computer.

For common APIs that work across all platforms, see Common API Reference.

Agent Creation

agentFromComputer(opts?): Promise<ComputerAgent>

Create an agent for desktop automation.

Parameters:

interface ComputerAgentOpt {
  // Agent options (inherited from AgentOpt)
  aiActionContext?: string;
  cache?: boolean;
  // ... other AgentOpt properties

  // Device options
  displayId?: string;
  customActions?: DeviceAction<any>[];
  headless?: boolean;
  xvfbResolution?: string;
}

displayId (optional): Specify which display to control. Get available displays with ComputerDevice.listDisplays().
customActions (optional): Add custom actions to the device.
headless (optional, Linux only): Set to true to start a virtual display via Xvfb, enabling desktop automation on headless Linux servers and CI environments without a physical display. Can also be set via the MIDSCENE_COMPUTER_HEADLESS_LINUX=true environment variable.
xvfbResolution (optional): Resolution for the Xvfb virtual display. Defaults to '1920x1080x24'.

Example: Testing Electron Apps on Headless Linux CI

A complete demo of testing Obsidian (an Electron app) on headless Linux CI with @midscene/computer: https://github.com/web-infra-dev/midscene-example/tree/main/computer/electron-demo

Example:

import { agentFromComputer } from '@midscene/computer';

// Connect to primary display
const agent = await agentFromComputer({
  aiActionContext: 'You are automating a desktop application.',
});

// Connect to specific display
const displays = await ComputerDevice.listDisplays();
const agent2 = await agentFromComputer({
  displayId: displays[1].id,
});

Device Management

ComputerDevice.listDisplays(): Promise<DisplayInfo[]>

List all available displays.

Returns:

interface DisplayInfo {
  id: string;
  name: string;
  primary?: boolean;
}

Example:

import { ComputerDevice } from '@midscene/computer';

const displays = await ComputerDevice.listDisplays();
console.log('Available displays:', displays);
// [
//   { id: '0', name: 'Built-in Display', primary: true },
//   { id: '1', name: 'External Display', primary: false }
// ]

checkComputerEnvironment(): Promise<EnvironmentCheck>

Check if the computer environment is properly configured.

Returns:

interface EnvironmentCheck {
  available: boolean;
  error?: string;
  platform: string;
  displays: number;
}

Example:

import { checkComputerEnvironment } from '@midscene/computer';

const env = await checkComputerEnvironment();
console.log('Environment check:', env);

if (!env.available) {
  console.error('Environment error:', env.error);
}

ComputerAgent

The ComputerAgent class extends PageAgent<ComputerDevice> and inherits all common agent methods:

aiAct(action: string): Perform an action with AI
aiQuery(query: string): Extract information with AI
aiAssert(assertion: string): Assert a condition with AI
aiWaitFor(condition: string): Wait for a condition
aiLocate(description: string): Locate an element
And more...

See Common API Reference for details.

Available Actions

The ComputerDevice supports the following actions:

Mouse Actions

Tap (Click)

Single click at the target location.

await agent.aiAct('click on the File menu');
await agent.aiAct('click at center of screen');

DoubleClick

Double-click at the target location.

await agent.aiAct('double-click on the desktop icon');

RightClick

Right-click to open context menu.

await agent.aiAct('right-click on the desktop');
await agent.aiAct('right-click on the file');

MouseMove

Move mouse to an element.

await agent.aiAct('move mouse to the menu item');

DragAndDrop

Drag from one location and drop at another.

await agent.aiAct('drag the file to the folder');

Keyboard Actions

KeyboardPress

Press keyboard keys with optional modifiers.

Supported keys:

Regular keys: a-z, 0-9, Enter, Escape, Space, Tab, etc.
Arrow keys: ArrowUp, ArrowDown, ArrowLeft, ArrowRight
Function keys: F1-F12
Modifiers: Command/Cmd (macOS), Control/Ctrl, Alt, Shift, Win (Windows)
Media keys: VolumeUp, VolumeDown, Mute, etc.

Examples:

// Simple key press
await agent.aiAct('press Enter');
await agent.aiAct('press Escape');

// Key combinations (platform-specific)
if (process.platform === 'darwin') {
  // macOS
  await agent.aiAct('press Cmd+Space');  // Open Spotlight
  await agent.aiAct('press Cmd+Tab');    // App switcher
  await agent.aiAct('press Cmd+C');      // Copy
  await agent.aiAct('press Cmd+V');      // Paste
} else {
  // Windows/Linux
  await agent.aiAct('press Windows key'); // Start menu
  await agent.aiAct('press Alt+Tab');     // App switcher
  await agent.aiAct('press Ctrl+C');      // Copy
  await agent.aiAct('press Ctrl+V');      // Paste
}

// Arrow keys
await agent.aiAct('press ArrowDown');
await agent.aiAct('press ArrowUp');

// Function keys
await agent.aiAct('press F5');  // Refresh

Input

Type text into an input field.

await agent.aiAct('type "Hello World" in the search box');
await agent.aiAct('type "my-document.txt"');

ClearInput

Clear the content of an input field.

await agent.aiAct('clear the text field');

Scroll Actions

Scroll the screen or a specific area.

// Scroll directions
await agent.aiAct('scroll down');
await agent.aiAct('scroll up');
await agent.aiAct('scroll left');
await agent.aiAct('scroll right');

// Scroll to positions
await agent.aiAct('scroll to top');
await agent.aiAct('scroll to bottom');

Display Actions

ListDisplays

Get information about all connected displays.

const displays = await ComputerDevice.listDisplays();

Examples

Open Application and Navigate

import { agentFromComputer } from '@midscene/computer';

const agent = await agentFromComputer();

// Open application
if (process.platform === 'darwin') {
  await agent.aiAct('press Cmd+Space');
  await agent.aiAct('type "TextEdit" and press Enter');
} else {
  await agent.aiAct('press Windows key');
  await agent.aiAct('type "Notepad" and press Enter');
}

await agent.aiWaitFor('text editor window is visible');

// Type content
await agent.aiAct('type "Hello, Midscene!"');

// Save file
if (process.platform === 'darwin') {
  await agent.aiAct('press Cmd+S');
} else {
  await agent.aiAct('press Ctrl+S');
}

Multi-Display Workflow

import { ComputerDevice, agentFromComputer } from '@midscene/computer';

// List displays
const displays = await ComputerDevice.listDisplays();
console.log(`Found ${displays.length} displays`);

// Control primary display
const agent1 = await agentFromComputer({
  displayId: displays[0].id,
});
await agent1.aiAct('move mouse to center of screen');

// Control secondary display
if (displays.length > 1) {
  const agent2 = await agentFromComputer({
    displayId: displays[1].id,
  });
  await agent2.aiAct('move mouse to center of screen');
}

Web Browser Automation

import { agentFromComputer } from '@midscene/computer';

const agent = await agentFromComputer();

// Open browser
if (process.platform === 'darwin') {
  await agent.aiAct('press Cmd+Space');
  await agent.aiAct('type "Safari" and press Enter');
} else {
  await agent.aiAct('press Windows key');
  await agent.aiAct('type "Chrome" and press Enter');
}

await agent.aiWaitFor('browser window is open');

// Navigate
await agent.aiAct('click on address bar');
await agent.aiAct('type "example.com" and press Enter');
await agent.aiWaitFor('page has loaded');

// Extract information
const title = await agent.aiQuery('string, get the page title');
console.log('Page title:', title);

TypeScript Types

import type {
  ComputerAgent,
  ComputerAgentOpt,
  ComputerDevice,
  ComputerDeviceOpt,
  DisplayInfo,
  EnvironmentCheck,
} from '@midscene/computer';

#API Reference (PC Desktop)

#Agent Creation

#Device Management

#ComputerAgent

#Available Actions

#Mouse Actions

#Tap (Click)

#DoubleClick

#RightClick

#MouseMove

#DragAndDrop

#Keyboard Actions

#KeyboardPress

#Input

#ClearInput

#Scroll Actions

#Display Actions

#ListDisplays

#Examples

#Open Application and Navigate

#Multi-Display Workflow

#Web Browser Automation

#TypeScript Types

#See Also

API Reference (PC Desktop)

Agent Creation

Device Management

ComputerAgent

Available Actions

Mouse Actions

Tap (Click)

DoubleClick

RightClick

MouseMove

DragAndDrop

Keyboard Actions

KeyboardPress

Input

ClearInput

Scroll Actions

Display Actions

ListDisplays

Examples

Open Application and Navigate

Multi-Display Workflow

Web Browser Automation

TypeScript Types

See Also