API Reference (PC Desktop)
This page documents the PC desktop-specific APIs provided by @midscene/computer.
For common APIs that work across all platforms, see Common API Reference.
Agent Creation
agentFromComputer(opts?): Promise<ComputerAgent>
Create an agent for local desktop automation.
agentForRDPComputer(opts): Promise<ComputerAgent<RDPDevice>>
Create an agent for remote Windows desktop automation over RDP.
Parameters:
interface BaseComputerAgentOpt {
// Agent options (inherited from AgentOpt)
aiActionContext?: string;
cache?: boolean;
// ... other AgentOpt properties
customActions?: DeviceAction<any>[];
}
interface LocalComputerAgentOpt extends BaseComputerAgentOpt {
// Local desktop options
displayId?: string;
headless?: boolean;
xvfbResolution?: string;
}
interface RDPComputerAgentOpt extends BaseComputerAgentOpt {
host: string;
port?: number;
username?: string;
password?: string;
domain?: string;
adminSession?: boolean;
ignoreCertificate?: boolean;
securityProtocol?: 'auto' | 'tls' | 'nla' | 'rdp';
desktopWidth?: number;
desktopHeight?: number;
}
Local Desktop Options
displayId (optional): Specify which display to control. Get available displays with ComputerDevice.listDisplays().
customActions (optional): Add custom actions to the device.
headless (optional, Linux only): Set to true to start a virtual display via Xvfb, enabling desktop automation on headless Linux servers and CI environments without a physical display. Can also be set via the MIDSCENE_COMPUTER_HEADLESS_LINUX=true environment variable.
xvfbResolution (optional): Resolution for the Xvfb virtual display. Defaults to '1920x1080x24'.
RDP Options
host: Remote Windows host or IP.
port: RDP port. Defaults to 3389.
username / password: Credentials for the remote session.
domain: Optional Windows domain.
adminSession: Request the remote admin session when the server allows it.
ignoreCertificate: Skip certificate validation for self-signed setups.
securityProtocol: Choose 'auto', 'tls', 'nla', or 'rdp'.
desktopWidth / desktopHeight: Request a specific remote desktop resolution.
Example:
import { agentFromComputer } from '@midscene/computer';
// Connect to primary display
const agent = await agentFromComputer({
aiActionContext: 'You are automating a desktop application.',
});
// Connect to specific display
const displays = await ComputerDevice.listDisplays();
const agent2 = await agentFromComputer({
displayId: displays[1].id,
});
Example: connect to a remote Windows desktop over RDP
import { agentForRDPComputer } from '@midscene/computer';
const agent = await agentForRDPComputer({
aiActionContext:
'You are controlling a remote Windows desktop over the RDP protocol.',
host: '10.75.166.249',
port: 3389,
username: 'Admin',
password: 'replace-with-your-password',
ignoreCertificate: true,
});
await agent.aiWaitFor('The remote Windows desktop is visible');
await agent.aiAct('Click the Windows Start button');
await agent.aiAct('Open Settings');
Device Management
ComputerDevice.listDisplays(): Promise<DisplayInfo[]>
List all available displays.
Returns:
interface DisplayInfo {
id: string;
name: string;
primary?: boolean;
}
Example:
import { ComputerDevice } from '@midscene/computer';
const displays = await ComputerDevice.listDisplays();
console.log('Available displays:', displays);
// [
// { id: '0', name: 'Built-in Display', primary: true },
// { id: '1', name: 'External Display', primary: false }
// ]
checkComputerEnvironment(): Promise<EnvironmentCheck>
Check if the computer environment is properly configured.
Returns:
interface EnvironmentCheck {
available: boolean;
error?: string;
platform: string;
displays: number;
}
Example:
import { checkComputerEnvironment } from '@midscene/computer';
const env = await checkComputerEnvironment();
console.log('Environment check:', env);
if (!env.available) {
console.error('Environment error:', env.error);
}
ComputerAgent
The ComputerAgent class extends PageAgent<ComputerDevice> and inherits all common agent methods:
aiAct(action: string): Perform an action with AI
aiQuery(query: string): Extract information with AI
aiAssert(assertion: string): Assert a condition with AI
aiWaitFor(condition: string): Wait for a condition
aiLocate(description: string): Locate an element
- And more...
See Common API Reference for details.
Available Actions
The ComputerDevice supports the following actions:
Mouse Actions
Tap (Click)
Single click at the target location.
await agent.aiAct('click on the File menu');
await agent.aiAct('click at center of screen');
DoubleClick
Double-click at the target location.
await agent.aiAct('double-click on the desktop icon');
RightClick
Right-click to open context menu.
await agent.aiAct('right-click on the desktop');
await agent.aiAct('right-click on the file');
MouseMove
Move mouse to an element.
await agent.aiAct('move mouse to the menu item');
DragAndDrop
Drag from one location and drop at another.
await agent.aiAct('drag the file to the folder');
Keyboard Actions
KeyboardPress
Press keyboard keys with optional modifiers.
Supported keys:
- Regular keys:
a-z, 0-9, Enter, Escape, Space, Tab, etc.
- Arrow keys:
ArrowUp, ArrowDown, ArrowLeft, ArrowRight
- Function keys:
F1-F12
- Modifiers:
Command/Cmd (macOS), Control/Ctrl, Alt, Shift, Win (Windows)
- Media keys:
VolumeUp, VolumeDown, Mute, etc.
Examples:
// Simple key press
await agent.aiAct('press Enter');
await agent.aiAct('press Escape');
// Key combinations (platform-specific)
if (process.platform === 'darwin') {
// macOS
await agent.aiAct('press Cmd+Space'); // Open Spotlight
await agent.aiAct('press Cmd+Tab'); // App switcher
await agent.aiAct('press Cmd+C'); // Copy
await agent.aiAct('press Cmd+V'); // Paste
} else {
// Windows/Linux
await agent.aiAct('press Windows key'); // Start menu
await agent.aiAct('press Alt+Tab'); // App switcher
await agent.aiAct('press Ctrl+C'); // Copy
await agent.aiAct('press Ctrl+V'); // Paste
}
// Arrow keys
await agent.aiAct('press ArrowDown');
await agent.aiAct('press ArrowUp');
// Function keys
await agent.aiAct('press F5'); // Refresh
Type text into an input field.
await agent.aiAct('type "Hello World" in the search box');
await agent.aiAct('type "my-document.txt"');
Clear the content of an input field.
await agent.aiAct('clear the text field');
Scroll the screen or a specific area.
// Scroll directions
await agent.aiAct('scroll down');
await agent.aiAct('scroll up');
await agent.aiAct('scroll left');
await agent.aiAct('scroll right');
// Scroll to positions
await agent.aiAct('scroll to top');
await agent.aiAct('scroll to bottom');
Display Actions
ListDisplays
Get information about all connected displays.
const displays = await ComputerDevice.listDisplays();
When you use RDP, ListDisplays returns the current remote session as a single display.
Examples
Open Application and Navigate
import { agentFromComputer } from '@midscene/computer';
const agent = await agentFromComputer();
// Open application
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+Space');
await agent.aiAct('type "TextEdit" and press Enter');
} else {
await agent.aiAct('press Windows key');
await agent.aiAct('type "Notepad" and press Enter');
}
await agent.aiWaitFor('text editor window is visible');
// Type content
await agent.aiAct('type "Hello, Midscene!"');
// Save file
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+S');
} else {
await agent.aiAct('press Ctrl+S');
}
Multi-Display Workflow
import { ComputerDevice, agentFromComputer } from '@midscene/computer';
// List displays
const displays = await ComputerDevice.listDisplays();
console.log(`Found ${displays.length} displays`);
// Control primary display
const agent1 = await agentFromComputer({
displayId: displays[0].id,
});
await agent1.aiAct('move mouse to center of screen');
// Control secondary display
if (displays.length > 1) {
const agent2 = await agentFromComputer({
displayId: displays[1].id,
});
await agent2.aiAct('move mouse to center of screen');
}
Web Browser Automation
import { agentFromComputer } from '@midscene/computer';
const agent = await agentFromComputer();
// Open browser
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+Space');
await agent.aiAct('type "Safari" and press Enter');
} else {
await agent.aiAct('press Windows key');
await agent.aiAct('type "Chrome" and press Enter');
}
await agent.aiWaitFor('browser window is open');
// Navigate
await agent.aiAct('click on address bar');
await agent.aiAct('type "example.com" and press Enter');
await agent.aiWaitFor('page has loaded');
// Extract information
const title = await agent.aiQuery('string, get the page title');
console.log('Page title:', title);
TypeScript Types
import type {
ComputerAgent,
ComputerAgentOpt,
ComputerDevice,
ComputerDeviceOpt,
DisplayInfo,
EnvironmentCheck,
} from '@midscene/computer';
See Also