API Reference (PC Desktop)
This page documents the PC desktop-specific APIs provided by @midscene/computer.
For common APIs that work across all platforms, see Common API Reference.
Agent Creation
agentFromComputer(opts?): Promise<ComputerAgent>
Create an agent for desktop automation.
Parameters:
interface ComputerAgentOpt {
// Agent options (inherited from AgentOpt)
aiActionContext?: string;
cache?: boolean;
// ... other AgentOpt properties
// Device options
displayId?: string;
customActions?: DeviceAction<any>[];
}
displayId (optional): Specify which display to control. Get available displays with ComputerDevice.listDisplays().
customActions (optional): Add custom actions to the device.
Example:
import { agentFromComputer } from '@midscene/computer';
// Connect to primary display
const agent = await agentFromComputer({
aiActionContext: 'You are automating a desktop application.',
});
// Connect to specific display
const displays = await ComputerDevice.listDisplays();
const agent2 = await agentFromComputer({
displayId: displays[1].id,
});
Device Management
ComputerDevice.listDisplays(): Promise<DisplayInfo[]>
List all available displays.
Returns:
interface DisplayInfo {
id: string;
name: string;
primary?: boolean;
}
Example:
import { ComputerDevice } from '@midscene/computer';
const displays = await ComputerDevice.listDisplays();
console.log('Available displays:', displays);
// [
// { id: '0', name: 'Built-in Display', primary: true },
// { id: '1', name: 'External Display', primary: false }
// ]
checkComputerEnvironment(): Promise<EnvironmentCheck>
Check if the computer environment is properly configured.
Returns:
interface EnvironmentCheck {
available: boolean;
error?: string;
platform: string;
displays: number;
}
Example:
import { checkComputerEnvironment } from '@midscene/computer';
const env = await checkComputerEnvironment();
console.log('Environment check:', env);
if (!env.available) {
console.error('Environment error:', env.error);
}
ComputerAgent
The ComputerAgent class extends PageAgent<ComputerDevice> and inherits all common agent methods:
aiAct(action: string): Perform an action with AI
aiQuery(query: string): Extract information with AI
aiAssert(assertion: string): Assert a condition with AI
aiWaitFor(condition: string): Wait for a condition
aiLocate(description: string): Locate an element
- And more...
See Common API Reference for details.
Available Actions
The ComputerDevice supports the following actions:
Mouse Actions
Tap (Click)
Single click at the target location.
await agent.aiAct('click on the File menu');
await agent.aiAct('click at center of screen');
DoubleClick
Double-click at the target location.
await agent.aiAct('double-click on the desktop icon');
RightClick
Right-click to open context menu.
await agent.aiAct('right-click on the desktop');
await agent.aiAct('right-click on the file');
MouseMove
Move mouse to an element.
await agent.aiAct('move mouse to the menu item');
DragAndDrop
Drag from one location and drop at another.
await agent.aiAct('drag the file to the folder');
Keyboard Actions
KeyboardPress
Press keyboard keys with optional modifiers.
Supported keys:
- Regular keys:
a-z, 0-9, Enter, Escape, Space, Tab, etc.
- Arrow keys:
ArrowUp, ArrowDown, ArrowLeft, ArrowRight
- Function keys:
F1-F12
- Modifiers:
Command/Cmd (macOS), Control/Ctrl, Alt, Shift, Win (Windows)
- Media keys:
VolumeUp, VolumeDown, Mute, etc.
Examples:
// Simple key press
await agent.aiAct('press Enter');
await agent.aiAct('press Escape');
// Key combinations (platform-specific)
if (process.platform === 'darwin') {
// macOS
await agent.aiAct('press Cmd+Space'); // Open Spotlight
await agent.aiAct('press Cmd+Tab'); // App switcher
await agent.aiAct('press Cmd+C'); // Copy
await agent.aiAct('press Cmd+V'); // Paste
} else {
// Windows/Linux
await agent.aiAct('press Windows key'); // Start menu
await agent.aiAct('press Alt+Tab'); // App switcher
await agent.aiAct('press Ctrl+C'); // Copy
await agent.aiAct('press Ctrl+V'); // Paste
}
// Arrow keys
await agent.aiAct('press ArrowDown');
await agent.aiAct('press ArrowUp');
// Function keys
await agent.aiAct('press F5'); // Refresh
Type text into an input field.
await agent.aiAct('type "Hello World" in the search box');
await agent.aiAct('type "my-document.txt"');
Clear the content of an input field.
await agent.aiAct('clear the text field');
Scroll the screen or a specific area.
// Scroll directions
await agent.aiAct('scroll down');
await agent.aiAct('scroll up');
await agent.aiAct('scroll left');
await agent.aiAct('scroll right');
// Scroll to positions
await agent.aiAct('scroll to top');
await agent.aiAct('scroll to bottom');
Display Actions
ListDisplays
Get information about all connected displays.
const displays = await ComputerDevice.listDisplays();
Examples
Open Application and Navigate
import { agentFromComputer } from '@midscene/computer';
const agent = await agentFromComputer();
// Open application
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+Space');
await agent.aiAct('type "TextEdit" and press Enter');
} else {
await agent.aiAct('press Windows key');
await agent.aiAct('type "Notepad" and press Enter');
}
await agent.aiWaitFor('text editor window is visible');
// Type content
await agent.aiAct('type "Hello, Midscene!"');
// Save file
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+S');
} else {
await agent.aiAct('press Ctrl+S');
}
Multi-Display Workflow
import { ComputerDevice, agentFromComputer } from '@midscene/computer';
// List displays
const displays = await ComputerDevice.listDisplays();
console.log(`Found ${displays.length} displays`);
// Control primary display
const agent1 = await agentFromComputer({
displayId: displays[0].id,
});
await agent1.aiAct('move mouse to center of screen');
// Control secondary display
if (displays.length > 1) {
const agent2 = await agentFromComputer({
displayId: displays[1].id,
});
await agent2.aiAct('move mouse to center of screen');
}
Web Browser Automation
import { agentFromComputer } from '@midscene/computer';
const agent = await agentFromComputer();
// Open browser
if (process.platform === 'darwin') {
await agent.aiAct('press Cmd+Space');
await agent.aiAct('type "Safari" and press Enter');
} else {
await agent.aiAct('press Windows key');
await agent.aiAct('type "Chrome" and press Enter');
}
await agent.aiWaitFor('browser window is open');
// Navigate
await agent.aiAct('click on address bar');
await agent.aiAct('type "example.com" and press Enter');
await agent.aiWaitFor('page has loaded');
// Extract information
const title = await agent.aiQuery('string, get the page title');
console.log('Page title:', title);
TypeScript Types
import type {
ComputerAgent,
ComputerAgentOpt,
ComputerDevice,
ComputerDeviceOpt,
DisplayInfo,
EnvironmentCheck,
} from '@midscene/computer';
See Also