PC Desktop Automation Support

Midscene can drive native keyboard and mouse controls to support PC desktop automation on Windows, macOS, and Linux.

By leveraging a visual model solution, the automation process works with any desktop application—whether built with Electron, Qt, WPF, or native technologies. Developers only need to focus on the final user experience when debugging UI automation scripts.

The PC desktop automation solution comes with all the features of Midscene:

  • Supports zero-code trial using Playground
  • Supports JavaScript SDK for scripting
  • Supports automation scripts in YAML format and command-line tools
  • Supports HTML reports to replay all operation paths
  • Works across Windows, macOS, and Linux platforms
  • Multi-display support for complex setups

Showcases

Prompt (macOS): Help me post a tweet promoting Midscene's support for AutoGLM through safari, with the following requirements:

  1. Text content: Midscene now supports AutoGLM!
  2. Media content: Use the AutoGLM video from the download folder!

View the full report for this task: report.html

Prompt (Windows): Open Sauce Demo e-commerce site, login and add items to cart

View the full report for this task: report.html

Prompt (macOS): Open Google and query San Jose tomorrow weather temperature

View the full report for this task: report.html

Prompt (Linux): Open TodoMVC, add multiple tasks and filter them

View the full report for this task: report.html

See more showcases: showcases

Try with Playground

With Midscene.js playground, you can experience PC desktop automation capabilities without writing any code.

See Getting Started to learn how to launch the Playground.

Key Features

Cross-platform Desktop Control

  • Mouse Operations: Click, double-click, right-click, mouse move, drag-and-drop
  • Keyboard Input: Type text, press keys with modifiers (Cmd/Ctrl/Alt/Shift)
  • Screen Capture: Take screenshots of any display
  • Multi-display: Work with multiple monitors simultaneously

AI-Powered Automation

Using Midscene's AI capabilities, you can automate desktop applications with natural language:

await agent.aiAct('open the File menu');
await agent.aiAct('click on Save As');
await agent.aiAct('type "my-document" in the filename field');
await agent.aiAct('press Enter');

Use Cases

  • Desktop Application Testing: Automate testing for Electron, Qt, or native apps
  • Workflow Automation: Automate repetitive tasks across desktop applications
  • Cross-app Integration: Control multiple applications in sequence
  • UI Testing: Test desktop applications with natural language descriptions

Next Steps