# Midscene - Vision-Driven UI Automation > AI-powered, vision-driven UI automation for every platform. ## Other - [API reference (Android)](/android-api-reference.md): Use this doc when you need to customize Midscene's Android automation or review Android-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common). - [Android Getting Started](/android-getting-started.md): This guide walks you through everything required to automate an Android device with Midscene: connect a real phone over adb, configure model credentials, try the no-code Playground, and run your first JavaScript script. :::info Demo Projects Control Android devices with JavaScript: https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo ::: - [Android Automation Support](/android-introduction.md): Midscene can drive adb tools to support Android automation. By adapting a visual model solution, the automation process works with any app tech stack—whether built with Native, Flutter, React Native, or Lynx. Developers only need to focus on the final experience when debugging UI automation scripts. The Android UI automation solution comes with all the features of Midscene: Supports zero-code trial using Playground.Supports JavaScript SDK.Supports automation scripts in YAML format and command-line tools.Supports HTML reports to replay all operation paths. - [API reference (Common)](/api.md) - [Automate with scripts in YAML](/automate-with-scripts-in-yaml.md): In most cases, developers write automation scripts just to perform some smoke tests, like checking for the appearance of certain content or verifying that a key user path is accessible. In such situations, maintaining a large test project is unnecessary. Midscene offers a way to perform automation using .yaml files, which helps you focus on the script itself rather than the testing framework. This allows any team member to write automation scripts without needing to learn any API. Here is an example. By reading its content, you should be able to understand how it works. :::info Sample Project You can find a sample project that uses YAML scripts for automation here: WebAndroidComputer (Mac/Windows/Linux) ::: - [Awesome Midscene](/awesome-midscene.md): A curated list of community projects that extend Midscene.js capabilities across different platforms and programming languages. - [Introducing Instant Actions and Deep Think](/blog-introducing-instant-actions-and-deep-think.md): From Midscene v0.14.0, we have introduced two new features: Instant Actions and Deep Think. - [Bridge mode by Chrome extension](/bridge-mode.md): The bridge mode in the Midscene Chrome extension is a tool that allows you to use local scripts to control the desktop version of Chrome. Your scripts can connect to either a new tab or the currently active tab. Using the desktop version of Chrome allows you to reuse all cookies, plugins, page status, and everything else you want. You can work with automation scripts to complete your tasks. This mode is commonly referred to as 'man-in-the-loop' in the context of automation. :::info Demo Project check the demo project of bridge mode: https://github.com/web-infra-dev/midscene-example/blob/main/bridge-mode-demo ::: - [Caching AI planning & locate](/caching.md): Midscene supports caching Plan steps and matched DOM element information to reduce AI model calls and greatly improve execution efficiency. Please note that DOM element cache is only supported for web automation tasks, and has certain limitations. Effect With caching hit, time cost is significantly reduced. For example, in the following case, execution time was reduced from 51 seconds to 28 seconds. before after - [Changelog](/changelog.md) - [](/common/get-cdp-url.md) - [](/common/prepare-android.md) - [](/common/prepare-ios.md) - [](/common/setup-env.md) - [](/common/start-experience.md) - [](/common/troubleshooting-llm-connectivity.md) - [API Reference (PC Desktop)](/computer-api-reference.md): This page documents the PC desktop-specific APIs provided by @midscene/computer. For common APIs that work across all platforms, see Common API Reference. - [PC Desktop Getting Started](/computer-getting-started.md): This guide walks you through everything required to automate PC desktop applications with Midscene: install dependencies, configure model credentials, and run your first JavaScript script. :::info Demo Projects Control PC desktop with JavaScript: https://github.com/web-infra-dev/midscene-example/tree/main/computer/javascript-sdk-demo Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/computer/vitest-demo ::: - [PC Desktop Automation Support](/computer-introduction.md): Midscene can drive native keyboard and mouse controls to support PC desktop automation on Windows, macOS, and Linux. By leveraging a visual model solution, the automation process works with any desktop application—whether built with Electron, Qt, WPF, or native technologies. Developers only need to focus on the final user experience when debugging UI automation scripts. The PC desktop automation solution comes with all the features of Midscene: Supports zero-code trial using PlaygroundSupports JavaScript SDK for scriptingSupports automation scripts in YAML format and command-line toolsSupports HTML reports to replay all operation pathsWorks across Windows, macOS, and Linux platformsHeadless mode for Linux CI via Xvfb (no physical display required)Multi-display support for complex setups - [Data privacy](/data-privacy.md): ⁠Midscene.js is an open-source project (GitHub: Midscene) under the MIT license. You can see all the codes in the public repository. When using Midscene.js, your page data (including the screenshot) is sent directly to the AI model provider you choose. No third-party platform will have access to this data. All you need to be concerned about is the data privacy policy of the model provider. If you prefer building Midscene.js and its Chrome Extension in your own environment instead of using the published versions, you can refer to the Contributing Guide to find building instructions. - [FAQ](/faq.md) - [API Reference (HarmonyOS)](/harmony-api-reference.md): When you need to customize device behavior, integrate Midscene into a framework, or troubleshoot HDC issues, refer to this section. For common constructor parameters (reports, hooks, caching, etc.), see the platform-agnostic API Reference. - [HarmonyOS Getting Started](/harmony-getting-started.md): This guide walks you through everything needed to automate HarmonyOS devices with Midscene: connecting a real device via HDC, configuring model API keys, trying the zero-code Playground, and running your first JavaScript script. :::info Demo Projects Control HarmonyOS devices with JavaScript: https://github.com/web-infra-dev/midscene-example/blob/main/harmony/javascript-sdk-demo Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/harmony/vitest-demo ::: - [HarmonyOS Automation Support](/harmony-introduction.md): Midscene can drive the HDC (HarmonyOS Device Connector) tool to automate HarmonyOS NEXT devices. Thanks to its visual model approach, the entire automation process works with any HarmonyOS app technology stack — whether ArkTS native or other frameworks. Developers only need to debug UI automation scripts against the final rendered interface. The HarmonyOS UI automation solution includes all Midscene features: Zero-code trial via Playground.JavaScript SDK support.YAML-based automation scripts and CLI tools.HTML report generation for replaying all action paths. - [Integrate with Android (adb)](/integrate-with-android.md): After connecting the Android device with adb, you can use Midscene javascript SDK to control Android devices. :::info Demo Projects Control Android devices with javascript: https://github.com/web-infra-dev/midscene-example/blob/main/android/javascript-sdk-demo Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/android/vitest-demo ::: :::info Showcases More showcases ::: - [Integrate with any interface](/integrate-with-any-interface.md): You can use Midscene Agent to control any interface—such as IoT devices, in-house apps, and in-vehicle displays—by implementing a UI operation class that conforms to AbstractInterface. After implementing the UI operation class, you get the full capabilities of Midscene Agent: the TypeScript GUI Automation Agent SDK, supporting integration with any interfacethe playground for debuggingcontrolling the interface with YAML scriptsan MCP service that exposes UI actions - [Integrate with Playwright](/integrate-with-playwright.md): Playwright.js is an open-source automation library developed by Microsoft, mainly used for end-to-end testing and web scraping of web applications. There are two ways to integrate with Playwright: Directly integrate and call the Midscene Agent via script, suitable for quick prototyping, data scraping, and automation scripts.Integrate Midscene into Playwright test cases, suitable for UI testing scenarios. - [Integrate with Puppeteer](/integrate-with-puppeteer.md): Puppeteer is a Node.js library which provides a high-level API to control Chrome or Firefox over the DevTools Protocol or WebDriver BiDi. Puppeteer runs in the headless (no visible UI) by default but can be configured to run in a visible ("headful") browser. :::info Demo Projects you can check the demo project of Puppeteer here: https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo There is also a demo of Playwright with Vitest: https://github.com/web-infra-dev/midscene-example/tree/main/playwright-with-vitest-demo ::: - [Midscene.js - Vision-Driven Automation by AI](/introduction.md): AI-powered, vision-driven UI automation for every platform. - [API reference (iOS)](/ios-api-reference.md): Use this doc when you need to customize iOS device behavior, wire Midscene into WebDriverAgent-driven workflows, or troubleshoot WDA requests. For shared constructor options (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common). - [iOS getting started](/ios-getting-started.md): This guide walks you through everything required to automate an iOS device with Midscene: connect a real phone through WebDriverAgent, configure model credentials, try the no-code Playground, and run your first JavaScript script. :::info Demo Projects Control iOS devices with JavaScript: https://github.com/web-infra-dev/midscene-example/blob/main/ios/javascript-sdk-demo Integrate Vitest for testing: https://github.com/web-infra-dev/midscene-example/tree/main/ios/vitest-demo ::: - [iOS Automation Support](/ios-introduction.md): Midscene can drive WebDriver tools to support iOS automation. By adapting a visual model solution, the automation process works with any app tech stack—whether built with Native, Flutter, React Native, or Lynx. Developers only need to focus on the final experience when debugging UI automation scripts. The iOS UI automation solution comes with all the features of Midscene: Supports zero-code trial using Playground.Supports JavaScript SDK.Supports automation scripts in YAML format and command-line tools.Supports HTML reports to replay all operation paths. - [LLMs.txt documentation](/llm-txt.md): How to get tools like Cursor, Windstatic, GitHub Copilot, ChatGPT, and Claude to understand Midscene.js. We support LLMs.txt files for making the Midscene.js documentation available to large language models. - [MCP server](/mcp-android.md): Midscene provides a MCP server that allows AI assistants to control Android devices, automate mobile app testing tasks. :::info MCP Introduction MCP (Model Context Protocol) is a standardized way for AI models to interact with external tools and capabilities. MCP servers expose a set of tools that AI models can invoke to perform various tasks. For Midscene, these tools allow AI models to connect to Android devices, launch apps, interact with UI elements, and more. ::: - [Expose devices as an MCP service](/mcp.md): MCP (Model Context Protocol) is a protocol standard that lets AI models interact with external tools and capabilities. Midscene provides MCP services that expose atomic operations in Midscene Agent (each Action in the Action Space) as MCP tools. Upper-layer Agents can use natural language to inspect the UI, precisely operate UI elements, and run automation tasks without needing to understand the underlying implementation. Because Midscene Agent relies on a vision model, configure the environment variables required by Midscene inside the MCP service instead of reusing the upstream Agent's model configuration. - [Common Model Configuration](/model-common-config.md) - [Model configuration](/model-config.md): Midscene reads all model configuration from operating-system environment variables. Midscene integrates the OpenAI SDK by default for AI calls. The SDK defines the parameter shape, and most model providers (or deployment tools) offer compatible endpoints. This doc focuses on Midscene model configuration. For how we choose models, see Model strategy. For quick recipes for popular models, see Common model configuration. - [Model Strategy](/model-strategy.md): :::info Quick start If you want to try Midscene right away, pick a model and follow its configuration guide: Doubao Seed ModelQwen Models: Qwen3.5, Qwen3-VL, Qwen2.5-VLZhipu GLM-VZhipu AutoGLMGemini-3-Pro / Gemini-3-FlashGPT-5.4UI-TARS ::: This guide focuses on how we choose models in Midscene. If you need configuration instructions, head to Model configuration. - [Quick experience by Chrome extension](/quick-experience.md): Midscene.js provides a Chrome extension. By using it, you can quickly experience the main features of Midscene on any webpage, without needing to set up a code project. ⁠The extension shares the same code as the npm @midscene/web packages, so you can think of it as a playground or a way to debug with Midscene. Prompt : Sign up for Github, you need to pass the form validation, but don't actually click. View the full report of this task: report.html - [](/showcases-android.md) - [](/showcases-computer.md) - [](/showcases-harmony.md) - [](/showcases-ios.md) - [](/showcases-mcp.md) - [](/showcases-web.md) - [Showcases](/showcases.md): This doc showcases some cases built with Midscene. - [Control any platform with Skills](/skills.md): Agent Skills are a format for extending AI coding agents with specialized capabilities. Midscene provides Agent Skills that let AI coding tools (like Claude Code, Cline, etc.) drive UI automation through CLI commands — no MCP server setup required. Unlike MCP integration, Skills work by running CLI commands directly in the terminal. The AI agent acts as the brain: it takes screenshots, analyzes the UI, and decides which actions to perform next. - [Use JavaScript to optimize the AI automation code](/use-javascript-to-optimize-ai-automation-code.md): Many developers love using aiAct or ai to accomplish automation tasks, even packing long, complex logic into a single natural-language instruction. It feels “smart,” but in practice you may run into unstable reproducibility and slower performance. This article shares an approach for writing automation scripts with JavaScript and structured APIs. - [API reference (Web)](/web-api-reference.md): Use this doc when you need to customize Midscene's browser automation agents or review browser-only constructor options. For shared parameters (reporting, hooks, caching, etc.), see the platform-agnostic API reference (Common). - [YAML script runner](/yaml-script-runner.md): Midscene defines a YAML-based scripting format so you can quickly author automation scripts, then run them from the command line without extra setup. For more details on YAML scripts, see Automate with scripts in YAML. For example, you can write a YAML script like this: Run it with one command: The CLI prints execution progress and generates a visual report when it finishes, while keeping setup simple.