English

Changelog

For the complete changelog, please refer to: Midscene Releases

v0.28 - 📱 Build your own GUI automation agent by integrating with your own interface (preview feature)

🚀 Support for integration with any interface (preview feature)

v0.28 introduces the capability to integrate with your own interface. Define an interface controller class that conforms to the AbstractInterface definition, and you can get a fully-featured Midscene Agent.

The typical use case for this feature is to build a GUI automation Agent for your own interface, such as IoT devices, in-house applications, car displays, etc.!

Combined with the universal Playground architecture and SDK enhancement features, developers can conveniently debug custom devices.

For more information, please refer to Integrate with Any Interface (Preview Feature)

📱 Android platform optimization

Planning Cache Support: Added planning cache functionality for Android platform, improving execution efficiency
Input Strategy Enhancement: Optimized input clearing strategy based on IME settings, improving Android platform input experience
Scroll Calculation Improvement: Optimized scroll endpoint calculation algorithm for Android platform

👆 Gesture operation extension

Double-Click Operation Support: Added support for double-click actions
Long Press and Swipe Gestures: Added support for long press and swipe gestures

⚙️ Core function enhancement

Agent Configuration Isolation: Implemented model configuration isolation between different agents, avoiding configuration conflicts
Execution Option Extension: Added useCache and replanningCycleLimit configuration options for Agent, providing more fine-grained control
YAML Script Support: Support for running universal custom devices through YAML scripts, enhancing automation capabilities

🐞 Bug fixes

Fixed Qwen model search region size issues
Optimized deepThink parameter handling and rectangle size calculation
Resolved issues related to Playwright double-click operations
Improved TEXT action type processing logic

📚 Documentation and community

Added custom interface documentation to help developers better extend functionality
Added Awesome Midscene section in README to showcase community projects

v0.27 - 🧠 Core module refactoring, assertions and reports functionally enhanced

⚠️ Core module refactoring

Based on the introduction of Rslib in v0.26 to improve development experience and reduce contribution thresholds, v0.27 takes it a step further by refactoring the core modules on a large scale. This makes it extremely easy to extend new devices and add new AI operations, and we sincerely welcome community developers to contribute!

Due to the wide scope of this refactoring, please feel free to report any issues you encounter after upgrading, and we will address them promptly.

🌐 API enhancement

aiAssert Functionally Enhanced
- New name field allows naming different assertion tasks, making it easier to identify and parse in JSON output results
- New domIncluded and screenshotIncluded options allow flexible control over whether to send DOM snapshots and page screenshots to AI

🤖 Chrome extension playground upgrade

All Agent APIs can be directly debugged and run in the Playground! Interactive, extraction, and verification cover three major categories of methods, with visual operations and verification that boost your automation development efficiency! Come experience the truly versatile AI automation platform! 🚀

📊 Report function optimization

New Marking Layer Switch: The report player has added a switch to hide the marking layer, allowing users to view the original page view without obstruction when playing back.

🐞 Bug fixes

Fixed the problem that aiWaitFor sometimes caused the report to not be generated
Reduced memory consumption of Playwright plugin

v0.26 - 🚀 Toolchain fully integrated Rslib, greatly improving development experience and reducing contribution threshold

🌐 Web integration optimization

Support freezing page context(freezePageContext/unfreezePageContext), so that all subsequent operations reuse the same page snapshot, avoiding repeated page status acquisition
Add all agent APIs to Playwright fixture, simplify test script writing, and solve the problem of not generating reports when using agentForPage

📱 Android automation enhancement

New keyboard hiding strategy(keyboardDismissStrategy), allowing you to specify the way to automatically hide the keyboard

📊 Report function optimization

Report content lazy parsing, solving the problem of report crash when the report is large
Report player adds automatic zoom switch, making it easier to view the global perspective playback
Support aiAssert / aiQuery tasks in report playback, to fully show the entire page change process
Fix the problem that the sidebar status is not displayed as a failure icon when the assertion fails
Fix the problem that the drop-down filter in the report cannot be switched

🚀 Build and engineering

Build tool migration to Rslib library development tool, improving build efficiency and development experience
Full repository source code jump, making it easier for developers to view source code
MCP npm package product volume optimization, from 56M to 30M, greatly improving loading speed

🐞 Bug fixes

CLI automatically opens headed mode when keepWindow is true
Fix the implementation problem of getGlobalConfig, solve the problem of abnormal environment variable initialization
Ensure that the mime-type in base64 encoding is correct
Fix the return value type of aiAssert task

v0.25 - 🚀 Support using images as AI prompt input

🎯 Core function enhancement

New worker runtime support, support running in worker environment
Support using images as AI prompt input, see Prompting with images
Image processing upgrade, using Photon & Sharp for efficient image cropping

🌐 Web integration optimization

Get XPath by coordinates, improve cache reproducibility
Cache file moves plan module to the front, improving readability
Chrome Recorder supports exporting all events to markdown documents
agent supports specifying HTML report name, see reportFileName

📱 Android automation enhancement

Long press gesture support
Pull-to-refresh support

🐞 Bug fixes

Use global config to handle environment variables, avoid issues caused by multiple packaging
Manually construct error information when error object serialization fails
Fix playwright report type dependency declaration order issue
Fix MCP packaging issue

📚 Documentation AI-friendly

LLMs.txt is now available in both Chinese and English, making it easier for AI to understand
Each document now has a copy-to-markdown button, making it easier to feed to AI

🤖 Other function enhancement

Chrome Recorder supports aiScroll function
Refactor aiAssert to be consistent with aiBoolean

v0.24 - 🤖 MCP for Android automation

🚀 MCP for Android automation

You can now use Midscene MCP to automate Android apps, just like you use it for web apps. Read more: MCP for Android Automation

🌐 Optimization

For Mac platform Puppeteer, a double input clearing mechanism has been added to ensure that the input box is cleared before input

🔧 Development experience

Simplify the way to build htmlElement.js to avoid report template build issues caused by circular dependencies
Optimize development workflow, just use npm run dev to enter midscene project development

v0.23 - 📊 New report style and YAML script ability enhancement

🎨 Report system upgrade

New report style

New report style design, providing clearer and more beautiful test result display
Optimize report layout and visual effects, improve user reading experience
Enhance report readability and information hierarchy structure

⚙️ YAML script ability enhancement

Support multiple YAML files batch execution

New config mode support, support configure Yaml file running order, browser reuse strategy, parallelism
Support getting JSON format running results

🧪 Test coverage enhancement

Android test enhancement

New Android platform related test cases, improve code quality and stability
Improve test coverage, ensure the reliability of Android features

v0.22- 🎬 Chrome extension recording function released

🌐 Web integration enhancement

1️⃣ New recording function

Chrome extension adds recording function, which can record user operations on the page and generate automation scripts
Support recording click, input, scroll and other common operations, greatly reducing the threshold for writing automation scripts
The recorded operations can be directly played back and debugged in the Playground

2️⃣ Upgrade to IndexedDB for storage

Chrome extension's Playground and Bridge now use IndexedDB for data storage
Compared to the previous storage scheme, it provides larger storage capacity and better performance
Support storing more complex data structures, laying the foundation for future feature extensions

3️⃣ Customize replanning cycle limit

Set the MIDSCENE_REPLANNING_CYCLE_LIMIT environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAction).
The default value is 10. When the AI needs to re-plan more than this limit, an error will be thrown and suggest splitting the task.
Provide more flexible task execution control, adapting to different automation scenarios

export MIDSCENE_REPLANNING_CYCLE_LIMIT=10 # default value is 10

📱 Android interaction optimization

1️⃣ New screenshot path generation

Generate a unique file path for each screenshot to avoid file overwrite issues
Improve stability in concurrent test scenarios

v0.21 - 🎨 Chrome extension UI upgrade

🌐 Web integration enhancement

1️⃣ New chat-style user interface

New chat-style user interface design for better user experience

2️⃣ Flexible timeout configuration

Supports overriding timeout settings from test fixture, providing more flexible timeout control
Applicable scenarios: Different test cases require different timeout settings

3️⃣ Unified Puppeteer and Playwright configuration

New waitForNavigationTimeout and waitForNetworkIdleTimeout parameters for Playwright
Unified timeout options configuration for Puppeteer and Playwright, providing consistent API experience, reducing learning costs

4️⃣ New data export callback mechanism

New agent.onDumpUpdate callback function, can get real-time notification when data is exported
Refactored the post-task processing flow to ensure the correct execution of asynchronous operations
Applicable scenarios: Monitoring or processing exported data

📱 Android interaction optimization

1️⃣ Input experience improvement

Changed click input to slide operation, improving interaction response and stability
Reduced operation failures caused by inaccurate clicks

v0.20 - Support for assigning XPath to locate elements

🌐 Web integration enhancement

1️⃣ New `aiAsk` method

Allows direct questioning of the AI model to obtain string-formatted answers for the current page.
Applicable scenarios: Tasks requiring AI reasoning such as Q&A on page content and information extraction.
Example:

await agent.aiAsk('any question')

2️⃣ Support for passing XPath to locate elements

Location priority: Specified XPath > Cache > AI model location.
Applicable scenarios: When the XPath of an element is known and the AI model location needs to be skipped.
Example:

await agent.aiTap('submit button', { xpath: '//button[@id="submit"]' })

📱 Android improvement

1️⃣ Playground tasks can be cancelled

Supports interrupting ongoing automation tasks to improve debugging efficiency.

2️⃣ Enhanced `aiLocate` API

Returns the Device Pixel Ratio, which is commonly used to calculate the real coordinates of elements.

📈 Report generation optimization

Improve report generation mechanism, from batch storage to single append, effectively reducing memory usage and avoiding memory overflow when the number of test cases is large.

v0.19 - Support for getting complete execution process data

New API for getting Midscene execution process data

Add the _unstableLogContent API to the agent. Get the execution process data of Midscene, including the time of each step, the AI tokens consumed, and the screenshot.

The report is generated based on this data, which means you can customize your own report using this data.

CLI support for adjusting Midscene env variable priority

By default, dotenv does not override the global environment variables in the .env file. If you want to override, you can use the --dotenv-override option.

Reduce report file size

Reduce the size of the generated report by trimming redundant data, significantly reducing the report file size for complex pages. The typical report file size for complex pages has been reduced from 47.6M to 15.6M!

v0.18 - Enhanced reporting features

🚀 Midscene has another update! It makes your testing and automation processes even more powerful:

Custom node in report

Add the logScreenshot API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.

Example:

test('login github', async ({ ai, aiAssert, aiInput, logScreenshot }) => {
  if (CACHE_TIME_OUT) {
    test.setTimeout(200 * 1000);
  }
  await ai('Click the "Sign in" button');
  await aiInput('quanru', 'username');
  await aiInput('123456', 'password');

  // log by your own
  await logScreenshot('Login page', {
    content: 'Username is quanru, password is 123456',
  });

  await ai('Click the "Sign in" button');
  await aiAssert('Login success');
});

Support for downloading reports as videos

Support direct video download from the report player, just by clicking the download button on the player interface.

Applicable scenarios: Share test results, archive reproduction steps, and demonstrate problem reproduction.

More Android configurations exposed

Optimize input interactions in Android apps and allow connecting to remote Android devices
- autoDismissKeyboard?: boolean - Optional parameter. Whether to automatically dismiss the keyboard after entering text. The default value is true.
- androidAdbPath?: string - Optional parameter. Used to specify the path of the adb executable file.
- remoteAdbHost?: string - Optional parameter. Used to specify the remote adb host.
- remoteAdbPort?: number - Optional parameter. Used to specify the remote adb port.
Examples:

await agent.aiInput('Search Box', 'Test Content', { autoDismissKeyboard: true })

const agent = await agentFromAdbDevice('s4ey59', {
    autoDismissKeyboard: false, // Optional parameter. Whether to automatically dismiss the keyboard after entering text. The default value is true.
    androidAdbPath: '/usr/bin/adb', // Optional parameter. Used to specify the path of the adb executable file.
    remoteAdbHost: '192.168.10.1', // Optional parameter. Used to specify the remote adb host.
    remoteAdbPort: '5037' // Optional parameter. Used to specify the remote adb port.
})

Upgrade now to experience these powerful new features!

v0.17 - Let AI see the DOM of the page

Data query API enhanced

To meet more automation and data extraction scenarios, the following APIs have been enhanced with the options parameter, supporting more flexible DOM information and screenshots:

agent.aiQuery(dataDemand, options)
agent.aiBoolean(prompt, options)
agent.aiNumber(prompt, options)
agent.aiString(prompt, options)

New `options` parameter

domIncluded: Whether to pass the simplified DOM information to AI model, default is off. This is useful for extracting attributes that are not visible on the page, like image links.
screenshotIncluded: Whether to pass the screenshot to AI model, default is on.

Code example

// Extract all contact information (including hidden avatarUrl attributes)
const contactsData = await agent.aiQuery(
  "{name: string, id: number, company: string, department: string, avatarUrl: string}[], extract all contact information including hidden avatarUrl attributes",
  { domIncluded: true }
);

// Check if the id attribute of the first contact is 1
const isId1 = await agent.aiBoolean(
  "Is the first contact's id is 1?",
  { domIncluded: true }
);

// Get the ID of the first contact (hidden attribute)
const firstContactId = await agent.aiNumber("First contact's id?", { domIncluded: true });

// Get the avatar URL of the first contact (invisible attribute on the page)
const avatarUrl = await agent.aiString(
  "What is the Avatar URL of the first contact?",
  { domIncluded: true }
);

New right-click ability

Have you ever encountered a scenario where you need to automate a right-click operation? Now, Midscene supports a new agent.aiRightClick() method!

Function

Perform a right-click operation on the specified element, suitable for scenarios where right-click events are customized on web pages. Please note that Midscene cannot interact with the browser's native context menu after right-click.

Parameter description

locate: Describe the element you want to operate in natural language
options: Optional, supports deepThink (AI fine-grained positioning) and cacheable (result caching)

Example

// Right-click on a contact in the contacts application, triggering a custom context menu
await agent.aiRightClick("Alice Johnson");

// Then you can click on the options in the menu
await agent.aiTap("Copy Info"); // Copy contact information to the clipboard

A complete example

In this report file, we show a complete example of using the new aiRightClick API and new query parameters to extract contact data including hidden attributes.

Report file: puppeteer-2025-06-04_20-34-48-zyh4ry4e.html

The corresponding code can be found in our example repository: puppeteer-demo/extract-data.ts

Refactor cache

Use xpath cache instead of coordinates, improve cache hit rate.

Refactor cache file format from json to yaml, improve readability.

v0.16 - Support MCP

Midscene MCP

🤖 Use Cursor / Trae to help write test cases. 🕹️ Quickly implement browser operations akin to the Manus platform. 🔧 Integrate Midscene capabilities swiftly into your platforms and tools.

Support structured API for agent

APIs: aiBoolean, aiNumber, aiString, aiLocate

v0.15 - Android automation unlocked!

Android automation unlocked!

🤖 AI Playground: natural‑language debugging 📱 Supports native, Lynx & WebView apps 🔁 Replayable runs 🛠️ YAML or JS SDK ⚡ Auto‑planning & Instant Actions APIs

More features

Allow custom midscene_run dir
Enhance report filename generation with unique identifiers and support split mode
Enhance timeout configurations and logging for network idle and navigation
Adapt for gemini-2.5-pro

v0.14 - Instant actions

"Instant Actions" introduces new atomic APIs, enhancing the accuracy of AI operations.

v0.13 - DeepThink mode

Atomic AI interaction methods

Supports aiTap, aiInput, aiHover, aiScroll, and aiKeyboardPress for precise AI actions.

DeepThink mode

Enhances click accuracy with deeper contextual understanding.

v0.12 - Integrate Qwen 2.5 VL

Integrate Qwen 2.5 VL's native capabilities

Keeps output accuracy.
Supports more element interactions.
Cuts operating cost by over 80%.

v0.11.0 - UI-TARS model caching

✨ UI-TARS model support caching

Enable caching by document 👉 ： Enable Caching
Enable effect

✨ Optimize DOM tree extraction strategy

Optimize the information ability of the dom tree, accelerate the inference process of models like GPT 4o

v0.10.0 - UI-TARS model released

UI-TARS is a Native GUI agent model released by the Seed team. It is named after the TARS robot in the movie Star Trek, which has high intelligence and autonomous thinking capabilities. UI-TARS takes images and human instructions as input information, can correctly perceive the next action, and gradually approach the goal of human instructions, leading to the best performance in various benchmark tests of GUI automation tasks compared to open-source and closed-source commercial models.

UI-TARS: Pioneering Automated GUI Interaction with Native Agents - Figure 1

UI-TARS: Pioneering Automated GUI Interaction with Native - Figure 4

✨ Model advantage

UI-TARS has the following advantages in GUI tasks:

Target-driven
Fast inference speed
Native GUI agent model
Private deployment without data security issues

v0.9.0 - Bridge mode released

With the Midscene browser extension, you can now use scripts to link with the desktop browser for automated operations!

We call it "Bridge Mode".

Compared to previous CI environment debugging, the advantages are:

You can reuse the desktop browser, especially Cookie, login state, and front-end interface state, and start automation without worrying about environment setup.
Support manual and script cooperation to improve the flexibility of automation tools.
Simple business regression, just run it locally with Bridge Mode.

Documentation: Use Chrome Extension to Experience Midscene

v0.8.0 - Chrome extension

✨ New Chrome extension, run Midscene anywhere

Through the Midscene browser extension, you can run Midscene on any page, without writing any code.

Experience it now 👉：Use Chrome Extension to Experience Midscene

v0.7.0 - Playground ability

✨ Playground ability, debug anytime

Now you don't have to keep re-running scripts to debug prompts!

On the new test report page, you can debug the AI execution results at any time, including page operations, page information extraction, and page assertions.

v0.6.0 - Doubao model support

✨ Doubao model support

Support for calling Doubao models, reference the environment variables below to experience.

MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"https://xxx.net/api/v3","apiKey":"xxx"}'
MIDSCENE_MODEL_NAME='ep-20240925111815-mpfz8'
MIDSCENE_MODEL_TEXT_ONLY='true'

Summarize the availability of Doubao models:

Currently, Doubao only has pure text models, which means "seeing" is not available. In scenarios where pure text is used for reasoning, it performs well.
If the use case requires combining UI analysis, it is completely unusable

Example:

✅ The price of a multi-meat grape (can be guessed from the order of the text on the interface)

✅ The language switch text button (can be guessed from the text content on the interface: Chinese, English text)

❌ The left-bottom play button (requires image understanding, failed)

✨ Support for GPT-4o structured output, cost reduction

By using the gpt-4o-2024-08-06 model, Midscene now supports structured output (structured-output) features, ensuring enhanced stability and reduced costs by 40%+.

Midscene now supports hitting GPT-4o prompt caching features, and the cost of AI calls will continue to decrease as the company's GPT platform is deployed.

✨ Test report: support animation playback

Now you can view the animation playback of each step in the test report, quickly debug your running script

✨ Speed up: merge plan and locate operations, response speed increased by 30%

In the new version, we have merged the Plan and Locate operations in the prompt execution to a certain extent, which increases the response speed of AI by 30%.

Before

after

✨ Test report: the accuracy of different models

GPT 4o series models, 100% correct rate
doubao-pro-4k pure text model, approaching usable state

🐞 Problem fix

Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀

before

after

v0.5.0 - Support GPT-4o structured output

✨ New features

Support for gpt-4o-2024-08-06 model to provide 100% JSON format limit, reducing Midscene task planning hallucination behavior

Support for Playwright AI behavior real-time visualization, improve the efficiency of troubleshooting

Cache generalization, cache capabilities are no longer limited to playwright, pagepass, puppeteer can also use cache

- playwright test --config=playwright.config.ts
# Enable cache
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts

Support for azure openAI
Support for AI to add, delete, and modify the existing input

🐞 Problem fix

Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀
During the AI interaction process, unnecessary attribute fields were trimmed, reducing token consumption.
Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events
For pagepass, provide an optimization solution for the flickering behavior that occurs during the execution of Midscene

// Currently, pagepass relies on a too low version of puppeteer, which may cause the interface to flicker and the cursor to be lost. The following solution can be used to solve this problem
const originScreenshot = puppeteerPage.screenshot;
puppeteerPage.screenshot = async (options) => {
  return await originScreenshot.call(puppeteerPage, {
    ...options,
    captureBeyondViewport: false
  });
};

v0.4.0 - Support CLI usage

✨ New features

Support for Cli usage, reducing the usage threshold of Midscene

# Headed mode (visible browser) access baidu.com and search "weather"
npx @midscene/cli --headed --url https://www.baidu.com --action "input 'weather', press enter" --sleep 3000

# Visit GitHub status page and save the status to ./status.json
npx @midscene/cli --url https://www.githubstatus.com/ \
  --query-output status.json \
  --query '{serviceName: string, status: string}[], github page status, return service name'

Support for AI to wait for a certain time to continue the subsequent task execution
Playwright AI task report shows the overall time and aggregates AI tasks by test group

🐞 Problem fix

Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events

v0.3.0 - Support AI report HTML

✨ New features

Generate html format AI report, aggregate AI tasks by test group, facilitate test report distribution

🐞 Problem fix

Fix the problem of AI report scrolling preview

v0.2.0 - Control Puppeteer by natural language

✨ New features

Support for using natural language to control puppeteer to implement page automation 🗣️💻
Provide AI cache capabilities for playwright framework, improve stability and execution efficiency
AI report visualization, aggregate AI tasks by test group, facilitate test report distribution
Support for AI to assert the page, let AI judge whether the page meets certain conditions

v0.1.0 - Control Playwright by natural language

✨ New features

Support for using natural language to control puppeteer to implement page automation 🗣️💻
Support for using natural language to extract page information 🔍🗂️
AI report visualization, AI behavior, AI thinking visualization 🛠️👀
Direct use of GPT-4o model, no training required 🤖🔧

On This Page

#Changelog

#v0.28 - 📱 Build your own GUI automation agent by integrating with your own interface (preview feature)

#🚀 Support for integration with any interface (preview feature)

#📱 Android platform optimization

#👆 Gesture operation extension

#⚙️ Core function enhancement

#🐞 Bug fixes

#📚 Documentation and community

#v0.27 - 🧠 Core module refactoring, assertions and reports functionally enhanced

#⚠️ Core module refactoring

#🌐 API enhancement

#🤖 Chrome extension playground upgrade

#📊 Report function optimization

#🐞 Bug fixes

#v0.26 - 🚀 Toolchain fully integrated Rslib, greatly improving development experience and reducing contribution threshold

#🌐 Web integration optimization

#📱 Android automation enhancement

#📊 Report function optimization

#🚀 Build and engineering

#🐞 Bug fixes

#v0.25 - 🚀 Support using images as AI prompt input

#🎯 Core function enhancement

#🌐 Web integration optimization

#📱 Android automation enhancement

#🐞 Bug fixes

#📚 Documentation AI-friendly

#🤖 Other function enhancement

#v0.24 - 🤖 MCP for Android automation

#🚀 MCP for Android automation

#🌐 Optimization

#🔧 Development experience

#v0.23 - 📊 New report style and YAML script ability enhancement

#🎨 Report system upgrade

#New report style

#⚙️ YAML script ability enhancement

#Support multiple YAML files batch execution

#🧪 Test coverage enhancement

#Android test enhancement

#v0.22- 🎬 Chrome extension recording function released

#🌐 Web integration enhancement

#1️⃣ New recording function

#2️⃣ Upgrade to IndexedDB for storage

#3️⃣ Customize replanning cycle limit

#📱 Android interaction optimization

#1️⃣ New screenshot path generation

#v0.21 - 🎨 Chrome extension UI upgrade

#🌐 Web integration enhancement

#1️⃣ New chat-style user interface

#2️⃣ Flexible timeout configuration

#3️⃣ Unified Puppeteer and Playwright configuration

#4️⃣ New data export callback mechanism

#📱 Android interaction optimization

#1️⃣ Input experience improvement

#v0.20 - Support for assigning XPath to locate elements

#🌐 Web integration enhancement

#1️⃣ New aiAsk method

#2️⃣ Support for passing XPath to locate elements

#📱 Android improvement

#1️⃣ Playground tasks can be cancelled

#2️⃣ Enhanced aiLocate API

#📈 Report generation optimization

#v0.19 - Support for getting complete execution process data

#New API for getting Midscene execution process data

#CLI support for adjusting Midscene env variable priority

#Reduce report file size

#v0.18 - Enhanced reporting features

#Custom node in report

#Support for downloading reports as videos

#More Android configurations exposed

#v0.17 - Let AI see the DOM of the page

#Data query API enhanced

#New options parameter

#Code example

#New right-click ability

#Function

#Parameter description

#Example

#A complete example

#Refactor cache

#v0.16 - Support MCP

Changelog

v0.28 - 📱 Build your own GUI automation agent by integrating with your own interface (preview feature)

🚀 Support for integration with any interface (preview feature)

📱 Android platform optimization

👆 Gesture operation extension

⚙️ Core function enhancement

🐞 Bug fixes

📚 Documentation and community

v0.27 - 🧠 Core module refactoring, assertions and reports functionally enhanced

⚠️ Core module refactoring

🌐 API enhancement

🤖 Chrome extension playground upgrade

📊 Report function optimization

🐞 Bug fixes

v0.26 - 🚀 Toolchain fully integrated Rslib, greatly improving development experience and reducing contribution threshold

🌐 Web integration optimization

📱 Android automation enhancement

📊 Report function optimization

🚀 Build and engineering

🐞 Bug fixes

v0.25 - 🚀 Support using images as AI prompt input

🎯 Core function enhancement

🌐 Web integration optimization

📱 Android automation enhancement

🐞 Bug fixes

📚 Documentation AI-friendly

🤖 Other function enhancement

v0.24 - 🤖 MCP for Android automation

🚀 MCP for Android automation

🌐 Optimization

🔧 Development experience

v0.23 - 📊 New report style and YAML script ability enhancement

🎨 Report system upgrade

New report style

⚙️ YAML script ability enhancement

Support multiple YAML files batch execution

🧪 Test coverage enhancement

Android test enhancement

v0.22- 🎬 Chrome extension recording function released

🌐 Web integration enhancement

1️⃣ New recording function

2️⃣ Upgrade to IndexedDB for storage

3️⃣ Customize replanning cycle limit

📱 Android interaction optimization

1️⃣ New screenshot path generation

v0.21 - 🎨 Chrome extension UI upgrade

🌐 Web integration enhancement

1️⃣ New chat-style user interface

2️⃣ Flexible timeout configuration

3️⃣ Unified Puppeteer and Playwright configuration

4️⃣ New data export callback mechanism

📱 Android interaction optimization

1️⃣ Input experience improvement

v0.20 - Support for assigning XPath to locate elements

🌐 Web integration enhancement

1️⃣ New `aiAsk` method

2️⃣ Support for passing XPath to locate elements

📱 Android improvement

1️⃣ Playground tasks can be cancelled

2️⃣ Enhanced `aiLocate` API

📈 Report generation optimization

v0.19 - Support for getting complete execution process data

New API for getting Midscene execution process data

CLI support for adjusting Midscene env variable priority

Reduce report file size

v0.18 - Enhanced reporting features

Custom node in report

Support for downloading reports as videos

More Android configurations exposed

v0.17 - Let AI see the DOM of the page

Data query API enhanced

New `options` parameter

Code example

New right-click ability

Function

Parameter description

Example

A complete example

Refactor cache

v0.16 - Support MCP