Midscene.js - Joyful Automation by AI

UI automation testing is often difficult to maintain, which often involves a maze of #ids, data-test attributes, and .selectors. When it comes to refactoring, it can be a nightmare, although this is precisely the situation where UI automation should be useful.

Introducing Midscene.js, an innovative SDK designed to bring joy back to automation scripts by simplifying the commands.

Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.

Currently, the model we are using by default is the OpenAI GPT-4o model, while you can customize it to a different model if needed.

Interact, query and assert by natural language

There are three main capabilities: action (.ai, .aiAction), query (.aiQuery), assert(.aiAssert).

  • Use .ai to execute a series of actions by describing the steps
  • Use .aiQuery to extract customized data from the UI. Just describe the JSON format you want, and AI will give the answer based on its "understand" of the page
  • Use .aiAssert to perform assertions on the page.

All these methods accept natural language prompt as param. Obviously, the cost of script maintenance will be greatly decreased.

For example

// 👀 type keywords, perform a search
await ai('type "Headphones" in search box, hit Enter');

// 👀 find the items, return in JSON
const items = await aiQuery(
  "{itemTitle: string, price: Number}[], find item in list and corresponding price"
);

console.log("headphones in stock", items);

// 👀 assert by natural language
await aiAssert("There is a category filter on the left");

Multiple ways to integrate

To start experiencing the core feature of Midscene, we recommend you use The Chrome Extension. You can call Action / Query / Assert by natural language on any webpage, without needing to set up a code project.

Also, there are several ways to integrate Midscene into your code project:

Visualized report

Midscene will provide a visual report after each run. With this report, you can review the animated replay and view the details of each step in the process. What's more, there is a playground in the report file for you to adjust your prompt without re-running all your scripts.

Just you and model provider, no third-party services

⁠Midscene.js is an open-source project (GitHub: Midscene) under the MIT license. You can run it in your own environment. All data gathered from pages will be sent directly to OpenAI or the custom model provider according to your configuration. Therefore, only you and the model provider will have access to the data. No third-party platform will access the data.