Introduction

UI automation can be frustrating, often involving a maze of #ids, data-test attributes, and .selectors that are difficult to maintain, especially when the page undergoes a refactor.

Introducing Midscene.js, an innovative SDK designed to bring joy back to programming by simplifying automation tasks.

Midscene.js leverages a multimodal Large Language Model (LLM) to intuitively “understand” your user interface and carry out the necessary actions. You can simply describe the interaction steps or expected data formats, and the AI will handle the execution for you.

Features

Interact, query and assert by natural language

Use .aiAction to perform a series of actions by describing the steps, .aiQuery to extract customized data from the UI, and .aiAssert to perform assertions on the page.

It is all based on natural language processing, bringing you a new experience in writing automation.

For example

// 👀 type keywords, perform a search
await ai('type "Headphones" in search box, hit Enter');

// 👀 find the items
const items = await aiQuery(
  "{itemTitle: string, price: Number}[], find item in list and corresponding price"
);

console.log("headphones in stock", items);

Immediate Integration

Quickly integrate GPT-4o and web automation tools like Playwright or Puppeteer into your project with Midscene.

You can setup your automation now with all your familiar tools. No custom training is need.

Visualization Tool

With our visualization tool, you can easily debug the prompt and AI response. All intermediate data, such as queries, plans, and actions, can be visualized.

You may open the Online Visualization Tool to see the showcase.

Flow Chart

Here is a flowchart that describes the core process of the interaction between Midscene and AI.