v0.17.4 Changelog - Let AI See the DOM of the Page

Data Query API Enhanced

To meet more automation and data extraction scenarios, the following APIs have been enhanced with the options parameter, supporting more flexible DOM information and screenshots:

  • agent.aiQuery(dataDemand, options)
  • agent.aiBoolean(prompt, options)
  • agent.aiNumber(prompt, options)
  • agent.aiString(prompt, options)

Newoptions parameter

  • domIncluded: Whether to pass the simplified DOM information to AI model, default is off. This is useful for extracting attributes that are not visible on the page, like image links.
  • screenshotIncluded: Whether to pass the screenshot to AI model, default is on.

Code Example

// Extract all contact information (including hidden avatarUrl attributes)
const contactsData = await agent.aiQuery(
  "{name: string, id: number, company: string, department: string, avatarUrl: string}[], extract all contact information including hidden avatarUrl attributes",
  { domIncluded: true }
);

// Check if the id attribute of the first contact is 1
const isId1 = await agent.aiBoolean(
  "Is the first contact's id is 1?",
  { domIncluded: true }
);

// Get the ID of the first contact (hidden attribute)
const firstContactId = await agent.aiNumber("First contact's id?", { domIncluded: true });

// Get the avatar URL of the first contact (invisible attribute on the page)
const avatarUrl = await agent.aiString(
  "What is the Avatar URL of the first contact?",
  { domIncluded: true }
);

New Right-Click Ability

Have you ever encountered a scenario where you need to automate a right-click operation? Now, Midscene supports a new agent.aiRightClick() method!

Function

Perform a right-click operation on the specified element, suitable for scenarios where right-click events are customized on web pages. Please note that Midscene cannot interact with the browser's native context menu after right-click.

Parameter Description

  • locate: Describe the element you want to operate in natural language
  • options: Optional, supports deepThink (AI fine-grained positioning) and cacheable (result caching)

Example

// Right-click on a contact in the contacts application, triggering a custom context menu
await agent.aiRightClick("Alice Johnson");

// Then you can click on the options in the menu
await agent.aiTap("Copy Info"); // Copy contact information to the clipboard

A Complete Example

In this report file, we show a complete example of using the new aiRightClick API and new query parameters to extract contact data including hidden attributes.

Report file: puppeteer-2025-06-04_20-34-48-zyh4ry4e.html

The corresponding code can be found in our example repository: puppeteer-demo/extract-data.ts