Each Agent in Midscene has its own constructor.
These Agents share some common constructor parameters:
generateReport: boolean
: If true, a report file will be generated. (Default: true)autoPrintReportMsg: boolean
: If true, report messages will be printed. (Default: true)cacheId: string | undefined
: If provided, this cacheId will be used to match the cache. (Default: undefined)In Puppeteer, there is an additional parameter:
forceSameTabNavigation: boolean
: If true, page navigation is restricted to the current tab. (Default: true)Below are the main APIs available for the various Agents in Midscene.
In the documentation below, you might see function calls prefixed with
agent.
. If you utilize destructuring in Playwright (e.g.,async ({ ai, aiQuery }) => { /* ... */ }
), you can call these functions without theagent.
prefix. This is merely a syntactical difference.
agent.aiAction()
or .ai()
This method allows you to perform a series of UI actions described in natural language. Midscene automatically parses and executes the steps.
Parameters:
steps: string
- A natural language description of the UI steps.Return Value:
Examples:
For optimal results, please provide clear and detailed instructions. Avoid vague commands (e.g., "post a tweet"), as they may lead to unstable or failed execution.
Under the hood, Midscene sends the page context and screenshots to the LLM to plan the steps in detail. It then executes these steps sequentially. If Midscene determines that the actions cannot be performed, an error will be thrown.
Your task is decomposed into the following built-in methods, which you can view in the visual report:
Currently, Midscene does not support planning steps with conditions or loops.
Related Documentation:
agent.aiQuery()
This method allows you to extract data directly from the UI using multimodal AI reasoning capabilities. Simply define the expected format (e.g., string, number, JSON, or an array) in the dataDemand
, and Midscene will return a result that matches the format.
Parameters:
dataShape: T
: A description of the expected return format.Return Value:
dataDemand
, and Midscene will return a matching result.Examples:
agent.aiAssert()
This method lets you specify an assertion in natural language, and the AI determines whether the condition is true. If the assertion fails, the SDK throws an error that includes both the optional errorMsg
and a detailed reason generated by the AI.
Parameters:
assertion: string
- The assertion described in natural language.errorMsg?: string
- An optional error message to append if the assertion fails.Return Value:
errorMsg
and additional AI-provided information.Example:
Assertions are critical in test scripts. To reduce the risk of errors due to AI hallucination (e.g., missing an error), you can also combine .aiQuery
with standard JavaScript assertions instead of using .aiAssert
.
For example, you might replace the above code with:
agent.aiWaitFor()
This method allows you to wait until a specified condition, described in natural language, becomes true. Considering the cost of AI calls, the check interval will not exceed the specified checkIntervalMs
.
Parameters:
assertion: string
- The condition described in natural language.options?: object
- An optional configuration object containing:
timeoutMs?: number
- Timeout in milliseconds (default: 15000).checkIntervalMs?: number
- Interval for checking in milliseconds (default: 3000).Return Value:
Examples:
Given the time consumption of AI services, .aiWaitFor
might not be the most efficient method. Sometimes, using a simple sleep function may be a better alternative.
agent.runYaml()
This method executes an automation script written in YAML. Only the tasks
part of the script is executed, and it returns the results of all .aiQuery
calls within the script.
Parameters:
yamlScriptContent: string
- The YAML-formatted script content.Return Value:
result
property that includes the results of all .aiQuery
calls.Example:
For more information about YAML scripts, please refer to Automate with Scripts in YAML.
.reportFile
The path to the report file.
You can override environment variables at runtime by calling the overrideAIConfig
method.
Set the MIDSCENE_DEBUG_AI_PROFILE
variable to view the execution time and usage for each AI call.
LangSmith is a platform for debugging large language models. To integrate LangSmith, follow these steps:
After starting Midscene, you should see logs similar to: