Automate with Scripts in YAML
In most cases, developers write automation scripts just to perform some smoke tests, like checking for the appearance of certain content or verifying that a key user path is accessible. In such situations, maintaining a large test project is unnecessary.
Midscene offers a way to perform automation using .yaml
files, which helps you focus on the script itself rather than the testing framework. This allows any team member to write automation scripts without needing to learn any API.
Here is an example. By reading its content, you should be able to understand how it works.
web:
url: https://www.bing.com
tasks:
- name: Search for weather
flow:
- ai: Search for "today's weather"
- sleep: 3000
- name: Check results
flow:
- aiAssert: The results show weather information
Sample Project
You can find a sample project that uses YAML scripts for automation here:
Setup AI model service
Set your model configs into the environment variables. You may refer to choose a model for more details.
# replace with your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
Alternatively, you can use a .env
file in the same directory where you run the command to store your configuration. The Midscene command-line tool will automatically load it when running a YAML script.
.env
OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
Install the Command-Line Tool
Install @midscene/cli
globally:
npm i -g @midscene/cli
# Or install it within your project
npm i @midscene/cli --save-dev
Create a file named bing-search.yaml
to drive a web browser automation task:
web:
url: https://www.bing.com
tasks:
- name: Search for weather
flow:
- ai: Search for "today's weather"
- sleep: 3000
- aiAssert: The results show weather information
Or, to drive an Android device automation task (requires the device to be connected via adb):
android:
# launch: https://www.bing.com
deviceId: s4ey59
tasks:
- name: Search for weather
flow:
- ai: Open the browser and navigate to bing.com
- ai: Search for "today's weather"
- sleep: 3000
- aiAssert: The results show weather information
Run the script:
midscene ./bing-search.yaml
# Or if you installed midscene in your project
npx midscene ./bing-search.yaml
You will see the script's execution progress and the visual report file.
Script File Structure
Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.
A standard .yaml
script file includes a web
or android
section to configure the environment, and a tasks
section to define the automation tasks.
web:
url: https://www.bing.com
# The tasks section defines the series of steps to be executed
tasks:
- name: Search for weather
flow:
- ai: Search for "today's weather"
- sleep: 3000
- aiAssert: The results show weather information
Theweb
part
web:
# The URL to visit, required. If `serve` is provided, provide the relative path.
url: <url>
# Serve a local path as a static server, optional.
serve: <root-directory>
# The browser user agent, optional.
userAgent: <ua>
# The browser viewport width, optional, defaults to 1280.
viewportWidth: <width>
# The browser viewport height, optional, defaults to 960.
viewportHeight: <height>
# The browser's device pixel ratio, optional, defaults to 1.
deviceScaleFactor: <scale>
# Path to a JSON format browser cookie file, optional.
cookie: <path-to-cookie-file>
# The strategy for waiting for network idle, optional.
waitForNetworkIdle:
# The timeout in milliseconds, optional, defaults to 2000ms.
timeout: <ms>
# Whether to continue on timeout, optional, defaults to true.
continueOnNetworkIdleError: <boolean>
# The path to the JSON file for outputting aiQuery/aiAssert results, optional.
output: <path-to-output-file>
# Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
unstableLogContent: <boolean | path-to-unstable-log-file>
# Whether to restrict page navigation to the current tab, optional, defaults to true.
forceSameTabNavigation: <boolean>
# The bridge mode, optional, defaults to false. Can be 'newTabWithUrl' or 'currentTab'. See below for more details.
bridgeMode: false | 'newTabWithUrl' | 'currentTab'
# Whether to close newly created tabs when the bridge disconnects, optional, defaults to false.
closeNewTabsAfterDisconnect: <boolean>
# Whether to ignore HTTPS certificate errors, optional, defaults to false.
acceptInsecureCerts: <boolean>
# Background knowledge to send to the AI model when calling aiAction, optional.
aiActionContext: <string>
Theandroid
part
android:
# The device ID, optional, defaults to the first connected device.
deviceId: <device-id>
# The launch URL, optional, defaults to the device's current page.
launch: <url>
# The path to the JSON file for outputting aiQuery/aiAssert results, optional.
output: <path-to-output-file>
Thetasks
part
The tasks
part is an array that defines the steps of the script. Remember to add a -
before each step to indicate it's an array item.
The interfaces in the flow
section are almost identical to the API, with some differences in parameter nesting levels.
tasks:
- name: <name>
continueOnError: <boolean> # Optional, whether to continue to the next task on error, defaults to false.
flow:
# Auto Planning (.ai)
# ----------------
# Perform an interaction. `ai` is a shorthand for `aiAction`.
- ai: <prompt>
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# This usage is the same as `ai`.
- aiAction: <prompt>
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Instant Action (.aiTap, .aiHover, .aiInput, .aiKeyboardPress, .aiScroll)
# ----------------
# Tap an element described by a prompt.
- aiTap: <prompt>
deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Hover over an element described by a prompt.
- aiHover: <prompt>
deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Input text into an element described by a prompt.
- aiInput: <final text content of the input>
locate: <prompt>
deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Press a key (e.g., Enter, Tab, Escape) on an element described by a prompt.
- aiKeyboardPress: <key>
locate: <prompt>
deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Scroll globally or on an element described by a prompt.
- aiScroll:
direction: 'up' # or 'down' | 'left' | 'right'
scrollType: 'once' # or 'untilTop' | 'untilBottom' | 'untilLeft' | 'untilRight'
distance: <number> # Optional, the scroll distance in pixels.
locate: <prompt> # Optional, the element to scroll on.
deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.
# Log the current screenshot with a description in the report file.
- logScreenshot: <title> # Optional, the title of the screenshot. If not provided, the title will be 'untitled'.
content: <content> # Optional, the description of the screenshot.
# Data Extraction
# ----------------
# Perform a query that returns a JSON object.
- aiQuery: <prompt> # Remember to describe the format of the result in the prompt.
name: <name> # The key for the query result in the JSON output.
# More APIs
# ----------------
# Wait for a condition to be met, with a timeout (in ms, optional, defaults to 30000).
- aiWaitFor: <prompt>
timeout: <ms>
# Perform an assertion.
- aiAssert: <prompt>
errorMessage: <error-message> # Optional, the error message to print if the assertion fails.
# Wait for a specified amount of time.
- sleep: <ms>
# Execute a piece of JavaScript code in the web page context.
- javascript: <javascript>
name: <name> # Optional, assign a name to the return value, which will be used as a key in the JSON output.
- name: <name>
flow:
# ...
Advanced Usage of the Command-Line Tool
@midscene/cli
provides flexible ways to run your automation scripts.
Run One or More Scripts
You can pass a single .yaml
script file or use a glob pattern to match multiple .yaml
files to the midscene
command. This is a shorthand for the --files
argument.
# Run a single script
midscene ./bing-search.yaml
# Use a glob pattern to run all matching scripts
midscene './scripts/**/*.yaml'
Command-Line Options
The command-line tool provides several options to control the execution behavior of your scripts.
--files <file1> <file2> ...
: Specifies a list of script files to execute, which will be run in order. Supports glob patterns, following the syntax supported by glob.
--concurrent <number>
: Sets the number of concurrent executions. Defaults to 1
.
--continue-on-error
: If this flag is set, it will continue to run the remaining scripts even if one fails. Defaults to off.
--share-browser-context
: Shares the same browser context (e.g., Cookies and localStorage
) across all scripts. This is very useful for sequential tests that require a login state. Defaults to off.
--summary <filename>
: Specifies the path for the generated JSON format summary report file.
--headed
: Runs the script in a browser with a graphical user interface, rather than in headless mode.
--keep-window
: Keeps the browser window open after the script execution is complete. This option automatically enables --headed
mode.
--config <filename>
: Specifies a configuration file. Parameters in the config file will be used as default values for the command-line arguments.
--web.userAgent <ua>
: Sets the browser UA, which will override the web.userAgent
parameter in all script files.
--web.viewportWidth <width>
: Sets the browser viewport width, which will override the web.viewportWidth
parameter in all script files.
--web.viewportHeight <height>
: Sets the browser viewport height, which will override the web.viewportHeight
parameter in all script files.
--android.deviceId <device-id>
: Sets the Android device ID, which will override the android.deviceId
parameter in all script files.
--dotenv-debug
: Sets the debug log for dotenv, disabled by default.
--dotenv-override
: Sets whether dotenv overrides global environment variables with the same name, disabled by default.
Examples:
Use the --files
argument to specify the file order and run in parallel.
midscene --files ./login.yaml ./buy/*.yaml ./checkout.yaml
Run all scripts with a concurrency of 4 and continue on any file error.
midscene --files './scripts/**/*.yaml' --concurrent 4 --continue-on-error
Writing Command-Line Arguments in a File
You can write a configuration file in YAML format and reference it with --config
. When invoking the command-line tool, command-line arguments have higher priority than the configuration file.
files:
- './scripts/login.yaml'
- './scripts/search.yaml'
- './scripts/**/*.yaml'
concurrent: 4
continueOnError: true
shareBrowserContext: true
Usage:
midscene --config ./config.yaml
More Features
Using Environment Variables in.yaml
Files
You can use environment variables in your .yaml
files with the ${variable-name}
syntax.
For example, if you have a .env
file with the following content:
You can use the environment variable in your .yaml
file like this:
#...
- ai: type ${topic} in input box
#...
Running in Headed Mode
web
scenarios only
'Headed' mode means the browser window will be visible. By default, scripts run in headless mode.
To run in headed mode, you can use the --headed
option. Additionally, if you want to keep the browser window open after the script finishes, you can use the --keep-window
option. The --keep-window
option automatically enables --headed
mode.
Headed mode consumes more resources, so it is recommended to use it only locally.
# Run in headed mode
midscene /path/to/yaml --headed
# Run in headed mode and keep the browser window open afterward
midscene /path/to/yaml --keep-window
Using Bridge Mode
web
scenarios only
By using bridge mode, you can leverage YAML scripts to automate your existing desktop browser. This is particularly useful for reusing cookies, plugins, and page states, or for interacting manually with automation scripts.
To use bridge mode, you first need to install the Chrome extension and then use the following configuration in the target
section:
web:
url: https://www.bing.com
+ bridgeMode: newTabWithUrl
Please refer to Bridge Mode via Chrome Extension for more details.
Running YAML Scripts with JavaScript
You can also run a YAML script using JavaScript by calling the runYaml
method on the Agent. Note that this method will only execute the tasks
part of the YAML script.
Analyzing Command-Line Tool Results
After execution is complete, the following files will be generated in the output directory:
- The file path specified by the
--summary
option (defaults to index.json
), containing the execution status and statistics of all files.
- The individual execution results of each YAML file (in JSON format).
- The visual reports for each file (in HTML format).
Configure dotenv's Default Behavior
Midscene uses dotenv
to load environment variables from a .env
file.
Disable dotenv's Debug Logs
By default, Midscene prints debug information from dotenv. If you don't want to see this information, you can use the --dotenv-debug
option to disable it.
midscene /path/to/yaml --dotenv-debug=false
Use .env to Override Global Environment Variables
By default, dotenv
will not override global environment variables with the same name found in the .env
file. If you want to override them, you can use the --dotenv-override
option.
midscene /path/to/yaml --dotenv-override=true
FAQ
How can I get cookies in JSON format from Chrome?
You can use this Chrome extension to export cookies in JSON format.
How can I open the debug log for dotenv?
Midscene uses dotenv
to load environment variables from a .env
file. You can use the --dotenv-debug
option to open the debug log for dotenv.
midscene /path/to/yaml --dotenv-debug=true
More
You might also be interested in Prompting Tips.