English

Automate with Scripts in YAML

In most cases, developers write automation scripts just to perform some smoke tests, like checking for the appearance of certain content or verifying that a key user path is accessible. In such situations, maintaining a large test project is unnecessary.

Midscene offers a way to perform automation using .yaml files, which helps you focus on the script itself rather than the testing framework. This allows any team member to write automation scripts without needing to learn any API.

Here is an example. By reading its content, you should be able to understand how it works.

web:
  url: https://www.bing.com

tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000

  - name: Check results
    flow:
      - aiAssert: The results show weather information

Sample Project

You can find a sample project that uses YAML scripts for automation here:

Web
Android

Setup AI model service

Set your model configs into the environment variables. You may refer to choose a model for more details.

# replace with your own
export OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

# You may need more configs, such as model name and endpoint, please refer to [choose a model](../choose-a-model)
export OPENAI_BASE_URL="..."

Alternatively, you can use a .env file in the same directory where you run the command to store your configuration. The Midscene command-line tool will automatically load it when running a YAML script.

OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"

Install the Command-Line Tool

Install @midscene/cli globally:

npm i -g @midscene/cli
# Or install it within your project
npm i @midscene/cli --save-dev

Create a file named bing-search.yaml to drive a web browser automation task:

web:
  url: https://www.bing.com

tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information

Or, to drive an Android device automation task (requires the device to be connected via adb):

android:
  # launch: https://www.bing.com
  deviceId: s4ey59

tasks:
  - name: Search for weather
    flow:
      - ai: Open the browser and navigate to bing.com
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information

Run the script:

midscene ./bing-search.yaml
# Or if you installed midscene in your project
npx midscene ./bing-search.yaml

You will see the script's execution progress and the visual report file.

Script File Structure

Script files use YAML format to describe automation tasks. It defines the target to be manipulated (like a webpage or an Android app) and the series of steps to perform.

A standard .yaml script file includes a web or android section to configure the environment, and a tasks section to define the automation tasks.

web:
  url: https://www.bing.com

# The tasks section defines the series of steps to be executed
tasks:
  - name: Search for weather
    flow:
      - ai: Search for "today's weather"
      - sleep: 3000
      - aiAssert: The results show weather information

The `web` part

web:
  # The URL to visit, required. If `serve` is provided, provide the relative path.
  url: <url>

  # Serve a local path as a static server, optional.
  serve: <root-directory>

  # The browser user agent, optional.
  userAgent: <ua>

  # The browser viewport width, optional, defaults to 1280.
  viewportWidth: <width>

  # The browser viewport height, optional, defaults to 960.
  viewportHeight: <height>

  # The browser's device pixel ratio, optional, defaults to 1.
  deviceScaleFactor: <scale>

  # Path to a JSON format browser cookie file, optional.
  cookie: <path-to-cookie-file>

  # The strategy for waiting for network idle, optional.
  waitForNetworkIdle:
    # The timeout in milliseconds, optional, defaults to 2000ms.
    timeout: <ms>
    # Whether to continue on timeout, optional, defaults to true.
    continueOnNetworkIdleError: <boolean>

  # The path to the JSON file for outputting aiQuery/aiAssert results, optional.
  output: <path-to-output-file>

  # Whether to save log content to a JSON file, optional, defaults to `false`. If true, saves to `unstableLogContent.json`. If a string, saves to the specified path. The log content structure may change in the future.
  unstableLogContent: <boolean | path-to-unstable-log-file>

  # Whether to restrict page navigation to the current tab, optional, defaults to true.
  forceSameTabNavigation: <boolean>

  # The bridge mode, optional, defaults to false. Can be 'newTabWithUrl' or 'currentTab'. See below for more details.
  bridgeMode: false | 'newTabWithUrl' | 'currentTab'

  # Whether to close newly created tabs when the bridge disconnects, optional, defaults to false.
  closeNewTabsAfterDisconnect: <boolean>

  # Whether to ignore HTTPS certificate errors, optional, defaults to false.
  acceptInsecureCerts: <boolean>

  # Background knowledge to send to the AI model when calling aiAction, optional.
  aiActionContext: <string>

The `android` part

android:
  # The device ID, optional, defaults to the first connected device.
  deviceId: <device-id>

  # The launch URL, optional, defaults to the device's current page.
  launch: <url>

  # The path to the JSON file for outputting aiQuery/aiAssert results, optional.
  output: <path-to-output-file>

The `tasks` part

The tasks part is an array that defines the steps of the script. Remember to add a - before each step to indicate it's an array item.

The interfaces in the flow section are almost identical to the API, with some differences in parameter nesting levels.

tasks:
  - name: <name>
    continueOnError: <boolean> # Optional, whether to continue to the next task on error, defaults to false.
    flow:
      # Auto Planning (.ai)
      # ----------------

      # Perform an interaction. `ai` is a shorthand for `aiAction`.
      - ai: <prompt>
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # This usage is the same as `ai`.
      - aiAction: <prompt>
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Instant Action (.aiTap, .aiHover, .aiInput, .aiKeyboardPress, .aiScroll)
      # ----------------

      # Tap an element described by a prompt.
      - aiTap: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Hover over an element described by a prompt.
      - aiHover: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Input text into an element described by a prompt.
      - aiInput: <final text content of the input>
        locate: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Press a key (e.g., Enter, Tab, Escape) on an element described by a prompt.
      - aiKeyboardPress: <key>
        locate: <prompt>
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Scroll globally or on an element described by a prompt.
      - aiScroll:
        direction: 'up' # or 'down' | 'left' | 'right'
        scrollType: 'once' # or 'untilTop' | 'untilBottom' | 'untilLeft' | 'untilRight'
        distance: <number> # Optional, the scroll distance in pixels.
        locate: <prompt> # Optional, the element to scroll on.
        deepThink: <boolean> # Optional, whether to use deepThink to precisely locate the element. Defaults to False.
        xpath: <xpath> # Optional, the xpath of the target element for the operation. If provided, Midscene will prioritize this xpath to find the element before using the cache and the AI model. Defaults to empty.
        cacheable: <boolean> # Optional, whether to cache the result of this API call when the [caching feature](./caching.mdx) is enabled. Defaults to True.

      # Log the current screenshot with a description in the report file.
      - logScreenshot: <title> # Optional, the title of the screenshot. If not provided, the title will be 'untitled'.
        content: <content> # Optional, the description of the screenshot.

      # Data Extraction
      # ----------------

      # Perform a query that returns a JSON object.
      - aiQuery: <prompt> # Remember to describe the format of the result in the prompt.
        name: <name> # The key for the query result in the JSON output.

      # More APIs
      # ----------------

      # Wait for a condition to be met, with a timeout (in ms, optional, defaults to 30000).
      - aiWaitFor: <prompt>
        timeout: <ms>

      # Perform an assertion.
      - aiAssert: <prompt>
        errorMessage: <error-message> # Optional, the error message to print if the assertion fails.

      # Wait for a specified amount of time.
      - sleep: <ms>

      # Execute a piece of JavaScript code in the web page context.
      - javascript: <javascript>
        name: <name> # Optional, assign a name to the return value, which will be used as a key in the JSON output.

  - name: <name>
    flow:
      # ...

Advanced Usage of the Command-Line Tool

@midscene/cli provides flexible ways to run your automation scripts.

Run One or More Scripts

You can pass a single .yaml script file or use a glob pattern to match multiple .yaml files to the midscene command. This is a shorthand for the --files argument.

# Run a single script
midscene ./bing-search.yaml

# Use a glob pattern to run all matching scripts
midscene './scripts/**/*.yaml'

Command-Line Options

The command-line tool provides several options to control the execution behavior of your scripts.

--files <file1> <file2> ...: Specifies a list of script files to execute, which will be run in order. Supports glob patterns, following the syntax supported by glob.
--concurrent <number>: Sets the number of concurrent executions. Defaults to 1.
--continue-on-error: If this flag is set, it will continue to run the remaining scripts even if one fails. Defaults to off.
--share-browser-context: Shares the same browser context (e.g., Cookies and localStorage) across all scripts. This is very useful for sequential tests that require a login state. Defaults to off.
--summary <filename>: Specifies the path for the generated JSON format summary report file.
--headed: Runs the script in a browser with a graphical user interface, rather than in headless mode.
--keep-window: Keeps the browser window open after the script execution is complete. This option automatically enables --headed mode.
--config <filename>: Specifies a configuration file. Parameters in the config file will be used as default values for the command-line arguments.
--web.userAgent <ua>: Sets the browser UA, which will override the web.userAgent parameter in all script files.
--web.viewportWidth <width>: Sets the browser viewport width, which will override the web.viewportWidth parameter in all script files.
--web.viewportHeight <height>: Sets the browser viewport height, which will override the web.viewportHeight parameter in all script files.
--android.deviceId <device-id>: Sets the Android device ID, which will override the android.deviceId parameter in all script files.
--dotenv-debug: Sets the debug log for dotenv, disabled by default.
--dotenv-override: Sets whether dotenv overrides global environment variables with the same name, disabled by default.

Examples:

Use the --files argument to specify the file order and run in parallel.

midscene --files ./login.yaml ./buy/*.yaml ./checkout.yaml

Run all scripts with a concurrency of 4 and continue on any file error.

midscene --files './scripts/**/*.yaml' --concurrent 4 --continue-on-error

Writing Command-Line Arguments in a File

You can write a configuration file in YAML format and reference it with --config. When invoking the command-line tool, command-line arguments have higher priority than the configuration file.

files:
  - './scripts/login.yaml'
  - './scripts/search.yaml'
  - './scripts/**/*.yaml'

concurrent: 4
continueOnError: true
shareBrowserContext: true

Usage:

midscene --config ./config.yaml

More Features

Using Environment Variables in `.yaml` Files

You can use environment variables in your .yaml files with the ${variable-name} syntax.

For example, if you have a .env file with the following content:

topic=weather today

You can use the environment variable in your .yaml file like this:

#...
- ai: type ${topic} in input box
#...

Running in Headed Mode

web scenarios only

'Headed' mode means the browser window will be visible. By default, scripts run in headless mode.

To run in headed mode, you can use the --headed option. Additionally, if you want to keep the browser window open after the script finishes, you can use the --keep-window option. The --keep-window option automatically enables --headed mode.

Headed mode consumes more resources, so it is recommended to use it only locally.

# Run in headed mode
midscene /path/to/yaml --headed

# Run in headed mode and keep the browser window open afterward
midscene /path/to/yaml --keep-window

Using Bridge Mode

web scenarios only

By using bridge mode, you can leverage YAML scripts to automate your existing desktop browser. This is particularly useful for reusing cookies, plugins, and page states, or for interacting manually with automation scripts.

To use bridge mode, you first need to install the Chrome extension and then use the following configuration in the target section:

web:
  url: https://www.bing.com
+ bridgeMode: newTabWithUrl

Please refer to Bridge Mode via Chrome Extension for more details.

Running YAML Scripts with JavaScript

You can also run a YAML script using JavaScript by calling the runYaml method on the Agent. Note that this method will only execute the tasks part of the YAML script.

Analyzing Command-Line Tool Results

After execution is complete, the following files will be generated in the output directory:

The file path specified by the --summary option (defaults to index.json), containing the execution status and statistics of all files.
The individual execution results of each YAML file (in JSON format).
The visual reports for each file (in HTML format).

Configure dotenv's Default Behavior

Midscene uses dotenv to load environment variables from a .env file.

Disable dotenv's Debug Logs

By default, Midscene prints debug information from dotenv. If you don't want to see this information, you can use the --dotenv-debug option to disable it.

midscene /path/to/yaml --dotenv-debug=false

Use .env to Override Global Environment Variables

By default, dotenv will not override global environment variables with the same name found in the .env file. If you want to override them, you can use the --dotenv-override option.

midscene /path/to/yaml --dotenv-override=true

FAQ

How can I get cookies in JSON format from Chrome?

You can use this Chrome extension to export cookies in JSON format.

How can I open the debug log for dotenv?

Midscene uses dotenv to load environment variables from a .env file. You can use the --dotenv-debug option to open the debug log for dotenv.

midscene /path/to/yaml --dotenv-debug=true

You might also be interested in Prompting Tips.

#Automate with Scripts in YAML

#Setup AI model service

#Install the Command-Line Tool

#Script File Structure

#The web part

#The android part

#The tasks part

#Advanced Usage of the Command-Line Tool

#Run One or More Scripts

#Command-Line Options

#Writing Command-Line Arguments in a File

#More Features

#Using Environment Variables in .yaml Files

#Running in Headed Mode

#Using Bridge Mode

#Running YAML Scripts with JavaScript

#Analyzing Command-Line Tool Results

#Configure dotenv's Default Behavior

#Disable dotenv's Debug Logs

#Use .env to Override Global Environment Variables

#FAQ

#More