Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output format of AI services, but it doesn't mean you can only use OpenAI's models. You can use any model service that supports the same interface (most platforms or tools support this).
In this article, we will show you how to config AI service provider and how to choose a different model. You may read Choose a model to learn more about how to choose a model.
These are the most common configs, in which OPENAI_API_KEY
is required.
Name | Description |
---|---|
OPENAI_API_KEY |
Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
OPENAI_BASE_URL |
Optional. Custom endpoint URL for API endpoint. Often used to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
MIDSCENE_MODEL_NAME |
Optional. Specify a different model name (default is gpt-4o). Often used to switch to a different model. |
Config to use UI-TARS
model:
UI-TARS
is a dedicated model for UI automation. See more details in Choose a model.
Name | Description |
---|---|
MIDSCENE_USE_VLM_UI_TARS |
Optional. Set to "1" to use UI-TARS model. |
Some advanced configs are also supported. Usually you don't need to use them.
Name | Description |
---|---|
OPENAI_USE_AZURE |
Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
MIDSCENE_OPENAI_INIT_CONFIG_JSON |
Optional. Custom JSON config for OpenAI SDK initialization |
MIDSCENE_OPENAI_SOCKS_PROXY |
Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") |
OPENAI_MAX_TOKENS |
Optional. Maximum tokens for model response |
Pick one of the following ways to config environment variables.
This is what we used in our demo project.
Dotenv is a zero-dependency module that loads environment variables from a .env
file into process.env
.
Create a .env
file in your project root directory, and add the following content. There is no need to add export
before each line.
Import the dotenv module in your script. It will automatically read the environment variables from the .env
file.
claude-3-opus-20240229
from AnthropicWhen configuring MIDSCENE_USE_ANTHROPIC_SDK=1
, Midscene will use Anthropic SDK (@anthropic-ai/sdk
) to call the model.
Configure the environment variables:
There are some extra configs when using Azure OpenAI Service.
This mode cannot be used in Chrome extension.
gemini-1.5-pro
from GoogleConfigure the environment variables:
qwen-vl-max-latest
from AliyunConfigure the environment variables:
doubao-vision-pro-32k
from VolcengineCreate a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint
In the inference point interface, find an ID like ep-202...
as the model name.
Configure the environment variables:
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test
Put your .env
file in the connectivity-test
folder, and run the test with npm i && npm run test
.