Midscene uses the OpenAI SDK to call AI services. Using this SDK limits the input and output schema of AI services, but it doesn't mean you can only use OpenAI's services. You can use any model service that supports the same interface (most platforms or tools support this).
In this article, we will show you how to config AI service provider and how to choose a different model. You may read Choose a model first to learn more about how to choose a model.
These are the most common configs, in which OPENAI_API_KEY
is required.
Name | Description |
---|---|
OPENAI_API_KEY |
Required. Your OpenAI API key (e.g. "sk-abcdefghijklmnopqrstuvwxyz") |
OPENAI_BASE_URL |
Optional. Custom endpoint URL for API endpoint. Use it to switch to a provider other than OpenAI (e.g. "https://some_service_name.com/v1") |
MIDSCENE_MODEL_NAME |
Optional. Specify a different model name other than gpt-4o |
Config to use Qwen 2.5 VL
model:
Name | Description |
---|---|
MIDSCENE_USE_QWEN_VL |
Optional. Set to "1" to use Qwen 2.5 VL model |
Config to use UI-TARS
model:
Name | Description |
---|---|
MIDSCENE_USE_VLM_UI_TARS |
Optional. Set to "1" to use UI-TARS model |
For more information about the models, see Choose a model.
Some advanced configs are also supported. Usually you don't need to use them.
Name | Description |
---|---|
OPENAI_USE_AZURE |
Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
MIDSCENE_OPENAI_INIT_CONFIG_JSON |
Optional. Custom JSON config for OpenAI SDK initialization |
MIDSCENE_OPENAI_SOCKS_PROXY |
Optional. Proxy configuration (e.g. "socks5://127.0.0.1:1080") |
OPENAI_MAX_TOKENS |
Optional. Maximum tokens for model response |
MIDSCENE_DEBUG_AI_PROFILE |
Optional. Set to "1" to print the AI usage and response time |
Pick one of the following ways to config environment variables.
This is what we used in our demo project.
Dotenv is a zero-dependency module that loads environment variables from a .env
file into process.env
.
Create a .env
file in your project root directory, and add the following content. There is no need to add export
before each line.
Import the dotenv module in your script. It will automatically read the environment variables from the .env
file.
There are some extra configs when using Azure OpenAI Service.
This mode cannot be used in Chrome extension.
gpt-4o
from OpenAIConfigure the environment variables:
qwen-vl-max-latest
from AliyunConfigure the environment variables:
ui-tars-72b-sft
hosted by yourselfConfigure the environment variables:
claude-3-opus-20240229
from AnthropicWhen configuring MIDSCENE_USE_ANTHROPIC_SDK=1
, Midscene will use Anthropic SDK (@anthropic-ai/sdk
) to call the model.
Configure the environment variables:
doubao-vision-pro-32k
from VolcengineCreate a inference point first: https://console.volcengine.com/ark/region:ark+cn-beijing/endpoint
In the inference point interface, find an ID like ep-202...
as the model name.
Configure the environment variables:
gemini-1.5-pro
from GoogleConfigure the environment variables:
If you want to troubleshoot connectivity issues, you can use the 'connectivity-test' folder in our example project: https://github.com/web-infra-dev/midscene-example/tree/main/connectivity-test
Put your .env
file in the connectivity-test
folder, and run the test with npm i && npm run test
.