You can view the ICAL learned examples here.
# Python 3.10+
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
pip install -e .
-
Setup the standalone environments. Please check out this page for details.
-
Configurate the urls for each website.
export CLASSIFIEDS="<your_classifieds_domain>:9980"
export CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c" # Default reset token for classifieds site, change if you edited its docker-compose.yml
export SHOPPING="<your_shopping_site_domain>:7770"
export REDDIT="<your_reddit_domain>:9999"
export WIKIPEDIA="<your_wikipedia_domain>:8888"
export HOMEPAGE="<your_homepage_domain>:4399"
You can also run the unit tests to ensure that VisualWebArena is installed correctly:
pytest -x
- Generate config files for each test example:
python scripts/generate_test_data.py
You will see *.json
files generated in the config_files folder. Each file contains the configuration for one test example.
- Obtain and save the auto-login cookies for all websites:
bash prepare.sh
- Set up API keys.
If using OpenAI models, set a valid OpenAI API key (starting with sk-
) as the environment variable:
export OPENAI_API_KEY=your_key
To run the evaluation, replace the paths in scripts/run_final_eval.sh
with your local paths. Then run the script:
sh scripts/run_ical_gpt4o_vwa.sh
To run the human-in-the-loop ICAL agent and to collect human correctable trajectories, replace the paths in scripts/human_in_the_loop.sh
with your local paths. Then run the script:
sh scripts/human_in_the_loop_gpt4o.sh
- Install vLLM:
pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
- Run the vLLM server and load your QWEN2VL model:
sh scripts/vllm/run_vllm.sh <path_to_qwen2vl_model>
- Run the evaluation:
sh scripts/run_ical_qwen2vl_vwa.sh
We provide a GUI that allows you to specify a config file, user intent, and model, and then step the agent and take actions, with optional human-in-the-loop. First, run the FastAPI server:
python run_gui.py
Then, open http://127.0.0.1:8000/public/index.html
in your browser.
We provide our scripts for the VLM abstraction and human-in-the-loop in ICAL_scripts
.
This code builds on VisualWebArena. Be sure to check them out!
@inproceedings{sarch2024vlm,
title={VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought},
author={Sarch, Gabriel Herbert and Jang, Lawrence and Tarr, Michael J and Cohen, William W and Marino, Kenneth and Fragkiadaki, Katerina},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024}
}