Running ICAL in VisualWebArena

ICAL examples

You can view the ICAL learned examples here.

Install

# Python 3.10+
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
pip install -e .

Setup

Setup the standalone environments. Please check out this page for details.
Configurate the urls for each website.

export CLASSIFIEDS="<your_classifieds_domain>:9980"
export CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c"  # Default reset token for classifieds site, change if you edited its docker-compose.yml
export SHOPPING="<your_shopping_site_domain>:7770"
export REDDIT="<your_reddit_domain>:9999"
export WIKIPEDIA="<your_wikipedia_domain>:8888"
export HOMEPAGE="<your_homepage_domain>:4399"

You can also run the unit tests to ensure that VisualWebArena is installed correctly:

pytest -x

Generate config files for each test example:

python scripts/generate_test_data.py

You will see *.json files generated in the config_files folder. Each file contains the configuration for one test example.

Obtain and save the auto-login cookies for all websites:

bash prepare.sh

Set up API keys.

If using OpenAI models, set a valid OpenAI API key (starting with sk-) as the environment variable:

export OPENAI_API_KEY=your_key

Running the ICAL evaluation

To run the evaluation, replace the paths in scripts/run_final_eval.sh with your local paths. Then run the script:

sh scripts/run_ical_gpt4o_vwa.sh

Running the Human-in-the-Loop Feature

To run the human-in-the-loop ICAL agent and to collect human correctable trajectories, replace the paths in scripts/human_in_the_loop.sh with your local paths. Then run the script:

sh scripts/human_in_the_loop_gpt4o.sh

Running with qwen2vl

Install vLLM:

pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl

Run the vLLM server and load your QWEN2VL model:

sh scripts/vllm/run_vllm.sh <path_to_qwen2vl_model>

Run the evaluation:

sh scripts/run_ical_qwen2vl_vwa.sh

Running the human in the loop GUI

We provide a GUI that allows you to specify a config file, user intent, and model, and then step the agent and take actions, with optional human-in-the-loop. First, run the FastAPI server:

python run_gui.py

Then, open http://127.0.0.1:8000/public/index.html in your browser.

ICAL scripts

We provide our scripts for the VLM abstraction and human-in-the-loop in ICAL_scripts.

Acknowledgements

This code builds on VisualWebArena. Be sure to check them out!

Citation

@inproceedings{sarch2024vlm,
  title={VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought},
  author={Sarch, Gabriel Herbert and Jang, Lawrence and Tarr, Michael J and Cohen, William W and Marino, Kenneth and Fragkiadaki, Katerina},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Running ICAL in VisualWebArena

Contents

ICAL examples

Install

Setup

Running the ICAL evaluation

Running the Human-in-the-Loop Feature

Running with qwen2vl

Running the human in the loop GUI

ICAL scripts

Acknowledgements

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Running ICAL in VisualWebArena

Contents

ICAL examples

Install

Setup

Running the ICAL evaluation

Running the Human-in-the-Loop Feature

Running with qwen2vl

Running the human in the loop GUI

ICAL scripts

Acknowledgements

Citation