Skip to content

Commit

Permalink
Sotopia API and UI (#264)
Browse files Browse the repository at this point in the history
* feat: FastAPI Implementation of Sotopia Part Two (w websocket) (#252)

* api doc

* add PUT

* add an temp example for websocket

* websocket

* update readme

* Update README.md

* update websocket live simulation api doc

* [autofix.ci] apply automated fixes

* update websocket doc

* add api server with websocket as well as a client

* fix mypy errors

* support stopping the chat

* add 404 to the status code

* fix mypy issue

* update the returned message types

* redesign websocket api

* update websocket, fix mypy error

* add example of using websocket

* clean code & change to existing functions for simulation

* fix typing mismatch

* update doc & mypy type fix

* add type check for run_async_server

* move example

---------

Co-authored-by: Hao Zhu <[email protected]>
Co-authored-by: Zhe Su <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* Add customizable evaluation dimensions (#256)

* add customizable evaluation dimensions

* add docs

* fix mypy error & refactor examples

* add docs for evaluation dimensions

* update docs and examples

* add test cases and fix mypy issue

* fix mypy issue

* Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk) (#262)

Co-authored-by: openhands <[email protected]>

* Fix/custom eval dimension test (#263)

* Fix test_create_custom_dimension to use CustomEvaluationDimension.get(pk)

* Update documentation for SotopiaDimension and EvaluationDimensionBuilder

* [autofix.ci] apply automated fixes

* Add API documentation for evaluation dimensions

* Refine API documentation for evaluation_dimensions.py to match style

* [autofix.ci] apply automated fixes

---------

Co-authored-by: openhands <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* add doc

---------

Co-authored-by: XuhuiZhou <[email protected]>
Co-authored-by: openhands <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* Feat/addtional fast apis for non-streaming simulation and managing relationshio (#265)

* temp run

* add relationship api

* fix mypy error

* update relationship api

* simulate episode non-streaming

* modify sim episodes

* add simulation status

* task error

* add background task

* [autofix.ci] apply automated fixes

* back to arun one episode

* upload the code

* use rq to execute background tasks

* temp sol

---------

Co-authored-by: Hao Zhu <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* fix ci error

* solving pytests

* improve the tests

* add custom eval fast api (#268)

* fix mypy error

* aact moderator (#257)

* initial framework

* initial conv

* fix module error

* feat: Add 3 new features to Moderator (#266)

* feat:introduce booting procedure, saving, and ending chat to moderator

* fix: moderator will now ignore none AgentAction, Observations now don't have to include all channels in the mapping

* merge changes of example into the original one

* fix: 1. save() method now accepts push_to_db config 2. booting()'s waiting time is changed to 0.1 sec

* fix: rewrite booting() so that different agent will receive different background information

* fix: moderator now inherits from Node directly, instead of from BaseAgent

---------

Co-authored-by: JXZhou <JXZhou>

* add save condition for moderator

* push to db false

* to fully stop

* stopping all agents

* fix mypy

* fix mypy error

---------

Co-authored-by: JXZhou <[email protected]>

* Deploy the api to modal (#267)

* prototype for modal serving

* add openai secret

* fix type annotation

* add doc

* bug fix for simulation api

* add customize model, evaluator model and evaluation dimensions

* Implement modal API server with Redis integration and FastAPI setup

- Added a new script for the modal API server that initializes a Redis instance.
- Created a persistent volume for Redis data and included a function to download initial data if not present.
- Configured a Docker image with necessary dependencies including Redis Stack and FastAPI.
- Implemented a web API class that sets up and cleans up the Redis connection, ensuring readiness before serving requests.
- Integrated the SotopiaFastAPI application within the modal framework.

---------

Co-authored-by: XuhuiZhou <[email protected]>

* Feature/sotopia demo UI (#261)

* initial

* initial ui

* merge main

* add new ui

* switch to fastAPI

* websocket check

* fix render episode error

* add  page; make a simplified page and still WIP

* [autofix.ci] apply automated fixes

* fix simplified streaming version

* semi-done character page + avatar assets

* Fixed character card styling

* [autofix.ci] apply automated fixes

* unified rendering and chat display

* updated chat character icons

* add some tags

* add typing

* temp fix

* add characters avatar to simulation

* fix episode full avatar

* go to modal config

* clean up code

* add modal streamlit app

* clean codebase except websocket

* remove repeated local css

* clean websocket

* fix get name error

* fix errors

* pre render scenario

* add custom eval

* change streamlit to dynamic path

* new uv

* revert to previous install commands

* a fix for modal

* add customized dimension

* [autofix.ci] apply automated fixes

* sort scenarios in simulation

* for demo video

* update deploy instruction

* update intro page

* update intro page

* [autofix.ci] apply automated fixes

* update intro page

* add customized dimensions

* update api link and modal environment

* move folder

* fix relative import

* update modal image build

* use uv to build environment

* change folder name

* change test

* fix modal serve

* environment change

* refactor

* fix ui

---------

Co-authored-by: Zhe Su <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: astrophie <[email protected]>

* remove dev tag

* add custom eval

* base dimension

* fix ui mypy

* fix mypy

* add delete dimension

* update streamlit ui

* ignores the ui directory

* Committing changes before push

* pytest for eval dimension

* fix mypy

* clean up comments

* run non streaming simulation

* add pytest for websocket

* fix mypy issue

---------

Co-authored-by: Hao Zhu <[email protected]>
Co-authored-by: Zhe Su <[email protected]>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Co-authored-by: openhands <[email protected]>
Co-authored-by: JXZhou <[email protected]>
Co-authored-by: astrophie <[email protected]>
  • Loading branch information
7 people authored Jan 8, 2025
1 parent 61f190e commit d7724db
Show file tree
Hide file tree
Showing 59 changed files with 5,844 additions and 1,171 deletions.
1 change: 1 addition & 0 deletions .github/.codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ ignore:
- ".github" # ignore the .github directory
- "docs" # ignore the tests directory
- "figs" # ignore the figs directory
- "ui" # ignore the ui directory

coverage:
status:
Expand Down
116 changes: 116 additions & 0 deletions docs/pages/concepts/evaluation_dimension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
## Overview

Evaluation dimensions are used to evaluate the quality of social interactions.
In original Sotopia paper, there are 7 dimensions to evaluate the quality of social interactions, where we named them as `sotopia` evaluation dimensions:
- believability
- relationship
- knowledge
- secret
- social rules
- financial and material benefits
- goal

The `SotopiaDimensions` can be used directly without initializing the database. It provides a set of predefined evaluation dimensions that are ready to use for evaluating social interactions. For example,

```python
from sotopia.envs.parallel import ParallelSotopiaEnv
from sotopia.envs.evaluators import EvaluationForTwoAgents, ReachGoalLLMEvaluator, RuleBasedTerminatedEvaluator, SotopiaDimensions

env = ParallelSotopiaEnv(
env_profile=env_profile,
model_name=model_names["env"],
action_order="round-robin",
evaluators=[
RuleBasedTerminatedEvaluator(max_turn_number=20, max_stale_turn=2),
],
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[SotopiaDimensions], # type: ignore
# TODO check how to do type annotation
),
],
)
```


However we observe under many use cases people may want to evaluate with customized evaluation metrics, so we provide a way to build custom evaluation dimensions.
For a quick reference, you can directly check out the `examples/use_custom_dimensions.py`.

### CustomEvaluationDimension
The [`CustomEvaluationDimension`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension.
There are four parameters:
- name: the name of the dimension
- description: the description of the dimension
- range_low: the minimum score of the dimension (should be an integer)
- range_high: the maximum score of the dimension (should be an integer)

### CustomEvaluationDimensionList
The [`CustomEvaluationDimensionList`](/python_API/database/evaluation_dimensions) is a class that can be used to create a custom evaluation dimension list based on the existing dimensions. It helps one to group multiple dimensions together for a specific use case.
There are two parameters:
- name: the name of the dimension list
- dimension_pks: the primary keys of the dimensions in the dimension list

### EvaluationDimensionBuilder
The [`EvaluationDimensionBuilder`](/python_API/database/evaluation_dimensions) is a class that can be used to generate a custom evaluation dimension model based on the existing dimensions.


## Usage
### Initialize the database
The default evaluation metric is still `SotopiaDimensions` in `sotopia.env.evaluators`.There is no `CustomEvaluationDimension` in the database by default. To initialize the database, please refer to `examples/use_custom_dimensions.py`.


### Use the custom evaluation dimensions
After you initialize your customized evaluation dimensions, you can choose to use any one of these methods provided below:

#### Method 1: Choose dimensions by names
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_name(
["transactivity", "verbal_equity"]
)
)
```

#### Method 2: Directly choose the grouped evaluation dimension list
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
)
)
```

#### Method 3: Build a custom evaluation dimension model temporarily
We provide multiple ways to build a custom evaluation dimension model with `EvaluationDimensionBuilder`, specifically:
- `generate_dimension_model`: build an evaluation dimension from existing dimension primary keys.
- `generate_dimension_model_from_dict`: build an evaluation dimension from a dictionary that specifies the parameters of the `CustomEvaluationDimension`. For example
```json
[
{
"name": "believability",
"description": "The believability of the interaction",
"range_low": 0,
"range_high": 10
},
...
]
```
- `select_existing_dimension_model_by_name`: build an evaluation dimension from existing dimension names. For example `['believability', 'goal']`
- `select_existing_dimension_model_by_list_name`: build an evaluation dimension from existing `CustomEvaluationDimensionList` list names. For example, directly use `sotopia`.


After you get the evaluation dimension model, you can pass it as a parameter for the `Evaluator`, for example,
```python
evaluation_dimensions = (
EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
)
)
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[evaluation_dimensions], # type: ignore
),
],
```
2 changes: 1 addition & 1 deletion docs/pages/contribution/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Please refer to [Dev Containers](https://containers.dev/supporting#editors) to s

You can also set up the development environment without Dev Containers. There are three things you will need to set up manually:

- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extra`.
- Python and uv: Please start from an environment supporting Python 3.10+ and install uv using `pip install uv; uv sync --all-extras`. (Note that this will install all the extra dependencies)
- Redis: Please refer to introduction page for the set up of Redis.
- Local LLM (optional): If you don't have access to model endpoints (e.g. OpenAI, Anthropic or others), you can use a local model. You can use Ollama, Llama.cpp, vLLM or many others which support OpenAI compatible endpoints.

Expand Down
6 changes: 6 additions & 0 deletions docs/pages/examples/deployment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Deploy Sotopia Python API to Modal
We offer a script to deploy Sotopia Python API to [Modal](https://modal.com/).
To do so, simply go to the `sotopia/sotopia/ui` directory and run the following command:
```bash
modal deploy sotopia/ui/modal_api_server.py
```
54 changes: 54 additions & 0 deletions docs/pages/python_API/database/evaluation_dimensions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# `evaluation_dimensions.py`

This module provides classes and utilities for defining and managing custom evaluation dimensions within the Sotopia environment. It includes classes for individual dimensions, lists of dimensions, and a builder for creating dimension models.

## Classes

### `CustomEvaluationDimension`

Represents a custom evaluation dimension with specific attributes such as name, description, and score range.

#### Attributes
- `name`: `str`. The name of the dimension.
- `description`: `str`. A brief description of the dimension.
- `range_low`: `int`. The minimum score for the dimension.
- `range_high`: `int`. The maximum score for the dimension.

### `CustomEvaluationDimensionList`

Groups multiple custom evaluation dimensions together.

#### Attributes
- `name`: `str`. The name of the dimension list.
- `dimension_pks`: `list[str]`. A list of primary keys for the dimensions included in the list.

### `EvaluationDimensionBuilder`

Provides utility methods to create and manage evaluation dimension models.

#### Methods
- `create_range_validator(low: int, high: int)`: Creates a validator for score ranges.

**Arguments:**
- `low`: `int`. The minimum score allowed.
- `high`: `int`. The maximum score allowed.

- `build_dimension_model(dimension_ids: list[str])`: Builds a dimension model from primary keys.

**Arguments:**
- `dimension_ids`: `list[str]`. A list of dimension primary keys.

- `build_dimension_model_from_dict(dimensions: list[dict[str, Union[str, int]]])`: Builds a dimension model from a dictionary.

**Arguments:**
- `dimensions`: `list[dict[str, Union[str, int]]]`. A list of dictionaries specifying dimension attributes.

- `select_existing_dimension_model_by_name(dimension_names: list[str])`: Selects a dimension model by dimension names.

**Arguments:**
- `dimension_names`: `list[str]`. A list of dimension names.

- `select_existing_dimension_model_by_list_name(list_name: str)`: Selects a dimension model by list name.

**Arguments:**
- `list_name`: `str`. The name of the dimension list.
17 changes: 16 additions & 1 deletion examples/experiment_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
EnvAgentComboStorage,
EnvironmentProfile,
EpisodeLog,
EvaluationDimensionBuilder,
)
from sotopia.envs.evaluators import (
EvaluationForTwoAgents,
Expand All @@ -34,6 +35,7 @@
)
from sotopia.server import run_async_server
from sotopia_conf.gin_utils import parse_gin_flags, run
# from sotopia.database import EvaluationDimensionBuilder

_DEFAULT_GIN_SEARCH_PATHS = [
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
Expand Down Expand Up @@ -109,6 +111,18 @@ def _iterate_env_agent_combo_not_in_db(
tag: str | None = None,
) -> Generator[EnvAgentCombo[Observation, AgentAction], None, None]:
"""We iterate over each environment and return the **first** env-agent combo that is not in the database."""
# loading evaluation metric
try:
evaluation_dimensions = EvaluationDimensionBuilder.select_existing_dimension_model_by_list_name(
"sotopia"
) # Initialize your customized dimension, please refer to `examples/use_custom_dimensions.py`
except Exception as e:
print(
"No customized evaluation dimensions found, using default SotopiaDimensions",
e,
)
evaluation_dimensions = SotopiaDimensions

if not env_ids:
env_ids = list(EnvironmentProfile.all_pks())
for env_id in env_ids:
Expand Down Expand Up @@ -152,7 +166,8 @@ def _iterate_env_agent_combo_not_in_db(
terminal_evaluators=[
ReachGoalLLMEvaluator(
model_names["env"],
EvaluationForTwoAgents[SotopiaDimensions],
EvaluationForTwoAgents[evaluation_dimensions], # type: ignore
# TODO check how to do type annotation
),
],
)
Expand Down
2 changes: 2 additions & 0 deletions examples/experimental/nodes/initial_message_node.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ def __init__(
input_tick_channel: str,
output_channels: list[str],
env_scenario: str,
node_name: str,
redis_url: str = "redis://localhost:6379/0",
):
super().__init__(
Expand All @@ -26,6 +27,7 @@ def __init__(
(output_channel, Text) for output_channel in output_channels
],
redis_url=redis_url,
node_name=node_name,
)
self.env_scenario = env_scenario
self.output_channels = output_channels
Expand Down
113 changes: 113 additions & 0 deletions examples/experimental/sotopia_original_replica/llm_agent_sotopia.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
import logging
import sys
from rich.logging import RichHandler

from aact import NodeFactory

from sotopia.experimental.agents.base_agent import BaseAgent
from sotopia.experimental.agents.datamodels import Observation, AgentAction

from sotopia.generation_utils import agenerate
from sotopia.generation_utils.generate import StrOutputParser

# Check Python version
if sys.version_info >= (3, 11):
pass
else:
pass

# Configure logging
FORMAT = "%(asctime)s - %(levelname)s - %(name)s - %(message)s"
logging.basicConfig(
level=logging.WARNING,
format=FORMAT,
datefmt="[%X]",
handlers=[RichHandler()],
)


@NodeFactory.register("llm_agent")
class LLMAgent(BaseAgent[Observation, AgentAction]):
def __init__(
self,
input_channels: list[str],
output_channel: str,
query_interval: int,
agent_name: str,
node_name: str,
goal: str,
model_name: str,
redis_url: str,
):
super().__init__(
[(input_channel, Observation) for input_channel in input_channels],
[(output_channel, AgentAction)],
redis_url,
node_name,
)
self.output_channel = output_channel
self.query_interval = query_interval
self.count_ticks = 0
self.message_history: list[Observation] = []
self.name = agent_name
self.model_name = model_name
self.goal = goal

def _format_message_history(self, message_history: list[Observation]) -> str:
## TODO: akhatua Fix the mapping of action to be gramatically correct
return "\n".join(message.to_natural_language() for message in message_history)

async def aact(self, obs: Observation) -> AgentAction:
if obs.turn_number == -1:
return AgentAction(
agent_name=self.name,
output_channel=self.output_channel,
action_type="none",
argument=self.model_name,
)

self.message_history.append(obs)

if len(obs.available_actions) == 1 and "none" in obs.available_actions:
return AgentAction(
agent_name=self.name,
output_channel=self.output_channel,
action_type="none",
argument="",
)
elif len(obs.available_actions) == 1 and "leave" in obs.available_actions:
self.shutdown_event.set()
return AgentAction(
agent_name=self.name,
output_channel=self.output_channel,
action_type="leave",
argument="",
)
else:
history = self._format_message_history(self.message_history)
action: str = await agenerate(
model_name=self.model_name,
template="Imagine that you are a friend of the other persons. Here is the "
"conversation between you and them.\n"
"You are {agent_name} in the conversation.\n"
"{message_history}\n"
"and you plan to {goal}.\n"
"You can choose to interrupt the other person "
"by saying something or not to interrupt by outputting notiong. What would you say? "
"Please only output a sentence or not outputting anything."
"{format_instructions}",
input_values={
"message_history": history,
"goal": self.goal,
"agent_name": self.name,
},
temperature=0.7,
output_parser=StrOutputParser(),
)

return AgentAction(
agent_name=self.name,
output_channel=self.output_channel,
action_type="speak",
argument=action,
)
Loading

0 comments on commit d7724db

Please sign in to comment.