Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored reward calculation methods #268

Open
wants to merge 3 commits into
base: refactor-validator
Choose a base branch
from

Conversation

Sarkosos
Copy link
Contributor

@Sarkosos Sarkosos commented Nov 24, 2024

Refactored rewards so that we have a base reward class so that we can implement the rewarding for different tasks in the future

extra_info=batch_rewards_output.extra_info,
)

async def calculate_final_reward(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your flow here, but I think it would make more sense to have class OrganicReward(BaseReward) so you don't have to have any of this baked in logic for checking if a set of data comes from a specific source. It would give us more flexibility, but this is a good start

if job.event["is_organic"]:
organic_multiplier = 10.0

return rewards * priority_multiplier * organic_multiplier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like different reward pipelines should come with formulaic constructions of what their reward should be based on priority and if it's organic or not. Not sure, but this is what my gut is saying. In the case of different challenges, this delineation is much more clear, but when it's just is/is_not organics, I guess it's more difficult.

Is there any merit in doing:

class FoldingReward(BaseReward): ...
class SyntheticFoldingReward(FoldingReward)
class OrganicFoldingReward(FoldingReward) 

This way, you can easily call what you need and you don't need to be restricted to looking for tags in the job event, in case things change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you can just construct the correct pipeline based on the entry point of the query

async def calculate_final_reward(
self, rewards: torch.Tensor, job: Job
) -> torch.Tensor:
# priority_multiplier = 1 + (job.priority - 1) * 0.1 TODO: Implement priority
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotta implement this once the global job pool is on the horizon

organic_multiplier = 1.0
if "is_organic" in job.event.keys():
if job.event["is_organic"]:
organic_multiplier = 10.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a parameter set in the config, same for the priority multiplier

@mccrindlebrian mccrindlebrian changed the base branch from staging to refactor-validator January 27, 2025 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants