Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exploring best solution when generating custom plots: export get_data() or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

Open
acasadevall opened this issue Aug 26, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@acasadevall
Copy link
Collaborator

Overall I see class campaign does not have an easy way to export generated data once campaign has finalized the running part.

Use case:
I want to plot a custom graph which requires rearranging my data, adding other columns, etc., and then adding specific plotting features.

Issue:
Current campaign has campaign.generate_graph(...) which offers a straightforward solution to generate graphs based on x,y,hue params (seaborn/pandas style). This might be enough but for other more customized graphs requires adding pre/post callbacks. Example: composition of graphs, having FacetGrid vs non-FacetGrid.

Possible solutions

  • (manually) Adding something like campaign.get_data() to get raw generated data in DataFrame (pandas) form. Example:
    output_data, gen_path = campaign.get_data() # <-- here we can also add data_frame callback similarly to current generate_graph approach
    # adding custom plot
    g = sns.catplot(data=processed_output, kind='bar', x='..', y='..', hue='..', palette='..', ...)
    g.fig.get_axes()[0].set_title("Title")
    g.set(ylabel="...", xlabel="...")
    g.fig.get_axes()[0].set_yscale('log')
    # saving using output path generated by benchkit/campaign
    g.fig.savefig(f"{fig_path}.png", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.png"')
    g.fig.savefig(f"{fig_path}.pdf", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.pdf"')

-- PROS: add post-process in the campaign
-- CONS: mix of responsibilities. current campaign class already has dependencies with Seaborn/Pandas when generating graph. Maybe campaign.get_data() should only return csv data rather than Pandas.

  • (add more complexity into campaign.generate_graph) Adding more callbacks (pre/post) to add specific calls to the pipeline:
    campaign.generate_graph(
        plot_name="catplot",
        kind="bar",
        orient='v',
        x="...",
        y="...",
        hue="...",
        palette="...",
        ...,
        process_dataframe=df_callback,
        **graph_callback=post_graph_callback**
    )

-- PROS: already used in benchkit, no more methods are needed
-- CONS: adding more callbacks means adding more complexity. We cannot generate wrappers of wrappers to support custom plots. Generating graphs using campaign.generate_graph should not have more complexity than using standard Seaborn/Matplotlib way.

  • (out of benchkit) Do a post-process afterwards on the csv/json files that are generated. This seems to be fair solution, but someone could it would be good to have only one pipeline from benchkit already.
@acasadevall acasadevall added enhancement New feature or request question Further information is requested labels Aug 26, 2024
@open-s4c open-s4c deleted a comment Aug 26, 2024
@apaolillo
Copy link
Collaborator

I think returning the pandas DataFrame is a reasonable request. I can add that and the rest remains valid.

@apaolillo apaolillo self-assigned this Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants