Exploring best solution when generating custom plots: export `get_data()` or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

acasadevall · 2024-08-26T14:04:29Z

Overall I see class campaign does not have an easy way to export generated data once campaign has finalized the running part.

Use case:
I want to plot a custom graph which requires rearranging my data, adding other columns, etc., and then adding specific plotting features.

Issue:
Current campaign has campaign.generate_graph(...) which offers a straightforward solution to generate graphs based on x,y,hue params (seaborn/pandas style). This might be enough but for other more customized graphs requires adding pre/post callbacks. Example: composition of graphs, having FacetGrid vs non-FacetGrid.

Possible solutions

(manually) Adding something like campaign.get_data() to get raw generated data in DataFrame (pandas) form. Example:

    output_data, gen_path = campaign.get_data() # <-- here we can also add data_frame callback similarly to current generate_graph approach
    # adding custom plot
    g = sns.catplot(data=processed_output, kind='bar', x='..', y='..', hue='..', palette='..', ...)
    g.fig.get_axes()[0].set_title("Title")
    g.set(ylabel="...", xlabel="...")
    g.fig.get_axes()[0].set_yscale('log')
    # saving using output path generated by benchkit/campaign
    g.fig.savefig(f"{fig_path}.png", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.png"')
    g.fig.savefig(f"{fig_path}.pdf", transparent=False)
    print(f'[INFO] Saving campaign figure in "{gen_path }.pdf"')

-- PROS: add post-process in the campaign
-- CONS: mix of responsibilities. current campaign class already has dependencies with Seaborn/Pandas when generating graph. Maybe campaign.get_data() should only return csv data rather than Pandas.

(add more complexity into campaign.generate_graph) Adding more callbacks (pre/post) to add specific calls to the pipeline:

    campaign.generate_graph(
        plot_name="catplot",
        kind="bar",
        orient='v',
        x="...",
        y="...",
        hue="...",
        palette="...",
        ...,
        process_dataframe=df_callback,
        **graph_callback=post_graph_callback**
    )

-- PROS: already used in benchkit, no more methods are needed
-- CONS: adding more callbacks means adding more complexity. We cannot generate wrappers of wrappers to support custom plots. Generating graphs using campaign.generate_graph should not have more complexity than using standard Seaborn/Matplotlib way.

(out of benchkit) Do a post-process afterwards on the csv/json files that are generated. This seems to be fair solution, but someone could it would be good to have only one pipeline from benchkit already.

The text was updated successfully, but these errors were encountered:

apaolillo · 2024-08-26T14:25:37Z

I think returning the pandas DataFrame is a reasonable request. I can add that and the rest remains valid.

acasadevall added enhancement New feature or request question Further information is requested labels Aug 26, 2024

open-s4c deleted a comment Aug 26, 2024

apaolillo self-assigned this Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring best solution when generating custom plots: export `get_data()` or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

Exploring best solution when generating custom plots: export `get_data()` or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

acasadevall commented Aug 26, 2024

apaolillo commented Aug 26, 2024

Exploring best solution when generating custom plots: export get_data() or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

Exploring best solution when generating custom plots: export get_data() or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

Comments

acasadevall commented Aug 26, 2024

apaolillo commented Aug 26, 2024

Exploring best solution when generating custom plots: export `get_data()` or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99

Exploring best solution when generating custom plots: export `get_data()` or similar from campaign to manually plot, modify, or rearrange data after camapaign execution #99