Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving bulk metadata issues #4259

Open
nschneid opened this issue Dec 29, 2024 · 4 comments
Open

Improving bulk metadata issues #4259

nschneid opened this issue Dec 29, 2024 · 4 comments
Assignees
Milestone

Comments

@nschneid
Copy link
Contributor

Showing the paper URL and snapshot is fantastic, but it would also be helpful to know how the metadata actually changed to verify that the change is worth accepting.

Brainstorming how this might work:

  1. If the title has changed, list both old and new titles, highlighting the differences.
  2. If the author list has changed:
    2.1 Render the author list as a string by separating authors with commas, and for any name containing spaces in the first and/or last part, use 3 spaces rather than 1 to separate the two parts so the boundary is unambiguous.
    2.2 If author names have changed at all, including adding or removing authors, print both old and new author lists, highlighting the diff. If the author names have not changed, only their order, list the change as "Corrected author order:" and the updated string. That is easy to check against the PDF snapshot.
    2.3 If author affiliations have changed, do we want to review them? They are not actually exposed in the site at all.
  3. If the abstract has changed: printing both old and new abstracts might be too much clutter. Is there a compact way to show only their diff?
@mbollmann
Copy link
Member

I would know how to do this in a Python script, but not necessarily a Github action ... but I also agree that more context would be helpful.

@nschneid
Copy link
Contributor Author

Actually, could we take advantage of the built-in comment edit feature?
image

If the bot were to first create the issue with the original JSON, could it then immediately edit it to substitute the new values, exposing the diff in the edit history?

@mjpost
Copy link
Member

mjpost commented Jan 2, 2025

I propose to add an old key that has the previous values of any changed items, which would allow the above to be done using a token-level diff. How did you cross out the above?

@mjpost mjpost added this to the 2025Q1 milestone Jan 2, 2025
@nschneid
Copy link
Contributor Author

nschneid commented Jan 2, 2025

This was viewing the edits to a comment.

@mjpost mjpost pinned this issue Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants