Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Ability to define the timestamp part for generation functions #150

Open
attevaltojarvi opened this issue Apr 9, 2024 · 4 comments

Comments

@attevaltojarvi
Copy link

Hi, and thanks for this package!

I'm proposing an update for the uuid6, uuid7 and uuid8 functions, where you could optionally specify the timestamp that gets used when generating the UUID value. For example for the uuid7 function:

def uuid7(timestamp_ms: int = None) -> UUID:
    global _last_v7_timestamp

    if timestamp_ms is None:
        nanoseconds = time.time_ns()
        timestamp_ms = nanoseconds // 10**6
    # (rest of function)

I haven't checked whether this isn't allowed in the spec, but I feel that this would be really useful in situations where you need to generate UUIDs for historical data, where you have the records' creation timestamp available:

# Django model example

for obj in Model.objects.iterator():
    timestamp = calendar.timegm(obj.created_at.utctimetuple())
    timestamp_ms = timestamp * 10**3
    obj.new_id = uuid7(timestamp_ms)
    obj.save()

This would allow for updating a system to start creating new records with the current timestamp, and a data migration for historical data, retaining the sortability by the UUID timestamp part.

Thanks in advance!

@oittaa
Copy link
Owner

oittaa commented Jul 10, 2024

Sorry I hadn't checked GitHub in a moment. While these options sound like a nice idea, I'm a bit worried that people would misuse these functions. v6 has a weird offset by Microsoft, v7 uses milliseconds since epoch, v8 nanoseconds... Does anyone have suggestions how to reasonably avoid disasters like mixing nanoseconds and milliseconds?

@attevaltojarvi
Copy link
Author

I personally think that the function signatures should just be clearly defined on which type they expect to receive:

def uuid7(at_milliseconds: int = None) -> UUID:
    ...
    

def uuid8(at_nanoseconds: int = None) -> UUID:
    ...

Getting the order of magnitude wrong is just a bad user error you can make with any other 3rd party library.

@mfresonke-work
Copy link

mfresonke-work commented Oct 3, 2024

Does anyone have suggestions how to reasonably avoid disasters like mixing nanoseconds and milliseconds?

@oittaa This is a fair argument. While I do agree with @attevaltojarvi that there's only so much you can do, I have found using parameter args in JS a good counter to this, as it's self documenting. Seems like you can do something similar in python?

def uuid7(*, unix_ms: int = None):
   # generation code goes here

def uuid8(*, unix_nanos: int = None):
   # generation code goes here

That would at least require the caller to blatantly ignore the fact it says _ms or _nanos when calling it with incorrect data.

uuid7(unix_ms=1727920979122)
uuid7(unix_nanos=1727921429461971000)

Additionally, you could also do a sanity check on the value , but I understand that is not a perfect solution.

@attevaltojarvi
Copy link
Author

Explicitly specifying keyword arguments is definitely a good idea. unix_ts_millis and unix_ts_nanos could be good names for them. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants