Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC for SortBy #178

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Add RFC for SortBy #178

wants to merge 7 commits into from

Conversation

damon-kwok
Copy link

This PR adds my RFC for SortBy primitive.

@SeanTAllen
Copy link
Member

@damon-kwok can you expand the motivation to explain why, given the existence of Sort that this would be valuable?

Additionally as part the detailed design, you should note there that is an existing implementation at X location. That information is currently hidden in the "How we test".

@damon-kwok
Copy link
Author

@SeanTAllen Updated.

@SeanTAllen
Copy link
Member

@damon-kwok "is not beginner friendly" is a statement. You need as part of your motivation to demonstrate how it isn't beginner friendly and how SortBy is.

@aturley
Copy link
Member

aturley commented Jul 16, 2020

@damon-kwok I think the second argument is really the strong point here. SortBy would let you sort objects that don't implement Comparable. Additionally, it lets you define arbitrary sort orderings so that you aren't constrained by the one chosen by Comparable.

"Beginner friendliness" is kind of arbitrary and may not be a point that you really want to try to argue.

@damon-kwok
Copy link
Author

@aturley Yes, i think you are correct.

@EpicEric
Copy link

EpicEric commented Jul 16, 2020

Copying my comments from the original Zulip thread:

My one qualm is the comparator signature being {(B): USize} whereas you'd usually have {(B, B): ISize} in other languages, and return a numeric value that represents the order of the two elements (-1, 0, or 1)

The former forces you to think of how to assign a numeric value for every element, which doesn't make sense for example if you're sorting strings alphabetically

Another possibility would be {(B, B): Bool} representing a gt implementation. This could also use the interface Comparable methods as a default.

And one example that more closely matches the collections library (ComparableSort would be to Sort what HashMap is to Map, for example):

https://playground.ponylang.io/?gist=4fe7e1b7403332e25a997ad29e482ea7

@damon-kwok
Copy link
Author

Copying zulip conversation context:

04:55 DamonKwok
I think this may be complicated. My idea comes from Hashable, it only needs a fun hash (): USize. so, a lambda is work.
Because a number already has all the interfaces you need.

04:58 Epic Eric
Yeah I had to copy all methods but you could probably do with a single comparator like lt

@jemc
Copy link
Member

jemc commented Jul 16, 2020

I agree with the statements above from @EpicEric - the lambda should be given the elements to compare. Coming up with an object -> integer mapping is not always feasible for the things I may want to sort.

Apart from the specific concerns that @EpicEric mentioned above, I want to add more concerns into the mix:

Imagine I have an object that I want to sort based on comparing multiple fields. In order to convert them to a USize, I'd have to first convert each one to a numeric representation, then do some bitshifting and bitmasking magic to make all of those numeric values fit into the USize in the correct order. This is more complicated and in many cases probably less performant because I'd have to eagerly consider all fields whereas a compare lambda would let me return early as soon as one of the earlier fields was found to be greater/lesser, and only consider later fields in the rarer case that all of the earlier fields are equal.

Further, this approach of converting all fields of the object into a numeric representation and squeezing them into a USize will not be physically possible for some objects, depending on the sizes (and relevant information density) of those fields.

To use a real world example from code I have written, consider a Time class that is implemented with two fields:

  • total_seconds: U64 - The total number of seconds since 0001-01-01 00:00:00.0 UTC
  • nanosecond: U32 - The number of nanoseconds past the designated second

There are 96 bits of information to consider for sorting purposes, and I'll never be able to fit all of those bits into a USize, which is 64 bits on some platforms and 32 bits on others. And that's an example with only two fields.

@SeanTAllen
Copy link
Member

@damon-kwok what "real world" code can you point to that is using this SortBy? I think knowing that could help pinpoint additional tests, usage patterns that should be experimented with before this would be considered for the standard library.

@damon-kwok
Copy link
Author

@jemc If you have an extreme case that Usize cannot represent, that's not what SortBy is dealing with. SortBy focuses on conditional evaluations rather than solving all problems.
See:

#Drawbacks
The user must ensure that the lambda evaluation function is valid.

@damon-kwok
Copy link
Author

damon-kwok commented Jul 16, 2020

@SeanTAllen Many real world cases:

Leveldb manager:

  • sort by key length
  • sort by memory used
  • sort by recent update

A blog system,

  • sort by create date
  • sort by last update time.

A set of JSON data representing the product information

sort by storage date
sort by inventory number

@damon-kwok
Copy link
Author

Focusing on existing types without compareable implementations.
The comparison problem of complex types is reduced to a number comparison problem.
This is what Sortby is going to do.

@chalcolith
Copy link
Member

I agree with @EpicEric and @jemc; I would prefer a lambda that takes two objects of the given type.

@damon-kwok
Copy link
Author

But that's not what I want to do. Maybe someone should add a new RFC for it.

@damon-kwok
Copy link
Author

I named it SortBy rather than any other name, hoping that it would be as simple and elegant as possible, and that no new types should be added for sorting. Otherwise it should be called something else.

@EpicEric
Copy link

EpicEric commented Jul 16, 2020

With a generic lambda approach like the one I showed above, adding a SortBy structure can be done in 10 lines:

primitive SortBy[A: Seq[B] ref, B: Any #read]
  fun apply(seq: A, f: {(B): USize} val): A^ =>
    Sort[A, B, SortFunction[B]](
      seq,
      object val is SortFunction[B]
        fun lt(first: B, second: B): Bool => f(first) < f(second)
        fun le(first: B, second: B): Bool => f(first) <= f(second)
        fun gt(first: B, second: B): Bool => f(first) > f(second)
        fun ge(first: B, second: B): Bool => f(first) >= f(second)
      end)

actor Main
  new create(env:Env) =>
    let array = [ "aa"; "aaa"; "a" ]
    SortBy[Array[String], String](array, {(x: String): USize => x.size() })
    for e in array.values() do
      env.out.print(e) // prints "a \n aa \n aaa"
    end

https://playground.ponylang.io/?gist=144512d4a8417b60452e3084b4c3f32d


As a side note: due to the need for a concrete type, you can't avoid adding the Array[String] type parameter in this case, matching the current Sort type. So either way, the RFC must be fixed to take that into account.

@damon-kwok
Copy link
Author

@EpicEric I have a few questions:

  1. How does the extra 10 lines of code solve the problems raised by @jemc ?
  2. Your SortBy seems to do the same thing as my SortBy, but it requires the overhead of 2 Object Literals?
  3. I don't quite know your intention. My SortBy is a new primitive, it will not interfere with the existing Sort, but you seem to want to modify Sort's implementation. Is that right?

@damon-kwok
Copy link
Author

@EpicEric Can you create a temp git repository and submit a sort*.pony file and a _test.pony file that can be used by others? This can help others more clearly compare the difference between the two.

@EpicEric
Copy link

EpicEric commented Jul 17, 2020

  1. How does the extra 10 lines of code solve the problems raised by @jemc ?

It doesn't. You'd create a sort function using the Sort implementation directly instead, without using SortBy.

primitive TimeSortFunction is SortFunction[Time]
  fun lt(first: Time, second: Time): Bool =>
    (first.total_seconds < second.total_seconds) || (
      (first.total_seconds == second.total_seconds)
        && (first.nanosecond < second.nanosecond))

  fun gt(first: Time, second: Time): Bool =>
    (first.total_seconds > second.total_seconds) || (
      (first.total_seconds == second.total_seconds)
        && first.nanosecond > second.nanosecond))

  fun le(first: Time, second: Time): Bool =>
    (first.total_seconds < second.total_seconds) || (
      (first.total_seconds == second.total_seconds)
        && first.nanosecond <= second.nanosecond))

  fun ge(first: Time, second: Time): Bool =>
    (first.total_seconds > second.total_seconds) || (
      (first.total_seconds == second.total_seconds)
        && (first.nanosecond >= second.nanosecond))

primitive TimeSort[A: Seq[Time] ref]
  fun apply(a: A): A^ =>
    Sort[A, Time, TimeSortFunction](a, TimeSort)
  1. Your SortBy seems to do the same thing as my SortBy, but it requires the overhead of 2 Object Literals?

This RFC doesn't have any code implementation, so I'm assuming you refer to this code. In this case, yes, there is one extra anonymous object allocation.

  1. I don't quite know your intention. My SortBy is a new primitive, it will not interfere with the existing Sort, but you seem to want to modify Sort's implementation. Is that right?

This is only if API names have to follow the same convention as the rest of the collections package. Compatibility could be maintained if Sort and ComparableSort are renamed to, say, RawSort and Sort respectively.

@damon-kwok
Copy link
Author

@EpicEric Thanks! I see. I know the process of adding RFC is long and we can listen to other people's ideas.

@jasoncarr0
Copy link

Do you still want to add this RFC?

While this may be a reasonable addition, I feel that it violates least surprise as it exists. Specifically:

  • Most of the comments here seemed to have expected from the name that it would be a comparison function
  • The comparable interface is based on comparison functions, rather than valuations
  • This is fixed to be USize, rather than any other type (and worse, the information available in USize is system-dependent)

It does not appear to me that a comparison-based solution has a very significant difference in allocation. If it just requires a single method (such as compare), then the allocation characteristics will be identical to a lambda returning USize. Of course as with HashFunctions we can have a generic primitive to dispatch without allocation in the general case

@damon-kwok
Copy link
Author

@jasoncarr0 Do you mean to change USize to U128 or I128?

@jasoncarr0
Copy link

I would actually expect U64. The issue with USize is that the maximum value depends on the system it runs on.

@damon-kwok
Copy link
Author

@jasoncarr0 Updated.

Base automatically changed from master to main February 8, 2021 22:15
@esoterra
Copy link

I agree with the commenters that I would expect something called SortBy to use a comparator.
If that's not possible it should at least use a key which can be complex (e.g. tuple) and support lexicographic sorting.

Here's a survey of languages sorting support

  • JavaScript
  • C++
    • Offers sorting with comparison functions via std::sort
  • Python
    • Sorting is stable for partial orderings and lexicographic for tuples
    • Sort using key available (since Py2.4) where the key can be tuples
    • Before Py3.0, sort offered comparison functions using cmp parameter
  • Java
    • Allows simple sorting and defining a sorting for a type via Comparable
    • Also allows custom sorts to be provided using the Comparator class
  • Rust
    • Offers sorting and searching based on trait std::cmp::Ord
    • Allows sorting of slices using sort_by which takes a comparator.
    • The newtype zero cost abstraction can be used to wrap a type with a new sorting behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants