GH-15677 Fix Match function bug #15896

maurever · 2023-11-01T10:35:15Z

Issue: #15677

TODO:

Do not add new parameter indexes, fix the bug in our implementation according to what R::match does. Using the iris dataset and calling match with c(“setosa”, “versicolor”) should generate a new column with value 1 assigned to rows with “setosa” and 2 assigned to rows with “versicolor”). The answer should be the same with all clients. This implies that R and Python should generate the same column values.
The default for nomatch should be NaN for both R and Python. In addition, only allow NaN or other numerical values for nomatch values. No string is supported.
Fix the documentation to reflect the changes: nomatch is defaulted to NaN and can only be numerical values. Remove the last sentences for incomparables.
Add parameter start_index with default value = 1, if a user wants to change indexing from 0 for example
Fix %in% to works as in base library but in different PR GH-15677 Fix %in% function #15929
Add better example in doc

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java

sebhrusen

Looks good @maurever.
Added some suggestions for doc + concern about sorted numerical values.

h2o-py/h2o/frame.py

h2o-r/h2o-package/R/frame.R

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java

maurever · 2023-11-17T14:25:04Z

@hannah-tillman Could you check the documentation of this feature, please? Thanks!

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java

hannah-tillman

made a few docs updates -- good on my end :)

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java

sebhrusen

looking good, thanks @maurever
Apparently, you still need to rebase to resolve conflicts, so I will approve next time.

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java

h2o-py/h2o/frame.py

h2o-py/tests/testdir_apis/Data_Manipulation/pyunit_h2oH2OFrame_match.py

maurever · 2023-12-01T12:36:02Z

Thanks, @sebhrusen; the PR is updated based on your suggestion and rebased.

h2o-py/h2o/frame.py

sebhrusen

tested the Py doc generation with the result rendered in the example, it works nicely!
Also updated the suggestion to use real data from the example.

Maybe same could be done in R.

h2o-py/h2o/frame.py

maurever · 2023-12-05T14:30:43Z

@hannah-tillman, please check that the documentation is generated correctly. @sebhrusen also suggested adding output into the R example, but I think it will not work in the R. If you have any idea how to improve the R example to be similar to the Python example, please let me know (or feel free to commit this change here). Thanks!

sebhrusen

LGTM, thanks @maurever

I took the initiative to commit a small fix in the Py example.
Regarding the R example, looking at R doc, I don't think it's possible to render the results the same way as we can in Py, common practice seems to be using a comment, like:

sample ##--> describe shortly what is expected when this is printed

for example, in this case

sample ##--> `match` column should be made of `1` for `setosa`, `2` for `versicolor` and `NaN` for `virginica`

hannah-tillman · 2023-12-05T18:54:00Z

@maurever The python docs build fine 👍

For the R documentation, I went through the current R guide to find an example of something adding the output because I didn’t really remember that happening. What I was able to find was a rare case of output being added as a comment, so we could do the same for this? Would look like this:

#' \dontrun{
#' h2o.init()
#' data <- as.h2o(iris)
#' match_col <- h2o.match(data$Species, c("setosa", "versicolor", "setosa"))
#' iris_match <- h2o.cbind(data, match_col)
#' sample <- h2o.splitFrame(iris_match, ratios=0.05, seed=1)[1]
#' sample
#' # [[1]]
#' #   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species C1
#' # 1          5.2         3.5          1.5         0.2     setosa  1
#' # 2          5.0         3.5          1.3         0.3     setosa  1
#' # 3          7.0         3.2          4.7         1.4 versicolor  1
#' # 4          4.9         2.4          3.3         1.0 versicolor  1
#' # 5          5.5         2.4          3.8         1.1 versicolor  1
#' # 6          5.8         2.7          5.1         1.9  virginica  0
#' #
#' # [12 rows x 6 columns] 
#' }

This is the only example I was able to find for output in the R docs, though, so it is currently the only idea i have. Let me know if this is what you want.

…o-3 into maurever_GH-15677_fix_match_bug

maurever · 2023-12-11T10:03:51Z

Thanks, @seb and @hannah-tillman. The PR is now ready for review.

sebhrusen

LG, thank you @maurever !

maurever self-assigned this Nov 1, 2023

maurever marked this pull request as draft November 1, 2023 14:01

maurever mentioned this pull request Nov 1, 2023

h2o.match only returns 1 and nomatch #15677

Closed

maurever changed the title ~~GH-15677 Fix Match function bug POC~~ GH-15677 Fix Match function bug Nov 1, 2023

maurever added this to the 3.46.0.1 milestone Nov 1, 2023

wendycwong suggested changes Nov 9, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

maurever requested review from wendycwong, sebhrusen and mn-mikke November 16, 2023 13:15

sebhrusen reviewed Nov 16, 2023

View reviewed changes

h2o-py/h2o/frame.py Outdated Show resolved Hide resolved

h2o-r/h2o-package/R/frame.R Outdated Show resolved Hide resolved

h2o-r/h2o-package/R/frame.R Outdated Show resolved Hide resolved

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Show resolved Hide resolved

maurever requested a review from sebhrusen November 17, 2023 14:17

maurever commented Nov 17, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Show resolved Hide resolved

maurever marked this pull request as ready for review November 17, 2023 14:24

maurever requested a review from hannah-tillman November 17, 2023 14:24

sebhrusen reviewed Nov 17, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

hannah-tillman previously approved these changes Nov 17, 2023

View reviewed changes

maurever dismissed hannah-tillman’s stale review via 6f7d17d November 20, 2023 14:28

sebhrusen reviewed Nov 20, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

sebhrusen reviewed Nov 20, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

sebhrusen reviewed Nov 20, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

maurever requested a review from sebhrusen November 22, 2023 08:49

sebhrusen reviewed Dec 1, 2023

View reviewed changes

h2o-core/src/main/java/water/rapids/ast/prims/search/AstMatch.java Outdated Show resolved Hide resolved

h2o-py/h2o/frame.py Outdated Show resolved Hide resolved

h2o-py/tests/testdir_apis/Data_Manipulation/pyunit_h2oH2OFrame_match.py Show resolved Hide resolved

maurever and others added 7 commits December 1, 2023 13:25

Implement the user suggestion POC

74ace2f

Fix R match function

d0a3024

implement match function

397031d

improve doc

a2ec040

Fix bugs, add better example, remove %in% changes

22e1cf6

ht/docs updates

881170e

Return sorting, implement mapping of indexes

c2ce574

maurever added 3 commits December 1, 2023 13:32

fix isin function

9678594

Implement case where match values are not unique

2570c07

Improve doc and tests

4d30c38

maurever force-pushed the maurever_GH-15677_fix_match_bug branch from e6ed78c to 4d30c38 Compare December 1, 2023 12:33

maurever requested a review from sebhrusen December 1, 2023 12:36

fix R and Py example

e18157f

wendycwong previously approved these changes Dec 1, 2023

View reviewed changes

Unify R and Python examples

d33e3b4

maurever dismissed wendycwong’s stale review via d33e3b4 December 4, 2023 14:05

sebhrusen reviewed Dec 4, 2023

View reviewed changes

h2o-py/h2o/frame.py Show resolved Hide resolved

sebhrusen reviewed Dec 4, 2023

View reviewed changes

h2o-py/h2o/frame.py Outdated Show resolved Hide resolved

Fix Ptyhon example

f344c3b

maurever requested review from hannah-tillman, sebhrusen and wendycwong December 5, 2023 14:30

fix column name in example

9ce28be

sebhrusen previously approved these changes Dec 5, 2023

View reviewed changes

maurever added 2 commits December 11, 2023 10:58

Improve R example

9b0e720

Merge branch 'maurever_GH-15677_fix_match_bug' of github.com:h2oai/h2…

fe9224d

…o-3 into maurever_GH-15677_fix_match_bug

maurever dismissed sebhrusen’s stale review via fe9224d December 11, 2023 10:01

maurever requested a review from sebhrusen December 11, 2023 10:04

sebhrusen approved these changes Dec 11, 2023

View reviewed changes

maurever merged commit 528edf3 into master Dec 13, 2023
63 of 68 checks passed

maurever deleted the maurever_GH-15677_fix_match_bug branch December 13, 2023 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-15677 Fix Match function bug #15896

GH-15677 Fix Match function bug #15896

maurever commented Nov 1, 2023 •

edited

Loading

sebhrusen left a comment

maurever commented Nov 17, 2023 •

edited

Loading

hannah-tillman left a comment

sebhrusen left a comment

maurever commented Dec 1, 2023

sebhrusen left a comment

maurever commented Dec 5, 2023

sebhrusen left a comment •

edited

Loading

hannah-tillman commented Dec 5, 2023

maurever commented Dec 11, 2023

sebhrusen left a comment

GH-15677 Fix Match function bug #15896

GH-15677 Fix Match function bug #15896

Conversation

maurever commented Nov 1, 2023 • edited Loading

sebhrusen left a comment

Choose a reason for hiding this comment

maurever commented Nov 17, 2023 • edited Loading

hannah-tillman left a comment

Choose a reason for hiding this comment

sebhrusen left a comment

Choose a reason for hiding this comment

maurever commented Dec 1, 2023

sebhrusen left a comment

Choose a reason for hiding this comment

maurever commented Dec 5, 2023

sebhrusen left a comment • edited Loading

Choose a reason for hiding this comment

hannah-tillman commented Dec 5, 2023

maurever commented Dec 11, 2023

sebhrusen left a comment

Choose a reason for hiding this comment

maurever commented Nov 1, 2023 •

edited

Loading

maurever commented Nov 17, 2023 •

edited

Loading

sebhrusen left a comment •

edited

Loading