Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when using user features? #40

Open
erlebach opened this issue Jun 30, 2022 · 3 comments
Open

Bug when using user features? #40

erlebach opened this issue Jun 30, 2022 · 3 comments

Comments

@erlebach
Copy link

When running fit() with user features, I get the error:

KeyError: 'the users in [user_features] do not match the users in [interactions]'

which has been reported previously. In my case, I did some debugging in the source code, and found the following. In the function _init_interactions, one finds the statement:

            if np.array_equal(sorted(x_uf.index.values), self.user_idx):
                self.x_uf = np.ascontiguousarray(x_uf.sort_index(), dtype=np.float32)
            else:
                raise KeyError('the users in [user_features] do not match the users in [interactions]')

which is the error in question. Looking at the definition of self.user_idx, one finds, in the same file rankfm.py:

        # store unique values of user/item indexes and observed interactions for each user
        self.user_idx = np.arange(len(self.user_id), dtype=np.int32)
        self.item_idx = np.arange(len(self.item_id), dtype=np.int32)

near line 128. Clearly, self.user_idx are consecutive indexes 0,1,2, ... up to the number of user ids.
However, sorted(x_uf.index.values) is the sorted list of user ids. Thus, the two lists cannot be equal. The code that leads me to this conclusions is:

        if user_features is not None:
            x_uf = pd.DataFrame(user_features.copy())
            x_uf = x_uf.set_index(x_uf.columns[0])
            x_uf.index = x_uf.index.map(self.user_to_index)
            if np.array_equal(sorted(x_uf.index.values), self.user_idx):

As far as I understand, the first column of user_features, which is an argument to the function, should be the actual user_id, which can be anything, as long as it does not appear twice in the dataframe. In this case, the conditional (last line) can not be satisfied.
Therefore, I must not understand the data format of user_features. Where is this explained? The documentation states the following:

user_features – dataframe of user metadata features: [user_id, uf_1, … , uf_n]

with no additional information regarding the values of user_id. Any clarification would be most welcome!

@erlebach erlebach changed the title But with user features? Bug when using user features? Jun 30, 2022
@erlebach
Copy link
Author

Please ignore the question. I forgot to remove duplicate member entries in the user_feature matrix.

@srinivascnu166
Copy link

Hi, I have faced the same issue. Can you please provide the details of how you solved it.
It would be great if you can share how you have formatted the data for user_features.

@erlebach
Copy link
Author

Hi @srinivascnu166,
All I did was make sure of two things: (to be checked independently for item and user features)

  1. there should be no duplicate rows, i.e., no duplicate items in the item feature list
  2. the list of unique items derived from the user/item list should be the same as the list of unique items derived from the item attribute list.

Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants