Bug when using user features? #40

erlebach · 2022-06-30T18:00:09Z

When running fit() with user features, I get the error:

KeyError: 'the users in [user_features] do not match the users in [interactions]'

which has been reported previously. In my case, I did some debugging in the source code, and found the following. In the function _init_interactions, one finds the statement:

            if np.array_equal(sorted(x_uf.index.values), self.user_idx):
                self.x_uf = np.ascontiguousarray(x_uf.sort_index(), dtype=np.float32)
            else:
                raise KeyError('the users in [user_features] do not match the users in [interactions]')

which is the error in question. Looking at the definition of self.user_idx, one finds, in the same file rankfm.py:

        # store unique values of user/item indexes and observed interactions for each user
        self.user_idx = np.arange(len(self.user_id), dtype=np.int32)
        self.item_idx = np.arange(len(self.item_id), dtype=np.int32)

near line 128. Clearly, self.user_idx are consecutive indexes 0,1,2, ... up to the number of user ids.
However, sorted(x_uf.index.values) is the sorted list of user ids. Thus, the two lists cannot be equal. The code that leads me to this conclusions is:

        if user_features is not None:
            x_uf = pd.DataFrame(user_features.copy())
            x_uf = x_uf.set_index(x_uf.columns[0])
            x_uf.index = x_uf.index.map(self.user_to_index)
            if np.array_equal(sorted(x_uf.index.values), self.user_idx):

As far as I understand, the first column of user_features, which is an argument to the function, should be the actual user_id, which can be anything, as long as it does not appear twice in the dataframe. In this case, the conditional (last line) can not be satisfied.
Therefore, I must not understand the data format of user_features. Where is this explained? The documentation states the following:

user_features – dataframe of user metadata features: [user_id, uf_1, … , uf_n]

with no additional information regarding the values of user_id. Any clarification would be most welcome!

The text was updated successfully, but these errors were encountered:

erlebach · 2022-06-30T20:06:59Z

Please ignore the question. I forgot to remove duplicate member entries in the user_feature matrix.

srinivascnu166 · 2022-08-01T11:33:29Z

Hi, I have faced the same issue. Can you please provide the details of how you solved it.
It would be great if you can share how you have formatted the data for user_features.

erlebach · 2022-08-16T14:01:51Z

Hi @srinivascnu166,
All I did was make sure of two things: (to be checked independently for item and user features)

there should be no duplicate rows, i.e., no duplicate items in the item feature list
the list of unique items derived from the user/item list should be the same as the list of unique items derived from the item attribute list.

Does this make sense?

erlebach changed the title ~~But with user features?~~ Bug when using user features? Jun 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug when using user features? #40

Bug when using user features? #40

erlebach commented Jun 30, 2022

erlebach commented Jun 30, 2022

srinivascnu166 commented Aug 1, 2022

erlebach commented Aug 16, 2022

Bug when using user features? #40

Bug when using user features? #40

Comments

erlebach commented Jun 30, 2022

erlebach commented Jun 30, 2022

srinivascnu166 commented Aug 1, 2022

erlebach commented Aug 16, 2022