You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running fit() with user features, I get the error:
KeyError: 'the users in [user_features] do not match the users in [interactions]'
which has been reported previously. In my case, I did some debugging in the source code, and found the following. In the function _init_interactions, one finds the statement:
ifnp.array_equal(sorted(x_uf.index.values), self.user_idx):
self.x_uf=np.ascontiguousarray(x_uf.sort_index(), dtype=np.float32)
else:
raiseKeyError('the users in [user_features] do not match the users in [interactions]')
which is the error in question. Looking at the definition of self.user_idx, one finds, in the same file rankfm.py:
# store unique values of user/item indexes and observed interactions for each userself.user_idx=np.arange(len(self.user_id), dtype=np.int32)
self.item_idx=np.arange(len(self.item_id), dtype=np.int32)
near line 128. Clearly, self.user_idx are consecutive indexes 0,1,2, ... up to the number of user ids.
However, sorted(x_uf.index.values) is the sorted list of user ids. Thus, the two lists cannot be equal. The code that leads me to this conclusions is:
As far as I understand, the first column of user_features, which is an argument to the function, should be the actual user_id, which can be anything, as long as it does not appear twice in the dataframe. In this case, the conditional (last line) can not be satisfied.
Therefore, I must not understand the data format of user_features. Where is this explained? The documentation states the following:
user_features – dataframe of user metadata features: [user_id, uf_1, … , uf_n]
with no additional information regarding the values of user_id. Any clarification would be most welcome!
The text was updated successfully, but these errors were encountered:
erlebach
changed the title
But with user features?
Bug when using user features?
Jun 30, 2022
Hi, I have faced the same issue. Can you please provide the details of how you solved it.
It would be great if you can share how you have formatted the data for user_features.
When running
fit()
with user features, I get the error:which has been reported previously. In my case, I did some debugging in the source code, and found the following. In the function
_init_interactions
, one finds the statement:which is the error in question. Looking at the definition of
self.user_idx
, one finds, in the same filerankfm.py
:near line 128. Clearly,
self.user_idx
are consecutive indexes 0,1,2, ... up to the number of user ids.However,
sorted(x_uf.index.values)
is the sorted list of user ids. Thus, the two lists cannot be equal. The code that leads me to this conclusions is:As far as I understand, the first column of
user_features
, which is an argument to the function, should be the actualuser_id
, which can be anything, as long as it does not appear twice in the dataframe. In this case, the conditional (last line) can not be satisfied.Therefore, I must not understand the data format of
user_features
. Where is this explained? The documentation states the following:with no additional information regarding the values of
user_id
. Any clarification would be most welcome!The text was updated successfully, but these errors were encountered: