Dealing with cold start users click history #11

igor17400 · 2024-03-26T08:03:43Z

Hello! I'm currently handling a dataset where the histories column might initially be empty, especially for users who are accessing the system for the first time.

Given this context, I'm seeking advice on how to approach a particular situation highlighted in the code found at this GitHub link. The process involves tokenizing the titles of previously clicked news articles, but I'm facing a potential cold start issue for new users without any history. In these instances, should I consider tokenizing empty titles, abstracts, etc.?

The text was updated successfully, but these errors were encountered:

igor17400 · 2024-03-27T04:45:55Z

Here is an update on what I did.

inside the __getitem__ on rec_dataset.py I added the following condition:

if history.size == 1 and history[0] == '':
            history = self._initialize_cold_start()
        else:
            history = self.news.loc[history]

where _initialize_cold_start is defined as the following:

def _initialize_cold_start(self):
        """
        In cold start cases, history can be empty thus we need to 
        add a dataframe with empty values for the embedding.
        """
        # Initialize an empty DataFrame with specified columns
        history = pd.DataFrame(columns=['title', 'abstract', 'sentiment_class', 'sentiment_score'])

        # Append a new row with the specified values
        history = history.append({
            'title': '', 
            'abstract': '', 
            'sentiment_class': 0,
            'sentiment_score': 0.0
        }, ignore_index=True)

        # Explicitly set the data types for the entire DataFrame
        history = history.astype({
            'title': 'object',
            'abstract': 'object',
            'sentiment_class': 'int64',
            'sentiment_score': 'float64'
        })

        return history

This may be useful for other people who are trying to solve the same problem.

andreeaiana · 2024-03-27T06:45:34Z

Hi @igor17400,

Thanks for raising this issue. Indeed, the original code did not work with empty user histories, but implementing this functionality should be useful for many users.

I think your solution is simple and elegant. I can have a look at it over the weekend, to test it with both pretrained word embeddings and PLMs, and streamline it across the data preprocessing functions for all datasets. Would you like to open a PR with your proposed solution?

igor17400 · 2024-04-02T10:51:19Z

Hi @andreeaiana,

I'm in the process of implementing PP-Rec, as outlined in PR #12. I'm currently working through it, ensuring that the blocks are accurate and checking the scores and behaviors for MIND large and Adressa. Thus, is not ready to be merge. However, just to let you know that in this PR, I'm adding the _initialize_cold_start idea along with the previously mentioned spinner for score calculation to avoid terminal freeze.

andreeaiana · 2024-04-04T09:46:32Z

Great, thanks for letting me know and for your contributions to the library.

igor17400 · 2024-04-12T09:10:48Z

@andreeaiana I noticed you filter out cold start users (those with empty histories). Why is that?

Link to the code

I'm wondering if it might be better to use a strategy like the one I previously mentioned (_initialize_cold_start) to pre-populate these cold start users with some placeholder news articles, rather than removing them. But maybe my thinking is wrong.

andreeaiana · 2024-04-15T09:06:21Z

@andreeaiana I noticed you filter out cold start users (those with empty histories). Why is that?

Link to the code

I'm wondering if it might be better to use a strategy like the one I previously mentioned (_initialize_cold_start) to pre-populate these cold start users with some placeholder news articles, rather than removing them. But maybe my thinking is wrong.

I think that's a good idea, we can try it. I know that some models originally do that, but not all of them.

igor17400 changed the title ~~Dealing with cold start users~~ Dealing with cold start users click history Mar 26, 2024

andreeaiana added the enhancement New feature or request label Mar 27, 2024

igor17400 mentioned this issue Apr 2, 2024

PP-Rec implementation #12

Open

Poseidondon added a commit to Poseidondon/newsreclib-ru that referenced this issue Jun 24, 2024

lstur andreeaiana#11

9246932

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with cold start users click history #11

Dealing with cold start users click history #11

igor17400 commented Mar 26, 2024

igor17400 commented Mar 27, 2024 •

edited

Loading

andreeaiana commented Mar 27, 2024

igor17400 commented Apr 2, 2024

andreeaiana commented Apr 4, 2024

igor17400 commented Apr 12, 2024

andreeaiana commented Apr 15, 2024

Dealing with cold start users click history #11

Dealing with cold start users click history #11

Comments

igor17400 commented Mar 26, 2024

igor17400 commented Mar 27, 2024 • edited Loading

andreeaiana commented Mar 27, 2024

igor17400 commented Apr 2, 2024

andreeaiana commented Apr 4, 2024

igor17400 commented Apr 12, 2024

andreeaiana commented Apr 15, 2024

igor17400 commented Mar 27, 2024 •

edited

Loading