Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations Frumerman, Shir and Shani, Guy and Shapira, Bracha and Shalom, Oren Sar 2019

Paper summary

darel

## Idea When we recommend items to users, some of them are not chosen by the user. These rejected recommendations are usually treated as hard mistakes. Authors argue that these bad recommendations still may influence user's choice even though they were not picked. For example user didn't click on "Die Hard" but watched another Bruce Willis movie. This seems to be a not so bad recommendation after all and maybe we should not penalize it as hard as we usually do. Ultimate goal is to invent a metric that would have good correlation between offline results and real online performance. ## User study Authors held a user study, showing people a set of 5 items: watched movie, 3 rejected recommendations and an item chosen after recommendation. Rejected recommendations were generated into 4 groups: - only high content similarity - only high collaborative similarity - only high popularity similarity - all medium similarities The question was "**How good is this recommendation 1-5?**" | Content | Collaborative | Popularity | Other | | ------- | ------------- | ---------- | ----- | | 3.8 | 3.52 | 2.93 | 1.99 | ## Proposal If standard precision is $$p_u = \frac{|c_u \cap r_u|}{|r_u|}$$ where $c_u$ are items chosen by user and $r_u$ items recommended to user, then we can define a refined precision as $$p_u^{sim} = p_u + \frac{\sum_{i \in r_u \setminus c_u}max_{j \in \{ c_u:\ t(u,j) > t(u,i)\}}sim(i, j)}{|r_u|}$$ where $t(u,i)$ is the time when user $u$ interacted with item $i$. ## Evaluation Authors used Xing dataset containing user interactions with a system for seeking employment opportunities. It contains logs of what was recommended and what was clicked. ### "Online" evaluation Measure correlation between different refinement types of precision of RS presented in dataset and actual user clicks. | Content | Collaborative | Regular | | ------- | ------------- | ------- | | 0.615 | 0.197 | 0.184 | ### Offline evaluation Split logs 70/30 by time and measure correlation between number of clicks per user on test and metrics on train as if we were training a model on train part. | Train clicks | Content | Collaborative | Random | | ------------ | ------- | ------------- | ------ | | 0.5 | 0.35 | 0.16 | 0.087 | ## Open question What is the best way to calculate item similarity?