Recommender System
机器学习
Key Problems:
- How to collect data.
- Extrapolate unknown rating from the known ones.
- Evaluating extrapolation methods.
3 Approaches to recommender systems:
- Content-based
- Collaborative
- Latent factor based
Content-based Recommender System
Recommend items to customer x similar item highly rated by x.
TF-IDF
TFij=fijmaxkfkj
where
fij= frequency of term (feature)
i in doc (item)
j.
IDFi=log(Nni)
where
N is the number of docs,
ni is the num of docs that mention term
i.
TF-IDF score:
wij=TFij×IDFi. Doc profiles = set of words with highest TF-IDF scores.
Pros:
- No cold-start
- Able to provide explanations
Cons:
- Finding apropriate features are hard (images, movies)
- How to build a user profile for new users?
Collaborative Filtering
User-user collaborative filtering: Find set N of users whos ratings are similar to user x, and estimate user x's ratings based on users in N.
Item-item collaborative filtering:
- For item i, find other similar items.
- Estimate ratings for item i based on similar items:
rxi=∑j∈N(i;x)Sij×rxj∑j∈N(i;x)Sij
where Sij is similarity, rxj is rating on j, N(i;x) is similar items rated by user x.
In practice, estimate rxi as the weighted average:
rxi=bxi+∑j∈N(i;x)Sij×(rxj−bxj)∑j∈N(i;x)Sij
where
bxj=μ+bx+bj,
μ is overall movie rating,
bx=μx−μ,
bj=μj−μ.
Pros:
- No feature selection needed
Cons:
- Cold start
- User / Rating matrix sparsity
- Tends to recommend popular items
Practical Tips
- Compare predictions with known ratings:
- Root mean square error: 1|R|∑(i,x)∈R(r^xi−rxi)2−−−−−−−−−−−−−−−√
- % of those in top 10
- In pratice, we care only about high ratings (recommender).
- Finding k most similar is expensive: LSH.
Interpolation Weights
rxi=bxi+∑j∈N(i;x)Wij×(rxj−bxj)
Learn Wij that minimizes SSE ∑(i,x)∈R(r^xi−rxi)2 on training data. Minimize
J(W)=∑x⎛⎝⎡⎣bxi+∑j∈N(i;x)wij(rxj−bxj)⎤⎦−rxi⎞⎠2
by gradient descent. The gradient is
∂J(W)∂Wij=2∑x,i⎛⎝⎡⎣bxi+∑k∈N(i;x)wik(rxk−bxk)⎤⎦−rxi⎞⎠(rxj−bxj)
Latent Factor Models
R is rating matrix, where Rix represents x's rating for item i. SVD (A=UΣVT) on R: R=QPT, where Q=U, PT=ΣVT, and rxi=qi⋅px.
SVD isn’t defined when entries are missing! Use specialized methods to find P, Q
minP,Q∑(i,x)∈R(rxi−qi⋅px)2
Introducing regularization:
minP,Q∑(i,x)∈R(rxi−qi⋅px)2+[λ1∑x∥px∥2+λ2∑i∥qi∥2]