@nrailgun 2015-11-10T08:28:04.000000Z 字数 2914 阅读 1675

Recommender System

机器学习

Key Problems:

How to collect data.
Extrapolate unknown rating from the known ones.
Evaluating extrapolation methods.

3 Approaches to recommender systems:

Content-based
Collaborative
Latent factor based

Content-based Recommender System

Recommend items to customer $x$ similar item highly rated by $x$ .

TF-IDF

T F i j = f i j max k f k j

$TF_{ij} = \frac{f_{ij}}{\max_k f_{kj}}$
where

fij= $f_{ij} =$ frequency of term (feature)

i $i$ in doc (item)

j $j$ .

I D F i = log (N n i)

$IDF_i = \log(\frac N n_i)$
where

N $N$ is the number of docs,

ni $n_i$ is the num of docs that mention term

i $i$ .
TF-IDF score:

wij=TFij×IDFi $w_{ij} = TF_{ij} \times IDF_i$ . Doc profiles = set of words with highest TF-IDF scores.

Pros:

No cold-start
Able to provide explanations

Cons:

Finding apropriate features are hard (images, movies)
How to build a user profile for new users?

Collaborative Filtering

User-user collaborative filtering: Find set $N$ of users whos ratings are similar to user $x$ , and estimate user $x$ 's ratings based on users in $N$ .

Item-item collaborative filtering:

For item $i$ , find other similar items.
Estimate ratings for item $i$ based on similar items:
$r x i = \sum j \in N ( i ; x ) S i j \times r x j \sum j \in N ( i ; x ) S i j$ $r_{xi} = \frac {\sum_{j \in N(i; x)} S_{ij} \times r_{xj}} {\sum_{j \in N(i; x)} S_{ij} }$
where $S_{ij}$ is similarity, $r_{xj}$ is rating on $j$ , $N(i; x)$ is similar items rated by user $x$ .

In practice, estimate $r_{xi}$ as the weighted average:

r x i = b x i + \sum j \in N ( i ; x ) S i j \times ( r x j - b x j ) \sum j \in N ( i ; x ) S i j

$r_{xi} = b_{xi} + \frac {\sum_{j \in N(i; x)} S_{ij} \times (r_{xj} - b_{xj})} {\sum_{j \in N(i; x)} S_{ij} }$
where

bxj=μ+bx+bj $b_{xj} = \mu + b_x + b_j$ ,

μ $\mu$ is overall movie rating,

bx=μx−μ $b_x = \mu_x - \mu$ ,

bj=μj−μ $b_j = \mu_j - \mu$ .

Pros:

No feature selection needed

Cons:

Cold start
User / Rating matrix sparsity
Tends to recommend popular items

Practical Tips

Compare predictions with known ratings:
- Root mean square error: $\frac 1 {|R|} \sqrt{\sum_{(i, x) \in R} (\hat{r}_{xi} - r_{xi})^2}$
- % of those in top 10
In pratice, we care only about high ratings (recommender).
Finding $k$ most similar is expensive: LSH.

Interpolation Weights

r x i = b x i + \sum j \in N (i; x) W i j \times (r x j - b x j)

$r_{xi} = b_{xi} + \sum_{j \in N(i; x)} W_{ij} \times (r_{xj} - b_{xj})$

Learn $W_{ij}$ that minimizes SSE $\sum_{(i, x) \in R} (\hat{r}_{xi} - r_{xi})^2$ on training data. Minimize

J (W) = \sum x ⎛ ⎝ ⎡ ⎣ b x i + \sum j \in N (i; x) w i j (r x j - b x j) ⎤ ⎦ - r x i ⎞ ⎠ 2

$J(W) = \sum_x \left( \left[ b_{xi} + \sum_{j \in N(i; x)} w_{ij} (r_{xj} - b_{xj}) \right] - r_{xi} \right)^2$
by gradient descent. The gradient is

\partial J ( W ) \partial W i j = 2 \sum x, i ⎛ ⎝ ⎡ ⎣ b x i + \sum k \in N (i; x) w i k (r x k - b x k) ⎤ ⎦ - r x i ⎞ ⎠ (r x j - b x j)

$\frac{\partial J(W)}{\partial W_{ij}} = 2 \sum_{x,i} \left( \left[ b_{xi} + \sum_{k \in N(i; x)} w_{ik} (r_{xk} - b_{xk}) \right] - r_{xi} \right) (r_{xj} - b{xj})$

Latent Factor Models

$R$ is rating matrix, where $R_{ix}$ represents $x$ 's rating for item $i$ . SVD ( $A = U \Sigma V^T$ ) on $R$ : $R = Q P^T$ , where $Q = U$ , $P^T = \Sigma V^T$ , and $r_{xi} = q_i \centerdot p_x$ .

SVD isn’t defined when entries are missing! Use specialized methods to find $P$ , $Q$

min P, Q \sum (i, x) \in R (r x i - q i \cdot p x) 2

$\min_{P,Q} \sum_{(i,x) \in R} (r_{xi} - q_i \centerdot p_x)^2$
Introducing regularization:

min P, Q \sum (i, x) \in R (r x i - q i \cdot p x) 2 + [λ 1 \sum x ∥ p x ∥ 2 + λ 2 \sum i ∥ q i ∥ 2]

$\min_{P,Q} \sum_{(i,x) \in R} (r_{xi} - q_i \centerdot p_x)^2 + \left[ \lambda_1 \sum_x \| p_x \|^2 + \lambda_2 \sum_i \| q_i \|^2 \right]$