@liushiya 2018-11-10T06:41:34.000000Z 字数 5127 阅读 916

Recommender System Based on Matrix Decomposition

机器学习 实验

你可以点击这里查看中文版本。

Motivation

Explore the construction of recommended system.
Understand the principle of matrix decomposition.
Be familiar to the use of gradient descent.
Construct a recommendation system under small-scale dataset, cultivate engineering ability.

Dataset

Utilizing MovieLens-100k dataset.
u.data -- Consisting 10,000 comments from 943 users out of 1682 movies. At least, each user comment 20 videos. Users and movies are numbered consecutively from number 1 respectively. The data is sorted randomly

user id	item id	rating	timestamp
196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596

3. u1.base / u1.test are train set and validation set respectively, seperated from dataset u.data with proportion of 80% and 20%. It also make sense to train set and validation set from u1.base / u1.test to u5.base / u5.test.
4. You can also construct train set and validation set according to your own evaluation method.

Experiment Environment

python3，at least containing following python package: sklearn，numpy，jupyter，matplotlib.
An advice is installing anaconda3 directly, which already contains the python package mentioned above.

Time and Place

2018-11-24 8:50-12:15 AM B7-138(Mingkui Tan) B7-238（Qingyao Wu）

Submit Deadline

2018-12-29 12:00 noon

Experimental Form

Complete in group.

Experiment Step

The experiment code and drawing are both completed on jupyter.

Using alternate least squares optimization(ALS):

Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix $R_ {n \ _users, n \ _items}$ against the raw data, and fill 0 for null values.
Initialize the user factor matrix $P_ {n \ _users, K}$ and the item (movie) factor matrix $Q_ {n \ _item, K}$ , where $K$ is the number of potential features.
Determine the loss function and the hyperparameter learning rate $\eta$ and the penalty factor $\lambda$ .
Use alternate least squares optimization method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
    4.1 With fixd item factor matrix, find the loss partial derivative of each row (column) of the user factor matrices, ask the partial derivative to be zero and update the user factor matrices.
    4.2 With fixd user factor matrix, find the loss partial derivative of each row (column) of the item factor matrices, ask the partial derivative to be zero and update the item
    4.3 Calculate the $L_ {validation}$ on the validation set, comparing with the $L_ {validation}$ of the previous iteration to determine if it has converged.
Repeat step 4. several times, get a satisfactory user factor matrix $P$ and an item factor matrix $Q$ , Draw a $L_ {validation}$ curve with varying iterations.
The final score prediction matrix $\hat{R}_{n\_users,n\_items}$ is obtained by multiplying the user factor matrix $P_{n\_users,K}$ and the transpose of the item factor matrix $Q_{n\_item,K}$ .

Using stochastic gradient descent method(SGD):

Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix $R_ {n \ _users, n \ _items}$ against the raw data, and fill 0 for null values.
Initialize the user factor matrix $P_ {n \ _users, K}$ and the item (movie) factor matrix $Q_ {n \ _item, K}$ , where $K$ is the number of potential features.
Determine the loss function and hyperparameter learning rate $\eta$ and the penalty factor $\lambda$ .
Use the stochastic gradient descent method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
    4.1 Select a sample from scoring matrix randomly;
    4.2 Calculate this sample's loss gradient of specific row(column) of user factor matrix and item factor matrix;
    4.3 Use SGD to update the specific row(column) of $P_{n\_users,K}$ and $Q_{n\_item,K}$ ;
    4.4 Calculate the $L_ {validation}$ on the validation set, comparing with the $L_ {validation}$ of the previous iteration to determine if it has converged.
Repeat step 4. several times, get a satisfactory user factor matrix $P$ and an item factor matrix $Q$ , Draw a $L_ {validation}$ curve with varying iterations.
The final score prediction matrix $\hat{R}_{n\_users,n\_items}$ is obtained by multiplying the user factor matrix $P_{n\_users,K}$ and the transpose of the item factor matrix $Q_{n\_item,K}$ .

You can just choose one method to complete the experiment, including ALS, SGD or any method you can find in the Internet. Using method from the Internet, you should specify the source.

Sort out the experimental results and complete the experimental report.
You can also explore recommadation system of other tasks and functions.

Evaluation

Item	Proportion	Description
Attendance	40%	Ask for a leave if time conflict
Code availability	20%	Complied successfully
Report	30%	According to report model
Code specification	10%	Mainly consider whether using the readable variable name

Requirement for Submission

Submission process

1.Access222.201.187.50:7001.
2.Click on the corresponding submission entry.
3.Fill in your name, student number, upload pdf format report and zip format code compression package.

Precautions

Experiment reports and code can be uploaded multiple times, and multiple uploads will overwrite previously submitted files.
After uploading, you can refresh the page and check if the upload is successful in the file list below.
Teaching assistants save all uploaded results at the experimental deadline, and the files uploaded after the deadline are invalid.
If you write an experiment report in Word, you need to export it to pdf format.
The package format of the code file must be zip. Please do not submit the compressed file in rar format.
Submit URL can only be accessed by campus network.
The code is written in python language, the experimental report score standard English is better than Chinese, latex is better than word.

Any advice or idea is welcome to discuss with teaching assistant in QQ group.

Reference

Matrix Factorization: A Simple Tutorial and Implementation in Python