[关闭]
@liushiya 2018-11-10T06:41:34.000000Z 字数 5127 阅读 869

Recommender System Based on Matrix Decomposition

机器学习 实验


你可以点击这里查看中文版本。

Motivation

  1. Explore the construction of recommended system.
  2. Understand the principle of matrix decomposition.
  3. Be familiar to the use of gradient descent.
  4. Construct a recommendation system under small-scale dataset, cultivate engineering ability.

Dataset

  1. Utilizing MovieLens-100k dataset.
  2. u.data -- Consisting 10,000 comments from 943 users out of 1682 movies. At least, each user comment 20 videos. Users and movies are numbered consecutively from number 1 respectively. The data is sorted randomly
user id item id rating timestamp
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596

3. u1.base / u1.test are train set and validation set respectively, seperated from dataset u.data with proportion of 80% and 20%. It also make sense to train set and validation set from u1.base / u1.test to u5.base / u5.test.
4. You can also construct train set and validation set according to your own evaluation method.

Experiment Environment

python3,at least containing following python package: sklearnnumpyjupytermatplotlib.
An advice is installing anaconda3 directly, which already contains the python package mentioned above.

Time and Place

2018-11-24 8:50-12:15 AM B7-138(Mingkui Tan) B7-238(Qingyao Wu)

Submit Deadline

2018-12-29 12:00 noon

Experimental Form

Complete in group.

Experiment Step

The experiment code and drawing are both completed on jupyter.

Using alternate least squares optimization(ALS):

  1. Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix against the raw data, and fill 0 for null values.
  2. Initialize the user factor matrix and the item (movie) factor matrix , where is the number of potential features.
  3. Determine the loss function and the hyperparameter learning rate and the penalty factor .
  4. Use alternate least squares optimization method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
        4.1 With fixd item factor matrix, find the loss partial derivative of each row (column) of the user factor matrices, ask the partial derivative to be zero and update the user factor matrices.
        4.2 With fixd user factor matrix, find the loss partial derivative of each row (column) of the item factor matrices, ask the partial derivative to be zero and update the item
        4.3 Calculate the on the validation set, comparing with the of the previous iteration to determine if it has converged.
  5. Repeat step 4. several times, get a satisfactory user factor matrix and an item factor matrix , Draw a curve with varying iterations.
  6. The final score prediction matrix is obtained by multiplying the user factor matrix and the transpose of the item factor matrix .

Using stochastic gradient descent method(SGD):

  1. Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix against the raw data, and fill 0 for null values.
  2. Initialize the user factor matrix and the item (movie) factor matrix , where is the number of potential features.
  3. Determine the loss function and hyperparameter learning rate and the penalty factor .
  4. Use the stochastic gradient descent method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
        4.1 Select a sample from scoring matrix randomly;
        4.2 Calculate this sample's loss gradient of specific row(column) of user factor matrix and item factor matrix;
        4.3 Use SGD to update the specific row(column) of and ;
        4.4 Calculate the on the validation set, comparing with the of the previous iteration to determine if it has converged.
  5. Repeat step 4. several times, get a satisfactory user factor matrix and an item factor matrix , Draw a curve with varying iterations.
  6. The final score prediction matrix is obtained by multiplying the user factor matrix and the transpose of the item factor matrix .

You can just choose one method to complete the experiment, including ALS, SGD or any method you can find in the Internet. Using method from the Internet, you should specify the source.

Sort out the experimental results and complete the experimental report.
You can also explore recommadation system of other tasks and functions.

Evaluation

Item Proportion Description
Attendance 40% Ask for a leave if time conflict
Code availability 20% Complied successfully
Report 30% According to report model
Code specification 10% Mainly consider whether using the readable variable name

Requirement for Submission

Submission process

1.Access222.201.187.50:7001.
2.Click on the corresponding submission entry.
3.Fill in your name, student number, upload pdf format report and zip format code compression package.

Precautions


Any advice or idea is welcome to discuss with teaching assistant in QQ group.

Reference

Matrix Factorization: A Simple Tutorial and Implementation in Python

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注