u.data -- Consisting 10,000 comments from 943 users out of 1682 movies. At least, each user comment 20 videos. Users and movies are numbered consecutively from number 1 respectively. The data is sorted randomly
user id
item id
rating
timestamp
196
242
3
881250949
186
302
3
891717742
22
377
1
878887116
244
51
2
880606923
166
346
1
886397596
3. u1.base / u1.test are train set and validation set respectively, seperated from dataset u.data with proportion of 80% and 20%. It also make sense to train set and validation set from u1.base / u1.test to u5.base / u5.test.
4. You can also construct train set and validation set according to your own evaluation method.
Experiment Environment
python3,at least containing following python package: sklearn,numpy,jupyter,matplotlib.
An advice is installing anaconda3 directly, which already contains the python package mentioned above.
Time and Place
2018-11-24 8:50-12:15 AM B7-138(Mingkui Tan) B7-238(Qingyao Wu)
Submit Deadline
2018-12-29 12:00 noon
Experimental Form
Complete in group.
Experiment Step
The experiment code and drawing are both completed on jupyter.
Using alternate least squares optimization(ALS):
Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix against the raw data, and fill 0 for null values.
Initialize the user factor matrix and the item (movie) factor matrix , where is the number of potential features.
Determine the loss function and the hyperparameter learning rate and the penalty factor .
Use alternate least squares optimization method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
4.1 With fixd item factor matrix, find the loss partial derivative of each row (column) of the user factor matrices, ask the partial derivative to be zero and update the user factor matrices.
4.2 With fixd user factor matrix, find the loss partial derivative of each row (column) of the item factor matrices, ask the partial derivative to be zero and update the item
4.3 Calculate the on the validation set, comparing with the of the previous iteration to determine if it has converged.
Repeat step 4. several times, get a satisfactory user factor matrix and an item factor matrix , Draw a curve with varying iterations.
The final score prediction matrix is obtained by multiplying the user factor matrix and the transpose of the item factor matrix .
Using stochastic gradient descent method(SGD):
Read the data set and divide it (or use u1.base / u1.test to u5.base / u5.test directly). Populate the original scoring matrix against the raw data, and fill 0 for null values.
Initialize the user factor matrix and the item (movie) factor matrix , where is the number of potential features.
Determine the loss function and hyperparameter learning rate and the penalty factor .
Use the stochastic gradient descent method to decompose the sparse user score matrix, get the user factor matrix and item (movie) factor matrix:
4.1 Select a sample from scoring matrix randomly;
4.2 Calculate this sample's loss gradient of specific row(column) of user factor matrix and item factor matrix;
4.3 Use SGD to update the specific row(column) of and ;
4.4 Calculate the on the validation set, comparing with the of the previous iteration to determine if it has converged.
Repeat step 4. several times, get a satisfactory user factor matrix and an item factor matrix , Draw a curve with varying iterations.
The final score prediction matrix is obtained by multiplying the user factor matrix and the transpose of the item factor matrix .
You can just choose one method to complete the experiment, including ALS, SGD or any method you can find in the Internet. Using method from the Internet, you should specify the source.
Sort out the experimental results and complete the experimental report.
You can also explore recommadation system of other tasks and functions.
Evaluation
Item
Proportion
Description
Attendance
40%
Ask for a leave if time conflict
Code availability
20%
Complied successfully
Report
30%
According to report model
Code specification
10%
Mainly consider whether using the readable variable name
Requirement for Submission
Submission process
1.Access222.201.187.50:7001.
2.Click on the corresponding submission entry.
3.Fill in your name, student number, upload pdf format report and zip format code compression package.
Precautions
Experiment reports and code can be uploaded multiple times, and multiple uploads will overwrite previously submitted files.
After uploading, you can refresh the page and check if the upload is successful in the file list below.
Teaching assistants save all uploaded results at the experimental deadline, and the files uploaded after the deadline are invalid.
If you write an experiment report in Word, you need to export it to pdf format.
The package format of the code file must be zip. Please do not submit the compressed file in rar format.
Submit URL can only be accessed by campus network.
The code is written in python language, the experimental report score standard English is better than Chinese, latex is better than word.
Any advice or idea is welcome to discuss with teaching assistant in QQ group.