@w1024020103
2017-03-21T18:42:52.000000Z
字数 5130
阅读 757
CS61B
2.7. Query Parser and Contest
We've provided you with a query parser for SimpleDB that you can use to write and run SQL queries against your database once you have completed the exercises in this lab.
The first step is to create some data tables and a catalog. Suppose you have a file data.txt with the following contents:
1,10
2,20
3,30
4,40
5,50
5,50
You can convert this into a SimpleDB table using the convert command (make sure to type ant first!):
java -jar dist/simpledb.jar convert data.txt 2 "int,int"
我们给SimpleDB提供了一个查询解析器,你可以用它写SQL命令并在你的数据库上运行。
第一步是要建立一些数据tables和一个catalog,加入你有一个data.txt文件有以下内容:
1,10
2,20
3,30
4,40
5,50
5,50
你可以用convert命令把它转换成一个SimpleDB table:
java -jar dist/simpledb.jar convert data.txt 2 "int,int"
This creates a file data.dat
. In addition to the table's raw data, the two additional parameters specify that each record has two fields and that their types are int and int.
Next, create a catalog file, catalog.txt
, with the follow contents:
data (f1 int, f2 int)
This tells SimpleDB that there is one table, data (stored in data.dat) with two integer fields named f1 and f2.
Finally, invoke the parser. You must run java from the command line (ant doesn't work properly with interactive targets.) From the simpledb/ directory, type:
java -jar dist/simpledb.jar parser catalog.txt
这会创建一个data.dat文件。除了这个table的原始数据,另外两个参数告诉了我们每行有两列以及他们的类型是int和int。接下来,创建一个catalog文件,catalog.txt,有如下内容:
data (f1 int, f2 int)
这告诉SimpleDB这里有一个table,以及有两个名为f1和f2的整数列存在data.dat里。最后,激活解析器。你必须从命令行运行java,输入:
java -jar dist/simpledb.jar parser catalog.txt
You should see output like:
Added table : data with schema INT(f1), INT(f2),
SimpleDB>
Finally, you can run a query:
SimpleDB> select d.f1, d.f2 from data d;
Started a new transaction tid = 1221852405823
ADDING TABLE d(data) TO tableMap
TABLE HAS tupleDesc INT(d.f1), INT(d.f2),
1 10
2 20
3 30
4 40
5 50
5 50
6 rows.
0.16 seconds
SimpleDB>
你应该看到如下的输出:
Added table : data with schema INT(f1), INT(f2),
SimpleDB>
最后,你可以执行一个查询:
SimpleDB> select d.f1, d.f2 from data d;
Started a new transaction tid = 1221852405823
ADDING TABLE d(data) TO tableMap
TABLE HAS tupleDesc INT(d.f1), INT(d.f2),
1 10
2 20
3 30
4 40
5 50
5 50
6 rows.
0.16 seconds
SimpleDB>
The parser is relatively full featured (including support for SELECTs, INSERTs, DELETEs, and transactions), but does have some problems and does not necessarily report completely informative error messages. Here are some limitations to bear in mind:
解析器功能相对完善(支持SELECT, INSERT, DELETE 和 transactions),但也有一些问题而且不能完全汇报错误信息。记住下面的限制:
Exercise 7: Please execute the three queries below using your SimpleDB prototype and report the times in your lab write-up.
Contest (Optional)
We have built a SimpleDB-encoded version of the DBLP database; the needed files are located at: http://www.cs.washington.edu/education/courses/cse444/14sp/labs/lab2/dblp_data.tar.gz
You should download the file and unpack it. It will create four files in the dblp_data directory. Move them into the simpledb directory. The following commands will acomplish this, if you run them from the simpledb directory:
wget http://www.cs.washington.edu/education/courses/cse444/14sp/labs/lab2/dblp_data.tar.gz
tar xvzf dblp_data.tar.gz
mv dblp_data/* .
rm -r dblp_data.tar.gz dblp_data
You can then run the parser with:
java -jar dist/simpledb.jar parser dblp_simpledb.schema
We will start a thread on the course message board inviting anyone interested to post their runtimes for the following three queries (please run the queries on a lab machine and indicate which one you used so it becomes easier to compare runtimes). The contest is just for fun. It will not affect your grade:
SELECT p.title
FROM papers p
WHERE p.title LIKE 'selectivity';
SELECT p.title, v.name
FROM papers p, authors a, paperauths pa, venues v
WHERE a.name = 'E. F. Codd'
AND pa.authorid = a.id
AND pa.paperid = p.id
AND p.venueid = v.id;SELECT a2.name, count(p.id)
FROM papers p, authors a1, authors a2, paperauths pa1, paperauths pa2
WHERE a1.name = 'Michael Stonebraker'
AND pa1.authorid = a1.id
AND pa1.paperid = p.id
AND pa2.authorid = a2.id
AND pa1.paperid = pa2.paperid
GROUP BY a2.name
ORDER BY a2.name;
Note that each query will print out its runtime after it executes.
You may wish to create optimized implementations of some of the operators; in particular, a fast join operator (e.g., not nested loops) will be essential for good performance on queries 2 and 3.
There is currently no optimizer in the parser, so the queries above have been written to cause the parser to generate reasonable plans. Here are some helpful hints about how the parser works that you may wish to exploit while running these queries:
Project(Join(Join(Filter(a),pa),p))