@w1024020103 2017-03-21T10:42:52.000000Z 字数 5130 阅读 838

Proj2 2.7. Query Parser and Contest

CS61B

2.7. Query Parser and Contest

We've provided you with a query parser for SimpleDB that you can use to write and run SQL queries against your database once you have completed the exercises in this lab.
The first step is to create some data tables and a catalog. Suppose you have a file data.txt with the following contents:

1,10
2,20
3,30
4,40
5,50
5,50

You can convert this into a SimpleDB table using the convert command (make sure to type ant first!):

java -jar dist/simpledb.jar convert data.txt 2 "int,int"

我们给SimpleDB提供了一个查询解析器,你可以用它写SQL命令并在你的数据库上运行。
第一步是要建立一些数据tables和一个catalog，加入你有一个data.txt文件有以下内容：

1,10
2,20
3,30
4,40
5,50
5,50

你可以用convert命令把它转换成一个SimpleDB table：

java -jar dist/simpledb.jar convert data.txt 2 "int,int"

This creates a file data.dat. In addition to the table's raw data, the two additional parameters specify that each record has two fields and that their types are int and int.
Next, create a catalog file, catalog.txt, with the follow contents:

data (f1 int, f2 int)

This tells SimpleDB that there is one table, data (stored in data.dat) with two integer fields named f1 and f2.
Finally, invoke the parser. You must run java from the command line (ant doesn't work properly with interactive targets.) From the simpledb/ directory, type:

java -jar dist/simpledb.jar parser catalog.txt

这会创建一个data.dat文件。除了这个table的原始数据，另外两个参数告诉了我们每行有两列以及他们的类型是int和int。接下来，创建一个catalog文件，catalog.txt,有如下内容：

data (f1 int, f2 int)

这告诉SimpleDB这里有一个table,以及有两个名为f1和f2的整数列存在data.dat里。最后，激活解析器。你必须从命令行运行java，输入：

java -jar dist/simpledb.jar parser catalog.txt

You should see output like:
Added table : data with schema INT(f1), INT(f2), SimpleDB>
Finally, you can run a query:

SimpleDB> select d.f1, d.f2 from data d;
Started a new transaction tid = 1221852405823
ADDING TABLE d(data) TO tableMap
TABLE HAS tupleDesc INT(d.f1), INT(d.f2),
1 10
2 20
3 30
4 40
5 50
5 50
6 rows.
0.16 seconds
SimpleDB>

你应该看到如下的输出：
Added table : data with schema INT(f1), INT(f2), SimpleDB>

最后，你可以执行一个查询：

SimpleDB> select d.f1, d.f2 from data d;
Started a new transaction tid = 1221852405823
ADDING TABLE d(data) TO tableMap
TABLE HAS tupleDesc INT(d.f1), INT(d.f2),
1 10
2 20
3 30
4 40
5 50
5 50
6 rows.
0.16 seconds
SimpleDB>

The parser is relatively full featured (including support for SELECTs, INSERTs, DELETEs, and transactions), but does have some problems and does not necessarily report completely informative error messages. Here are some limitations to bear in mind:

You must preface every field name with its table name, even if the field name is unique (you can use table name aliases, as in the example above, but you cannot use the AS keyword.)
Nested queries are supported in the WHERE clause, but not the FROM clause.
No arithmetic expressions are supported (for example, you can't take the sum of two fields.)
At most one GROUP BY and one aggregate column are allowed.
Set-oriented operators like IN, UNION, and EXCEPT are not allowed.
Only AND expressions in the WHERE clause are allowed.
UPDATE expressions are not supported.
The string operator LIKE is allowed, but must be written out fully (that is, the Postgres tilde [~] shorthand is not allowed.)

解析器功能相对完善（支持SELECT, INSERT, DELETE 和 transactions），但也有一些问题而且不能完全汇报错误信息。记住下面的限制：

你必须在列名前加上它的table名作为前缀，就算这个列名是独一无二的（你可以用table name别名，比如上面的例子那样，但你不能用AS关键词）
WHERE从句支持嵌套查询，但FROM从句不支持
算术表达不被支持（比如你不能求两列的和）
最多一个GROUP BY和一个集合列被允许，集合导向的运算符比如：IN, UNION, EXCEPT不被允许。
只有在WHERE从句里面可以使用AND表达
UPDATE表达不被支持
字符串运算符LIKE可行，但必须完整地写出来（也就是说，波浪号 [~]不被允许）

Exercise 7: Please execute the three queries below using your SimpleDB prototype and report the times in your lab write-up.

Contest (Optional)

We have built a SimpleDB-encoded version of the DBLP database; the needed files are located at: http://www.cs.washington.edu/education/courses/cse444/14sp/labs/lab2/dblp_data.tar.gz

You should download the file and unpack it. It will create four files in the dblp_data directory. Move them into the simpledb directory. The following commands will acomplish this, if you run them from the simpledb directory:

wget http://www.cs.washington.edu/education/courses/cse444/14sp/labs/lab2/dblp_data.tar.gz
tar xvzf dblp_data.tar.gz
mv dblp_data/* .
rm -r dblp_data.tar.gz dblp_data

You can then run the parser with:

java -jar dist/simpledb.jar parser dblp_simpledb.schema

We will start a thread on the course message board inviting anyone interested to post their runtimes for the following three queries (please run the queries on a lab machine and indicate which one you used so it becomes easier to compare runtimes). The contest is just for fun. It will not affect your grade:

SELECT p.title
FROM papers p
WHERE p.title LIKE 'selectivity';
SELECT p.title, v.name
FROM papers p, authors a, paperauths pa, venues v
WHERE a.name = 'E. F. Codd'
AND pa.authorid = a.id
AND pa.paperid = p.id
AND p.venueid = v.id;

SELECT a2.name, count(p.id)
FROM papers p, authors a1, authors a2, paperauths pa1, paperauths pa2
WHERE a1.name = 'Michael Stonebraker'
AND pa1.authorid = a1.id
AND pa1.paperid = p.id
AND pa2.authorid = a2.id
AND pa1.paperid = pa2.paperid
GROUP BY a2.name
ORDER BY a2.name;

Note that each query will print out its runtime after it executes.

You may wish to create optimized implementations of some of the operators; in particular, a fast join operator (e.g., not nested loops) will be essential for good performance on queries 2 and 3.

There is currently no optimizer in the parser, so the queries above have been written to cause the parser to generate reasonable plans. Here are some helpful hints about how the parser works that you may wish to exploit while running these queries:

The table on the left side of the joins in these queries is passed in as the first DbIterator parameter to Join.
Expressions in the WHERE clause are added to the plan from top to bottom, such that first expression will be the bottom-most operator in the generated query plan. For example, the generated plan for Query 2 is:

Project(Join(Join(Filter(a),pa),p))

Proj2 2.7. Query Parser and Contest

内容目录