- data: facts and statistics collected together for reference or analysis.
- database: A very large, structured collection of data.
DBMS(database management system) is a software system designed to store, manage, and facilitate query to databases.
- Oracle
- IBM DB2
- Microsoft SQL Server
Database System = Databases + DBMS
Typical Applications Supported by Datatbase Systems
- OLTP (Online Transaction Processing 联机事务处理)
- booking a flight tickets
- electronic banking
- OLAP (Online Analytical Processing 联机分析处理) and Data Warehousing (数据仓库)
- business reporting for sales data
- CRM (Customer Relationship Management 客户关系管理)
WWW ? DBMS
- The Web = Surface web + Deep web
- Surface Web: simply the HTML pages, accessed by 'search'.
- Deep Web: content hidden behind HTML forms, accessed by 'query'.
- Search vs Query
- Search is structure-free. The keywords 'database systems' can appear in anyplace in a HTML pages.
- Query is structure-aware. We restruct that the keywords 'database systems' can only appear in the
title
field. i.e., we assume there is an underlying STRUCTURE.
- Referring to C:
C
struct book {
char TITLE[256];
char AUTHOR[256];
float PRICE;
int YEAR;
}
Files vs DBMS
- We can store data in OS files.
- E.g., Google has its own distributed file system called Google File System (GFS).
- Advantages of DBMS:
- Good data modeling
- data independence (数据独立性)
- data integrity and security (数据的完整性与安全性)
- Simple and efficient ad-hoc queries (简单有效的即席查询)
- reduced application development time (减少应用程序开发时间)
- Concurrency Access (并发存储)
- Crash recovery (故障恢复)
A History Perspective
- IDS (Intergrated Data Store) by Charles Bachman, early 1960s.
- Form the basis of Network data model (网状数据模型).
- IMS (Information Management System) by IBM, late 1960s.
- Form the basis of hierachical data model (层次数据模型).
- Relational Data Model (关系数据模型) by Edgar Codd, 1970.
- System R, by IBM, started in 1974.
- SQL (Structured Query Language)
- INGRES, by Berkeley, started in 1974.
- POSTGRES, Mariposa, C-Store.
- Database Transaction Processing, mainly by Jim Gray.
- Object-Relational DBMS, 1990s.
- IBM's DB2, Oracle database, and Microsoft SQL Server.
From OLTP to OLAP and Data Warehousing
- OLAP, flexible reporting for business intelligence.
- OLTP
- Transactions that involve large numbers of records.
- Frequent Ad-hoc queries and infrequent updates.
- A few decision making users.
- Fast response times.
- Data Warehouses are designed to facilitate reporting and analysis.
- Read-Mostly DBMS: C-Store, MonetDB
- Integrated data spanning long time periods, often augmented with summary infomation.
- Several gigabytes to terabytes common.
- Interactive response times expected for complex queries; ad-hoc updates uncommon.
Data Mining (DM 数据挖掘)
- DM is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data.
Big Data, data sets that are too large and complex to manipulate or interrogate with standard methods or tools, which comes from everywhere...
Big data spans four dimensions:
- Volume (容量): terabytes(TB), even petabytes of information.
- Velocity(速率): sometimes 2 minutes is too late.
- Check 5 million trade events created each day for potential fraud.
- Variety (多样): big data is any type of data - structured and unstructured data.
- Exploit the 80% data growth in images, video and documents to improve customer satisfaction.
- Veracity(诚实)
- 1 in 3 business leaders don't trust the information they use to make decisions.
Describe data: Data Model
- A data model is a collection of concepts for describing data.
- A schema(模式) is a description of a particular collection of data, using a given data model.
- The relational data model is the most widely used model today.
- Main concept: relation, basically a table with rows and columns
- Every relation has a schema, which describes the columns, or fields (their names, types, constraints, etc.).
- A relation schema is a TEMPLATE of the corresponding relation.
Levels of Abstraction in a DBMS
- Many views describe how users see the data.
- Personalized access of data.
- Conceptual schema defines logical structure.
- i.e., what relation to store.
- Physical schema specifies physical structure.
- How the “logical” relations are physically stored on external storage such as disk.
Data Independence (One of the most important benefits of using a DBMS)
- Applications insulated from how data is structured and stored.
- Logical data independence: Protection from changes in logical structure of data.
- Physical data independence: Protection from changes in physical structure of data.
Queries in a Relational DBMS
- Specified in a Non-Procedural(非程式化) way
- Users only specify what data they need.
- A DBMS takes care to evaluate queries as efficiently as possible.
- A Non-Procedural Query Language:
- SQL: Structured Query Language
- Basic form of a SQL query:
Concurrent execution of user programs
- Why?
- Utilize CPU while waiting for disk I/O
- database programs make heavy use of disk
- Avoid short programs waiting behind long ones
- e.g. ATM withdrawal while bank manager sums balance across all accounts
- Interleaving(交叉存取的) actions of different user programs can lead to inconsistency.
- Concurrency Control: DBMS ensures such problems don't arise. Users can pretend they are using a single user system.
Transaction 事务
- An transaction is an atomic sequence of database actions (reads/writes).
- Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins.
Incomplete Transaction and System Crashes
- Incomplete transaction
- Canceled by the transaction or DBMS
- Aborted unexpectedly by system crash
- Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of transactions:
- Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol(Write-Ahead Log); OS support for this is often inadequate.)
- After a crash, the effects of partially executed transactions are undone using the log.
Structure of DBMS
![Architecture of DBMS](images/1-architecture.png)
People who work with Database
- End users and DBMS vendors
- DB application programmers
- Database administrator (DBA)
- Designs logical /physical schemas
- Handles security and authorization
- Data availability, crash recovery
- Database tuning as needs evolve
Summary
- DBMS used to maintain, query large datasets.
- Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security.
- Levels of abstraction give data independence.
- A DBMS typically has a layered architecture.
- DBAs hold responsible jobs and are well-paid!
- DBMS R&D is one of the broadest, most exciting areas in CS.
- We focus on Relational DBMS:
- maintain/query structured data