Database System: Lecture 1-Introduction
Database system 所有笔记在 “HobbitQia的笔记本” 的基础上修改而成
Database Systems
DBMS (Database Management System)

Purpose of Database Systems
Database systems offer solutions to all the below problems.
- data redundancy (数据冗余) and inconsistency
- inconsistency:可能同一份数据以不同形式存在不同文件中,修改时只修改了其中一部分
- data isolation (数据孤立,数据孤岛)
- 数据可能分布在不同文件,以不同格式组织,没法互通
- difficulty in accessing data
- Integrity problems (完整性问题)
- 完整的约束条件被藏在数据和代码中,而不是显式地声明。 e.g. "account balance 1"
- Atomicity problems (原子性问题)
- Failures may leave database in an inconsistent state with partial updates carried out. e.g. 从 A 账户转账到 B, 我们必须保证 A 转出 B 转入这两件事同时进行,不能被打断。
Clearly, it is essential to database consistency that either both the credit and debit occur, or that neither occur. That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in a conventional file-processing system.
- Concurrent access anomalies (并发访问异常)
- Uncontrolled concurrent accesses can lead to inconsistencies
- Security problems
- Authentication (认证), Privilege (权限), Audit (审计)
Characteristics of Databases
- data persistence (数据持久性)
- convenience in accessing data (数据访问便利性)
- data integrity(数据完整性)
- concurrency control for multiple user (多用户并发控制)
- failure recovery (故障恢复)
- security control (安全控制)
View of Data
A database system is a collection of interrelated data and a set of programs that allowusers to access and modify these data. A major purpose of a database system is toprovide users with an abstract view of the data. That is, the system hides certain detailsof how the data are stored and maintained.
Data Models
Data models is a collection of tools for describing data, data relationships, data semantics, data constraints.
- Relational model (关系模型) (表格)数据库系统层面
use a collection of tables to represent both data and the relationships among those data
record-based

- Entity-Relationship (实体-联系) data model 需求分析层面
- Object-based data models
- Object-oriented (面向对象数据模型)
- Object-relational (对象-关系模型模型)
- Semistructured data model (半结构化数据模型)
JSON and Extensible Markup Language (XML) are widely used semi-structured data representations.
- Other older models:
- Network model (网状模型)
- Hierarchical model(层次模型)
Data Abstraction

- physical level: 物理中如何保存、实现(如磁盘)
- logical level: 数据库中存储什么数据?数据之间的关系?
- view level: 不同人看到不同的他们所需要的数据
Advantages:
- Hide the complexities
- Enhance the adaptation to changes 硬件环境变化 (physical level), 可以通过调整逻辑关系和映射来适应新的硬件环境。 逻辑环境变化 (logic level), 可以通过 view 和 logic 的映射使得 view 尽量少变化。
Instance and Schema
The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema.
- Schema (模式)– the logical structure of the database (physical/logical)
- Instance (实例) – the actual content of the database at a particular point in time
Data Independence
- Physical Data Independence (物理数据独立性) – the ability to modify the physical schema without changing the logical schema
- Logical Data Independence (逻辑数据独立性) - the ability to modify the logical schema without changing the user view schema
Database Languages
A database system provides a data-definition language (DDL) to specify the databaseschema and a data-manipulation language (DML) to express database queries and updates.
Data Definition Language (DDL, 数据定义语言)
The DDL is also used to specify additional properties of the data.
C 语言里的 struct 经过编译后全部变为了代码(有一张符号表)但数据库里不会。
通过 DDL 中的 data storage and definition language 来定义数据库的存储结构和访问方式,通常不对用户显示。
Database 会支持一些 integrity constrains(“can be test with mammal overhead”):
- Domain Constrains:约束数据类型
- Referential Integrity
There are cases where we wish to ensure that a value that appears in one relation for a given set of attributes also appears in a certain set of attributes in another relation (referential integrity).
- Authorization:不同用户的各种权限
The output of the DDL is placed in the data dictionary, which contains metadata (元数据)—that is, data about data.
Data dictionary 可以被认为是一个特殊的表(generated by DDL complier),只能被数据库本身访问和修改。数据库系统在读入和修改数据前先向 data dictionary 咨询。
The SQL Data-Definition Language
For instance, the following SQL DDL statement defines the department table:
Data Manipulation Language (DML, 数据操作语言)
A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model.
四种访问操作:检索(retrieval),插入(insertion),删除(deletion),modification(修改)
两种 data-manipulation language:
- Procedural DMLs require a user to specify what data are needed and how to get those data. e.g. C
- Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are needed without specifying how to get those data. e.g. SQL
A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language.
query language 可以基本等同于 data-manipulation language
Database Access from Application Programs
Application programs are programs that are used to interact with the database in this fashion.
数据库必须由过程式语言编写。
Application programs generally access databases through one of
- Language extensions to allow embedded SQL e.g. 通过预处理器,将 select 语句识别出来,翻译成 C 语言的函数调用。
- API (Application program interface) e.g. ODBC/JDBC which allow SQL queries to be sent to a database.
… application-program interface (set of procedures) … can be used to send DML and DDL statements to the database and retrieve the results.
Database Design
- Entity Relationship Model (实体-联系模型) 一对一/一对多/多对一/多对多

- Normalization Theory (规范化理论) Formalize what designs are bad, and test for them
Data Engine
A database system is partitioned into modules that deal with each of the responsibilities of the overall system.
Storage Manager
The storage manager is the component of a database system that provides the interfacebetween the low-level data stored in the database and the application programs andqueries submitted to the system.
为了数据持久化,放在硬盘里,但数据处理要进入内存,这之间存在数据鸿沟。
按块访问内存 (4k/16k), 缓冲管理。
The storage manager components include:
- File manager
- Buffer manager
- Authorization and integrity manager
- Transaction manager
… ensures that the database remains in a consistent (correct) state despite system failures, and that concurrent transaction executions proceed without conflicts.
数据库里放的是
- data files 真正的数据
- data dictionary 放的是 metadata
- statistical data 用于数据库的查询处理
- indices
Query Processor
The query processor components include:
- DDL interpreter
interprets DDL statements and records the definitions in the data dictionary.
- DML compiler
- translates DML statements in a query language into an evaluation plan consisting of low-level instructions that the query evaluation engine understands.
- The DML compiler performs query optimization (查询优化); that is, it picks the lowest cost evaluation plan from among the various alternatives. 执行计划会根据统计数据的改变而改变。
- Query evaluation engine
executes low-level instructions generated by the DML compiler.
主要干三件事:Parsing and translation,Optimization,Evaluation

Transaction Management
atomicity:all-or-none
consistency:correctness
durability:despite the possibility of system failure
A transaction is a collection of operations that performs a single logical functionin a database application.
e.g. 银行转账,A 转账到 B, A 余额减掉 B 余额加上。
要有隔离性,延迟写回
- Recover Manager ensures that the database remains in a consistent (correct) state despite system failures (e.g. power failures and operating system crashes) and transaction failures. 日志,防止写回数据库时出现断电或者崩溃。
- Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.
一个完整的 database system structure

Database Users

History of Database Systems
- 1973 Turing Award: Charles W. Bachman — father of databases
- 1981 Turing Award: Edgar F. Codd
- 1998 Turing Award: Jim Gray
- 2014 Turing Award: Michael Stonebraker
Loading...