This is a study note of PostgreSQL 修炼之道.

1 Installation and start service

In Mac computer, for consitency, use brew to install PostgreSQL and pgAdmin. The installation also runs the initdb for the postgres database, no need to run it again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# check brew status
brew doctor

# update brew to the latest
brew update

# check the version is the desired
brew info postgresql

# install postgresql
brew install postgresql

# check the version of pgadmin4, use brew cask for gui apps
brew cask info pgadmin4

# install pgadmin 4
brew cask install pgadmin 4

# list all services: brew services
# to start pg when login
brew services start postgresql

# to stop pg: brew services stop postgresql
# to run pg but not in every login: brew services run postgresql
# to run pg when boot: sudo brew services start postgresql

After installation, there are one database postgres and two template databases template0 and template. Template database can be customized to custome cloned databases.

2 PSQL

PSQL can use the following environment variables: PGDATABASE, PGHOST, PGPORT, and PGUSER. psql -E will show actual SQL statements for psql commands.

Common commands:

  • \?: list commands
  • \h [name]: help syntax of a command
  • \q: quit psql
  • \l: list all databases.
  • \c name: connect to a dabased.
  • \d: list of relations.
  • \d[S+] name: describe table, view, sequnce, or index. S show system objects. + show additional details
  • \timing on: display command execution time.
  • \encoding: display or set coding set.
  • pset, \x: set display borders and expanded display.
  • \i filename: execute external file
  • \set ECHO_HIDDEN on | off: switch on/off showing SQL statements for a command.

Psql automatically commit. To change it, use \set AUTOCOMMIT off or begin; commands; rollback;/commit;.

3 Data Types

3.1 Literal Tranform

Simple types don’t need tranform, from example, 1, 3.14, 'heoll world'. For other types,

  • Use type name and single-quoted string. Ex: bit '1101', int '1'.
  • Use CAST. Ex: CAST('5' as int), CAST('2014-07-01' as date).
  • Use ::. Ex: '5'::int, '2014-07-01'::data.

3.2 Boolean Type

Values:

  • True values: TRUE, 'true', 't', 'y', 'tRue', 'yes', '1'
  • False values: FALSE, 'false', 'f', 'n', 'falSE', 'no', '0'

Operatos: AND, OR, NOT. Any value operates with NULL is NULL. Use IS to compare. NULL in IS is UNKNOWN.

3.3 Numeric Types

smallint, int/integer, bigint, numeric/decimal (variable length, accuate, good for money), real, double precision, serial (4 bytes), bigserial (8 bytes).

Floating numbers include three special values: 'Infinity', '-Infinity' and 'NaN'.

serial is auto-incremental type with 4-byte values. bigserial has 8 bytes.

money type is a 8 bytes type whose display depends on lc_monetary.

Opeartors:

  • ^: power
  • |/: square root
  • ||/: cube root
  • |: factorial
  • @: absolute value
  • &, |, #, ~: and, or, xor, not for binary
  • <<, >>: binary left and right shift

3.4 Chars

  • character varying(n)/varchar(n): variable length, maximum 1GB.
  • character/character(n)/char(n): fixed length filled with spaces, the first is 1 char.
  • text: variable length without length limit.

Use text or varchar because char doesn’t perform better in pg.

Functions:

  • ||: concatenation
  • bit_length: binary length
  • char_length/length: number of chars
  • convert: convert to an encoding
  • lower\upper: convert ot lowercase/uppercase
  • overlay: replace substring
  • positon: find the index of a substring
  • substring: extract substring
  • trim([leading|trailing|both]): trim a string

4 Logic Structure

One pg system can have multiple db. A connection connects to a single db.

4.1 Database

When create a db, specify the following args:

  • OWNER: db onwer, default is the current login user.
  • TEMPLATE: template, default is template1.
  • ENCODING: if it is different from template1, then use template0 becaause template0 has no encoding. The common one is UTF8, as defined in template1.
  • TABLESPACE: table space name is used to define locations in the file system.
  • CONNECTION LIMIT: the max number of concurrent connections. The default is -1

4.2 Schema

A database can have multiple schema. A schema works as a namespace to organize db objects such as tables or functions. A db connection can access all schema defined in the db. There is a default schema public when a database is created. It is a good idea to not use the default schema and explicitly specify the db object using schem_name.object_name. The SQL standard uses the username as schema name. In application, we can use a subsystem name as schema name and create a user with the same name as the owner.

If schema is not specified, pg use schema names defined in search_path to search a db object. The default value is "$user", public, it include the current username and public.

Schema privileges include USAGE, CREATE. PUBLIC represents all users.

4.3 Table

TOAST stands for The Oversized-Attribute Storage Technique. It is used to store large variable length value. A value can be compressed and stored in external TOAST table.

Temporary tables can be sessional or transactional.

Table constrains include CHECK, not null, UNIQUE, and REFERENCES.

A table can inherit another table’s definition to extend it. Query parent will show child’s data. Query child doesn’t show parent’s data. A table can inherit multiple tables.

Use inheritance and Trigger/RULE to partition a table: a logic table (parent)can be partitioned into several physical tables (children).

4.4 Trigger and Event Trigger

Trigger can be FOR STATEMENT or FOR ROW, and can be BEFORE or AFTER. Statement triggers alwasy return NULL. Row triggers will use the return value as new value in BEFORE trigger, no operation for NULL. The return value of AFTER row tigger is ignored. Tiggers are executed by the order of its name.

pg defines the following variables:

  • NEW: the new value of type RECORD for INSERT/UPDATE.
  • OLD: the old value of type RECORD for UPDATE/DELETE.
  • TG_NAME/TG_XXX: the trigger name of type nam, XXX is other trigger data such as WHEN, OP, TABLE_NAME etc.

Event trigger is used to support DDL events such as ddl_command_start, ddl_command_end, and sql_drop.

The system view pg_event_trigger has all defined event triggers.

4.5 Indexes

pg support the following indexes:

  • B-tree: best for equal or range search
  • HASH: best for simple equal search
  • GiST: suport index strategy
  • SP-GiST: space based GiST
  • GIN: best for multi-value search, ex: array.

4.6 User and Role

pg uses the role concept to control db access. A role is a set of privileges. A user is a role with an additional LOGIN privilege in pg. Roles are visible glabally in multiple databases of the same database system. There is a super user for each db system.

Some privileges are only created during user creation or ALTER ROLE:

  • superuser
  • db creation
  • login

privilege controlled by GRANT and REVOKE:

  • SCHEMA creation
  • Temp Table
  • db connect
  • creation of db objects such as table, vew and function etc
  • select/update/insert/delete on tables or columns
  • operation on sequence

GRANT is used to grant a user some roles or grant privileges to a role.

4.7 Transaction

pg uses MVCC that alllows parallel write and read. DDL operation is in a transaction. In psql, the AUTOCOMMIT is on. pg supports SAVEPOINT to enable partial transaction.

pg has two real transaction isolation levels: read commited and serialization. In read commited level, two selects in one transaction may read different data.

pg has two types of locks: table lock and row low.

5 pg Architecture

There are two types of pg drivers: libpq-based C interface and pure language drivers such as JDBC that doesn’t use libpq. Inter-computer connections use TCP/IP while intra-computer conntections use domain socket.

When pg starts, it runs Postmager (the same postgres command ) process that forks session service processes Posgtres processes and several hlep processes such as log, journal, archive and vacuum processes.

Each connect is served by a process.

6 Management

pg_ctl is a utility to initialize database system, start/stop/re-start postgresql, check status, dynamic config.

pg use postgresql.conf in the $PGDATA folder. All configurations are in pg_settings view. Detail connection configurations are in pg_hba.conf file.

7 Advanced Topics

7.1 Internals

Every table has system columns:

  • oid: row object id, it is generated globally, for system tables
  • tableoid: table oid (parent or child table)
  • xmin, xmax, cmin, cmax: control visibility in MVCC
  • ctid: row location in a table, maybe change after vacuum

pg uses one or more files to store data. Data is stoed in pages that has a default size of 8KB, max is 32KB. One page can store multiple tuples (rows).

7.2 pg Features

  • Rule system: used to rewrite SQL statements
  • Pattern matching: LIKE, SIMILAR TO, POSIX. substring.
  • Listen and notfiy: communcation channel.
  • Index features: function index, partial index, GiST, and more.
  • Sequence
  • Advisory lock
  • SQL/MED: connect different dbs. There are four db objects: foreign data wrapper (FDW), server, user mapping, foreign table.