Much of this material is a distillation of the PostgreSQL documentation, which you should check out.
SQL Databases are Complicated
There’s a lot of moving parts, but at the end of the day, it’s broken into this pipeline of steps:
- A client application establishes a connection to the server.
- The parser takes a look at the text output of the query sent by the client and transforms it into a query tree.
- A rewrite engine looks at the query tree that was just produced and looks at any rules in the
pg_rules
system table that it needs to apply. It rewrites the query accordingly. Notably, this is how queries to SQL views are transformed. - The planner takes the rewritten query and tries to estimate the cheapest way to query for the data requested by the client. At the end of this process, we get an optimized (hopefully!) plan.
- In the final process, the executor takes the plan from the previous stage, retrieves the data, filters and sorts appropriately, and then hands the data back to be given to the client.
Connection Pooling
The most notable aspect of establishing a connection is that many applications that use a SQL database will implement connection pooling.
But what is connection pooling, and why does it matter?
Let’s imagine a situation without conneciton pooling. If you want to query a database, you’ll open a network connection, it will process your query, and then you’ll close that network connection. If you’ve got a large amount of traffic hitting the SQL database, then the process of opening and closing connections will eventually become a performance problem.
Instead, you’ll want to pool your connections, which will allow you to reuse your open connections for later queries.
Think of it like this: There’s a popular book at the public library, and so the library decides to purchase 20 copies of the book so that nobody has to wait, or at least wait too long, to read the book. Someone will borrow the book, read it, and then return it to the library.
Database connection pooling works similarly. An application will open up, say, 20 connections and have them ready in a pool. If a thread in the application needs to access the database connection, it will borrow that connection, use it, and then return it to the pool.
Note: This is not a feature of the database itself, but rather a pattern that applications will use to improve performance. While applications will pool connections, databases often spawn a new process per connection that’s made.
Parsing
After making a connection, the client will typically want to send over a query to the database to retrieve or modify data.
The database will convert this text input into something it understands by putting it through a lexer and then a parser.
The lexer’s responsibility is to take the text and convert known keywords into tokens, to be used by the parser.
The parser takes the output of the lexer and transforms it into a parse tree. This is where the query’s grammar is checked to ensure that it’s valid. The parser doesn’t care if the query makes any sense (semantics) — it just cares if the query follows the correct structure (syntax). For example, a parser would find a SELECT
query valid that queried by a column that doesn’t exist on that table.
After parsing, we know that the query is syntactically valid. Then the parser takes the parse tree and transforms it into a query tree. It looks similar to a parse tree, but it figures out what tables are being referenced, ensuring that it’s all valid.
That’s it for part 1! When we get to part 2, we’ll show more of the querying process.