Last time we learned how to add comments to our SQL queries. This helps tie logged queries back to code. But to truly achieve this goal have to solve two more problems: the exfiltration of long running queries and the injection of a unique identifier that maps to our workloads’ code paths:
- exfiltration Postgres will not log a query for which the client unexpectedly disconnects. Even if a query runs for a very long time, it may not be logged at all. There may be other situations in which these important queries are lost to the ether, so we must detect and save them before this happens.
Let’s use the cron method described in this StackOverflow answer from dland to solve for this. dland write:
To be honest, logging long-running queries seems via cron does seem like the most KISS solution to me:
psql -tc "select now() as t, pid, usename, query from pg_stat_activity where state != 'idle'" > /tmp/pg.running.txt
Then you can spot the queries that stay there for several hours and run EXPLAIN on them.
This means we will end up creating a separate log file so we should think about how to organize this information so that it’s useful and not overwhelming. To this point, I think long running queries should instead be dumped into the default log file occasionally, so that operators can grep the same log file, except that now it will be more enriched.
We need to decide upon what frequency to invoke the check. I think 1 minute is fine, that’s too long for a database query. If something is running for a long time it will get printed over and over, once every minute.
We do not want to see active queries that are faster than 1 minute, which happen to be active at the time of the check, so we’ll make sure to only log queries whose runtime is exceeding 1 minute.
Finally I would like to avoid using an external program and do this all within postgres, this way we can use the existing logging functions and keep things self-contained.
- injection How and what to inject into the queries for your app. When we learned How to trace postgresql database SQL queries we did not determine how to do it from a conventional web application. Nor did we determine exactly what to inject.
i. How: let’s create a solution compatible with the default ActiveRecord postgres driver used in Ruby on Rails to automatically insert a SQL comment on every query. There is some precedent, e.g. Rails 6 saw the addition of theannotate
method which injects a SQL comment into a query. We are looking for an automated solution, however, since our app has a lot of queries and we do not wish to edit all (any) of them.
ii. What: let’s assume there is some global variable in memory (e.g. set by some other part of the Rails application, such as Rack or Sidekiq.) containing a unique identifier and whether or not this is a web request. It would be great if we could know what line number this query originated from as well, but sticking an entire call stack into a SQL comment seems garish. Let’s consider options around this lastly as a finishing garnish.
1. Solving the exfiltration problem
Nowadays we can chat with ChatGPT to get a jumpstart on unfamiliar problem-spaces. It did not disappoint in producing a solution. Note that this solution uses the internal pg_cron
extension, so I am adding the steps and then we’ll get into installing and using ChatGPT’s solution. Let’s begin:
Create Dockerfile
1 | FROM postgres:15.1-bullseye |
Build it. Note I tried to use postgres:11 but apt was broken there, so we’ll be using postgres 15. Depending on when you’re reading this, you may have the same issue with postgres 15, and maybe bumping the versions will work for you too, future person.
1 | docker build -t pgwcron . |
Create postgresql.conf
1 | listen_addresses = '*' |
Start the server
1 | docker run --rm --name trace-postgres -v $PWD/postgresql.conf:/etc/postgresql/postgresql.conf -e POSTGRES_PASSWORD=pass pgwcron -c 'config_file=/etc/postgresql/postgresql.conf' |
Start a client
1 | docker exec -it $(docker inspect --format="{{.Id}}" trace-postgres) psql -U postgres |
Install the pg_cron
extension by executing the following SQL statement:
1 | CREATE EXTENSION pg_cron; |
And now we can use ChatGPT’s solution. Note that I had to fix some small errors it made, so the solution you see here is tested working:
This function will use the
RAISE NOTICE
statement to output a message for each active query in thepg_stat_activity
view that has been running for more than 1 minute, with the current timestamp, process ID, username, and query text. These messages will be written to the PostgreSQL log file, which is specified in thelog_destination
configuration parameter in thepostgresql.conf
file.
1 | CREATE OR REPLACE FUNCTION log_active_queries() |
To schedule this modified function to run every minute, you can use the
pg_cron
extension as follows:
1 | SELECT cron.schedule('log_active_queries', '* * * * *', $$SELECT log_active_queries();$$); |
To stop a scheduled job that was created using the pg_cron extension, you can use the cron.unschedule function as follows:
1 | SELECT cron.unschedule('log_active_queries'); |
Now let’s create some long running jobs with annotations and see if it gets logged. I will run this a few times simultaneously using multiple terminal shells.
1 | docker exec -it $(docker inspect --format="{{.Id}}" trace-postgres) psql -U postgres -c 'SELECT pg_sleep(120) /* Hello */;' |
Looks like it works! In our logs we can see the long running query along with its tracer comment:
1 | NOTICE: Active queries as of 2022-12-28 01:18:00.057665+00 at pid 114: SELECT pg_sleep(120) /* Hello */; |
We should want to improve the function now to show possibly other columns from pg_stat_activity
which might be useful, or any other changes. But let’s move on from here having proved this approach.
2. Solving the injection problem (from Rails)
Here is a clean monkey patch which annotates SQL queries executed by the PG::Connection class (the underlying connection used by ActiveRecord):
Create sqlannotator.rb in config/initializers
1 | # https://blog.daveallie.com/clean-monkey-patching |
To test it, perform any SQL query, or directly use the connection instance like so:
1 | ActiveRecord::Base.connection.raw_connection.async_exec("SELECT now();") |
This works! You can emulate a lockup situation easily like so, and then check if it’s being logged.
- open up a rails database console shell and explicitly lock your table:
1 | BEGIN; |
- leave that open, and open a rails console and run
1 | User.all |
It should get stuck…
- Now go look at your logs, or use the
psql
shell to check pg_stat_activity:
1 | app=# select pid,query from pg_stat_activity where state = 'active'; |
There’s our annotation! We’re done. Just one more thing and that’s cleaning up this mess.
At this point I wanted to cancel the queries and found a nice answer from Andong Zhan was helpful:
What I did is first check what are the running processes by
1 | SELECT * FROM pg_stat_activity WHERE state = 'active'; |
Find the process you want to kill, then type:
1 | SELECT pg_cancel_backend(<pid of the process>) |
This basically “starts” a request to terminate gracefully, which may be satisfied after some time, though the query comes back immediately.
If the process cannot be killed, try:
1 | SELECT pg_terminate_backend(<pid of the process>) |