Do your tables start out fast and performant but over time seem to become slow and unusable? If so, you might be facing the same issue that I ran into called bloat.
What is bloat?
Bloat is a side effect of PostgreSQL’s MVCC data model which allows running multiple simultaneous transactions on a single data set. This post won’t go into detail on how MVCC works. The important piece to understand is that as a result of MVCC and maintaining multiple “views” of the same data, eventually some pieces of these views fall out of scope and can no longer be seen by any statement. These are called “dead tuples” and with a significant amount of writes to your tables can come to comprise a significant amount of your table. This is what we mean when we refer to bloat. This can contribute to an increased number of pages that must be searched through and can slow down your queries.
How do I find out if my tables are bloated?
Luckily for us, there are some great scripts out there that will help us to identify bloat. These bloat scripts from pgexperts really do a great job of helping you to identify which tables or indexes are bloated.
How do I fix it?
The simple fix for this problem is to regularly VACUUM
your tables. Because
this is a very common operation, postgres provides an autovacuum daemon to
regularly perform this action. However, there are cases where you will want to
change some of the autovacuum defaults on a per table basis. The reason for
doing this is beacause as table size grows, you may find you need to vacuum it
more frequently so that vacuuming does not use too many resources when it does
run and so that you are regularly able to reuse disk space taken up by deleted
rows.
The main options you might be interested in are:
-
autovacuum_vacuum_scale_factor
At the time of this writing, by default, the
autovacuum_vacuum_scale_factor
is set to 0.2. This corresponds to 20% of table size. However, as table size grows drastically, you might find that autovacuum is not happening frequently enough for your needs. On some of our larger tables we have set this to0.01
so that they get cleaned up very frequently. -
autovacuum_vacuum_threshold
Explicitly setting
autovacuum_vacuum_threshold
can be advisable especially on tables where rows are constantly being added and deleted. In this case, we might want to get more aggressive with autovacuum. On some of our tables we want to vacuum every 1000 inserts, updates, or deletes.ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.0); ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 1000);
In addition to autovacuuming your tables, you will probably also want the
autovacuum daemon to run ANALYZE
as well so that the query planner can update
its knowledge of the tables and how best to execute a query. These common
settings are similar to the ones above:
autovacuum_analyze_threshold
autovacuum_analyze_scale_factor
For a full list of all the options you can configure, head on over to PostgreSQL’s section on Automatic Vacuuming