Stacks on Stacks

Tuning Postgres autovacuum for fun and profit

Do your tables start out fast and performant but over time seem to become slow and unusable? If so, you might be facing the same issue that I ran into called bloat.

What is bloat?

Bloat is a side effect of PostgreSQL’s MVCC data model which allows running multiple simultaneous transactions on a single data set. This post won’t go into detail on how MVCC works. The important piece to understand is that as a result of MVCC and maintaining multiple “views” of the same data, eventually some pieces of these views fall out of scope and can no longer be seen by any statement. These are called “dead tuples” and with a significant amount of writes to your tables can come to comprise a significant amount of your table. This is what we mean when we refer to bloat. This can contribute to an increased number of pages that must be searched through and can slow down your queries.

How do I find out if my tables are bloated?

Luckily for us, there are some great scripts out there that will help us to identify bloat. These bloat scripts from pgexperts really do a great job of helping you to identify which tables or indexes are bloated.

How do I fix it?

The simple fix for this problem is to regularly VACUUM your tables. Because this is a very common operation, postgres provides an autovacuum daemon to regularly perform this action. However, there are cases where you will want to change some of the autovacuum defaults on a per table basis. The reason for doing this is beacause as table size grows, you may find you need to vacuum it more frequently so that vacuuming does not use too many resources when it does run and so that you are regularly able to reuse disk space taken up by deleted rows.

The main options you might be interested in are:

  • autovacuum_vacuum_scale_factor

    At the time of this writing, by default, the autovacuum_vacuum_scale_factor is set to 0.2. This corresponds to 20% of table size. However, as table size grows drastically, you might find that autovacuum is not happening frequently enough for your needs. On some of our larger tables we have set this to 0.01 so that they get cleaned up very frequently.

  • autovacuum_vacuum_threshold

    Explicitly setting autovacuum_vacuum_threshold can be advisable especially on tables where rows are constantly being added and deleted. In this case, we might want to get more aggressive with autovacuum. On some of our tables we want to vacuum every 1000 inserts, updates, or deletes.

    ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.0);
    ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 1000);
    

In addition to autovacuuming your tables, you will probably also want the autovacuum daemon to run ANALYZE as well so that the query planner can update its knowledge of the tables and how best to execute a query. These common settings are similar to the ones above:

  • autovacuum_analyze_threshold
  • autovacuum_analyze_scale_factor

For a full list of all the options you can configure, head on over to PostgreSQL’s section on Automatic Vacuuming