In many of the situations where VACUUM and related cleanup such as VACUUM FULL or CLUSTER need to happen, such as index bloat and excessive dead tuples taking up disk space, the work involved can be so intensive that administrators decide VACUUM is something to be avoided. This is actually the opposite of what should be concluded. In reality, the answer to most vacuum related problems is to vacuum more often.
There are few known issues that you can run into with vacuum and autovacuum that will inhibit attempts to follow recommended vacuum pratice, or otherwise seems unexpected.
autovacuum is running even though it was turned off
If your table is approaching transaction ID wraparound, autovacuum will process it regardless of whether it's turned on or off globally or for that table. This is one of the many reasons why running with autovacuum off, instead of turning it to run smoothly all the time, is dangerous. If you haven't done that tuning, and just turned it off instead, when autovacuum does start anyway to deal with wraparound it will not be adjusted appropriately to avoid interference with activity going on.
autovacuum is constantly running
While autovacuum stays running all the time, it shouldn't be doing active work all the time. If it is, there are two common issues to investigate.
Each time vacuum runs out of its allocated space for maintenance_work_mem, it needs to start over and make an additional pass over the table Accordingly, if you have set this tunable too low relative to the size needed to vacuum your tables, vacuum can run much less efficiently that it should, and therefore autovacuum will be running much more often just to keep up.
Another autovacuum problem related to having a large number of databases. autovacuum tries to start one worker on each database every autovacuum_naptime seconds(defaulting to one minute, up the maximum set by autovacuum_max|-workers. So if you have 60 databases, a new worker will be started every second with the default parameters. You probably need to increase the nap time value if you have more than dozen of databases in your cluster.
Out of memory errors
If you are too aggressive with setting maintenance_work_mem, you may discouver that autovacuum periodically causes out of memory errors on your system. You can typically identify that situation because the amount of memory listed in the logs for the allocation failure attempt will match that database setting.
Don't forget that up to autovacuum_max_workers can be active, each using that much memory.
Not keeping up on a busy server
The default for autovacuum_vacuum_cost_delay of 20 ms is a medium one appropriate for a smaller system. If you have a busy server, you can easily discover that autovacuum never quite keeps up with incoming traffic. If your server I/O performance is good, then probably you can just reduce the delay setting.
Large servers will commonly run with a cost delay in the 1ms to 5ms range. Also at some point you may need to switch to increasing autovacuum_cost_limit instead to get additional autovacuum work accomplicated. That's the general order in which to tune these two settings.
autovacuum is too disruptive
The opposite case to the previous is also possible: autovacuum can easily consume too much i/o capacity on a server without a lot of it to spare. In theat case, increases of autovacuum_vacuum_cost_delay to as high as 100ms are common. this is preferred way to deal with the situation where vacuum seems to be consuming too many resources.
Long running transactions
One problem that can block all vacuum related actvity on a server is a long running transaction. If you have a query running that needs to see very old information, the server can't cleanup past that point until it completes.
The easiest way to monitor long running transactions is watch for them in pg_stat_activity, see article
Free Space Map exhaustion
One of the major problems in PostgreSQL is that the FSM is stored in shared memory, and is of a fixed size. If your database has a large number of deletions, updates, and or rollbacks in between when a manual or automaticvacuum runs, it's possible for this FSM to run out of room and therefore not be able to track additional free space. This can quickly lead to catastrophic amounts of table and index bloat.
Usually you can get clue from manual autovacuum output. Something like
"INFO: free space map contains <num> pages in <num> relations"
To show the FSM size, run