Large Text Indexes in Postgres


Note: This article was written for Postgres 9.6 and below. For Postgres 10+, use hash indexes instead.


An index on a sufficiently large text column can take up more space than the table itself. If you only need to check for equality, you can significantly reduce the size of the index.

At first glance, a hash index seem perfect for this. However, you shouldn’t use them since they are not currently WAL-logged. Instead, use an expression index:

CREATE INDEX CONCURRENTLY ON table_name (CAST(md5(column_name) AS uuid));

Cast to a uuid since it’s 16 bytes - the perfect size to store an md5 hash.

Add an extra condition to your queries so the index is used.

SELECT * FROM table_name WHERE column_name = 'some_value'
  AND md5(column_name)::uuid = md5('some_value')::uuid; -- add this

Keep the original equality comparison in the unlikely chance of a hash collision.

Finally, confirm it worked:

EXPLAIN ANALYZE
  SELECT * FROM table_name WHERE column_name = 'some_value'
  AND md5(column_name)::uuid = md5('some_value')::uuid;

This should show the new index being used.

Index Scan using table_name_md5_idx on table_name  (cost=0.58..8.60 rows=1 width=172) (actual time=0.012..0.012 rows=0 loops=1)
  Index Cond: ((md5(column_name)::uuid = '9619030c-750b-4300-95b1-2d365196de91'::uuid)
  Filter: (column_name = 'some_value'::text)
Planning time: 0.062 ms
Execution time: 0.027 ms

If it’s not, run ANALYZE VERBOSE table_name; and try again.

This reduced the size of one of our indexes at Instacart by 7x! slot_machine

Published October 7, 2016 · Tweet


You might also enjoy

Introducing Dexter, the Automatic Indexer for Postgres POPULAR

Securing User Emails in Rails with Lockbox

Introducing pdscan: Scan Your Data Stores for Unencrypted Personal Data


All code examples are public domain.
Use them however you’d like (licensed under CC0).