Recently one of our Redis production servers started a slow decent in available memory. It was highly likely to be a code change from a previously released version but searching through the code would only lead to guessing. I needed to clean out whatever the keys were that were taking up the majority of the space quickly before we concentrated on a more permanent fix.

It's common practice to namespace keys in Redis with a colon (for example, collection:person:1234) so what I really need to do is reduce all the keys down to groups and count the number of keys and the size of each namespace.

A tool that:

  1. Has to be non-blocking. This Redis server is running in production and has 110 million keys. Blocking commands like KEYS would not possible.
  2. Should be able to scan a subset of the keys. This is so that we don't need to go through the entire dataset to get an idea of what was going on.
  3. Grouping down the keys into their namespace is pretty easy (for example, collection:person:1234 would be reduced to collection:person:*), but the number of keys in a namespace doesn't reflect the size of the memory usage (which is what we really care about) so I needed some way to estimate the amount of memory for a namespace within that sample of keys taken.

Numbers 1 and 2 were pretty easy to solve by using SCAN (greatly sped up with a large COUNT option) but number 3 was a bit more tricky because Redis v3.2 doesn't provide a reliable way to say "how much memory would this key (string, sets, etc) be?"

I decided to use DUMP which produces a binary JSON-like string for a key and measure the length of that. This is crude and does not directly relate to memory usage, but it does a good enough job of highlighting keys that contains sets with thousands of elements.

Unfortunately DUMP is very slow (compared to the other scanning operations) and some of these keys are very large (translates to network throughput) so it's not practical (or really needed) to measure the the length of all of the keys in each namespace. Instead I must be able to configure how many items for a given namespace will be tested and that average is applied to the rest of the items.

This worked really well as a namespace that contained thousands of similar sized items only needs say 10 items to be measured to get an idea of the complete namespace.

Ultimately I had to create a CLI tool. The command, redis-usage has a bunch of flexible options:

Usage of redis-usage:
-count int
SCAN COUNT option. (default 10)
-db int
Redis server database.
-dump-limit int
Use DUMP to get key sizes (much slower). If this is zero then DUMP will not be used, otherwise it will take N sizes for each prefix to calculate an average bytes for that key prefix. If you want to measure the sizes for all keys set this to a very large number.
-host string
Redis server host. (default "localhost")
-limit int
Limit the number of keys scanned.
-match string
SCAN MATCH option.
-port int
Redis server port number. (default 6379)
-prefixes string
You may specify custom prefixes (comma-separated).
-separator string
Seperator for grouping. (default ":")
-sleep int
Number of milliseconds to wait between reading keys.
-timeout int
Milliseconds for timeout (default 3000)
-top int
Only show the top number of prefixes.

It even has a pretty a progress bar (thanks https://github.com/cheggaaa/pb!). It looks something like this:

$ ./redis-usage -limit 1000 -dump-limit 3
1002 / 1002 [=====================================================] 100.00% 15s

orderkey:* -> 930 keys, ~14.5 KB estimated size
live___KountaCacheDependency_mysql:* -> 68 keys, ~3.79 KB estimated size
sqsworker:job:* -> 4 keys, ~52 bytes estimated size
With a larger sample size were able to find the keys that were causing the problem and add the appropriate fix to the application. Yay!