Recently one of our Redis production servers started a slow decent in available memory. It was highly likely to be a code change from a previously released version but searching through the code would only lead to guessing. I needed to clean out whatever the keys were that were taking up the majority of the space quickly before we concentrated on a more permanent fix.
It's common practice to namespace keys in Redis with a colon (for example, collection:person:1234) so what I really need to do is reduce all the keys down to groups and count the number of keys and the size of each namespace.
A tool that:
- Has to be non-blocking. This Redis server is running in production and has 110 million keys. Blocking commands like KEYS would not possible.
- Should be able to scan a subset of the keys. This is so that we don't need to go through the entire dataset to get an idea of what was going on.
- Grouping down the keys into their namespace is pretty easy (for example, collection:person:1234 would be reduced to collection:person:*), but the number of keys in a namespace doesn't reflect the size of the memory usage (which is what we really care about) so I needed some way to estimate the amount of memory for a namespace within that sample of keys taken.
Numbers 1 and 2 were pretty easy to solve by using SCAN (greatly sped up with a large COUNT option) but number 3 was a bit more tricky because Redis v3.2 doesn't provide a reliable way to say "how much memory would this key (string, sets, etc) be?"
I decided to use DUMP which produces a binary JSON-like string for a key and measure the length of that. This is crude and does not directly relate to memory usage, but it does a good enough job of highlighting keys that contains sets with thousands of elements.
Unfortunately DUMP is very slow (compared to the other scanning operations) and some of these keys are very large (translates to network throughput) so it's not practical (or really needed) to measure the the length of all of the keys in each namespace. Instead I must be able to configure how many items for a given namespace will be tested and that average is applied to the rest of the items.
This worked really well as a namespace that contained thousands of similar sized items only needs say 10 items to be measured to get an idea of the complete namespace.
Ultimately I had to create a CLI tool. The command, redis-usage has a bunch of flexible options:
It even has a pretty a progress bar (thanks https://github.com/cheggaaa/pb!). It looks something like this:
With a larger sample size were able to find the keys that were causing the problem and add the appropriate fix to the application. Yay!