Pete Stephenson
2018-06-17 00:18:55 UTC
Hi all,
My server, ams.sks.heypete.com, has been suffering from periods where
the amount of CPU used by the sks process goes to 100% for a few minutes
at a time. During this time, my Apache reverse proxy produces errors of
the following type (client IP address obfuscated for their privacy):
[Sun Jun 17 00:00:31.414596 2018] [proxy:error] [pid 4648:tid
139657505371904] [client CLIENT_IP:40327] AH00898: Error reading from
remote server returned by /pks/lookup
This happens across a range of client IP addresses, so it doesn't appear
to be a single malicious user. Rather, it seems that something is
causing the sks process to stall and connections to it time out.
After a minute or two, CPU usage drops to the normal value of a few
percent up to 15%, with queries being promptly answered until the CPU
usage spikes again and things stall out.
The server is in close sync with its peers, with no particular issues on
the recon side.
Any ideas what might be causing this? I'm running 1.1.6 on Debian, and
things have generally been working well for several years. For good
measure, I recently deleted the key database and recreated it from a
fresh dump, but that had no effect.
Potentially related: several clients, evidently corporate mail servers
that query the SKS pool for every email they send or receive, are making
dozens of queries per second to my server. Is it reasonable to impose
rate limits on such clients (e.g. no more than X queries in Y seconds)?
If so, what would reasonable values be for X and Y?
Thank you.
Cheers!
-Pete
My server, ams.sks.heypete.com, has been suffering from periods where
the amount of CPU used by the sks process goes to 100% for a few minutes
at a time. During this time, my Apache reverse proxy produces errors of
the following type (client IP address obfuscated for their privacy):
[Sun Jun 17 00:00:31.414596 2018] [proxy:error] [pid 4648:tid
139657505371904] [client CLIENT_IP:40327] AH00898: Error reading from
remote server returned by /pks/lookup
This happens across a range of client IP addresses, so it doesn't appear
to be a single malicious user. Rather, it seems that something is
causing the sks process to stall and connections to it time out.
After a minute or two, CPU usage drops to the normal value of a few
percent up to 15%, with queries being promptly answered until the CPU
usage spikes again and things stall out.
The server is in close sync with its peers, with no particular issues on
the recon side.
Any ideas what might be causing this? I'm running 1.1.6 on Debian, and
things have generally been working well for several years. For good
measure, I recently deleted the key database and recreated it from a
fresh dump, but that had no effect.
Potentially related: several clients, evidently corporate mail servers
that query the SKS pool for every email they send or receive, are making
dozens of queries per second to my server. Is it reasonable to impose
rate limits on such clients (e.g. no more than X queries in Y seconds)?
If so, what would reasonable values be for X and Y?
Thank you.
Cheers!
-Pete
--
Pete Stephenson
Pete Stephenson