Discussion:
[Sks-devel] disk full, keys.niif.hu crashed
Kiss Gabor (Bitman)
2018-06-15 03:54:19 UTC
Permalink
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
Deleting files and restarting processes did not help:

recon.log:
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_recon, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002-2013
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 recon port: 11370
2018-06-15 05:50:09 Opening PTree database
2018-06-15 05:50:09 Setting up PTree data structure
2018-06-15 05:50:09 PTree setup complete
2018-06-15 05:50:09 Initiating catchup
2018-06-15 05:50:10 DB closed

db.log:
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_db, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002, 2003, 2004
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 http port: 11371
2018-06-15 05:50:09 Membership: (zimmermann.mayfirst.org 11370)[], ... (keys.jpbe.de 11370)[]
2018-06-15 05:50:09 address for zimmermann.mayfirst.org:11370 changed from [] to
[<ADDR_INET [2001:470:1:116::6]:11370>, <ADDR_INET [216.66.15.2]:11370>]
...
2018-06-15 05:50:10 address for keys.jpbe.de:11370 changed from [] to [<ADDR_INET [2001:67c:16c8:32cc::1]:11370>, <ADDR_INET [185.120.22.22]:11370>]
2018-06-15 05:50:10 Opening KeyDB database
2018-06-15 05:50:10 Shutting down database

Unfortunately I cannot work on restoration till Sunday evening.

Gabor
André Keller
2018-06-15 07:40:27 UTC
Permalink
Hi,
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
keys.communityrack.org shares the same fate. Trying to get it online
again...


Regards

André
Moritz Wirth
2018-06-15 12:22:03 UTC
Permalink
FWIW, you can set the DB_LOG_AUTOREMOVE flag for the database - the logs
should be removed automatically

[***@instance-4 ~]# cat /var/lib/sks/KDB/DB_CONFIG
set_flags               DB_LOG_AUTOREMOVE

Best regards,
Post by André Keller
Hi,
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
keys.communityrack.org shares the same fate. Trying to get it online
again...
Regards
André
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Paul M Furley
2018-06-15 12:27:45 UTC
Permalink
Glad I wasn't the only one :) keyserver.paulfurley.com also got
destroyed, rebuilt this morning.

I've been getting a lot of traffic alerts from my host lately (>200MB
per hour), anyone know if there's a reason there's been a lot more
traffic lately?

I haven't yet managed to investigate if it's peering traffic traffic
from the pool.

Kind regards,

Paul
Post by André Keller
Hi,
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
keys.communityrack.org shares the same fate. Trying to get it online
again...
Regards
André
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Michael Jones
2018-06-15 12:45:40 UTC
Permalink
some nodes have the db cleanup, some nodes have loggging;

Graph of disk space

There was definitely an injection of keys, will perform some clean up
ops later.

Kind Regards,
Mike
Post by Paul M Furley
Glad I wasn't the only one :) keyserver.paulfurley.com also got
destroyed, rebuilt this morning.
I've been getting a lot of traffic alerts from my host lately (>200MB
per hour), anyone know if there's a reason there's been a lot more
traffic lately?
I haven't yet managed to investigate if it's peering traffic traffic
from the pool.
Kind regards,
Paul
Post by André Keller
Hi,
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
keys.communityrack.org shares the same fate. Trying to get it online
again...
Regards
André
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
tiker
2018-06-15 12:48:39 UTC
Permalink
My little Raspberry Pi node is still online but its file system is also
filling up.

It's trying to get updated keys from its peers but is constantly failing
with:
2018-06-15 08:39:53 Error getting missing keys:
Invalid_argument("String.create")

All of my peers have a different number of keys (one peer has 77,
another peer has 30, etc.) so I think all of the nodes are having an issue.

Rob D
Post by Paul M Furley
Glad I wasn't the only one :) keyserver.paulfurley.com also got
destroyed, rebuilt this morning.
I've been getting a lot of traffic alerts from my host lately (>200MB
per hour), anyone know if there's a reason there's been a lot more
traffic lately?
I haven't yet managed to investigate if it's peering traffic traffic
from the pool.
Kind regards,
Paul
Post by André Keller
Hi,
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
keys.communityrack.org shares the same fate. Trying to get it online
again...
Regards
André
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Keith Erekson
2018-06-15 15:53:29 UTC
Permalink
This has happened to my keyserver twice in the last two days. I assumed
it was some sort of malicious behavior, because it happened quite
suddenly both times and had the effect of a DoS. ;-)

For example, I have over 1700 binary log files like "log.0000002014",
each 10MB, created in the last 24 hours. (It would have kept going, but
the filesystem filled up.)

The timestamps show that often 30 or 40 of them are created in the same
minute.

~Keith
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_recon, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002-2013
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 recon port: 11370
2018-06-15 05:50:09 Opening PTree database
2018-06-15 05:50:09 Setting up PTree data structure
2018-06-15 05:50:09 PTree setup complete
2018-06-15 05:50:09 Initiating catchup
2018-06-15 05:50:10 DB closed
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_db, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002, 2003, 2004
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 http port: 11371
2018-06-15 05:50:09 Membership: (zimmermann.mayfirst.org 11370)[], ... (keys.jpbe.de 11370)[]
2018-06-15 05:50:09 address for zimmermann.mayfirst.org:11370 changed from [] to
[<ADDR_INET [2001:470:1:116::6]:11370>, <ADDR_INET [216.66.15.2]:11370>]
...
2018-06-15 05:50:10 address for keys.jpbe.de:11370 changed from [] to [<ADDR_INET [2001:67c:16c8:32cc::1]:11370>, <ADDR_INET [185.120.22.22]:11370>]
2018-06-15 05:50:10 Opening KeyDB database
2018-06-15 05:50:10 Shutting down database
Unfortunately I cannot work on restoration till Sunday evening.
Gabor
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
tiker
2018-06-15 16:40:19 UTC
Permalink
The problems seem to be caused by a large key.  There's at least 2
different hash values for this key (so probably recently updated) and
one of the versions of the key is 22mb.  The size is causing timeouts on
some reverse proxies and the constant retries is causing the .log files
to be created and growing in the DB directory.

When viewing the key through the web interface (both hash versions so
far) one of the UID packets turns into a binary blob of garbage on the
screen.  But does seem to end correctly but after the 22mb of junk on
the screen, the sub keys appear to be ok at the end.  This might be the
cause of the error I posted with my previous message.

I've checked a couple SKS servers for this key and so far, they all seem
to have issues with this key.

This key was also appears to have been created yesterday which may
explain your two crashes.

I don't think I want to post the key ID here because it's hard on the
servers grabbing this key but someone should look at it and figure out
what to do with this.  My node only seems to sync with about 10% of its
peers.

Thanks.
Rob D
Post by Keith Erekson
This has happened to my keyserver twice in the last two days. I assumed
it was some sort of malicious behavior, because it happened quite
suddenly both times and had the effect of a DoS. ;-)
For example, I have over 1700 binary log files like "log.0000002014",
each 10MB, created in the last 24 hours. (It would have kept going, but
the filesystem filled up.)
The timestamps show that often 30 or 40 of them are created in the same
minute.
~Keith
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_recon, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002-2013
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 recon port: 11370
2018-06-15 05:50:09 Opening PTree database
2018-06-15 05:50:09 Setting up PTree data structure
2018-06-15 05:50:09 PTree setup complete
2018-06-15 05:50:09 Initiating catchup
2018-06-15 05:50:10 DB closed
2018-06-15 05:50:09 Opening log
2018-06-15 05:50:09 sks_db, SKS version 1.1.6
2018-06-15 05:50:09 Using BerkelyDB version 5.3.28
2018-06-15 05:50:09 Copyright Yaron Minsky 2002, 2003, 2004
2018-06-15 05:50:09 Licensed under GPL. See LICENSE file for details
2018-06-15 05:50:09 http port: 11371
2018-06-15 05:50:09 Membership: (zimmermann.mayfirst.org 11370)[], ... (keys.jpbe.de 11370)[]
2018-06-15 05:50:09 address for zimmermann.mayfirst.org:11370 changed from [] to
[<ADDR_INET [2001:470:1:116::6]:11370>, <ADDR_INET [216.66.15.2]:11370>]
...
2018-06-15 05:50:10 address for keys.jpbe.de:11370 changed from [] to [<ADDR_INET [2001:67c:16c8:32cc::1]:11370>, <ADDR_INET [185.120.22.22]:11370>]
2018-06-15 05:50:10 Opening KeyDB database
2018-06-15 05:50:10 Shutting down database
Unfortunately I cannot work on restoration till Sunday evening.
Gabor
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
tiker
2018-06-15 20:01:46 UTC
Permalink
I don't think so but I could be wrong.  (I'm no expert here.)

Binary attachments (like images) are marked as "uat [contents
ommited]".  In this case, it's a "uid" row that starts the binary data
instead of a text line showing a name.

Here's a (temporary) link to an image of what I see:
Loading Image...

I'll send an email to Kristian F. with the details about this key to
review and comment on.

Thanks.
Rob D
Post by tiker
The problems seem to be caused by a large key.  There's at least 2
different hash values for this key (so probably recently updated) and
one of the versions of the key is 22mb.  The size is causing timeouts on
some reverse proxies and the constant retries is causing the .log files
to be created and growing in the DB directory.
The current advice over at
https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Peering is to set
client_max_body_size to 8 MiB.
Post by tiker
I don't think I want to post the key ID here because it's hard on the
servers grabbing this key but someone should look at it and figure out
what to do with this.  My node only seems to sync with about 10% of its
peers.
Is this something with a binary image attribute? :(
-Phil
tiker
2018-06-15 21:42:13 UTC
Permalink
Well, it turns out that the cause of our issues, the method to re-create
these keys and make things worse is already posted publicly.

Take a look at the recently reported issues on the SKS bitbucket site.

I don't think my SKS node has enough storage space to survive long
enough for this issue to be fixed.  I may have to shut it down.

Rob D
Post by tiker
I don't think so but I could be wrong.  (I'm no expert here.)
Binary attachments (like images) are marked as "uat [contents
ommited]".  In this case, it's a "uid" row that starts the binary data
instead of a text line showing a name.
http://www.funkymonkey.org/tmp/bigkey.jpg
I'll send an email to Kristian F. with the details about this key to
review and comment on.
Thanks.
Rob D
Post by tiker
The problems seem to be caused by a large key.  There's at least 2
different hash values for this key (so probably recently updated) and
one of the versions of the key is 22mb.  The size is causing timeouts on
some reverse proxies and the constant retries is causing the .log files
to be created and growing in the DB directory.
The current advice over at
https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Peering is to set
client_max_body_size to 8 MiB.
Post by tiker
I don't think I want to post the key ID here because it's hard on the
servers grabbing this key but someone should look at it and figure out
what to do with this.  My node only seems to sync with about 10% of its
peers.
Is this something with a binary image attribute? :(
-Phil
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Andrew Gallagher
2018-06-16 15:02:28 UTC
Permalink
Post by tiker
Well, it turns out that the cause of our issues, the method to re-create
these keys and make things worse is already posted publicly.
There are two main ways in which critical internet infrastructure goes
on fire: a government TLA takes it down for nefarious purposes, or some
random gobshite sets it ablaze as an experiment.

The history of the internet shows that it is almost always the latter.

A
James Cloos
2018-06-15 23:49:12 UTC
Permalink
t> Here's a (temporary) link to an image of what I see:
t> http://www.funkymonkey.org/tmp/bigkey.jpg

It is hard to check w/o knowing the key hash, but can iconv(1) decode
that uid into utf8? Perhaps it is in one of the legacy 16bit encodings?

Can you get that uid (just the uid) into a file so that it can be checked?

-JimC
--
James Cloos <***@jhcloos.com> OpenPGP: 0x997A9F17ED7DAEA6
Andrew Gallagher
2018-06-16 14:52:33 UTC
Permalink
Post by James Cloos
It is hard to check w/o knowing the key hash, but can iconv(1) decode
that uid into utf8? Perhaps it is in one of the legacy 16bit encodings?
According to the person responsible, it's just random noise.

A
Paul Furley
2018-06-16 16:32:01 UTC
Permalink
Alternatively, we can view this as a great opportunity to improve the resilience of this critical infrastructure.

This is a serious, serious flaw... I'm grateful to the individual for taking the time to research and highlight this issue. Sure, not ideal that the network is struggling as a result, but at least we'll have to find a way to fix it!

Paul


  Original Message  
From: ***@andrewg.com
Sent: 16 June 2018 4:02 pm
To: sks-***@nongnu.org
Subject: Re: [Sks-devel] disk full, keys.niif.hu crashed
Post by tiker
Well, it turns out that the cause of our issues, the method to re-create
these keys and make things worse is already posted publicly.
There are two main ways in which critical internet infrastructure goes
on fire: a government TLA takes it down for nefarious purposes, or some
random gobshite sets it ablaze as an experiment.

The history of the internet shows that it is almost always the latter.

A
_______________________________________________
Sks-devel mailing list
Sks-***@nongnu.org
https://lists.nongnu.org/mailman/listinfo/sks-devel
Tom at FlowCrypt
2018-06-16 18:34:52 UTC
Permalink
I think there should be a default setting on all installations with a clear
max key size.

8M is a good start, 1M is even better. 1MB well generous enough for a
public key.

As a user, I shouldn't need to do download megabytes of fluff for every
person I want to message.

I propose that we set and enforce max size by default.
Post by Paul Furley
Alternatively, we can view this as a great opportunity to improve the
resilience of this critical infrastructure.
This is a serious, serious flaw... I'm grateful to the individual for
taking the time to research and highlight this issue. Sure, not ideal that
the network is struggling as a result, but at least we'll have to find a
way to fix it!
Paul
Original Message
Sent: 16 June 2018 4:02 pm
Subject: Re: [Sks-devel] disk full, keys.niif.hu crashed
Post by tiker
Well, it turns out that the cause of our issues, the method to re-create
these keys and make things worse is already posted publicly.
There are two main ways in which critical internet infrastructure goes
on fire: a government TLA takes it down for nefarious purposes, or some
random gobshite sets it ablaze as an experiment.
The history of the internet shows that it is almost always the latter.
A
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Tom at FlowCrypt
2018-06-16 18:44:37 UTC
Permalink
I should have added, DB_LOG_AUTOREMOVE should probably be a default, too.

Whatever makes the servers more likely to survive out in the wild.
Post by Tom at FlowCrypt
I think there should be a default setting on all installations with a
clear max key size.
8M is a good start, 1M is even better. 1MB well generous enough for a
public key.
As a user, I shouldn't need to do download megabytes of fluff for every
person I want to message.
I propose that we set and enforce max size by default.
Post by Paul Furley
Alternatively, we can view this as a great opportunity to improve the
resilience of this critical infrastructure.
This is a serious, serious flaw... I'm grateful to the individual for
taking the time to research and highlight this issue. Sure, not ideal that
the network is struggling as a result, but at least we'll have to find a
way to fix it!
Paul
Original Message
Sent: 16 June 2018 4:02 pm
Subject: Re: [Sks-devel] disk full, keys.niif.hu crashed
Post by tiker
Well, it turns out that the cause of our issues, the method to re-create
these keys and make things worse is already posted publicly.
There are two main ways in which critical internet infrastructure goes
on fire: a government TLA takes it down for nefarious purposes, or some
random gobshite sets it ablaze as an experiment.
The history of the internet shows that it is almost always the latter.
A
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Shengjing Zhu
2018-06-18 09:51:46 UTC
Permalink
Post by Tom at FlowCrypt
I should have added, DB_LOG_AUTOREMOVE should probably be a default, too.
One question for DB_LOG_AUTOREMOVE,
How does it compare to run db_archive(I usually run this via crontab once a day)
It seems I didn't survive this time.
--
Regards,
Shengjing Zhu
Andrew Gallagher
2018-06-16 19:30:40 UTC
Permalink
Post by Paul Furley
This is a serious, serious flaw... I'm grateful to the individual for taking the time to research and highlight this issue. Sure, not ideal that the network is struggling as a result, but at least we'll have to find a way to fix it!
I’m not complaining about the research. I’m complaining about testing the research against the live infrastructure with no consideration for the consequences.

Absolutely this is important, and we need to fix it. But it would have been a lot easier to fix before the offending key was released into the wild. A responsible researcher would have tested against an isolated server, and not the live infrastructure.

A
Shengjing Zhu
2018-06-18 09:47:47 UTC
Permalink
Hi,

My server disk is also fulled with logs.
I tried to run db_archive, but the command never returns.
So I deleted all the log.* file, now I can't start the sks.

Is there anything I can do except rebuilding?

Thanks
Shengjing Zhu
Paul M Furley
2018-06-18 09:57:15 UTC
Permalink
I'm not sure if there's a better way, but I rebuilt. If you've forgotten
how and you're on debian, the following gist might help you:

https://gist.github.com/paulfurley/b901428d1702c613531147f7573757fd

Kind regards,

Paul
Post by Shengjing Zhu
Hi,
My server disk is also fulled with logs.
I tried to run db_archive, but the command never returns.
So I deleted all the log.* file, now I can't start the sks.
Is there anything I can do except rebuilding?
Thanks
Shengjing Zhu
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Keith Erekson
2018-06-18 19:50:35 UTC
Permalink
Just a heads up for anyone trying to rebuild from the dump on
keyserver.mattrude.com...

Looks like something went wrong with the export, as today's dump is only
4GB, but the day before is 11GB.

Compare the README.txt files:

http://keyserver.mattrude.com/dump/2018-06-17/README.txt

http://keyserver.mattrude.com/dump/2018-06-18/README.txt

~Keith
Post by Paul M Furley
I'm not sure if there's a better way, but I rebuilt. If you've forgotten
https://gist.github.com/paulfurley/b901428d1702c613531147f7573757fd
Kind regards,
Paul
Post by Shengjing Zhu
Hi,
My server disk is also fulled with logs.
I tried to run db_archive, but the command never returns.
So I deleted all the log.* file, now I can't start the sks.
Is there anything I can do except rebuilding?
Thanks
Shengjing Zhu
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel
Gabor Kiss
2018-06-23 10:15:19 UTC
Permalink
Post by Kiss Gabor (Bitman)
Yesterday at 18:15 (CEST) keys.niif.hu started to produce tons
of logs in /var/lib/sks/DB. In less than 2 hours the 40 GB filesystem
got fulfilled.
Unfortunately I cannot work on restoration till Sunday evening.
I've just found fresh and fast accessible database dump.
After a 5 hour rebuilding process keys.niif.hu is back on the air. :)
My own keydump will be available on Monday as usual.

Gabor

Loading...