[Sks-devel] SKS apocalypse mitigation

Discussion:

Andrew Gallagher

2018-03-23 11:10:49 UTC

Hi, all.

I fear I am reheating an old argument here, but news this week caught my
attention:

https://www.theguardian.com/technology/2018/mar/20/child-abuse-imagery-bitcoin-blockchain-illegal-content

tl;dr: Somebody has uploaded child porn to Bitcoin. That opens the
possibility that *anyone* using Bitcoin could be prosecuted for
possession. Whether this will actually happen or not is unclear, but
similar abuse of SKS is an apocalyptic possibility that has been
discussed before on this list.

I've read Minsky's paper. The reconciliation process is simply a way of
comparing two sets without having to transmit the full contents of each
set. The process is optimised to be highly efficient when the difference
between the sets is small, and gets less efficient as the sets diverge.

Updating the sets on each side is outside the scope of the recon
algorithm, and in SKS it proceeds by a sequence of client pull requests
to the remote server. This is important, because it opens a way to
implement object blacklists in a minimally-disruptive manner.

An SKS server can unilaterally decide not to request any object it likes
from its peers. In combination with a local database cleaner that
deletes existing objects, and a submission filter that prevents them
from being reuploaded, it is entirely technically possible to blacklist
objects from a given system.

The problems start when differences in the blacklists between peers
cause their sets to diverge artificially. The normal reconciliation
process will never resolve these differences and a small amount of extra
work will be expended during each reconciliation. This is not fatal in
itself, as SKS imposes a difference limit beyond which peers will simply
stop reconciling, so the increase in load should be contained.

The trick is to ensure that all the servers in the pool agree (to a
reasonable level) on the blacklist. This could be as simple as a file
hosted at a well known URL that each pool server downloads on a
schedule. The problem then becomes a procedural one - who hosts this,
who decides what goes in it, and what are the criteria?

It has been argued that the current technical inability of SKS operators
to blacklist objects could be used as a legal defence. I'm not convinced
this is tenable even now, and legal trends indicate that it is going to
become less and less tenable as time goes on.

Another effective method that does not require an ongoing management
process would be to blacklist all image IDs - this would also have many
other benefits (I say this as someone who once foolishly added an
enormous image to his key). This would cause a cliff edge in the number
of objects and, unless carefully choreographed, could result in a mass
failure of recon.

One way to prevent this would be to add the blacklist of images in the
code itself during a version bump, but only enable the filter at some
timestamp well in the future - then a few days before the deadline,
increase the version criterion for the pool. That way, all pool members
will move in lockstep and recon interruptions should be temporary and
limited to clock skew.

These two methods are complementary and can be implemented either
together or separately. I think we need to start planning now, before
events take over.

--
Andrew Gallagher

Yaron Minsky

2018-03-23 12:27:55 UTC

Permalink

FWIW, while I'm effectively no longer involved in SKS development, I
do agree that this is a problem with the underlying design, and
Andrew's suggestions all sound sensible to me.

Post by Andrew Gallagher
Hi, all.
I fear I am reheating an old argument here, but news this week caught my
https://www.theguardian.com/technology/2018/mar/20/child-abuse-imagery-bitcoin-blockchain-illegal-content
tl;dr: Somebody has uploaded child porn to Bitcoin. That opens the
possibility that *anyone* using Bitcoin could be prosecuted for
possession. Whether this will actually happen or not is unclear, but
similar abuse of SKS is an apocalyptic possibility that has been
discussed before on this list.
I've read Minsky's paper. The reconciliation process is simply a way of
comparing two sets without having to transmit the full contents of each
set. The process is optimised to be highly efficient when the difference
between the sets is small, and gets less efficient as the sets diverge.
Updating the sets on each side is outside the scope of the recon
algorithm, and in SKS it proceeds by a sequence of client pull requests
to the remote server. This is important, because it opens a way to
implement object blacklists in a minimally-disruptive manner.
An SKS server can unilaterally decide not to request any object it likes
from its peers. In combination with a local database cleaner that
deletes existing objects, and a submission filter that prevents them
from being reuploaded, it is entirely technically possible to blacklist
objects from a given system.
The problems start when differences in the blacklists between peers
cause their sets to diverge artificially. The normal reconciliation
process will never resolve these differences and a small amount of extra
work will be expended during each reconciliation. This is not fatal in
itself, as SKS imposes a difference limit beyond which peers will simply
stop reconciling, so the increase in load should be contained.
The trick is to ensure that all the servers in the pool agree (to a
reasonable level) on the blacklist. This could be as simple as a file
hosted at a well known URL that each pool server downloads on a
schedule. The problem then becomes a procedural one - who hosts this,
who decides what goes in it, and what are the criteria?
It has been argued that the current technical inability of SKS operators
to blacklist objects could be used as a legal defence. I'm not convinced
this is tenable even now, and legal trends indicate that it is going to
become less and less tenable as time goes on.
Another effective method that does not require an ongoing management
process would be to blacklist all image IDs - this would also have many
other benefits (I say this as someone who once foolishly added an
enormous image to his key). This would cause a cliff edge in the number
of objects and, unless carefully choreographed, could result in a mass
failure of recon.
One way to prevent this would be to add the blacklist of images in the
code itself during a version bump, but only enable the filter at some
timestamp well in the future - then a few days before the deadline,
increase the version criterion for the pool. That way, all pool members
will move in lockstep and recon interruptions should be temporary and
limited to clock skew.
These two methods are complementary and can be implemented either
together or separately. I think we need to start planning now, before
events take over.
--
Andrew Gallagher
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel

Andrew Gallagher

2018-03-23 13:05:06 UTC

Permalink

Post by Andrew Gallagher
Another effective method that does not require an ongoing management
process would be to blacklist all image IDs

It occurs to me that this would be more wasteful of bandwidth than
blocking objects by their hash, as the server would have to request the
object contents before deciding whether to keep it or not. This is
assuming that recon is calculated on pure hashes with no type hints (I'm
99% sure this is the case, correct me if I'm wrong).

We could minimise this by maintaining a local cache of the hashes of
already-seen image objects. This would be consulted during recon and
submission in the same way as an externally-sourced blacklist.

--
Andrew Gallagher

Daniel Kahn Gillmor

2018-03-23 14:02:45 UTC

Permalink

Post by Andrew Gallagher
Updating the sets on each side is outside the scope of the recon
algorithm, and in SKS it proceeds by a sequence of client pull requests
to the remote server. This is important, because it opens a way to
implement object blacklists in a minimally-disruptive manner.

as both an sks server operator, and as a user of the pool, i do not want
sks server operators to be in the position of managing a blacklist of
specific data.

Post by Andrew Gallagher
The trick is to ensure that all the servers in the pool agree (to a
reasonable level) on the blacklist. This could be as simple as a file
hosted at a well known URL that each pool server downloads on a
schedule. The problem then becomes a procedural one - who hosts this,
who decides what goes in it, and what are the criteria?

This is a really sticky question, and i don't believe we have a global
consensus on how this should be done. I don't think this approach is
feasible.

Post by Andrew Gallagher
Another effective method that does not require an ongoing management
process would be to blacklist all image IDs - this would also have many
other benefits (I say this as someone who once foolishly added an
enormous image to his key). This would cause a cliff edge in the number
of objects and, unless carefully choreographed, could result in a mass
failure of recon.
One way to prevent this would be to add the blacklist of images in the
code itself during a version bump, but only enable the filter at some
timestamp well in the future - then a few days before the deadline,
increase the version criterion for the pool. That way, all pool members
will move in lockstep and recon interruptions should be temporary and
limited to clock skew.

I have no problems with blacklisting User Attribute packets from sks,
and i like Andrew's suggestion of an implementation roll-out, followed
by a "switch on" date for the filter. I support this proposal.

I've had no luck getting new filters added to sks in the past [0], so
i'd appreciate if someone who *does* have the skills/time/commit access
could propose a patch for this. I'd be happy to test it.

--dkg

[0] see for example https://bitbucket.org/skskeyserver/sks-keyserver/pull-request/20/trim-local-certifications-from-any-handled

Kristian Fiskerstrand

2018-03-24 18:01:40 UTC

Permalink

[I previously responded to a specific message not related to this thread
but none the less... ]

Post by Daniel Kahn Gillmor

as both an sks server operator, and as a user of the pool, i do not want
sks server operators to be in the position of managing a blacklist of
specific data.

I would definitely agree with this

Post by Daniel Kahn Gillmor

This is a really sticky question, and i don't believe we have a global
consensus on how this should be done. I don't think this approach is
feasible.

I agree with this as well, UAT generally have very limited value, so if
we introduce a filter to skip all UATs I'm all fine with making that a
requirement across severs in sks-keyservers.net pools. That isn't
something that restricts servers overall, but anyhow...

Post by Daniel Kahn Gillmor
I've had no luck getting new filters added to sks in the past [0], so
i'd appreciate if someone who *does* have the skills/time/commit access
could propose a patch for this. I'd be happy to test it.

and here comes at least one of the issues, we're talking about a filter
that responds to a specific alteration; mainly we need to specify a
specific filter for a specific version and move from there, which can be
relatively easy given sufficient time.

Post by Daniel Kahn Gillmor
--dkg
[0] see for example https://bitbucket.org/skskeyserver/sks-keyserver/pull-request/20/trim-local-certifications-from-any-handled
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel

--
----------------------------
Kristian Fiskerstrand
Blog: https://blog.sumptuouscapital.com
Twitter: @krifisk
----------------------------
Public OpenPGP keyblock at hkp://pool.sks-keyservers.net
fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3
----------------------------
"There is no urge so great as for one man to edit another man's work."
(Mark Twain)

Andrew Gallagher

2018-03-25 11:39:37 UTC

Permalink

Disappearance of
public keyservers would be a major inconvenience, but not a disaster.

Considering that keyservers are currently the only resilient way to distribute key revocations, I’m not sure I would be so sanguine. If I’m hosting my key exclusively on WKD or some other web based service, it would be easy to prevent key revocations from being distributed. Granted, revocation is imperfect at the best of times. But SKS is the best tool we have at the moment, and the ecosystem would be severely damaged without it.

A

brent s.

2018-03-25 18:12:27 UTC

Permalink

Disappearance of
public keyservers would be a major inconvenience, but not a disaster.

Considering that keyservers are currently the only resilient way to distribute key revocations, Iâm not sure I would be so sanguine. If Iâm hosting my key exclusively on WKD or some other web based service, it would be easy to prevent key revocations from being distributed. Granted, revocation is imperfect at the best of times. But SKS is the best tool we have at the moment, and the ecosystem would be severely damaged without it.
A

I strongly and vehemently agree with both sides.

On a more serious note (albeit somewhat off-topic), and admittedly much
less deplorable a consideration - has the topic of copyrighted material
being distributed in keys (notably in the image data) come up at any point?

I suggest the same mechanism used in this approach should also be
applicable to those instances as well. Under DMCA in the US, keyserver
operators would be liable for this data (as we would be "distributing"
it) and responsible for its removal for compliance. I presume many other
countries have similar copyright laws/stipulations as well.

(Ironically, many if not all of agents for intellectual property
reclamation have PGP keys themselves on our servers, as one of the
stipulations for a DMCA's validity per Â§ 512(c)(3)(A) (found here[0]) is
"A[n] ... electronic signature of a person authorized to act on behalf
of the owner of an exclusive right that is allegedly infringed.")

[0] https://www.law.cornell.edu/uscode/text/17/512

--
brent saner
https://square-r00t.net/
GPG info: https://square-r00t.net/gpg-info

Michael Jones

2018-03-25 23:39:50 UTC

Permalink

What if the approach was to either have a web of trust to whitelist users
able to upload images, or even more stringent strip all image data.

Is image data essential to operating?

I hardly ever look at the images, and these images could be shared via
other means.

The keyservers would continue to operate with keys and revocation keys but
no image data?

From memory the image can be removed from any key locally, so there is no
reason that on submission it could not be removed.

Doesn't solve all the issues, but would prevent malicious use of our
servers in a direct manor.

Post by Andrew Gallagher

Disappearance of
public keyservers would be a major inconvenience, but not a disaster.

Considering that keyservers are currently the only resilient way to

distribute key revocations, Iâm not sure I would be so sanguine. If Iâm
hosting my key exclusively on WKD or some other web based service, it would
be easy to prevent key revocations from being distributed. Granted,
revocation is imperfect at the best of times. But SKS is the best tool we
have at the moment, and the ecosystem would be severely damaged without it.

Post by Andrew Gallagher
A

Hendrik Visage

2018-03-26 05:58:11 UTC

Permalink

What if the approach was to either have a web of trust to whitelist users able to upload images, or even more stringent strip all image data.
Is image data essential to operating?

Iâd make the case, that it might actually be not far in the future, that weâll *need* to remove it to keep the database size(s) intact, and
looking at the current size, Iâd argue weâd want to remove the images already.

I hardly ever look at the images, and these images could be shared via other means.

Exactly, sent an email, look at an URL with the signed pictureâŠ
---
Hendrik Visage

Andrew Gallagher

2018-05-03 18:22:06 UTC

Permalink

Recent discussion has brought me back to thinking about Phil's
suggestion again.

Treat items in Filtered as part of "what we have" for reconciliation to
find the set difference. That way you never request them. Return HTTP
"410 Gone" for attempts to retrieve things which are marked Filtered.
That way clients don't try to authenticate and you just say "that might
have once existed, but no longer does". Include a custom HTTP header
saying "SKS-Filtered: something".

I don't think we need the custom header - 410 might be sufficient.

Then it's a policy change to not accept UATs and to mark them as things
to be filtered out instead, and a clean-up tool to walk the existing DBs
and decide what should be in Filtered. There will be down-time of some
extent, since SKS doesn't like sharing the DBs

Policy will have to be applied in multiple places. If the local
administrator changes a policy, then we have to walk the database as
above. If we receive a packet (either during catchup or via a
submission) that matches an existing policy, then we add the hash to the
blacklist (with an explanation) and throw away the packet. We also have
to be able to add and delete blacklist entries independently of general
policy.

It would be best if a running SKS was able to dynamically update its
blacklist and policy without having to shut down for maintenance. This
could be as simple as a config file that is reloaded on a schedule.

An SKS version which understands "SKS-Filtered:" headers will add an
entry to its own Filtered DB but _not_ delete stuff already in other
DBs. It should record "I've seen that some peers are unwilling to
provide this to me, I can mark it as unavailable and include it in the
set of things I won't request in future".

We need to distinguish between "things that we have blacklisted"
(authoritative) and "things that our peers have blacklisted" (cache).

The things that we have blacklisted locally (and presumably deleted) are
treated as present for recon, and "410 Gone" for requests.

The things that our peers have blacklisted (and previously returned 410
Gone for) are treated as present for recon *with that specific peer
only*, but otherwise not treated specially. If we don't have it and have
not locally blacklisted it, we should still request it from other peers
that are willing to serve it. If it violates our own policy then we
blacklist it locally. But we can't take our peer's word for that.

So the reconciliation process against "some-peer.net" operates against
the list of unique hashes from the set:

(SELECT hash FROM local_db) JOIN (SELECT hash FROM local_bl) JOIN
(SELECT hash FROM peer_bl_cache WHERE peer="some-peer.net")

(If we are in sync with "some-peer.net" then they will have generated
the same set, but with the local_bl and peer_bl_cache roles reversed)

But we only return 410 for incoming requests IFF they match:

(SELECT hash FROM local_bl)

If we receive 410 during catchup, then we add a new entry to the
peer_bl_cache: {hash: xxxxx, peer: "some-peer.net"}. All this should do
is ensure that recon against that particular peer stays in sync - it
should not affect the operation of recon with any other peer, nor of
incoming requests.

Since we are keeping a cache of peer blacklists, we have to allow for
cache invalidation. A remote peer might accidentally add a hash to its
blacklist, only to remove it later. We need to walk the peer_bl_cache at
a low rate in the background and HEAD each item just to make sure that
it still returns 410 - otherwise we clear that peer_bl_cache entry and
let it get picked up (if necessary) in the next recon.

I believe the above system should allow for recon to be maintained
separately between peer pairs whose blacklists differ, and for one
server to recon with multiple peers that all have differing blacklists.

---

The first, easier, issue with the above is bootstrapping.

Populating a new SKS server requires a dump of keys to be loaded. This
dump is assumed to be a close approximation to the full set of keys in
the distributed dataset. But with per-node policy restrictions, there is
no such thing as a "full set".

A new server populated by a dump from server A may not even be able to
recon with server A initially, because A's local_bl could be larger than
the maximum difference that the recon algorithm can handle. If A
included a copy of its local_bl with the dump, then the new server can
recon with A immediately. But only with A, because every server's
local_bl will be different.

This problem will extend to any two peers attempting to recon for the
first time. Without a local cache of each other's blacklists, the
difference between the datasets could easily be large enough to
overwhelm the algorithm.

There must therefore be a means of preseeding the local_bl_cache before
first recon with a new peer. This could be done by fetching a recent
blacklist dump from a standard location.

---

The second, harder, issue with the above is eventual consistency.

We assume that every peer will eventually see every packet at some
point. But it is entirely possible that all of my peers will put in
place policies against (say) photo-ids, and therefore I may never see a
photo-id that was not directly submitted to me - even if I have no such
policy myself. I am effectively firewalled behind my peers' policies.

Which then leads to pool consistency issues. If some peers are trapped
behind a policy firewall, not only will they have missing entries, they
may not ever *know* that they have missing entries. And this can break
in both directions simultaneously, as these peers may also contain extra
entries that the rest of the network will never see.

Without policies, indirect connectivity is sufficient for eventual
consistency. This leads to high latencies but is robust up to the point
of complete severance. But we can see that any policy that impedes the
flow of information across the network will potentially break eventual
consistency.

The only general solution is to alter the peering topology. We need to
get rid of membership restrictions for the pool. Any pool member should
be able to recon with any other pool member, ensuring that all members
see all hashes at least once. This would also have performance benefits
even if we don't implement policy blacklists.

--
Andrew Gallagher

Gabor Kiss

2018-05-03 19:18:15 UTC

Permalink

Post by Andrew Gallagher
The second, harder, issue with the above is eventual consistency.
We assume that every peer will eventually see every packet at some
point. But it is entirely possible that all of my peers will put in
place policies against (say) photo-ids, and therefore I may never see a
photo-id that was not directly submitted to me - even if I have no such
policy myself. I am effectively firewalled behind my peers' policies.
Which then leads to pool consistency issues. If some peers are trapped
behind a policy firewall, not only will they have missing entries, they
may not ever *know* that they have missing entries. And this can break
in both directions simultaneously, as these peers may also contain extra
entries that the rest of the network will never see.

Just a historical note.
Folks, have you noticed the similarity between distribution of
keys and newsfeed? ("News" was very popular communication form
before forums, web2 and high speed internet access.[1])
News admins had to search "good" partners if they wanted to get a rich
subset of newsgroups.
On the top of evolution of news servers you can find INN with a lot of
sophisticated solutions.
Fraction of experiences of decades of news may be useful here too.

[1] https://en.wikipedia.org/wiki/Usenet

Gabor

Andrew Gallagher

2018-05-04 16:13:56 UTC

Permalink

Post by Gabor Kiss
Just a historical note.
Folks, have you noticed the similarity between distribution of
keys and newsfeed? ("News" was very popular communication form
before forums, web2 and high speed internet access.[1])
News admins had to search "good" partners if they wanted to get a rich
subset of newsgroups.
On the top of evolution of news servers you can find INN with a lot of
sophisticated solutions.
Fraction of experiences of decades of news may be useful here too.

Yes, but this was driven by the limitations of pre-ARPAnet networking,
where you couldn't assume that you could connect directly to an
arbitrary news server. The same limitations also resulted in bang-path
email routing.

But email has long since migrated to a direct-connection delivery
paradigm, and for good reason. Sure, the idea that absolutely anybody
can set up a mail server and start opening connections to yours is a
little scary if you're not used to it. But that's how any internet
service works.

AFAICT, the limitation that SKS servers should only recon with known
peers was introduced as a measure against abuse. But it's a pretty
flimsy anti-abuse system considering that anyone can submit or search
for anything over the HKP interface without restriction.

I think all SKS servers should attempt to recon with as many other
servers as they can find. The tools exist to walk the network from a
known starting point or points and enumerate all responsive hosts. Why
not have each SKS server walk the network and update the in-memory copy
of its membership on an ongoing basis? If a previously unknown server
does try to recon, what's the harm? So long as it recons successfully it
should go into the list with all the rest.

That way the membership file as it exists now is just a starting point,
like the DNS root hints. No more begging on the list for peers. Just
pre-seed your membership file with a selection of the most stable SKS
sites (e.g. the ones coloured blue on the pool status page) and within
an hour you're peering with the entire pool, and them with you.

If any SKS server is found to be abusing trust, then block away. But
let's permit by default and block specific abuse rather than the other
way around. There may be a need for rate-limiting recon at some point,
but I don't think the pool is anywhere near that big yet.

--
Andrew Gallagher

Gabor Kiss

2018-05-05 06:00:06 UTC

Permalink

Post by Andrew Gallagher
I think all SKS servers should attempt to recon with as many other
servers as they can find. The tools exist to walk the network from a
known starting point or points and enumerate all responsive hosts. Why
not have each SKS server walk the network and update the in-memory copy
of its membership on an ongoing basis? If a previously unknown server
does try to recon, what's the harm? So long as it recons successfully it
should go into the list with all the rest.
That way the membership file as it exists now is just a starting point,
like the DNS root hints. No more begging on the list for peers. Just
pre-seed your membership file with a selection of the most stable SKS
sites (e.g. the ones coloured blue on the pool status page) and within
an hour you're peering with the entire pool, and them with you.

Okay, brain storming in progress. :-)

Keep the similarity to the DNS.
Don't collect millions of unwanted keys in advance.
Wait until a user request comes then ask discovered peers
for the key wanted, merge the results then send it back to the user.
Also store the key into local database and provide it for other
key servers if they ask you.

Requests may be "iterative" or "recursive" (words are stolen from DNS).
Users send recursive request: "I don't care how many peers
you ask, but tell me the key with all signatures."
A cross server request is iterative: "Send me what you have, no more."
This is to avoid endless storm of circulating requests.

How to maintain a pool of servers like above? How to measure their
quality?
It is more dificult than simply comparing number of locally
stored keys. There will be a dedicated key PMK. Monitoring station issues
new signatures 3-4 times a day to random subset of pool members
then recursively asks all pool members and aspirants if they
could retrieve all the new sigs on PMK.

To be continued ... :)

Gabor

Andrew Gallagher

2018-05-05 08:00:43 UTC

Permalink

Post by Gabor Kiss
Okay, brain storming in progress. :-)

:-)

Post by Gabor Kiss
Requests may be "iterative" or "recursive" (words are stolen from DNS).
Users send recursive request: "I don't care how many peers
you ask, but tell me the key with all signatures."

The DNS has a hierarchical structure that allows the authoritative source for data to be found within a small number of requests that depends on the number of components in the fqdn. There is no such structure in sks, and no way of knowing that all I no has been found, so the *best* case scenario is that every server has to be polled for every request.

Post by Gabor Kiss
How to maintain a pool of servers like above? How to measure their
quality?

Sorry, my use of “pool” was inaccurate. I meant to refer to all connected and responsive servers. “Graph” is maybe the better term.

A

Kiss Gabor (Bitman)

2018-05-05 09:55:09 UTC

Permalink

Post by Andrew Gallagher

Suboptimal solutions are also acceptable.
I don't think we alwys need the best (and most expensive way).
"Almost the best" is good enough in most of practical cases.

However some simulation of spreading keys and signatures would
be really useful.

Gabor

Andrew Gallagher

2018-05-05 10:33:35 UTC

Permalink

Post by Kiss Gabor (Bitman)
Suboptimal solutions are also acceptable.
I don't think we alwys need the best (and most expensive way).
"Almost the best" is good enough in most of practical cases.

We need to define our metric to determine what particular degrees of “suboptimal” are acceptable. Timeliness has never been a strong feature of sks and so probably shouldn’t be prioritised now. Accessibility however is crucial, and anything that could result in updated data being uploaded somewhere it is unlikely to be found is a deal breaker imo. DNS uses a hierarchical structure to ensure accessibility, sks uses gossip. If we get rid of gossip but don’t impose a hierarchy we could go far beyond “suboptimal”.

Post by Kiss Gabor (Bitman)
However some simulation of spreading keys and signatures would
be really useful.

Agreed. Anything that changes the behaviour of a distributed system needs realistic performance testing in virtuo.

A

Andrew Gallagher

2018-05-05 07:53:46 UTC

Permalink

If you peer with someone with no keys
loaded, it will render your server nearly inoperable.

I was aware that recon would fail in this case but not that the failure mode would be so catastrophic. Is there no test for key difference before recon is attempted?

A

Andrew Gallagher

2018-05-05 08:38:26 UTC

Permalink

While you could modify the protocol to do something like announce a
key-count first, that's still only protection against accidental
misconfiguration

That’s exactly what I’m talking about. Since the majority of the problems that you have experienced seem to be caused by people not setting it up correctly, it would appear to be sensible, and it’s such a simple thing that I’m surprised it hasn’t been implemented.

Yes, of course a malicious actor can take down an sks server, but you don’t need recon to do it...

A

Andrew Gallagher

2018-05-05 09:27:28 UTC

Permalink

While you could modify the protocol to do something like announce a
key-count first, that's still only protection against accidental
misconfiguration

Sorry for the double. We don’t need to modify the protocol to enable such checks. Whenever a server tries to recon with us, we can perform a callback against its status page and run whatever sanity tests we want before deciding whether to allow recon to proceed. This could be rolled out without any need for coordination.

A

Andrew Gallagher

2018-05-23 16:31:56 UTC

Permalink

Hi, all.

There has been a lot of chatter re possible improvements to SKS on the
list lately, and lots of ideas thrown around. So I thought I'd summarise
the proposals here, and try to separate them out into digestible chunks.

I've ordered them from less to more controversial. My personal
preference is for the first two sections (resiliency, type filters) to
be implemented, and for the rest to be parked.

This has turned out to be a much longer document than I expected. I
don't intend to spend any further time or energy on local blacklisting,
as its technical complexity increases every time I think about it, and
its politics and effectiveness are questionable.

A.

Concrete proposals
==================

Version 1.X: Resiliency
-----------------------

These are ideas that fell out of the other discussions, but are
applicable independently. If we want to make backwards-incompatible
changes, then automatic verification of status, versions etc will
probably be necessary to prevent recon failure.

### JSON status

A standardised JSON status page could be served by all SKS-speaking
services. This would ease fault detection and pool management, and is a
prerequisite for reliable sanity callbacks.

### Default initial_stat=true

Also a prerequisite for sanity callbacks. Otherwise useful for debugging
and fault detection.

### Sanity callbacks

Currently, a server has no way to determine if its peers are correctly
set up or have key deltas within the recon limit. If each host served a
JSON status page, peers could perform a sanity check against it before
allowing recon to continue. This would help contain the effects of some
of the more common failure modes.

### Empty db protection

If the number of keys in the local database is less than a configured
threshold, an sks server should disable recon, and throw a warning. The
particular threshold could be set in the conf file, and a sensible
default provided in the distro. This should prevent new servers from
attempting recon until a reasonable dump is loaded.

Version 2.0: Type filters with version ratchet
----------------------------------------------

This proposal seems to have the most support in principle. It is
relatively easy to implement, and directly addresses both illegal
content and database bloat. It does however require precise choreography.

It should be possible to alter the sks code during a version bump so that:

1. All objects of an expanded but hardcoded set of types (private keys,
localsigs, photo IDs, ...) are silently dropped if submitted
2. Any existing objects in the database of these types are treated as
nonexistent for all operations (queries, recon, dumps, ...)
3. The above rules are only enabled on a future flag day, say 180 days
after the release date of the new version
4. The version criterion for pool membership is bumped a few days in
advance of the flag day
5. A crufty database could be cleaned by dumping and reloading the db
locally, or a database cleaner could be run on a schedule from within
SKS itself

This would purge the pool of the most obviously objectionable content
(child porn, copyrighted material), with minimal collateral damage.

The disadvantage is that any noncompliant peer would fail recon after
flag day due to excessive delta, and thus would need to be either
depeered manually, or have its recon attempts denied by a sanity callback.

Other implementations (i.e. hockeypuck) would have to move in lockstep
or be depeered.

Future speculation
==================

Future A: Policy blacklisting
-----------------------------

Pay attention, kid. This is where it gets complicated.

Version ratchets may not be flexible or responsive enough to deal with
specific legal issues. Policy-based blacklisting gives server operators
a fine-grained tool to clean their databases of all sorts of content
without having to move in lockstep with their peers.

These proposals are more controversial, given that individual operators
will have hands-on responsibility for managing policy, and thereby
potentially be more exposed legally. It should be noted however that
technical debt may not be a valid defence against legal liability. IANAL.

All of the changes in this section must be made simultaneously,
otherwise various forms of recon failure are inevitable. This will
involve a major rewrite of the code, which may not be considered a good
use of time.

If type filters have been implemented (see above), the need for local
policy would be considerably reduced. If however type filters were not
used, then policy blacklists would be the main method for filtering
objectionable content, which might be prohibitive.

Note that locally-divergent blacklist policies have the potential to
break eventual consistency across the graph (see below).

### Local blacklist

An SKS server may maintain a local blacklist of hashes that it does not
want to store. At submission time, any object found in the blacklist is
silently dropped.

Any requests for objects in the blacklist should return `310 Gone`.

### Local dumps

When an SKS server is making a dump, it should dump all of its
databases, including blacklist, peer_bl_cache and limbo (see below).
This is useful for a) restoring state locally after a disaster, but also
b) helping new servers bootstrap themselves to a low-delta state.

### Bootstrap limbo

When restoring from a dump, a server may simply restore the dumped
blacklist and continue. But if the new server has a different policy
than the source, this is not sufficient. Hashes that were added to the
original blacklist for violating policies that the new server does not
enforce should not be blacklisted on the new server. But they cannot be
added to the local database either, because the actual data will not be
found in the dump.

Instead, these hashes are added to a `limbo` database that will be
progressively drained as and when the hashes are encountered again
during submission or catchup. This is important to ensure that recon can
start immediately with a complete set of hashes.

Any requests for objects in limbo should return `404 Not Found`. If an
object is successfully submitted or fetched that matches a hash in
limbo, then the hash will be removed from limbo before the object is
processed by policy.

### Peer blacklist cache

When fetching new objects from a peer during catchup, the peer may throw
`310 Gone` - if this happens then we know that the peer has blacklisted
it and we should not request it again from that peer for some time. We
store the triple `(hash, peer, timestamp)` in the database `peer_bl_cache`.

Similarly, if we receive `404 Not Found` during catchup, then this
object is in the remote server's limbo. We add it to `peer_bl_cache` as
if it were a `310`. Cache invalidation should reap it eventually.

### Fake recon

The recon algorithm is modified to operate against the set of unique hashes:

```
(SELECT hash FROM local_db) JOIN
(SELECT hash FROM local_bl) JOIN
(SELECT hash FROM limbo) JOIN
(SELECT hash FROM peer_bl_cache WHERE peer="$PEER");
```

This ensures that deltas are kept to a minimum.

Note that this may cause the remote server to request items that it does
not have but are in our blacklist or our limbo. This should only happen
once, after which the offending hash should be stored in the peer's
blacklist cache against our hostname.

If the remote server requests an object that we have stored in our
`peer_bl_cache` against its name, then our cache is obviously invalid
and we should remove that entry from the cache and respond with our copy
of the object, if we have one.

### Conditional catchup

Instead of requesting the N missing hashes from the delta, the server
will request the following hashes:

```
(SELECT hash FROM missing_hashes)
JOIN (SELECT hash FROM peer_bl_cache
WHERE peer="$PEER" ORDER BY timestamp LIMIT a)
JOIN (SELECT hash FROM limbo LIMIT b*N);
```

where `a` is small, perhaps even a weighted random integer from (0,1),
and `b` is O(1). These parameters will be adjusted so that a balance is
maintained between (on one hand) timely cache invalidation and limbo
draining; and (on the other) the impact upon the remote peer of
excessive requests.

### Policy enforcement

Each server would be able to define its own policy. The simplest policy
would be one that bans certain packet types (e.g. photo IDs).

During both catchup and submission (but after limbo draining), the new
object is compared with local policy. If it offends then its hash is
added to the local blacklist with a reference to the offending policy,
and the data is silently dropped.

Policy should be defined in a canonical form, so that a) local policy
can be reported on the status pages and b) remote dumps can be compared
with local policy to minimise the number of hashes that need to be
placed in limbo during bootstrap.

### Local database cleaner

If policy changes, there will in general be objects left behind in the
db that violate the new policy. A cleaner routine should periodically
walk the database and remove any offending objects, adding their hashes
to the local blacklist as if they had been submitted. This could be
implemented as an extension of the type-filter database cleaner above.

Open problem: Eventual Consistency
----------------------------------

Any introduction of blacklists opens the possibility of "policy
firewalls", where servers with permissive policies may be effectively
isolated from each other if all of the recon pathways between them pass
through servers with more restrictive policies. Policy would therefore
not only prevent the storage of violating objects locally, but prevent
their propagation across the network. The only way to break this
firewall is to create a new recon pathway that bypasses it. This could
be done manually, but this places responsibility on operators to
understand the policies of all other servers on the graph.

### Recon for all

It might be possible to move from a recon whitelist to a recon blacklist
model. Servers would spider the graph to find peers and automatically
try to peer with them. This would ensure that eventual consistency is
obtained quickly, by maximising the core graph of servers that are
mutually directly connected (and thus immune to firewalling).

The main objection is that moving from a whitelist to blacklist recon
model opens up a significant attack surface. Sanity callbacks could be
used to mitigate against human error, but not sabotage.

### Hard core

Alternatively, a group of servers that do not intend to introduce any
policy restrictions could agree to remain mutually well-connected, and
stay open to peering requests from all comers (subject to good
behaviour). This would effectively operate as a clearing house for objects.

The main objections are a) these servers must all operate in
jurisdictions where the universality of their databases is legally sound
(e.g. no right to be forgotten), and b) some animals would be more equal
than others.

Future B: Austerity
-------------------

In an extreme scenario, handling of any user IDs may be impossible due
to data protection regulations. On the same grounds, it may not even be
possible to store third-party signatures as these leak relationship
data. In such a case, it may still be possible to run an austere
keyserver network for self-signatures (i.e. expiry dates) and
revocations only. This would require a further version ratchet with a
type filter permitting a minumum of packet types, shorn of all personal
identifying information.

Ari Trachtenberg

2018-05-06 02:07:21 UTC

Permalink

The underlying recon algorithm can be stopped at any time and only the discovered
differences can be processed. In other words, it should be possible to put an explicit
timeout on recon time - you will get a partial synchronization, but that might be good
enough as long as you reconcile at a faster rate than the average number of differences.

Post by Andrew Gallagher

If you peer with someone with no keys
loaded, it will render your server nearly inoperable.

I was aware that recon would fail in this case but not that the failure mode would be so catastrophic. Is there no test for key difference before recon is attempted?

It's the calculation of the key difference which is the problem. That's
what recon is.
Recon figures out the difference in the keys present. It's highly
papers on the topic, leading to his academic degree; they're linked
https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Home
After recon figures out what the local server needs, it then requests
those keys using HKP.
While you could modify the protocol to do something like announce a
key-count first, that's still only protection against accidental
misconfiguration: worthwhile and a nice-to-have if there's ever an
incompatible protocol upgrade anyway, to have a safety auto-cutoff to
back up the manual checks people do, but not protection against malice.
Fundamentally, reconciliation between datasets requires computation.
You can add safety cut-offs, and rate-limits per IP and CPU limits per
request and various other things, but none of those help if you're
trying to protect the keyservers from a half of the apocalypse
scenarios.
-Phil
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel

---
Prof. Ari Trachtenberg ECE, Boston University
***@bu.edu http://people.bu.edu/trachten

brent s.

2018-05-05 10:31:20 UTC

Permalink

Post by Andrew Gallagher
AFAICT, the limitation that SKS servers should only recon with known
peers was introduced as a measure against abuse. But it's a pretty
flimsy anti-abuse system considering that anyone can submit or search
for anything over the HKP interface without restriction.
I think all SKS servers should attempt to recon with as many other
servers as they can find.

The SKS reconciliation algorithm scales with the count of the
differences in key-counts. If you peer with someone with no keys
loaded, it will render your server nearly inoperable.
We've seen this failure mode before. Repeatedly. It's part of why I
wrote the initial Peering wiki document. It's why I walked people
through showing how many keys they have loaded, and is why peering is so
much easier these days: most people who post to sks-devel follow the
guidance and take the hints, and get things sorted out before they post.

indeed; i'm with phil on this. the importing process is integral to the
turnup, which is why i offer keydumps[0] myself (available via both http
and rsync, compressed - maybe FTP someday as well), and offer
instructions in that section. and why i wrote this query tool[1]. and
this dumping script[2]. and packaged this[3].

(thanks, phil, by the way for those instructions. i found them super
helpful when i first turned up. and thanks to whomever it was on IRC(?)
that gave me the brilliant idea of running a modified second SKS
instance locally for no-downtime dumps!)

one of the key (no pun intended) criteria i have for peering is their
delta for # of keys off from mine. (i should add in a delta/comparison
function to [1] at some point. hrmmm...)

it is SO IMPORTANT for both ends of the peering to have a relatively
recent keyset. i don't see how we can "fix" this without entirely
restructuring how HKP recon behaves, which is no easy task from my
understanding (should it be even necessary first - i don't believe it
requires "fixing", personally).

This is why we only peer with people we whitelist, and why most people
look for as much demonstration of Clue as they can get before peering,
and it's a large part of why we do see de-peering when actions
demonstrate a lack of trustworthiness.

relevant to this point, i'm still relatively new to keyserver
administration and this list - is there a sort of established procedure
or policy for "announcing" a peer that individuals should de-peer with
(should they be peering with said peer)? what incident response policy
should one follow? what criteria/actions would lead to suggested de-peering?

i diverted the thread because i feel we're crossing into off-topic with
those questions i had and i don't want to hijack the original topic,
since it seems to still be under consideration.

[0] http://mirror.square-r00t.net/#dumps
[1] https://git.square-r00t.net/OpTools/tree/gpg/keystats.py
[2] https://git.square-r00t.net/OpTools/tree/gpg/sksdump.py
[3] https://aur.archlinux.org/packages/sks-local/

--
brent "i said 'peer(ing|ed|)' too many times in this email" saner
https://square-r00t.net/
GPG info: https://square-r00t.net/gpg-info

Andrew Gallagher

2018-05-05 12:30:03 UTC

Permalink

Post by brent s.
it is SO IMPORTANT for both ends of the peering to have a relatively
recent keyset. i don't see how we can "fix" this without entirely
restructuring how HKP recon behaves,

Yes. Perhaps it would be a good idea to systematise the dump/restore process so that instead of a human being following written instructions, a new peer of server A will attempt to a) probe server A to find the key difference b) if the difference is large, download a dump from some standard place c) reinitialise itself before trying again.

Removing human error from such processes is A Good Thing in any case...

A

brent s.

2018-05-05 14:00:30 UTC

Permalink

Post by Andrew Gallagher

Post by brent s.
it is SO IMPORTANT for both ends of the peering to have a relatively
recent keyset. i don't see how we can "fix" this without entirely
restructuring how HKP recon behaves,

Yes. Perhaps it would be a good idea to systematise the dump/restore process so that instead of a human being following written instructions, a new peer of server A will attempt to a) probe server A to find the key difference b) if the difference is large, download a dump from some standard place c) reinitialise itself before trying again.
Removing human error from such processes is A Good Thing in any case...
A

(a) is taken care of by recon already (in a way), but the problem for
(b) is the "standard place" - SKS/recon/HKP/peering is, by nature,
unfederated/decentralized. sure, there's the SKS pool, but that
certainly isn't required for peering (even with keyservers that ARE in
the pool) nor running sks. how does one decide the "canonical" dump to
be downloaded in (b)?

i WOULD say that removing human error is good, and normally i'd totally
agree - but i think this should instead be solved in documentation, as
implementing it in the software itself seems like a lot of work that
even breaks part of SKS/peering philosophy (to me, at least) with low
payoff. i can't speak to it, but i'd be curious if anyone could
anecdotally recall how often peering requests are made to this list
without them first importing a dump.

i instead propose that:

- in the default membership file, a note should be added to the comments
at the beginning about importing a dump first for peering with
"public(?) peers" should be done (and link to one or both of [0])

- in the man page for sks, under "FILES..membership", a note be added
saying the same/similar

- in <src>/README.md, under "Setup and Configuration..### Membership
file", the same note be added

This way, there is *no possible way* a new keyserver administrator will
even know HOW to peer WITHOUT first knowing that they should use a
keydump import beforehand.

Adding in an optional refusal threshold directive (max_key_delta or
something?) for a keycount delta of more than /n/ to sks.conf
(optionally perhaps with the ability to override that value per-peer in
membership?), however, would absolutely hold value, I think.

[0] https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Peering
https://bitbucket.org/skskeyserver/sks-keyserver/wiki/KeydumpSources

--
brent saner
https://square-r00t.net/
GPG info: https://square-r00t.net/gpg-info

Andrew Gallagher

2018-05-05 14:22:53 UTC

Permalink

Post by brent s.
(a) is taken care of by recon already (in a way),

According to a list message from earlier today it is not. If the delta is small, recon proceeds. If it is large, it breaks catastrophically. There is no (current) way to test nicely.

Post by brent s.
but the problem for
(b) is the "standard place" - SKS/recon/HKP/peering is, by nature,
unfederated/decentralized. sure, there's the SKS pool, but that
certainly isn't required for peering (even with keyservers that ARE in
the pool) nor running sks. how does one decide the "canonical" dump to
be downloaded in (b)?

There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy.

A

brent s.

2018-05-05 16:28:10 UTC

Permalink

Post by Andrew Gallagher

Post by brent s.
(a) is taken care of by recon already (in a way),

According to a list message from earlier today it is not. If the delta is small, recon proceeds. If it is large, it breaks catastrophically. There is no (current) way to test nicely.

sorry, should have clarified- i mean the "generating deltas" part of (a).

Post by Andrew Gallagher

There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy.

hrm. i suppose, but i'm under the impression not many keyserver admins
run their own dumps? (which i don't fault them for; the current dump i
have in its uncompressed form is 11 GB (5054255 keys). granted, you
don't see new keyserver turnups often, but still -- that can be a
lengthy download, plus the fairly sizeable chunk of time it takes for
the initial import.)

--
brent saner
https://square-r00t.net/
GPG info: https://square-r00t.net/gpg-info

Andrew Gallagher

2018-05-21 18:27:05 UTC

Permalink

Post by brent s.

Post by Andrew Gallagher

There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy.

Right.

I've thought about this a bit more, and the bootstrapping issue can be
solved without requiring every keyserver to produce a unique dump. We
just need one more database [table]...!

Let us call it Limbo. It contains the hashes of objects that the local
server does not have and has never seen (so has never had the chance to
test against policy), but knows must exist because they were in another
server's blacklist.

When bootstrapping, all that the new server needs to know is a
reasonably complete list of hashes. If it knows the real data as well,
all the better. But for recon to get started, given that we can perform
fake recon, the hashes are sufficient.

When performing a dump, a reference server also dumps its local
blacklist. When loading that dump, the blacklist of the reference is
used to populate the fresh server's Limbo. Now, the fresh server can
generate a low-delta fake recon immediately, by merging the DB, Local-BL
(initially empty) and Limbo hash lists. Recon then proceeds as discussed
before, and so long as the peer graph is well-connected, new peers can
be added without having to reference their dumps.

Limbo entries will return 404, just like missing entries (and unlike
blacklist entries). But the server will request a proportion of the
Limbo entries from its peers during each catchup. This would happen at a
much higher rate than the blacklist cache refresh, but still low enough
that its peers shouldn't suffer from the extra load.

Let's say that at each recon, the number of missing keys is found to be
N. The local server will then request these N keys from its peer. If at
the same time it were to also request (M=a*N) limbo entries thus:

(SELECT hash from limbo where hash NOT IN (SELECT hash from
peer_bl_cache where peer = $PEER) LIMIT $M)`

the extra load on the peer should not be excessive, and Limbo should be
drained at a rate roughly proportional to the parameter `a` and the rate
of new keys.

(This would also be a good place to perform the peer_bl_cache refresh).

When calculating key deltas for pool membership purposes, the fresh
server should not include its Limbo database in the count. This will
ensure that servers do not get added to the pool until their Limbo is
well drained. Alternatively, we could make an explicitly drained Limbo a
condition for pool membership.

This still leaves the issue of eventual consistency as an open problem,
but it can be addressed manually by encouraging good graph connectivity.

--
Andrew Gallagher

Alin Anton

2018-03-24 12:04:23 UTC

Permalink

Hello,

Horrible topic, but base64 images or something could also work for
regular bank transfers.

One could use the image property to store blacklist data, regular
expressions, etc. That key would be a regular one but with a noisy
picture. Maybe a web of DIStrust is also a good idea to vote out bad
objects. That means using your private key to signal the object for
malicious or illegal content, in a protocol which allows you to do that
without redownloading the image.

Just some ideas, a very ugly topic anyway, hard to think of, but easy to
imagine.

Alin Anton

Post by Andrew Gallagher
Hi, all.
I fear I am reheating an old argument here, but news this week caught my
https://www.theguardian.com/technology/2018/mar/20/child-abuse-imagery-bitcoin-blockchain-illegal-content
tl;dr: Somebody has uploaded child porn to Bitcoin. That opens the
possibility that *anyone* using Bitcoin could be prosecuted for
possession. Whether this will actually happen or not is unclear, but
similar abuse of SKS is an apocalyptic possibility that has been
discussed before on this list.
I've read Minsky's paper. The reconciliation process is simply a way of
comparing two sets without having to transmit the full contents of each
set. The process is optimised to be highly efficient when the difference
between the sets is small, and gets less efficient as the sets diverge.
Updating the sets on each side is outside the scope of the recon
algorithm, and in SKS it proceeds by a sequence of client pull requests
to the remote server. This is important, because it opens a way to
implement object blacklists in a minimally-disruptive manner.
An SKS server can unilaterally decide not to request any object it likes
from its peers. In combination with a local database cleaner that
deletes existing objects, and a submission filter that prevents them
from being reuploaded, it is entirely technically possible to blacklist
objects from a given system.
The problems start when differences in the blacklists between peers
cause their sets to diverge artificially. The normal reconciliation
process will never resolve these differences and a small amount of extra
work will be expended during each reconciliation. This is not fatal in
itself, as SKS imposes a difference limit beyond which peers will simply
stop reconciling, so the increase in load should be contained.
The trick is to ensure that all the servers in the pool agree (to a
reasonable level) on the blacklist. This could be as simple as a file
hosted at a well known URL that each pool server downloads on a
schedule. The problem then becomes a procedural one - who hosts this,
who decides what goes in it, and what are the criteria?
It has been argued that the current technical inability of SKS operators
to blacklist objects could be used as a legal defence. I'm not convinced
this is tenable even now, and legal trends indicate that it is going to
become less and less tenable as time goes on.
Another effective method that does not require an ongoing management
process would be to blacklist all image IDs - this would also have many
other benefits (I say this as someone who once foolishly added an
enormous image to his key). This would cause a cliff edge in the number
of objects and, unless carefully choreographed, could result in a mass
failure of recon.
One way to prevent this would be to add the blacklist of images in the
code itself during a version bump, but only enable the filter at some
timestamp well in the future - then a few days before the deadline,
increase the version criterion for the pool. That way, all pool members
will move in lockstep and recon interruptions should be temporary and
limited to clock skew.
These two methods are complementary and can be implemented either
together or separately. I think we need to start planning now, before
events take over.
_______________________________________________
Sks-devel mailing list
https://lists.nongnu.org/mailman/listinfo/sks-devel

--
Sl.univ.dr.ing. Alin-Adrian Anton
Politehnica University of Timisoara
Department of Computer and Information Technology
2nd Vasile Parvan Ave., 300223 Timisoara, Timis, Romania