💾 SuperFloppies Place! 💾

On Rate Limiting and Abuse

Written: Friday, August 31, 2018 at 19:58:45 EDT

Last Update: Sunday, September 2, 2018 at 16:07:07 EDT

NOTE: This is now an issue on tootsuite’s mastodon repository at GitHub. See GitHub: Issue #8575 on tootsuite/mastodon: "Abuse-prevention: rate limits / 'storm shield'" for the issue there.

Do note that this page is still the “full” version; GitHub issues aren’t really intended for a full discussion in a single post. That said, at this point, all technical comments should go to the issue itself, I think. That way discussion moving forward is in one place.

NOTE: All updates will appear at the bottom of this document. So far, three updates have been made, the most recent one at 2018-10-03 08:23:55 -0400 EDT.

So, for the second time, I suggested using Hashcash or a similar system in order to prevent abuses on the ActivityPub fediverse. And for the second time, the idea was shot down as being “wasteful”. Since I’m done debating this in the fediverse (too much to say, not enough space), I’m going to write my full argument here.

Also: I’m not averse to hearing better ideas. But those ideas have to accomplish at least as much as this one does, and ideally in a more performant manner. I’m all about efficiency, as anyone who has ever paid me for code can attest to. And I don’t consider efficiency to be a single-tiered variable: it must be a global consideration. It encompasses development time, runtime, robustness, and ease of use. A proposal which is better in some or all of these fronts, and worse in none, is what I’m hoping to hear.

This all having been said, I fully intend to implement this in an ActivityPub server that I’m creating.

Motivation/Rationale

TL;DR: @wilw@mastodon.cloud was quite literally forced out of the ActivityPub federated network by many users across several instances of ActivityPub implementations (mostly, if not entirely, Mastodon instances). The conditions of the network enabled this to happen, and no mechnaisms were available to halt its progress. The root of this can be easily identified, as well: there was literally no ability for @wilw@mastodon.cloud to keep up with the assult, due to insufficient tooling and resources. All the manpower in the world would have not made a difference here, either: we need automated tools that can be put in the hands of the abused, and not torn apart by the abusers.

What is needed?

To that effect, what is needed is a system that satisfies the following requirements:

What about a simple server-side rate limit?

Since this is the first thing that is counter-suggested, I’ll address it early. Such a mechanism requires more state than the proposal that follows, and it requires more shared state between individual servers in a cluster. What does that mean? It means adding yet another component to a system which is already showing stresses, and guarding that state with distributed locks. It could be done, but it would absolutely fail to scale properly. Contention becomes an issue in a system like this when it grows beyond a few thousand concurrent actors, and there are in some cases 10,000+ actors in a single instance. So, this cannot be made to work.

Given that it would be opt-in, it is reasonable to assume that it would bring little impact. But what if an entire instance wants this turned on? What if majority of the users of an instance want to benefit from the feature? This would result in aches and pains felt by the administrators (slow response issues everywhere without a ready explanation in RAM or processor usage), downtime, denial of service, and so forth. In fact, a denial of service attack would be wonderfully easy to carry out: create 2,000 accounts on an instance and bog it down with rate limits on all of them.

Clearly, that cannot work in a network of this size. We’re not talking about the email for a small business, which might transit 1,000 messages in a day. An ActivityPub implementation is a JSON document processor which cryptographic elements present within it. It’s a little more effort than parsing an Internet mail message (see RFC 5322 and RFC 6854 for the details of the format of an Internet mail message, if you are unfamiliar). And this is a good thing: it means that many of the primitives on which a reasonable solution can build are already present, and so the resulting addition to the code bases would be minimal.

So, then, what would work?

At the core of my proposal is to use some sort of proof of work; something like Hashcash, if not Hashcash itself. I say this because there are many types of proof-of-work, and not all of them are appropriate for use in a situation like this.

Challenge-Response

The most common type of proof of work familar to developers is that of the challenge-response. It is commonly used in authentication systems, such as Kerberos or digest authentication. It is relatively low-overhead in isolation, but many high-traffic Web servers have already done away with digest authentication in use, particularly API endpoints, in favor of authentication methods that require less state tracking and fewer network round trips. Credentials passed this way are also frequently used for authorization; the Web server knows who it is, but now has to look up in a database of some sort whether or not the user is permitted to perform the action.

A challenge-response system could be used, but it would fail in three major ways:

Problem-Solution

The other type of proof of work is the problem-solution type. The problem-solution type works something like the following:

Unlike challenge-response, which is typically used both for authentication and authorization, problem-solution is typically used only for authorization (yes, there are ways to use it for authentication, and some of those methods are even somewhat common; but ActivityPub already has authentication of messages, and so we’re only considering authorization here). Perhaps the most well-known use of this type of authorization is in blockchains, where the “winning hash” is a bearer token to be appended to the blockchain with its associated block.

Also unlike challenge-response, problem-solution algorithms scale A LOT. And for a problem domain such as the one which provides the context for this article, it is an almost perfect solution. So close to it, in fact, that despite having spent a lot of time racking my brain and the Internet to find something better in the past week since I suggested this the first time, I am really unable to find something that provides the same sort of characteristics as this type of solution.

As you probably already guessed if you’re read this far, Hashcash is one of the members of this family. It’s not the only member of this family, though; there are others. Many of them are extremely complicated systems which are overkill for something like this.

That is where Hashcash comes in.

But isn’t Hashcash Bad?

For blockchain applications where the blockchain is distributed globally and everyone wants to find the next block… it’s awful. Atrocious. Wasteful. But it was the first method used on a blockchain, and we’ve found better ways to handle that level of scale.

But let’s consider this: the ActivityPub federated universe will never scale to that type of size, for starters, and no entity within the federated universe will become so popular that they’d require exahash, petahash, or even terahash-level power. And that’s where the waste typically and wrongfully associated with Hashcash is found: in the fact that as of the time I’m writing this, the Bitcoin network is at 46.4 exahashes per second.

That’s insane. That’s an incomprehensibly large number of operations to most people, it is nearly unfathomable. That number looks like this (to three significant digits): 46,400,000,000,000,000,000 hashes per second. That many hashes are being computed by the Bitcoin network miners on average per second in order to try to find one single block every ten minutes.

So it’s not hard to understand why someone would see Hashcash and knee-jerk about it if they do not have a complete understanding of what Hashcash is and how it is (and is not) related to Bitcoin.

It also is proof of just how well it works, despite its massive power usage in an application such as a cryptocurrency. It enforces a rate limit of 1 block per each approximate 10 minute interval, globally.

Think about that for just a minute, and let it sink in.

Hashcash is used to limit the blockchain’s growth to one block per ten minutes, on average, world-freakin-wide. And the hash value required is adjusted once every approximately 2,000 blocks in an effort to maintain that fixed rate of growth. And it works.

But it wastes power, doesn’t it?

As with literally anything: how it is used determines what it does, how it behaves. Let’s start with what we know already:

And what does that electricity usage generate?

So, all that power isn’t being used because of Hashcash. It is actually being used because the Bitcoin blockchain does not want to have a new block every second or ten seconds; it wants to have a new block once every ten minutes, which means that the Bitcoin blockchain will only grow by about 144 blocks every 24 hours, on average, literally everywhere on planet Earth.

That necessarily takes a lot of something to provide for its security; in this case that something is electricity.

The Proposal, Formally

NOTE: Please see the updates section below after reading this section.

So, then, here is the proposal:

Since the default would be “off” there would be little to no impact at rollout, except for the new feature’s appearance post-upgrade.

If an account has enabled the boolean preference described above:

But this only limits the client-side posting rate, doesn’t it?

Yes, it does. But more importantly, it has an effect on the posting user. It is a well-known fact that users who think that something is “being slow” are going to give up out of frustration and move on to something else.

There are a few reasons why this makes Hashcash appealing:

  1. This gives the owner of an inbox the ability to control the receipt of messages. Currently, pretty much the entire ActivityPub federated universe has control of how flooded or not an individual user’s inbox becomes, and that is clearly not acceptable.
  2. It does not increase the server’s load, on the average, because:
    • Clients which behave (honor the postage) will have to generate the hash before attempting resubmission of a message rejected for insufficient postage. A client which behaves and submits a proper postage with the message has done nearly all the work; the server can verify without actually redoing the work, in an expedient and efficient manner.
    • Clients which do not behave are easily detected and can be automatically blocked, at the IP layer, because their pattern “sticks out” and can be considered unusal and indiciative of potential abuse.
  3. The “scale” at which this operates is tiny: an individual must opt-in (manually toggle the preference on) before anything changes for that account. An individual must also change the slider from zero to a non-zero value for it to become effective. The impact is strictly limited to users who mention or direct message the user, and nobody else.

Perhaps the only unappealing thing is that there is no way to know what the slider’s setting should be at for any given scenario: every one would be different. Would a 10 second cooling period have disspated the mob in the case of @wilw@mastodon.cloud? Maybe. What is for sure is that this feature, or something like it, does not exist now. If it did exist, it would grant users additional abilities in controlling their own inbox, at (nearly) no cost to the instance itself: the only additional cost to the server is to reject a message when it has insufficient postage.

OK, but how is this better than server-enforced rate limits?

This is important to understand: this feature is being suggested in order to allow an individual to protect its inbox against assult by other entities within the entire federated universe.

It is not intended to be always-on.

It is intended to be used strictly as a response to a “storm” directed at a single user, as was the case for @wilw@mastodon.cloud.

So this means that an instance should be able to keep its (very low, possibly zero) members who have turned this feature on in a small, in-memory table which contains the user’s local name and the number of zero bits required.

This also means that if an instance has a high percentage (like, more than 1 or 2% out of a population greater than 100) of people using this feature, something is wrong and this becomes a useful flag to the administrator that this is the case.

Essentially, if the feature incurs any noticable burden on the server at all, it is because it is host to a large number of people who are either paranoid, or on an instance which is hostile, uncontrolled, or as the Mastodon blocklist says, is a “free-speech zone”. In that case, the instance administrator knows about the costs that it is incurring and likely has to do a lot to keep its personal entertainment running in the first place.

So how is it better than using the database as a lookup source? Because:

Any other solution, to scale, would require additional middleware to offload it from the Mastodon application, increasing the management burden to maintain any size instance that federates.

Simply put: it puts additional control in the hands of the receiver of messages, while at the same time only incuring any cost whatsoever if/when a user enables it.

The feature should be big and scary looking, like a big red button that shuts down a data center. It should be stupidly clear that it is enabled, and even if all that is ignored, the impact is limited to the user who doesn’t want anyone talking to it in the first place. Clients give up quickly; far more quickly than the maximum delay would be able to be set at.

Feedback?

I encourage feedback and discussion of this. I’d like to see this feature, or something like it that empowers the user to control its own inbox. If not this idea, than something else which scales as well as it or better (I don’t think that the impact on the server can be made any more minimal) and gives as much or more control to the user over the user’s own inbox.

Updates

  1. 2018-09-01T13:59:14-04:00: I think that the word “protection” carries with it the implication that the feature should be always on. Another name would be better suited; perhaps “Storm Shield” or something. I don’t know. That sounds cheesy.
  2. 2018-09-01T22:57:34-04:00: A side effect of this is that work is reduced on the administrators/moderators of an instance. I’m not entirely sure if this is a net positive or net negative in the even larger picture. But I remain convinced that the targeted user is the most important thing. I see this as a potentially useful side effect: it gives the people who are performing the harassment something of a chance to consider their behavior and maybe improve themselves before they themselves become reported. Thanks to @codesections@fosstodon.org for pointing this out!
  3. 2018-09-02T01:05:20-04:00: Additionally, a self-cancellation timeout could be implemented. This should, to reduce burden on the server, be implemented as a simple integer count of seconds which is stored along side the target value, alongside a timestamp indicating when the function was enabled so that efficient checks for expiration are possible. A period of 72–144 hours would seem to be reasonable. Thanks to @adz@mastodon.technology for the inspiration here. This has been integrated into the document above.
  4. 2018-09-02T01:05:20-04:00: An alternative idea for the UI: instead of two widgets (on/off switch, slider) as proposed above, the implementation could have a “panic button”. When this panic button is it, it would introduce a small delay (say, 3 seconds). If the effects of the storm subside, this is all that would then be needed. Each time the button is hit again, the target delay increases by three seconds, perhaps with an upper limit (at which point the button becomes disabled/insensitive/inactive). Do note, however, that this depends on the self-cancellation timeout described above. Again, thanks to @adz@mastodon.technology for the inspiration here. This has been integrated into the document above.

Comments?

You can comment on my site by leaving me a comment at @SuperFloppies@mastodon.technology, where I do my communicating.

If this helped you in any way, and you wish to express gratitude, be sure to drop a comment. Alternatively, you can say “thank you” anonymously by dropping some Bitcoin in the tip jar.