Fixing The Reply Count

There are a number of issues with the decentralized nature of the Fediverse. One of them is that we are not seeing replies from people we do not follow.

Fixing The Reply Count

Firstly, this work is based on the fantastic baseline by Abhinav, who wrote a python script to fetch context toots for replies.

Byron Miller did a great write-up of why we aren't seeing the like count we are expecting on Fediverse notes (toots).

Update: The service is up and running. You can get a fix for the reply count issue today by authorising combine.social.

The Root Cause

In a sentence: The root cause is that servers only send messages to followers. One side-effect of that is that if a person who you do not follow replies to a (non-local) message, then you do not see the reply.

Wrong favorite count

Image credit: Lu Wilson.

What this leads to is that we have to go to the original post 99% of the time to get the correct counts.

As a mastodon user who is self-hosting, this becomes even more of an issue, since my single-user instance does not have a local follower for most of the users in the Fediverse and therefore will not get most of the posts federated to it.

The Usual Answer

Usually, the answer is that I should just add enough relays to get all the updates.

This comes with a few downsides.

One is that not everything is federated. Like counts, for instance, do not get federated. This leads to the issues mentioned above where we don't see the correct like counts.

Another is that simply relaying everything to everyone feels like a very block-chain-y solution where we totally ignore the global cost overhead of relaying all the posts to all the servers.

A "good" solution would be to update the federation mechanism where all replies and likes are actually relayed to all followers of the original post.

As a third-party contributor, I have no say in which features Mastodon developers choose to implement. I can of course just add a PR myself, but there is no guarantee that it will ever be included in the product.

A Workaround

Luckily, as a self-hoster, I get to add whatever I want to my own instance.

So I did just that.

I created a service that simply polls my home timeline and fetches replies to all posts.

The workaround flow something goes like this:

  1. Fetch all posts from the home timeline for the last n hours.
  2. Resolve the post
    (http://local/api/v2/search?q=url&resolve=true)
  3. Fetch context from the remote instance (https://remote/api/v1/statuses/id/context)
  4. For each ancestor and descendant, goto 2.

The Devil In the Detail

There is of course a devil in the detail, which is that this will loop infinitely unless there is a break condition that detects that a post already has been processed.

Additionally, Mastodon has a rate limit on most API calls set to 300 requests every 5 minutes per user, per client, and per IP. The rate limit is upheld by making sure that the fetch function is throttled to one request every two seconds (so we don't exclude the human user from actually accessing those remote hosts by taking up the entire rate quota).

What this means is also, that this will probably never work for people who are following many users. In my local testing (following ~50 users) it took only ~2 minutes, but once I added @Gargron to the list of people I follow, that time exploded to ~15 minutes. This does indeed indicate that this solution does not scale very well. If you are following 500 people, with a handful of those being as popular as @Gargron, then the server will simply not be able to resolve posts as fast as they are written.

Roadmap

I have a clear intention of releasing this into the wild. I'll start with running it locally for a while and monitoring my queue length to see if it blows up. Once I am relatively comfortable that it is sustainable, then I will slowly open it up to people who are not following too many people. If it is still running fairly stable, then I will start opening it up to the general public. I'll obviously also make it open source, so you can verify that I am not doing anything untoward with your access tokens or the posts that are readable with those tokens.

Steps

  1. Single user testing
  2. Release the source
  3. Friendly user testing with a few low-volume users
  4. Beta testing
  5. ???

I don't know if I could make it scale horizontally to work for many users and users with a large following.

Stay tuned for more updates.


There is a thread about this post on Mastodon.

Mastodon