[feat] Filter fake Create activities out of timelines #1053

Open
opened 2026-01-21 06:45:04 +00:00 by a · 4 comments
Contributor

The idea

When an object is discovered, Akkoma will normalize it to an activity by wrapping it in a fake Create activity.

I would like a way to not have these "fake Creates" show up in timelines. I am not aware of a way to do this with MRF without dropping the post entirely.

In practice, this would probably require two things:

  1. Internally track fake Create vs real Create? This might be detectable by whether the Create has a non-null id or not, but that could fail if receiving Create activities without an id.
  2. Filter on timelines to exclude based on that property.

The reasoning

Objects being discovered are typically less likely to be socially relevant, which means that they should not be shown at the top of timelines as if they were received just now. However, they should still show up in profiles at the appropriate datetime.

Have you searched for this feature request?

  • I have double-checked and have not found this feature request mentioned anywhere.
  • This feature is related to the Akkoma backend specifically, and not pleroma-fe.
### The idea When an object is discovered, Akkoma will normalize it to an activity by wrapping it in a fake Create activity. I would like a way to not have these "fake Creates" show up in timelines. I am not aware of a way to do this with MRF without dropping the post entirely. In practice, this would probably require two things: 1. Internally track fake Create vs real Create? This might be detectable by whether the Create has a non-null `id` or not, but that could fail if receiving Create activities without an id. 2. Filter on timelines to exclude based on that property. ### The reasoning Objects being discovered are typically less likely to be socially relevant, which means that they should not be shown at the top of timelines as if they were received just now. However, they should still show up in profiles at the appropriate datetime. ### Have you searched for this feature request? - [x] I have double-checked and have not found this feature request mentioned anywhere. - [x] This feature is related to the Akkoma backend specifically, and not pleroma-fe.
Owner

There’s one big issue with this: it’s possible for an activity to get fetched before it gets actively delivered. Not too common, but I’ve seen it happening before. At this point a "fake" activity already exists and you don’t want to change the database ID anymore (as this would mess up things). Even if we were to then just rewrite the ActivityPub ids (everywhere it shows up) and/or some "fake" indicator toggle, now a bunch of people are going to miss the post because it predates the min_id/since_id they are using to fetch updates.

I’m sorry to say, but I don’t think this can be realised with reasonable performance and effort.

Using the offset from the publication timestamp like ObjectAge does is probably the saner and more stable heuristic for social relevancy in practice.

However... thinking about it now, even ObjectAge’s :strip_follow has an issue: sicne it strips the follower address (and as:Public) to keep it from showing up in timelines, belatedly delivered follower-only posts can be stripped of all addressing and become fully invisible to everyone and everywhere.
In practice this is likely not too relevant at the moment since we can't fetch restricted posts anyway (due to unconditionally using the instance actor for signing rather than a follower) and ObjectAge thresholds are usually generous enough for typical federation delays. But allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\
This should probably be changed to by default not strip follower addresses if the addressing becomes empty otherwise didn’t contain as:public before restricted content becomes fetchable? This will mean old follower-only posts can again slip into the home timeline as "new" content, but this seems preferable to effectively dropping it completely (and those preferring the latter can change the default).

Perhaps we actually need to start back-dating db IDs (after the delay exceeds a generous, configurable threshold) to reliable avoid it showing up as "new" in timelines without stripping it from user profiles. This still has the drawback though that it will then be possible to permanently "miss" posts while traversing a profile or timeline if sufficiently old posts are newly discovered while the traversal happens as no future fetch will ever include this newly discovered content.

This might be detectable by whether the Create has a non-null id or not, but that could fail if receiving Create activities without an id.

It will also fail once we allow fetching of Create (and other) activities directly instead of only the encapsulated Note, (Mastodon-style) Question, Article, ... as allowed by #846

I am not aware of a way to do this with MRF without dropping the post entirely.

If you’re fine with relying on the missing AP id for now with all the caveats brought up above, you can simply have the MRF check for the presence of a non-nil "id" and then do the same thing as ObjectAge’s :strip_followto keep it from home timelines.

There’s one big issue with this: it’s possible for an activity to get fetched _before_ it gets actively delivered. Not too common, but I’ve seen it happening before. At this point a "fake" activity already exists and you don’t want to change the database ID anymore (as this would mess up things). Even if we were to then just rewrite the ActivityPub ids (everywhere it shows up) and/or some "fake" indicator toggle, now a bunch of people are going to miss the post because it predates the `min_id`/`since_id` they are using to fetch updates. I’m sorry to say, but I don’t think this can be realised with reasonable performance and effort. Using the offset from the publication timestamp like ObjectAge does is probably the saner and more stable heuristic for social relevancy in practice. However... thinking about it now, even ObjectAge’s `:strip_follow` has an issue: sicne it strips the follower address (and `as:Public`) to keep it from showing up in timelines, belatedly delivered follower-only posts can be stripped of _all_ addressing and become fully invisible to everyone and everywhere. In practice this is likely not too relevant at the moment since we can't fetch restricted posts anyway *(due to unconditionally using the instance actor for signing rather than a follower)* and ObjectAge thresholds are usually generous enough for typical federation delays. *But* allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\ This should probably be changed to by default _not_ strip follower addresses if the addressing ~~becomes empty otherwise~~ didn’t contain `as:public` before restricted content becomes fetchable? This will mean old follower-only posts can again slip into the home timeline as "new" content, but this seems preferable to effectively dropping it completely (and those preferring the latter can change the default). Perhaps we actually need to start back-dating db IDs (after the delay exceeds a generous, configurable threshold) to reliable avoid it showing up as "new" in timelines without stripping it from user profiles. This still has the drawback though that it will then be possible to permanently "miss" posts while traversing a profile or timeline if sufficiently old posts are newly discovered while the traversal happens as no future fetch will ever include this newly discovered content. > This might be detectable by whether the Create has a non-null id or not, but that could fail if receiving Create activities without an id. It will also fail once we allow fetching of `Create` (and other) activities directly instead of only the encapsulated `Note`, (Mastodon-style) `Question`, `Article`, ... as allowed by #846 > I am not aware of a way to do this with MRF without dropping the post entirely. If you’re fine with relying on the missing AP id for now with all the caveats brought up above, you can simply have the MRF check for the presence of a non-nil `"id"` and then do the same thing as ObjectAge’s `:strip_follow`to keep it from home timelines.
Author
Contributor

it’s possible for an activity to get fetched before it gets actively delivered. Not too common, but I’ve seen it happening before. At this point a "fake" activity already exists and you don’t want to change the database ID anymore (as this would mess up things).

i can appreciate that this would be disruptive to the codebase and not low-hanging fruit at all, but i would fundamentally make a distinction between "fetched" and "delivered". you can cache http resources within an http agent or processing layer, and you can separately track deliveries when you get a POST to inbox -- then you can associate deliveries with activities. and in the api, you use the deliveries as source of truth rather than simply activities (and sort by timestamp of deliveries too, not the activity/object).

allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\

if the distinction between http resources and inbox deliveries is made, i think it shouldn't cause those issues. fetching posts happens within http cache layer and not inbox delivery layer.

Perhaps we actually need to start back-dating db IDs

maybe? it might be easier to instead assign ids to deliveries...

If you’re fine with relying on the missing AP id for now with all the caveats brought up above, you can simply have the MRF check for the presence of a non-nil "id" and then do the same thing as ObjectAge’s :strip_followto keep it from home timelines.

i don't think i would be fine with those caveats as they seem pretty insufficient ux-wise for what i'd like to do. i'm wondering what you think of the idea about tracking deliveries, though...

> it’s possible for an activity to get fetched before it gets actively delivered. Not too common, but I’ve seen it happening before. At this point a "fake" activity already exists and you don’t want to change the database ID anymore (as this would mess up things). i can appreciate that this would be disruptive to the codebase and not low-hanging fruit at all, but i would fundamentally make a distinction between "fetched" and "delivered". you can cache http resources within an http agent or processing layer, and you can separately track deliveries when you get a POST to inbox -- then you can associate deliveries with activities. and in the api, you use the *deliveries* as source of truth rather than simply *activities* (and sort by timestamp of deliveries too, not the activity/object). > allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\ if the distinction between http resources and inbox deliveries is made, i think it shouldn't cause those issues. fetching posts happens within http cache layer and not inbox delivery layer. > Perhaps we actually need to start back-dating db IDs maybe? it might be easier to instead assign ids to deliveries... > If you’re fine with relying on the missing AP id for now with all the caveats brought up above, you can simply have the MRF check for the presence of a non-nil "id" and then do the same thing as ObjectAge’s :strip_followto keep it from home timelines. i don't think i would be fine with those caveats as they seem pretty insufficient ux-wise for what i'd like to do. i'm wondering what you think of the idea about tracking deliveries, though...
Owner

If I understand correctly and assuming you want to minimise the disruption, you are proposing adding a new deliveries table which basically will point at activities objects with entries in the former only being created if the associated activity was received in one of our own inboxes. Timeline APIs would then query the deliveries table and use its flake IDs for sorting.
This in principle would indeed allow robust, delivery-time sorted and delivered-exclusive timelines. Even in this minimal-disruption version, it already is quite disruptive in absolute terms however. But furthermore this, and for the most part any scheme separating fetched and delivered objects, causes a glaring issue wrt to Masto API:

Masto timeline API pagination parameters are not opaque values but documented to directly correspond and affect the ID of returned statuses. Fetched statuses still will need to have an ID to be able to be viewed and interacted with. At the same time, this ID MUST NOT change even if the activity was first fetched but later also delivered resulting in sorting issues and missed statuses as explained previously.

allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\

if the distinction between http resources and inbox deliveries is made, i think it shouldn't cause those issues.

I don’t follow. The described issue applies to ObjectAges means of removing objects from select timelines regardless of where they come from. A general timeline exclusion just for fetched-only content doesn't address this usecase

maybe? it might be easier to instead assign ids to deliveries...

Doesn’t look this way

i'm wondering what you think of the idea about tracking deliveries, though...

It appears to be fundamentally incompatible with Masto API and even if it there was a way to resolve this incompatibility I’m skeptical the disruption and effort required to implement it are worth the payoff.

Plus I think it does sometimes make sense to include fetched content in timelines. E.g. due do failed earlier attempts (e.g. downtime) a reply in a thread might arrive a significant amount of time before it’s parents are delivered. The parents will be fetched though. The whole conversation showing up at once (or if too old, never at all) seems like better UX than having posts you already read pop up hours later on your timeline. I understand the appeal on the basis of it being a technically clean and satisfying criteria and separation, but the practical benefit seems subjective and limited to fringe cases.

If I understand correctly and assuming you want to minimise the disruption, you are proposing adding a new `deliveries` table which basically will point at `activities` objects with entries in the former only being created if the associated activity was received in one of our own inboxes. Timeline APIs would then query the `deliveries` table and use its flake IDs for sorting. This in principle would indeed allow robust, delivery-time sorted and delivered-exclusive timelines. Even in this minimal-disruption version, it already is quite disruptive in absolute terms however. But furthermore this, and for the most part any scheme separating fetched and delivered objects, causes a glaring issue wrt to Masto API: Masto timeline API pagination parameters are **not** opaque values but [documented to directly correspond and affect the ID of returned statuses](https://docs.joinmastodon.org/methods/timelines/#query-parameters-2). Fetched statuses still will need to have an ID to be able to be viewed and interacted with. At the same time, this ID MUST NOT change even if the activity was first fetched but later also delivered resulting in sorting issues and missed statuses as explained previously. > > allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\ > > if the distinction between http resources and inbox deliveries is made, i think it shouldn't cause those issues. I don’t follow. The described issue applies to `ObjectAge`s means of removing objects from _select_ timelines regardless of where they come from. A general timeline exclusion just for fetched-only content doesn't address this usecase > maybe? it might be easier to instead assign ids to deliveries... Doesn’t look this way > i'm wondering what you think of the idea about tracking deliveries, though... It appears to be fundamentally incompatible with Masto API and even if it there was a way to resolve this incompatibility I’m skeptical the disruption and effort required to implement it are worth the payoff. Plus I think it does sometimes make sense to include fetched content in timelines. E.g. due do failed earlier attempts (e.g. downtime) a reply in a thread might arrive a significant amount of time before it’s parents are delivered. The parents will be fetched though. The whole conversation showing up at once (or if too old, never at all) seems like better UX than having posts you already read pop up hours later on your timeline. I understand the appeal on the basis of it being a _technically_ clean and satisfying criteria and separation, but the practical benefit seems subjective and limited to fringe cases.
Author
Contributor

you are proposing adding a new deliveries table which basically will point at activities objects with entries in the former only being created if the associated activity was received in one of our own inboxes

that could be one way to do it, yeah?

Timeline APIs would then query the deliveries table and use its flake IDs for sorting. This in principle would indeed allow robust, delivery-time sorted and delivered-exclusive timelines.

the idea is that a "timeline item" is not strictly "the activity itself" but rather "the thing you received, which contains an activity as its payload".

it doesn't strictly have to the be the same timeline api endpoints but api design is something i'm leaving out for now because anything would be fine i guess and the separation can be worked out later if the actual base idea itself is accepted, which seems to be not the case because as you say

Even in this minimal-disruption version, it already is quite disruptive in absolute terms however. But furthermore this, and for the most part any scheme separating fetched and delivered objects, causes a glaring issue wrt to Masto API

from which i can extract two concerns:

  • disruption (in terms of affecting existing code and requiring work, i assume)
  • masto api (regarding violating the existing assumptions of api clients i assume)

pagination parameters are not opaque values

i believe the exact stance on pagination by id in masto is that most pagination ids are considered "internal only", and api clients are supposed to use Link headers with rel=next and rel=prev. however, some ids "leak" out of that "internal only" scope because they are public facing -- this is the account and status ids mostly, where mastodon uses snowflake and akkoma uses flake. the idea with max/min/since (+ limit) is to do pagination based on what you already have -- the temporal info encoded into the snowflake/flake is not guaranteed, so api clients shouldn't extract it.

Fetched statuses still will need to have an ID to be able to be viewed and interacted with. At the same time, this ID MUST NOT change even if the activity was first fetched but later also delivered resulting in sorting issues and missed statuses as explained previously.

i think the "ID assignment" works like this currently, right?

  • when an object is discovered, it gets a status id assigned
  • when an activity is delivered, if the object is newly discovered, it gets a status id assigned

this is fine for the status view which mostly does not care about activities; mastodon:Status is transformed from the as:Note usually and not the as:Create. it's an akkoma thing to keep the Creates around; mastodon doesn't generally do this. (maybe they should, but that's a different concern altogether so i am not going to go into it here)

what is being described might break down into multiple cases:

  • statuses
  • activities
  • deliveries

with the entity in question used to serialize the API response could be a parameter or could be multiple endpoints. i don't think it breaks masto api to say "give me deliveries" and then use delivery ids to paginate, but it could be tricky to make sure that the server understands which id the client is requesting, i.e. if the client gives a status id for a deliveries api, the server would have to translate; but the better design is to wrap entities instead -- similar to how mastodon 4.5 api's Quote entity can refer to a quoted_status https://docs.joinmastodon.org/entities/Quote/ because the "quote" is separate from the "quoted status", just like how the "timeline item" can be considered separate from the "status". (there is a similar construct in the https://docs.joinmastodon.org/methods/search/ results, or how FeaturedTag and Tag have different ids.)

broadly, you have the "published date" which is a property of an object and the "discovered date" (received, fetched, synthesized... whatever) which is a property of a wrapper item

It appears to be fundamentally incompatible with Masto API and even if it there was a way to resolve this incompatibility I’m skeptical the disruption and effort required to implement it are worth the payoff.

i think the incompatibility can be resolved, but if you're not convinced of the payoff then that's a different matter altogether. for me the payoff is clear and i would greatly appreciate seeing "what i actually received" or as close to it as possible. i would at least want a stream of activities instead of a stream of notes. with an activity stream you could say things like "alice created a status" or "system discovered a status by alice (whom you follow)" and the distinction would make sense:

actor: <system>
type: Announce
object:
  audience: alice/followers
audience: alice/followers
> you are proposing adding a new deliveries table which basically will point at activities objects with entries in the former only being created if the associated activity was received in one of our own inboxes that could be one way to do it, yeah? > Timeline APIs would then query the deliveries table and use its flake IDs for sorting. This in principle would indeed allow robust, delivery-time sorted and delivered-exclusive timelines. the idea is that a "timeline item" is not strictly "the activity itself" but rather "the thing you received, which contains an activity as its payload". it doesn't strictly have to the be the same timeline api endpoints but api design is something i'm leaving out for now because anything would be fine i guess and the separation can be worked out later if the actual base idea itself is accepted, which seems to be not the case because as you say > Even in this minimal-disruption version, it already is quite disruptive in absolute terms however. But furthermore this, and for the most part any scheme separating fetched and delivered objects, causes a glaring issue wrt to Masto API from which i can extract two concerns: - disruption (in terms of affecting existing code and requiring work, i assume) - masto api (regarding violating the existing assumptions of api clients i assume) > pagination parameters are not opaque values i believe the exact stance on pagination by id in masto is that most pagination ids are considered "internal only", and api clients are *supposed* to use Link headers with rel=next and rel=prev. however, some ids "leak" out of that "internal only" scope because they are public facing -- this is the account and status ids mostly, where mastodon uses snowflake and akkoma uses flake. the idea with max/min/since (+ limit) is to do pagination based on what you already have -- the temporal info encoded into the snowflake/flake is not guaranteed, so api clients shouldn't extract it. > Fetched statuses still will need to have an ID to be able to be viewed and interacted with. At the same time, this ID MUST NOT change even if the activity was first fetched but later also delivered resulting in sorting issues and missed statuses as explained previously. i think the "ID assignment" works like this currently, right? - when an object is discovered, it gets a status id assigned - when an activity is delivered, if the object is newly discovered, it gets a status id assigned this is fine for the status view which mostly does not care about activities; mastodon:Status is transformed from the as:Note usually and not the as:Create. it's an akkoma thing to keep the Creates around; mastodon doesn't generally do this. (maybe they should, but that's a different concern altogether so i am not going to go into it here) what is being described might break down into multiple cases: - statuses - activities - deliveries with the entity in question used to serialize the API response could be a parameter or could be multiple endpoints. i don't think it breaks masto api to say "give me deliveries" and then use delivery ids to paginate, but it could be tricky to make sure that the server understands which id the client is requesting, i.e. if the client gives a status id for a deliveries api, the server would have to translate; but the better design is to wrap entities instead -- similar to how mastodon 4.5 api's Quote entity can refer to a quoted_status https://docs.joinmastodon.org/entities/Quote/ because the "quote" is separate from the "quoted status", just like how the "timeline item" can be considered separate from the "status". (there is a similar construct in the https://docs.joinmastodon.org/methods/search/ results, or how FeaturedTag and Tag have different ids.) broadly, you have the "published date" which is a property of an object and the "discovered date" (received, fetched, synthesized... whatever) which is a property of a wrapper item > It appears to be fundamentally incompatible with Masto API and even if it there was a way to resolve this incompatibility I’m skeptical the disruption and effort required to implement it are worth the payoff. i think the incompatibility can be resolved, but if you're not convinced of the payoff then that's a different matter altogether. for me the payoff is clear and i would greatly appreciate seeing "what i actually received" or as close to it as possible. i would at least want a stream of activities instead of a stream of notes. with an activity stream you could say things like "alice created a status" or "system discovered a status by alice (whom you follow)" and the distinction would make sense: ``` actor: <system> type: Announce object: audience: alice/followers audience: alice/followers ```
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
AkkomaGang/akkoma#1053
No description provided.