[feat] Filter fake Create activities out of timelines #1053
Labels
No labels
approved, awaiting change
broken setup
bug
cannot reproduce
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs change/feedback
needs docs
needs tests
not a bug
not our bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
AkkomaGang/akkoma#1053
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The idea
When an object is discovered, Akkoma will normalize it to an activity by wrapping it in a fake Create activity.
I would like a way to not have these "fake Creates" show up in timelines. I am not aware of a way to do this with MRF without dropping the post entirely.
In practice, this would probably require two things:
idor not, but that could fail if receiving Create activities without an id.The reasoning
Objects being discovered are typically less likely to be socially relevant, which means that they should not be shown at the top of timelines as if they were received just now. However, they should still show up in profiles at the appropriate datetime.
Have you searched for this feature request?
There’s one big issue with this: it’s possible for an activity to get fetched before it gets actively delivered. Not too common, but I’ve seen it happening before. At this point a "fake" activity already exists and you don’t want to change the database ID anymore (as this would mess up things). Even if we were to then just rewrite the ActivityPub ids (everywhere it shows up) and/or some "fake" indicator toggle, now a bunch of people are going to miss the post because it predates the
min_id/since_idthey are using to fetch updates.I’m sorry to say, but I don’t think this can be realised with reasonable performance and effort.
Using the offset from the publication timestamp like ObjectAge does is probably the saner and more stable heuristic for social relevancy in practice.
However... thinking about it now, even ObjectAge’s
:strip_followhas an issue: sicne it strips the follower address (andas:Public) to keep it from showing up in timelines, belatedly delivered follower-only posts can be stripped of all addressing and become fully invisible to everyone and everywhere.In practice this is likely not too relevant at the moment since we can't fetch restricted posts anyway (due to unconditionally using the instance actor for signing rather than a follower) and ObjectAge thresholds are usually generous enough for typical federation delays. But allowing fetches of restricted posts is something I wanted to implement at some point™ and then this will cause significant issues :\
This should probably be changed to by default not strip follower addresses if the addressing
becomes empty otherwisedidn’t containas:publicbefore restricted content becomes fetchable? This will mean old follower-only posts can again slip into the home timeline as "new" content, but this seems preferable to effectively dropping it completely (and those preferring the latter can change the default).Perhaps we actually need to start back-dating db IDs (after the delay exceeds a generous, configurable threshold) to reliable avoid it showing up as "new" in timelines without stripping it from user profiles. This still has the drawback though that it will then be possible to permanently "miss" posts while traversing a profile or timeline if sufficiently old posts are newly discovered while the traversal happens as no future fetch will ever include this newly discovered content.
It will also fail once we allow fetching of
Create(and other) activities directly instead of only the encapsulatedNote, (Mastodon-style)Question,Article, ... as allowed by #846If you’re fine with relying on the missing AP id for now with all the caveats brought up above, you can simply have the MRF check for the presence of a non-nil
"id"and then do the same thing as ObjectAge’s:strip_followto keep it from home timelines.i can appreciate that this would be disruptive to the codebase and not low-hanging fruit at all, but i would fundamentally make a distinction between "fetched" and "delivered". you can cache http resources within an http agent or processing layer, and you can separately track deliveries when you get a POST to inbox -- then you can associate deliveries with activities. and in the api, you use the deliveries as source of truth rather than simply activities (and sort by timestamp of deliveries too, not the activity/object).
if the distinction between http resources and inbox deliveries is made, i think it shouldn't cause those issues. fetching posts happens within http cache layer and not inbox delivery layer.
maybe? it might be easier to instead assign ids to deliveries...
i don't think i would be fine with those caveats as they seem pretty insufficient ux-wise for what i'd like to do. i'm wondering what you think of the idea about tracking deliveries, though...
If I understand correctly and assuming you want to minimise the disruption, you are proposing adding a new
deliveriestable which basically will point atactivitiesobjects with entries in the former only being created if the associated activity was received in one of our own inboxes. Timeline APIs would then query thedeliveriestable and use its flake IDs for sorting.This in principle would indeed allow robust, delivery-time sorted and delivered-exclusive timelines. Even in this minimal-disruption version, it already is quite disruptive in absolute terms however. But furthermore this, and for the most part any scheme separating fetched and delivered objects, causes a glaring issue wrt to Masto API:
Masto timeline API pagination parameters are not opaque values but documented to directly correspond and affect the ID of returned statuses. Fetched statuses still will need to have an ID to be able to be viewed and interacted with. At the same time, this ID MUST NOT change even if the activity was first fetched but later also delivered resulting in sorting issues and missed statuses as explained previously.
I don’t follow. The described issue applies to
ObjectAges means of removing objects from select timelines regardless of where they come from. A general timeline exclusion just for fetched-only content doesn't address this usecaseDoesn’t look this way
It appears to be fundamentally incompatible with Masto API and even if it there was a way to resolve this incompatibility I’m skeptical the disruption and effort required to implement it are worth the payoff.
Plus I think it does sometimes make sense to include fetched content in timelines. E.g. due do failed earlier attempts (e.g. downtime) a reply in a thread might arrive a significant amount of time before it’s parents are delivered. The parents will be fetched though. The whole conversation showing up at once (or if too old, never at all) seems like better UX than having posts you already read pop up hours later on your timeline. I understand the appeal on the basis of it being a technically clean and satisfying criteria and separation, but the practical benefit seems subjective and limited to fringe cases.
that could be one way to do it, yeah?
the idea is that a "timeline item" is not strictly "the activity itself" but rather "the thing you received, which contains an activity as its payload".
it doesn't strictly have to the be the same timeline api endpoints but api design is something i'm leaving out for now because anything would be fine i guess and the separation can be worked out later if the actual base idea itself is accepted, which seems to be not the case because as you say
from which i can extract two concerns:
i believe the exact stance on pagination by id in masto is that most pagination ids are considered "internal only", and api clients are supposed to use Link headers with rel=next and rel=prev. however, some ids "leak" out of that "internal only" scope because they are public facing -- this is the account and status ids mostly, where mastodon uses snowflake and akkoma uses flake. the idea with max/min/since (+ limit) is to do pagination based on what you already have -- the temporal info encoded into the snowflake/flake is not guaranteed, so api clients shouldn't extract it.
i think the "ID assignment" works like this currently, right?
this is fine for the status view which mostly does not care about activities; mastodon:Status is transformed from the as:Note usually and not the as:Create. it's an akkoma thing to keep the Creates around; mastodon doesn't generally do this. (maybe they should, but that's a different concern altogether so i am not going to go into it here)
what is being described might break down into multiple cases:
with the entity in question used to serialize the API response could be a parameter or could be multiple endpoints. i don't think it breaks masto api to say "give me deliveries" and then use delivery ids to paginate, but it could be tricky to make sure that the server understands which id the client is requesting, i.e. if the client gives a status id for a deliveries api, the server would have to translate; but the better design is to wrap entities instead -- similar to how mastodon 4.5 api's Quote entity can refer to a quoted_status https://docs.joinmastodon.org/entities/Quote/ because the "quote" is separate from the "quoted status", just like how the "timeline item" can be considered separate from the "status". (there is a similar construct in the https://docs.joinmastodon.org/methods/search/ results, or how FeaturedTag and Tag have different ids.)
broadly, you have the "published date" which is a property of an object and the "discovered date" (received, fetched, synthesized... whatever) which is a property of a wrapper item
i think the incompatibility can be resolved, but if you're not convinced of the payoff then that's a different matter altogether. for me the payoff is clear and i would greatly appreciate seeing "what i actually received" or as close to it as possible. i would at least want a stream of activities instead of a stream of notes. with an activity stream you could say things like "alice created a status" or "system discovered a status by alice (whom you follow)" and the distinction would make sense: