Incoming Follow activity gets undone by (re)handling Undo of a different Follow #1120
Labels
No labels
approved, awaiting change
broken setup
bug
cannot reproduce
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs change/feedback
needs docs
needs tests
not a bug
not our bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
AkkomaGang/akkoma#1120
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Your setup
From source
Extra details
Arch Linux
Version
3.18.0-248-g479c7288-develop
PostgreSQL version
18.3
What were you trying to do?
accept a follow request
What did you expect to happen?
it gets accepted without fail
What actually happened?
it fails
Logs
Severity
I cannot use the software
Have you searched for this issue?
context
non-working nginx config line
server_name media.akkoma.trwnh.com akkoma.trwnh.com;
working nginx config line
server_name akkoma.trwnh.com media.akkoma.trwnh.com;
the problem
nginx $server_name variable only gets set once per server block, NOT based on the Host header of the request
buggy http to https redirect
return 301 https://$server_name$request_uri;
correct http to https redirect
return 301 https://$host$request_uri;
summary
because akkoma's suggested nginx config uses the same server block for both the instance and the media subdomain, $server_name will cause issues and the correct nginx variable is actually $host
external requests are generally made to https and will generally always work
internal requests are made to http / port 80 and were getting redirected to the media subdomain (since it was the first one declared in the server block)
somehow the only noticeable side effect for me was that 1 specific remote user's follow requests would always fail even after i accepted them (although other users on the same remote instance were accepted successfully)
suggested solutions
option 1 is to put out a psa and note prominently in the nginx config that server_name should put the instance domain first and the media domain second. but this does still have the issue of http media redirecting to https instance (not ideal solution)
option 2 is to change the nginx config to use separate server blocks for the instance domain and the media domain, then put out a psa that the nginx config has changed. this repeats the same server code block but fixes the issue without additional issues
option 3 is to change the nginx config to use $host instead of $server_name because really that is what it should have always been. then put out a psa about this so people can check their configs in case they've been silently affected this whole time like me
i recommend option 3
a referenced this issue2026-05-06 08:00:52 +00:00
This is an actual issue, although if the template is used it only means if someone takes a media URL and on their own decides to replace
httpswithhttpit won’t resolve.The template already puts the main domain first.
There is no such thing as “internal [HTTP] requests”. Akkoma never ever performs AP fetches on its own domain (and if it somehow tries anyway, a safety check will make it error out early), nor does it make REST API requests to itself (except of course in the test suite). I have no idea how you got this idea.
Your logs also do not show anything using a
http://scheme in queries or data.There’s the
Server: domain:80 (http)line, but this seems unrelated to actual port 80 usage; it shows up in my logs too eventhough the redirect was already correct and Akkoma does not listen on port 80 at all, nor is Akkoma’s ports accessible to the outside. Might be phoenix just notices the (internal)httpusage and autocompletes the port to the default value.Either way, this still wouldn’t make sense since even the broken redirect should prevent anything from being able to access Akkoma itself via port 80.
I do not see any
port: 80bits in my logs though. Are you listening on actual port 80 (in a contianer / on a different IP so it doesn’t conflict with nginx port 80 usage)?In any event, your Follow-request issue is most certainly unrelated to the nginx HTTP redirect.
The match error suggest it encountered some unexpected error while trying to update the FR state and federating the accept out.
Check the state / data of both the follow request activity in the
activitiestable and thefollowing_relationshipstable. For unfathomable reasons the state is tracked at both of these places and most likely something desynced.Alrenatively, reject the FR request and temporarily block the other user. This should clear out all FR states. After unblock a new FR will likely just work.
Unrelated to the issue, but you just publlished your OAuth token as part of the debug logs.
You’ll want to log out/in again and go to
Settings → Securityand invalidate the leaked token if still there.And you are lacking two security releases behind.
[bug] nginx config uses $server_name and multiple server_name, leading to broken http->https redirects on internal requeststo Cannot accept follow requestlogs were taken before an update so i'm no longer behind. also i cleared my tokens, thank you for the heads-up -- i wasn't even aware that tokens were logged
This line of reasoning was based entirely on the
Server: akkoma.trwnh.com:80 (http)line, which made it seem to me that Phoenix / Cowboy was trying to access the domain over http instead of accessing localhost + the configured port. I guess it was a red herring, since it was at the top of the most important-looking block (the stack trace of the error itself), but as you say,As for my own usage,
No, my config uses this in prod.secret.exs:
it's weird that the follow request managed to be successfully accepted after i changed this, though! my procedure with testing this:
then i changed the order of the server_name values so that akkoma.trwnh.com was first (before media.akkoma.trwnh.com)
so that's what got me going down the rabbit hole of the nginx config -- the observation that it seemingly fixed the issue!
now, i say "seemingly", because after some indeterminate time of the follow relationship being successfully established, it looks like it's undone itself at some point within the time window starting a few minutes after accepting the request and ending about two hours later.
i've tried this in the past to no success, but haven't tried this recently (yesterday). i guess i can try to coordinate this during a time when the other person is active so i don't end up making it worse by no longer following them and missing their posts.
here's what i'm seeing:
there is no Follow activity in the
activitiestable whatsoever for the affected user.meanwhile for
following_relationshipsi don't know how to make sense of this since the ids don't seem to correlate to anything else?most of them have state=2, and a few have state=1. the timestamps range from 2023 to 2026-04-30.
i can guess my own id at least:
and all of those have state=2 which seems expected
i'm not sure how to convert an https id to a
follower_idin thefollowing_relationshipstable.but after poking around again in the
activitiestable, it seems the latest 2 Accept activities are still there?i'm not sure why there are 2 of them, or why only the latest 2.
for what it's worth, my
activitiestable doesn't contain either of the referenced Follow activities, while in the case of other Accept activities I am able to find the accepted Follow in myactivitiestable.if i had to guess, something is causing the Follow to be deleted from
activities, so even if the Accept remains, it references a no-longer-existing Follow activity.And that’s how you end up with a
nilin the error clause which won't match anything. So indeed a desync as suspected.sorry, forgot to reply to this bit:
follow*_ids are referencing the primary ID of users in the database also matching their API id (but PostgreSQL doesn’t understand the base62 encoding directly; you’ll need to convert to hex).You can look up the primary key of a user in the
userstable, likeSELECT id FROM users WHERE ap_id = 'https://…';or use aJOIN users AS u ON follower_id = u.idetc to get human-friendly output when inspecting the followers relation table.The menaing of the state enum values can be found in
lib/pleroma/ecto_enums.ex:iirc
follow_rejects don’t survive long and get cleared out after a while (same for activities of rejected FRs)I'm wondering if it makes sense to query for Accept activities whose object isn't present in the
activitiestable, and then if it makes sense to purge such Accept activities, or purge follow relationships based on such Accept activities?The problem is that Accept activities can be used for objects that aren't a Follow activity, so this might not be a safe operation.
I don’t know if Accepts are already pruned too when a FR is rejected or the accept
Undone. If not, yes some space could be saved in theory by getting rid of them.But either way, I don’t think this is related to the desync issue you experienced. There shouldn’ŧ be any issue with multiple Accepts.
There's two aspects here:
The problem is I don't know exactly what to test/query for. All I have managed to reveal so far is that there are Accept activities whose object isn't known locally. Ideally you would be able to fetch the object, but in the case of Follow activities they aren't fetchable on some implementations, and it's not guaranteed in general that all objects are fetchable. I guess it's fine to continue storing them, but they are kind of "reverse orphaned".
Maybe what I should be examining is the relation between Follow
activitiesandfollowing_relationships, which is where the JOIN statement can come in handy I guess. or otherwise:So the following relationship between the remote user and my account doesn't exist in the
following_relationshipstable.And as previously established, the Follow activities don't exist in the
activitiestable, either... anymore?And it seems reasonable to assume that the reason that trying to accept the follow request immediately fails is probably something like this:
If I'm wrong about any of that, please correct me.
If that's true, then why might the Follow activity be disappearing on my end? I'll try to get my friend to send another follow request, which I will leave pending, and then run more SQL queries to see what happens.
The desync I’m speaking of is between your
follow_relationshipsand youractivitiestable. As the processing works now, all entries in the former MUST have an associatedFollowand depending on state alsoAcceptorRejectactivity in the latter and thestaterecorded in the dedicated table and inlined into theFollowactivitiy MUST match. If only the activity, or only the relationship entry exist (or both exist with different state etc) then the tables are desynced. Only action on your end should be needed to wipe this clean.If as you say you still see the FR popping up in API responses, this seems literally impossible. The FR index API endpoint simply scans the
follow_relationshipstable, nothing else (accepting a FR however, needs the activity too to change its state).def get_follow_requests_query(%User{id: id}) do__MODULE__|> join(:inner, [r], f in assoc(r, :follower), as: :follower)|> where([r], r.state == ^:follow_pending)|> where([r], r.following_id == ^id)|> where([r, follower: f], f.is_active == true)enddef get_follow_requesting_users_with_request_id(%User{} = user) doget_follow_requests_query(user)|> select([r, follower: f], %{id: r.id, entry: f})endIt indeed first tries to find the
Followactivity, but upon failure simply stops (and ends up without a matching error handling case). It won't ever touch the relationship table at all. Thus nothing gets "destroyed".If such a desync occurs, the missing entry most likely was simply never persisted to the db due to e.g. the questionable transaction split in side effect handling, also mentioned in
akkkoma#888and some fatal error occuring between the two or in the later stage.Maybe streaming / push notifications? That could explain why I see it appear in my notifications, but manually checking the follow requests API shows no pending follow requests, and refreshing the frontend makes it disappear (except that one time when it didn't, and I was able to briefly accept it before it undid itself).
Probably worth adding error handling for that case, right? Not sure what that would be, but at the very least logging it and pointing out the Follow activity doesn't exist.
That seems reasonable -- I guess the thing being destroyed is the notification, not the follow request.
If the Follow isn't being persisted, then I guess it could be failing validation somehow (unlikely, both instances are Akkoma 3.19 here), or after it passes validation, it fails persisting (not sure where in the codebase to look for persisting a Follow activity)
Context
OK, managed to capture some more logs: https://gist.github.com/trwnh/ec6101fb698cb936c7c1a187e5d54442
Key terms of interest:
https://xyzzy.link/activities/196721e3-68f7-43e9-93d5-9dc296d9f9d9= the Undo Follow I received when the remote user cancels their pending follow requesthttps://xyzzy.link/activities/331b3902-7638-429b-9334-cabadda2b44d= the Follow activity being undone as the Undo.objecthttps://xyzzy.link/activities/4d85f170-1cd9-4cd9-9bda-6b5f2dec7b61= the new Follow activity I received that disappearedAQFT4gmoLbbW6TrjMG= the remote user trying to Follow meAQDDR47on7cS3eqxZQ= my userB68miR5szCpnMgPWZk= the new Follow's activities.id (corresponding to the follow request that gets pushed to notifications)The first point of interest is in 1.txt:
Then a few seconds later, the stuff in 2.txt happens:
03:09:13.261-- The Follow arrives via POST /inbox03:09:13.277-- Handling the Follow activity begins03:09:13.290-- /following collection is refetched for some reason03:09:13.461-- /followers collection is refetched for some reason03:09:13.504-- Trying to push follow relationship update to... the remote user trying to follow me?03:09:13.517-- Insert a follow request notification immediately after this line03:09:13.519-- Insert a timeline marker for the notifications timeline immediately after this line03:09:13.539-- Push notification to SubwayTooter03:09:14.262-- Push notification to Toot!Some other stuff not as relevant, so I excluded it from the logs:
03:09:15.592-- GET /api/v1/notificationsThen, the really interesting part, in 3.txt:
03:09:38.843-- I guess Oban is picking up a federator_incoming job...03:09:38.845-- Trying to handle the incoming AP activity, this time it'shttps://xyzzy.link/activities/196721e3-68f7-43e9-93d5-9dc296d9f9d9(the Undo, which failed to handle in 1.txt) instead ofhttps://xyzzy.link/activities/4d85f170-1cd9-4cd9-9bda-6b5f2dec7b61(the new Follow, should have been handled in 2.txt)... so I guess it's a retry?03:09:38.849-- DELETE from "notifications"... why?03:09:38.850-- DELETE from "activities"... why? !!![This is where the Follow activity gets purged]!!!03:09:38.851-- Committed the SQL transaction03:09:38.852-- Query the following_relationships... twice? with the same parameters?03:09:38.853-- DELETE from "following_relationships"...03:09:38.855-- something to do with my user, where the following_relationships.state == :follow_accept03:09:38.857-- refetching the remote user via AP03:09:38.858-- get their /following03:09:38.986-- get their /followers03:09:39.014-- something to do with the remote user, where the following_relationships.state == :follow_accept03:09:39.014-- Trying to push follow relationship update to... the remote user trying to follow me, again?03:09:39.015-- Querying for a Follow activity from that remote user to my user03:09:39.017-- UPDATE the Oban job to mark it as "completed"After that no more results for the key terms of interest.
So naturally I was interested in what
https://xyzzy.link/activities/196721e3-68f7-43e9-93d5-9dc296d9f9d9was supposed to be, but it wasn't persisted either. Still, looking through the logs, it looks like this is actually the unhandled Undo Follow from 1.txt! It's undoing a Follow with an id ofhttps://xyzzy.link/activities/331b3902-7638-429b-9334-cabadda2b44dinstead ofhttps://xyzzy.link/activities/4d85f170-1cd9-4cd9-9bda-6b5f2dec7b61, so I'm not sure why it would later be (re)processed to Undo a completely different Follow...The Oban job in 3.txt has an id of
bf09e703-3324-4a5f-90fc-c5b3080ff0c8, and it appears in the logs at these points:03:09:07.630-- 1.txt, with the unhandled Undo Follow03:09:13.276-- 2.txt, with the POST /inbox and the new Follow03:09:13.532-- 2.txt, somewhere in the middle of INSERT into "notifications"03:09:38.843-- 3.txt, the job gets processed and destroys stuff03:09:39.024-- 3.txt, the job is marked as "completed"Theory
I think this issue is due to out-of-order processing, combined with confusion about Undo.Follow.id for some reason. It looks like the Undo Follow arrives necessarily (to cancel the pending frq), then the Follow gets processed and wiped out 25 seconds later when the Oban job is processed. The deletions occur despite the Undo.Follow.id and Follow.id being different.
If I can accept the follow within ~25 seconds or so (before the Oban job undoes it), I think it successfully sends an Accept Follow to the remote instance. But that's an incredibly tight time window, so the most likely outcome is that the user tries to accept the follow after it has already been undone, leading to a no-op and the nil error observed earlier. Also, even if the Accept Follow goes out, it ends up being a state desync between the two instances, and you can't remove that follower or you might forget that they're following you since they aren't shown as a follower. They should still see your posts if at least 1 other user is locally known to follow you.
Cannot accept follow requestto Follow activity gets undone before the follow request can be acceptedFollow activity gets undone before the follow request can be acceptedto Incoming Follow activity gets undone by Undo of a different FollowIncoming Follow activity gets undone by Undo of a different Followto Incoming Follow activity gets undone by (re)handling Undo of a different FollowThis seems plausible for the processed FR retraction ad readd, but probably doesn’t apply to the initial FR.
Indeed, in
transmogrify.ex::handle_incoming_normalised/2, it simply checks for theobjectin the (required to be inlined (according to comments inlining is also hard-required by Mastodon)) undoneFollowand later steps then look up the matchingFollowactivity based on the followee, follower pair.Note though, Mastodon and I believe *key and likely many more will simply create the same (unresolvable)
idfor all follow activities with the same follower-followee pair. Checking the id will thus only help when the request comes from an implementation like *oma. But generally, retracting and resending a FR in rapid succession is simply not safe with common fedi implementations.@norm looks like you originally added this in https://git.pleroma.social/pleroma/pleroma/pulls/3553 (merged via
c2dcd767cf); do you still remember why it doesn’t use theFollowactivities’id? Were there some implementations federating FRs as transient activities? Or just a convenience decision since checking ids does not actually help for non-*oma remotes?(Mastodon at least already used ids For Follow activities; albeit they were not and still aren’t actually unique for repeated requests)
Ah, right. >_<
Misskey used to do that, but later switched to generating ids for each Follow (https://github.com/misskey-dev/misskey/pull/10600), but they generate those ids local to the Misskey server instead of using the original id (https://github.com/misskey-dev/misskey/issues/11015).
Mastodon I think generates a different id for each Follow, but the id is a fragment id.
mastodon/mastodon@bbb3392dbe/app/serializers/activitypub/follow_serializer.rb (L8)I don't think any fedi implementations warn users about this, do they? Nor do they apply any mitigations to ease this concern.
It's the kind of thing where it makes sense when you say it, but it's not obvious beforehand. Like the Egg of Columbus. I'm probably one of the people who ought to be most familiar with the inner workings of fedi, and even I didn't really consider this as something I was distinctly aware of -- at least enough that it didn't come to mind as a possible explanation for what was going on. (Granted, I was expecting the different ids to possibly be understood, so I wasn't expecting that it might not be understood.)
It's not clear what the correct behavior should be in all cases, either. I'd definitely have to think about it for some time before I could come up with a proper description.
What is
objectin this file? Since the closes thing to a definition in this file I saw isattribute :virtual_object, key: :objectandvirtual_objectappears to be the target account, I was assumingobjecttoo refers to the target account, and thus only the follower-followee pair is used in the activitie’sid. But I’m not familiar with ruby and rails and might be misunderstanding what’s going on thereActually, I just remembered I now have a follow relationship to a Mastodon server and can thus check: the
Followactivityidlooks completely different then the second branch in the in the linked code but is justhttps://mastodon.example/<some-uuid>. I guess this means in current versions the first branch always(?) already works and while I can’t tell for sure, presumably the UUID is actually unique to each requestHowever, the test sample suggests Mastodon used to create
https://mastodon.example/user/nameof/follows#id_of_sthURLs back when the ActivityPubUndoFollowcode was added to *oma, so this might be a recent'ish development.I think the key point to focus on here is what to do when the Undo.activity.id isn't there, i.e. when it's undoing a Follow by description rather than by id.
If the id is there, then it should be used. If there is no existing activity with that id, then the Undo is deferred (because that Follow.id could arrive later). But later Follow activities where Follow.id != Undo.object.id shouldn't be dropped.1
If the id is not there, then we can use the description. If there is no existing activity with that description, then the Undo is either dropped (we don't expect to receive an anonymous Follow later?) or deferred (and then, does it Undo any matching Follow after this point? Or does it have a limited time window in which case it will do so? What should that time window be?)
Per
mastodon/mastodon@8bbde181db/app/lib/activitypub/activity.rb (L98-L104)Mastodon will drop Create activities if a Delete for the same object arrived within the past 6 hours. It will also use that same "delete later" function (6 hours) for Undo activities, per https://github.com/mastodon/mastodon/blob/main/app/lib/activitypub/activity/undo.rbI don't know if "wait 6 hours before refollowing someone" is sound advice or grounds for standardization, but if an Undo Follow without an Undo.object.id ends up undoing any Follow that matches the description, then it could be.
Note that in the normal case, an Undo Follow should never arrive before a Follow, but when it does, it can cause problems like this if there isn't an existing Follow to be undone.
The "Misskey uses a local id for all Follows" issue shouldn't be a problem for Undo Follow, because in that case, the Undo and the Follow are on the same instance and the Follow.id should be correct and known by Akkoma. However, it would cause problems for Reject Follow if the Follow.id is checked and found to be invalid due to same-origin concerns or due to the Reject.object.id not being found in Akkoma's activities table. ↩︎