[bug] 500 internal server error when federating Like activity from Bridgy Fed #438
Labels
No labels
approved, awaiting change
bug
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs docs
needs tests
not a bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: AkkomaGang/akkoma#438
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Details
tested against akko.wtf, not my own instance
Version
3.5.0-12-g63f2d1cb
PostgreSQL version
No response
What were you trying to do?
Hi! I tried federating a
Like
from Bridgy Fed to this post on akko.wtf (running backend v3.5.0-12-g63f2d1cb), and it failed. Details below. Also tracking here. Not urgent, thanks in advance for looking!What did you expect to happen?
HTTP 200 or 202 on the inbox delivery request to
https://akko.wtf/users/rei/inbox
.What actually happened?
HTTP 500 error with body
{"errors":{"detail":"Internal server error"}}
.This is the same error we got from Pleroma, so it's probably activity handling code that hasn't changed since the fork. I'm guessing it choked on some part of the AS2 that's a composite object when it expects a string, maybe
actor
.Bridgy Fed log here. Full AS2 object we delivered is below.
Logs
Severity
I cannot use the software
Have you searched for this issue?
my guess would be that the usernames are failing validation since they contain restricted characters, and thus failing the http signature check
I'll have to check , but a rejection would probably be the correct behaviour
Hah, true! Afaik neither the AS2 core nor vocab specs define allowed or restricted characters for
preferredUsername
, but regardless, you're right, that Markdown value is clearly wrong. Good eyes, thanks for the catch, I'll fix and try again.Ah, my mistake, that was a copy paste artifact from the Bridgy Fed log where it got auto-linked. The actual
preferredUsername
value was justsnarfed.org
. I've updated the activity JSON in the issue here.I tried debugging, but got stuck :( Here's what I did in case someone else wants to debug further:
content-type: application/activity+json
curl -v 'https://fed.brid.gy/r/https://snarfed.org/2023-01-18_luna-nova' -H 'accept: application/activity+json' | jq .
Next thing I guess is to see if we can trigger a Like from a Birdy Fed instance and follow in Akkoma what happens with it, but at first glance, that seems more involved than simply making an account and pressing a like button (pls tell me if I'm wrong, or what would be the easiest way to try this).
[1] Click to expand
I add the following test to test/pleroma/web/activity_pub/activity_pub_controller_test.exs under `describe "/inbox"`. Note that I change the object and actor to ones who I added first in the test. The object shouldn't matter, and I assume the actor also isn't the problem since we can properly fetch that one.Note that I change the actor id, but keep it an object.
test/fixtures/fedi-birdyfed-like-activity.json has the following content I got from the OP (I also tried with the content I fetched with
curl 'https://fed.brid.gy/r/https://snarfed.org/2023-01-18_luna-nova' -H 'accept: application/activity+json'
). Note that I changed the id to use domainhttp://localhost:4001
. This is because we otherwise get a containment error.yeah I did some similar testing, to the same result
whatever is happening, it's happening during http sig verification
and bridgy is... not the easiest thing to get a test instance of
Thanks for all the sleuthing, sorry this is hard to test! I'm happy to federate another like from Bridgy Fed whenever you want.
sending the content of the
Like
with signature verification off ends up with it processing just fine, so yea it's 100% in sigsnow to try and isolate the part of bridgy that does that, this will be fun
af769de99e/common.py (L120-L151)
hm, after a rather... painful time trying to extract it, it seems to process fine :<<<
i'm going to have to debug this in prod aren't i
More examples of this 🤷🏻♀️
And, I don't know if it's the same but similar issues in Tootik.
Hi! Sorry this hasn't been easier to debug. Happy to help if I can!
New example activity below that https://idiomdrottning.org/users/Sandra/inbox (@snan which Akkoma version?) 500ed on just now. Often this is because there's a full object (which is still valid AS2/AP) in a field where the receiving server (Akkoma here) expects just an id, eg
attributedTo
below. Not sure if that's contributing here.The version I was running when these requests happened, and still am running as I am writing this, is git commit
ebfb617b26
probably better known as one commit after the v3.10.4 tag. (It was the head ofstable
when last I recompiled.)@snarfed, could it be some signature issue? That's another area where Akkoma can be pretty strict 🤷🏻♀️
Hmm! Other fediverse servers have been accepting Bridgy Fed's signatures for a long time, but sure, it's definitely possible.
BF generates HTTP Sigs based on the cavage 12 draft standard. It includes the Date, Host, Content-Type, Digest, (
SHA-256=...
), and special(request-target)
headers. Code: https://github.com/snarfed/bridgy-fed/blob/main/activitypub.py#L434-L487I didn't dive too far into your implementation, but one thing that struck me in the comment was the (request-target) header
this is actually a pseudo-header that will not make it through most http clients/reverse proxies
Oh! And I am on a rev proxy!
True!
(request-target)
isn't actually sent as an HTTP header, it's a special synthetic value that's only included in the HTTP Sig.https://datatracker.ietf.org/doc/html/draft-cavage-http-signatures-12#section-2.3
https://docs.joinmastodon.org/spec/security/#http
I started compacting
actor
,author
, andattributedTo
down to just string ids in outgoing activities, and that got Akkoma to accept aLike
! Replies are still 500ing though. Example delivered to https://akko.wtf/users/rei/inbox just now:Hi again! Friendly ping, looks like this is still happening.
Accept
s of follows are affected too, along with replies and others mentioned above.@snarfed
Here are some (hopefully) relevant log messages from today:
and
I am behind an nginx rev proxy 🤷🏻♀️
not sure why nginx factors into this, it's pretty clearly an issue validating outgoing requests, and is probably caused by your machine lacking ca-certificates or equivalent for your operating system
additionally those are just following counters, it's non-fatal
The reason I mentioned I was behind a rev proxy was just in case it mattered since you earlier said that that might drop some pseudo headers.
We use let's encrypt 🤷🏻♀️
It's not me that said
TLS :client: In state :hello received SERVER ALERT: Fatal - Internal Error
, that came from the log.has it become less impossible to run a local instance of the software in question? last time this issue was raised I attempted to delve into it, only to be rebuffed by it more or less not being possible to test locally
if that hasn't changed, I'm not sure if we can do much about this, as the provided logs don't say much of note
Sorry for the trouble! It's definitely possible to run locally, our docs (and CI) show how, but you're right that it's not easy to use locally to interact with other ActivityPub instances. It's not intended to be user-hosted, so we haven't prioritized that.
I'm happy to test against any Akkoma instance you want interactively though! And hopefully the example activities above help some. I can add more if you want.
The object part of the old reply sample no longer matches what’s served when fetching atm and fetching the create activity is impossible. Could you share an updated sample?
And btw Create activities not being fetchable, to my understanding violates AP spec; except for anonymous (
null
id) and transient objects (no id) must be retrievable via their id (given sufficient perms). See https://www.w3.org/TR/activitypub/#obj-id and the following sectionSure! Here's a reply I delivered to https://ihatebeinga.live/users/FloatingGhost/inbox just now. That POST returned
401
Request not signed
(not 500), even though the request did have a valid HTTP Sig withkeyID
https://fed.brid.gy/snarfed.org#key , and other fediverse servers are happily accepting these sigs and activities.Is this problem maybe that
https://w3id.org/security/v1
isn't in@context
? JSON-LD processing is supposed to be optional in AP, but I should probably still add that.(Re fetching
Create
activities, that would definitely be nice! It's not entirely clear whether the "MUST present" in https://www.w3.org/TR/activitypub/#retrieving-objects is a MUST for serving in general, or only for handling conneg if you serve...but regardless, ideally yes they should be fetchable. The catch here is that fetching ids with fragments is a famously undefined open question, https://github.com/w3c/activitypub/issues/367 , and Bridgy Fed'sCreate
ids only differ from the enclosed object's id by a fragment. We're not alone in using fragments in ids like that, but we're obviously not helping either. Sorry.)there's your issue
fixed via
4457928e32
i don’t think this is it
Fragments and query parameters were already deleted in
key_id_to_actor
;known_suffixes
only deals with actual different paths. In an ideal setting actor <-> key id would be a well specified conversion, but afaik it’s not and thus we ideally shouldn't rely on being able to transform this at all though, but look up the actor and check if the key id matches (but this requires DB lookups and is thus more costly)if that doesn't fix it, then eh
with the utter lack of reproducibility and the only debugging we have being very slow live systems interactions, I will leave this for the bridgy people to solve
Sadly true. https://swicg.github.io/activitypub-http-signature/#how-to-obtain-a-signature-s-public-key
Fair enough!
ok, so i can again confirm that the current full Activity passes successfully through our pipeline once it reaches
Federator.incoming_ap_doc
. Meaning on the upside things should just work™ once the signing issue is figured out, no additional lurking problems.Some comments on the activity though:
cc
andtag
are missing. This is not breaking parsing the document but means servers later fetching the object as part of thread expansion will see it differently than those receiving it through federationto
andcc
contains the public address; Akkoma treats this as a public post but it is conceivable this may confuse some implementations into thinking it’s unlisted/quiet-public/home whatever you want to call itcc
contains the follower collection address of the person your sending to; i don’t think you should do this (followers of person A want to get posts from A, not from whoever decides to use A’s following address) and Akkoma strips this outDebugging the signing issue without being able to locally test things is hard, but here are some pointers which hopefully help you or whoever decides to pick this up, to determine the issue:
IO.puts("text text #{inspect(variable3)} more text")
mix.exs
with e.g.:
And also updated links for Bridgy’s signing code:
2b449c6d31/activitypub.py (L530)
Compliments for actually resigning on redirects btw, most fedi software doesn't (necessitating workarounds to not break federation) albeit it’s definitely the right thing to do ;^^
On cursory loook though (i might easily miss something here, sorry if that’s the case), doesn’t
from_user
get lost on redirects and there’s no protection against redirect loops stalling everything?For a reply, this is correct imo. It's true that most micro blogging software on fedi don't do this, but the spec actually asks you to address the followers collection of who you reply to[1], so bridgy is the one doing it correctly.
If people on the target server don't want to see these, then it's up to that server to handle that[2]. (Which Akkoma apparently does by stripping it out, so OK.)
See https://www.w3.org/TR/activitypub/#inbox-forwarding
[1]
[2]
Thanks for clarifying, TIL!
One more remark i forgot to add before:
The ActivityStreams profile of JSON-LD restricts it such that no JSON-LD processing is required to parse a document, but it’s still a JSON-LD document and yes not specifying all used contexts can lead to interop issues with implementations which do employ JSON-LD processing (not Akkoma but see e.g. #717). The ActivityStream 2.0 spec also mentions this:
@Oneric thank you for confirming that! And for all the info and sleuthing, I really appreciate it. Sounds like
https://w3id.org/security/v1
missing from@context
is our clearest lead so far, I'll add that and try again and keep you all posted.Huh, this is news to me, I'll definitely look.
Good questions! Our GETs should all be signed by the instance actor, and POSTs shouldn't follow redirects, so in practice this shouldn't matter, but it'd be good hygiene to still pass
from_user
, agreed. I'll do that.And you're right, I don't explicitly protect against redirect loops, I let it fall through to the overall request handling deadline. Not a high priority in my threat model, but couldn't hurt to add. Thanks!
Actually, now that I think about it, our outbound activities like #438 (comment) don't include
https://w3id.org/security/v1
in@context
because they don't use anything from that context. Only actors do, inpublicKey
etc, and Bridgy Fed's actors do include it, eg https://fed.brid.gy/snarfed.org .I'm still happy to add it if it'll fix things on the Akkoma side, but now I suspect that it won't...
it won’t fix federation with Akkoma, since Akkoma doesn’t do JSON-LD processing; including it for actors might fix federation with some other servers though (similarly for other Mastodon etc extensions)
Yup, thanks. Our actors do include it. So, the sig incompatibility needs more investigation. OK! Thanks again for the links and info. Not my top priority right now, but I'll circle back and dig into this again sooner or later.
4457928e32
sounded likely, but I know you said that wasn't it. I'll also re-plug https://swicg.github.io/activitypub-http-signature/#how-to-obtain-a-signature-s-public-key as maybe the closest thing we have to an authoritative process for fetching a key and verifying a signature in the fediverse. 🤷it seems unlikey the problem is matching key to actor, else (in recent'ish) akkoma errors would already be logged when fetching the actor, but i didn’t see any such errors
@snarfed if you could provide a full request, including all headers exactly as sent it might also be possible to tell more or maybe even reproduce this without a real local bridgy instance.
If it’s not easily possible to directly grab outgoing requests from Bridgy itself, temporary setting up an instance of ilja’s ap_logger may be helpful
@Oneric sure, thank you for the offer! Here's a like I sent for https://akko.wtf/notice/AjAjPS5ikUpMpWGqP2 just now:
huh, turning this into a curl request for my local dev instance running current develop (
3ff0f46b9f
) and adding some logging print shows... the signature is successfully verified?It then still errors later because the localhost user doesn’t show up in recipients (altering the body predictable leads to sig failures due to digest change), but iirc we tested injecting some
Create
s before and even if, such later failures shouldn’t yield a signature error response...Note: i manually fetched the
https://fed.brid.gy/snarfed.org
actor before, but since brid.gy doesn’t appear to require auth fetches for actors, this doesn’t seem like a bootstrapping issue eitherMIX_ENV=prod
or the defaultdev
env doesn’t appear to make a differenceThe tested versions should match what’s deployed on akko.wtf apart from some doc changes
Are you sure this really exactly matches what other servers receive; Does some intermediate framework or proxy maybe alter something?
curl request for local dev instance
Extra logging diff
Debug log output
EDIT: Perhaps worth noting i replaced some double backslashes
\\
in the originally posted request with a single backslash\
each, else it wasn’t valid JSON (and gave a different error). I’m assuming this is just an artefact of whatever this was copied fromThanks again for looking at this!
Good question. It's possible that the hosting platform I'm on (Google App Engine) adds headers - which shouldn't change sig verification, right? - but I'm pretty sure it doesn't modify existing headers or the body.
Good catch, thanks, sorry about that!
...actually, this request is sent over TLS, and I think my stack does the encryption and constructs the HTTP request, so App Engine shouldn't be able to alter or add anything.
Can you set up a temporary ap_logger (see previous messages) instance? This way you’ll be able to collect the actual request representation on the receiving end and compare it to what you expected
Sure! I've attached an ap_logger log of another inbox delivery of a
Like
. Looks like it's missing one header that the previous request had,Connection
. I'm guessing that's HTTP 2 vs 1.1, or network path specific or something similar.Capitalisation of Header names is also different though i’m not sure if that’s due to ap_loggers http lib
Do the contents of all other headers, e.g.
Date
, also match what you expected/seen on the sender site? Ideally we’d get the same request logged from both the sender and receiver sideHere's my record of that request from the sender side. Apart from header capitalization, looks like everything's the same, including
Date
.Thanks! Unfortunately this didn’t bring me any closer to figuring out what’s going on though.
I checked again and regardless of what happens before, Phoenix/Plug will always lowercase header names anyway.
A tweaked request version still passes signature verification like before.
Even enabling
authoriued_fetch
and putting (plain-HTTP) nginx in front of my local test server doesn’t change anything about signature verificationTo check i also sent the same curl request to real akko.wtf, but unlike on localhost it fails with a "no signature error" (which might also happen if the signature was invalid and auth-fetch is enabled iiuc).
Since this error apparently persists across all Akkoma instances it also doesn’t seem to be a misconfiguration specific to akko.wtf. Perhaps try interacting with an Akkoma instance which does not require authfetch to get another data point, e.g.: https://fedi.absturztau.be
A
curl
ified full, signed request is quite helpful, but I’m afraid further debugging needs direct access to an affected live server setup which I don’t haveFow whatever it's worth I've had AF off when trying to use the bridge so I'm not sure it's AF-related 🤷🏻♀️
I successfully sent a
Like
to https://fedi.absturztau.be/notice/AjzA414GNoB0Wh1g4u via Bridgy Fed just now!So...maybe it's AF after all?
That's great! Maybe something else has changed because I've had AF off for quite a while. My instance name is Idiomdrottning.
I'd love to get bridged https://bsky.app/profile/Sandra.idiomdrottning.org.ap.brid.gy still shows 0
Ah, looks like you hit one of BF's spam filters, it currently expects that your display name is different from your username: https://fed.brid.gy/docs#troubleshooting
That requirement probably has too many false positives, I should remove it!
Done! @snan try unfollowing and re-following @bsky.brid.gy@bsky.brid.gy
@Oneric, not sure where this leaves us. AF seems like maybe a lead, but if so, I don't understand why. BF is happily fetching objects from akko.wtf (eg https://akko.wtf/objects/e60cda8e-17fb-4b99-b732-fb8a6553d7b2 ) with GETs signed with
keyId=https://fed.brid.gy/fed.brid.gy#key
.Here are some of the other data points so far afaict:
Like
to fedi.absturztau.be successfullyLike
to akko.wtf, it getsHTTP 401 ... Request not signed
backkeyId
isn't the issue, which makes sense, since other big projects like Mastodon also have fragments in theirkeyId
s...so maybe AF is still our best lead, even if we don't understand why...?
🧎🏻♀️
I’ve just rechecked this and made sure the setting applied correctly, but as mentioned before the sample request still works for me locally even with authfetch enabled. Authfetch might be a necessary but apparently not the only component here.
There are two places reacting to
authorized_fetch_mode
ensure_http_signatures_plug
which ends up rejecting the request with a not quite accurate "no signature" message (same message for no and invalid siganture)(i know without authfetch domain blocks can't be reliable enforced on access, but idk why it isn’t even trying without authfetch)
iirc (server currently down due to power issues) akko.wtf at least doesn’t disclose a block for
fedi.brid.gy
, although bsky.brid.gy was (or maybe still is if lifting was forgotten) temporarily blocked due to technical issues early on.ihatebeinga.live doesn’t disclose a block for any
*.brid.gy
domain yet was also affected.Someone with direct admin access to an affected instance will need to take a closer look at this.
Possibly helpful extra debug logging patch
Still didn't fly 🤷🏻♀️ Only says "request sent".
Looks like you're now hitting this same Akkoma issue. Bridgy Fed sent both an
Accept
and aFollow
from@bsky.brid.gy@bsky.brid.gy
to https://idiomdrottning.org/users/Sandra/inbox , and it got500 Server Error: Internal Server Error
responses for both.@snarfed That's great, good to know that there's not any AF or username issues and that it's instead this mysterious issue. Thank you so much for all your hard work on this bridge. 🙏🏻
For what it's worth, I just ran into a very similar issue with my own AP software.
What I discovered is that akkoma makes a request to the actor's key url with an
Accept
header ofapplication/ld+json; profile="https://www.w3.org/ns/activitystreams"
- if theContent-Type
of the response is not exactly this value, (at least, it needs to contain the base content typeapplication/ld+json
and the profile), the signature is rejected regardless of the response content.This is required by ActivityPub spec; using any Accept header which doesn’t contain this content type to request AP objects is violating spec.
Note *oma historically used a different header which is also tolerated by most implementations and this issue was filed before the header was fixed.
I believe this is incorrect. All of the following return conten types are accepted:
application/ld+json
with the AS profile (atm only if AS is the sole profile; after #814 multiple profiles are allowed if AS in one of them)Spec requires servers to present their AP objects with this Content-Type; if you don’t you’re breaking spec
application/activity+json
which the spec suggests but doesn’t require to be accepted. Notably Mastodon uses this type for its objectsIn particular accepting JSON-LD without the AS profile is out of question since
a. we are notable to do JSON-LD processing and rely on the representation guarantees given by the AS profile
b. JSON-LD is used outside of ActivityPub so the returned object meight not even be intended to be a valid AP object but be e.g. a user uploaded file. We cannot accept this for security reasons
It may be incorrect, but it's not true - previously, I was returning content type
application/activity+json
, and in the process of debugging I also tried returningapplication/ld+json
with no profile, both of which are rejected by akkoma.(I'm fixing this in my software, and I'm not implying that akkoma is behaving incorrectly here - I just made this note to maybe help debugging the bridgy issue, since I came across this while trying to debug mine)
application/activity+json
is explicitly accepted in the code and Mastodon returnsapplication/activity+json
while working fine with Akkoma. I’m fairly sure — while incorrect per spec — declaring AP objects asapplication/activity+json
doesn't break anything wrt federation with Akkoma.