WIP: Extract keys to their own table, match keyID #816

Draft
floatingghost wants to merge 8 commits from keys-extraction into develop

removes the flimsy and generally awful inherited system of assuming key IDs must be their user's AP ID with a suffix. keeping up with these suffixes was a pain to put it mildly

what this does:

  • creates a new table, signing_keys - User has one Signing Key
  • signing_keys contains a mapping between user_id and key_id, along with the associated PEM(s)
  • precalculates the public keys of local users as well
  • when we get signed request, see if we've already got the key
  • if no, request the URL! this will implicitly create a user if we don't already have it
  • this ALSO assumes that we will only get back a minimal key object, so will then try and dereference from there to the full user
  • returns a minimal user key json if a local user is requested via unsigned request

i have set up a testing instance and made sure it works with all major implementations

for the future:

  • possibility of one user, many keys?
  • can we get away with not preloading as often??? it's annoying
  • is there a better way to handle tests?

Currently 6 tests fail, but the broad scheme works
i will ensure the last 6 work before removing the wip tag

removes the flimsy and generally awful inherited system of assuming key IDs must be their user's AP ID with a suffix. keeping up with these suffixes was a pain to put it mildly what this does: - creates a new table, `signing_keys` - User has one Signing Key - signing_keys contains a mapping between user_id and key_id, along with the associated PEM(s) - precalculates the public keys of local users as well - when we get signed request, see if we've already got the key - if no, request the URL! this will implicitly create a user if we don't already have it - this ALSO assumes that we will only get back a minimal key object, so will then try and dereference from there to the full user - returns a minimal user key json if a local user is requested via unsigned request i have set up a testing instance and made sure it works with all major implementations for the future: - possibility of one user, many keys? - can we get away with not preloading as often??? it's annoying - is there a better way to handle tests? Currently 6 tests fail, but the broad scheme works i will ensure the last 6 work before removing the wip tag
floatingghost added 7 commits 2024-06-30 04:08:12 +00:00
Fix some tests
Some checks are pending
ci/woodpecker/push/build-amd64 Pipeline is pending
ci/woodpecker/push/build-arm64 Pipeline is pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/lint Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
4f9f16587b
Fix about a million tests
Some checks are pending
ci/woodpecker/push/build-amd64 Pipeline is pending
ci/woodpecker/push/build-arm64 Pipeline is pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/lint Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
5acceec133
floatingghost reviewed 2024-06-30 04:09:10 +00:00
@ -339,7 +339,6 @@ defmodule Pleroma.Object.Fetcher do
final_url
end
@doc "Do NOT use; only public for use in tests"
Author
Owner

for anyone seeing this: this was already used everywhere...

for anyone seeing this: this was already used everywhere...
Member

grepping current develop 3ff0f46b9f for get_object($|[^A-Za-z_]) in lib/ only yields:

  • this definition
  • one use inside fetch_and_contain_remote_object_from_id above
  • definitions of an unrelated get_object in Pleroma.Object.Containment

Changing its definition to defp also doesn’t lead to any errors in mix compile, only a warning about a doc string for a private function

Where are those current uses of safety-checks-bypassing get_object and should they be changed to fetch_and_contain_remote_object_from_id?

grepping current develop 3ff0f46b9f36cb6ccdc95ce78e0603de18b193ab for `get_object($|[^A-Za-z_])` in `lib/` only yields: - this definition - one use inside `fetch_and_contain_remote_object_from_id` above - definitions of an unrelated `get_object` in `Pleroma.Object.Containment` Changing its definition to `defp` also doesn’t lead to any errors in `mix compile`, only a warning about a doc string for a private function Where are those current uses of safety-checks-bypassing `get_object` and should they be changed to `fetch_and_contain_remote_object_from_id`?
Author
Owner

very fair point, it would probably be better to use the contain function

very fair point, it would probably be better to use the contain function
Member

The (afaict only one) usage of get_object added in this PR in fetch_remote_key can't use the normal containtment checks since keys don't follow usual AP semantics; their a subproperty of the whole object, so their ID won't directly match the URL. Even non-fragment keys, e.g. GTS, still provide the real key as a publickey subproperty and use the owning actor’s ID for the root object.
A key-specific containment fetch might help avoid the spoofed key IDs you mentioned in the comment below; haven't looked deeper into it yet though (e.g. if owner equals key ID without fragments or key id and owner are on the same domain consider it valid, otherwise also fetch the owner and check it links back to the key ID)

If there really are pre-existing get_object outside of tests and the fetcher itself somewhere, they likely should be switched over

The (afaict only one) usage of `get_object` added in this PR in `fetch_remote_key` can't use the normal containtment checks since keys don't follow usual AP semantics; their a subproperty of the whole object, so their ID won't directly match the URL. Even non-fragment keys, e.g. GTS, still provide the real key as a `publickey` subproperty and use the owning actor’s ID for the root object. A key-specific containment fetch might help avoid the spoofed key IDs you mentioned in the comment below; haven't looked deeper into it yet though (e.g. if owner equals key ID without fragments or key id and owner are on the same domain consider it valid, otherwise also fetch the owner and check it links back to the key ID) If there really are pre-existing `get_object` outside of tests and the fetcher itself somewhere, they likely should be switched over
floatingghost reviewed 2024-06-30 04:13:26 +00:00
@ -0,0 +197,4 @@
key_id: key_id
}
Repo.insert(key, on_conflict: :replace_all, conflict_target: :key_id)
Author
Owner

hm. i now i read this back, i don't like the potential for spoofed key IDs. uniq check should probably be user_id/key_id pair?

hm. i now i read this back, i don't like the potential for spoofed key IDs. uniq check should probably be user_id/key_id pair?
Member

Even with user_id+key_id key spoofing is possible due to only reading data from the HTTP response body in handle_signature_Response and using Fetcher.get_object which doesn't verify the authenticity of its data at all; that’s why its comment said Do NOT use
E.g. an attacker might trigger lookup of key id https://evil.com/spoofed#key but the served response then poses as https://fluffy.example/users/fox#main-key belonging to another user.

If I understand correctly on failure to verify we’ll refetch the key in case it was recently rotated. Until such a refetch gets triggered though an attacker should be able to impersonate whomever’s key they spoofed.

Instead the received data needs to be checked similar to other AP fetches. As mentioned in the other thread, URL comparisons can’t work quite the same way due to key ids not following the usual AP semantics. The checkes decribed in the other thread probably work, but i think it’ll be cleaner to switch to a "refetch once if id differs" approach in fetch_and_contain; we’ll end up fetching the user object then (even for non-fragment keys) and can use the top-level id directly as user_id and extract its publicKey. This allows us to use to share the strict-id logic between normal AP objects and keys

A user_id+key_Id composite primary key would allow shared keys to work if anyone uses them. But to avoid storing key data over and over again a separate _user_key join table would probably be better. Considering such shared keys don’t work with our old key logic either though, i think it’s safe to ignore them for now just like “multiple keys per user”.

Even with `user_id`+`key_id` key spoofing is possible due to only reading data from the HTTP response body in `handle_signature_Response` and using `Fetcher.get_object` which doesn't verify the authenticity of its data at all; that’s why its comment said `Do NOT use` E.g. an attacker might trigger lookup of key id `https://evil.com/spoofed#key` but the served response then poses as `https://fluffy.example/users/fox#main-key` belonging to another user. If I understand correctly on failure to verify we’ll [refetch the key](https://akkoma.dev/AkkomaGang/http_signatures/src/commit/d44c43d66758c6a73eaa4da9cffdbee0c5da44ae/lib/http_signatures/http_signatures.ex#L51) in case it was recently rotated. Until such a refetch gets triggered though an attacker should be able to impersonate whomever’s key they spoofed. Instead the received data needs to be checked similar to other AP fetches. As mentioned in the other thread, URL comparisons can’t work quite the same way due to key ids not following the usual AP semantics. The checkes decribed in the other thread probably work, but i think it’ll be cleaner to switch to a "refetch once if id differs" approach in `fetch_and_contain`; we’ll end up fetching the user object then (even for non-fragment keys) and can use the top-level id directly as `user_id` and extract its `publicKey`. This allows us to use to share the strict-id logic between normal AP objects and keys A `user_id`+`key_Id` composite primary key would allow shared keys to work if anyone uses them. But to avoid storing key data over and over again a separate `_user_key` join table would probably be better. Considering such shared keys don’t work with our old key logic either though, i think it’s safe to ignore them for now just like “multiple keys per user”.
Member

but i think it’ll be cleaner to switch to a "refetch once if id differs" approach in fetch_and_contain

done in #819

> but i think it’ll be cleaner to switch to a "refetch once if id differs" approach in fetch_and_contain done in #819
Oneric reviewed 2024-07-06 01:02:52 +00:00
Oneric left a comment
Member

I’m concerned about the spoofing potential from unchecked get_object, but everything else is just nits

I’m concerned about the spoofing potential from unchecked `get_object`, but everything else is just nits
@ -19,3 +14,1 @@
|> Map.put(:fragment, nil)
|> Map.put(:query, nil)
|> remove_suffix(@known_suffixes)
# Given the key ID, first attempt to look it up in the signing keys table.
Member

comment doesn't match the visible code beneath it

Comment implies the current function will fall back to something else if db lookup fails, but only attempts one function call

comment doesn't match the visible code beneath it Comment implies the current function will fall back to something else if db lookup fails, but only attempts one function call
@ -224,1 +224,4 @@
# FOR THE FUTURE: We might want to make this a one-to-many relationship
# it's entirely possible right now, but we don't have a use case for it
has_one(:signing_key, SigningKey, foreign_key: :user_id)
Member

this new has_one relationship is added, but the old public_key field isn’t removed yet

this new `has_one` relationship is added, but the old public_key field isn’t removed yet
@ -0,0 +6,4 @@
add :user_id, references(:users, type: :uuid, on_delete: :delete_all)
add :key_id, :text, primary_key: true
add :public_key, :text
add :private_key, :text
Member

Might :binary be more efficient to not have to reparse the text representation each time we actually verify/sign something? On the other hand, for local users, serving the text representation via AP fetches might be more common than actually using the key...

Might `:binary` be more efficient to not have to reparse the text representation each time we actually verify/sign something? On the other hand, for local users, serving the text representation via AP fetches might be more common than actually using the key...
@ -0,0 +10,4 @@
timestamps()
end
create unique_index(:signing_keys, [:key_id])
Member

key_id already is the primary key; manually creating an index here just duplicates the already existing unique primary key index

`key_id` already is the primary key; manually creating an index here just duplicates the already existing unique primary key index
Member

Before I forget again:

returns a minimal user key json if a local user is requested via unsigned request

03662501c3 already added a first building block for solving the related federation-bootstrapping issue ~2 years ago by setting limited_ap in mapped_signature_to_identity_plug for user routes if the signature is valid but can’t be mapped to a user identitiy. However limited_ap is never checked afterwards and idk if the valiod_signature can even end up as true if identity mapping fails.

This bit should be removed now that a more comprehensive solution is in place to avoid confusion.

Before I forget again: > returns a minimal user key json if a local user is requested via unsigned request 03662501c3d1aea06526d76177c002e6b8c72766 already added a first building block for solving the related federation-bootstrapping issue ~2 years ago by setting `limited_ap` in `mapped_signature_to_identity_plug` for user routes **if** the signature is valid but can’t be mapped to a user identitiy. However `limited_ap` is never checked afterwards and idk if the `valiod_signature` can even end up as `true` if identity mapping fails. This bit should be removed now that a more comprehensive solution is in place to avoid confusion.
floatingghost added 1 commit 2024-08-20 10:05:24 +00:00
mix format
Some checks are pending
ci/woodpecker/push/build-amd64 Pipeline is pending
ci/woodpecker/push/build-arm64 Pipeline is pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/lint Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
18370767c2
Some checks are pending
ci/woodpecker/push/build-amd64 Pipeline is pending
ci/woodpecker/push/build-arm64 Pipeline is pending
ci/woodpecker/push/docs Pipeline is pending
ci/woodpecker/push/lint Pipeline is pending
ci/woodpecker/push/test Pipeline is pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
This pull request has changes conflicting with the target branch.
  • mix.lock
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin keys-extraction:keys-extraction
git checkout keys-extraction
Sign in to join this conversation.
No description provided.