Commit graph

9337 commits

Author SHA1 Message Date
225f87ad62 Also allow limiting the initial prune_object
May sometimes be helpful to get more predictable runtime
than just with an age-based limit.

The subquery for the non-keep-threads path is required
since delte_all does not directly accept limit().

Again most of the diff is just adjusting indentation, best
hide whitespace-only changes with git diff -w or similar.
2024-05-31 17:16:40 +02:00
e64f031167 Log number of deleted rows in prune_orphaned_activities
This gives feedback when to stop rerunning limited batches.

Most of the diff is just adjusting indentation; best reviewed
with whitespace-only changes hidden, e.g. `git diff -w`.
2024-05-31 17:16:40 +02:00
fa52093bac Add standalone prune_orphaned_activities CLI task
This part of pruning can be very expensive and bog down the whole
instance to an unusable sate for a long time. It can thus be desireable
to split it from prune_objects and run it on its own in smaller limited batches.

If the batches are smaller enough and spaced out a bit, it may even be possible
to avoid any downtime. If not, the limit can still help to at least make the
downtime duration somewhat more predictable.
2024-05-31 17:16:40 +02:00
3126d15ffc refactor: move prune_orphaned_activities into own function
No logic changes. Preparation for standalone orphan pruning.
2024-05-31 17:16:39 +02:00
8f97c15b07 Merge pull request 'Preserve Meilisearch’s result ranking' (#772) from Oneric/akkoma:search-meili-order into develop
Reviewed-on: AkkomaGang/akkoma#772
2024-05-31 14:12:05 +00:00
3af0c53a86 use proper workers for fetching pins instead of an ad-hoc task (#788)
Reviewed-on: AkkomaGang/akkoma#788
Co-authored-by: Floatingghost <hannah@coffee-and-dreams.uk>
Co-committed-by: Floatingghost <hannah@coffee-and-dreams.uk>
2024-05-31 08:58:52 +00:00
fc7e07f424 meilisearch: enable using search_key
Using only the admin key works as well currently
and Akkoma needs to know the admin key to be able
to add new entries etc. However the Meilisearch
key descriptions suggest the admin key is not
supposed to be used for searches, so let’s not.

For compatibility with existings configs, search_key remains optional.
2024-05-29 23:17:27 +00:00
59685e25d2 meilisearch: show keys by name not description
This makes show-key’s output match our documentation as of Meilisearch
1.8.0-8-g4d5971f343c00d45c11ef0cfb6f61e83a8508208. Since I’m not sure
if older versions maybe only provided description, it will fallback to
the latter if no name parameter exists.
2024-05-29 23:17:27 +00:00
65aeaefa41 meilisearch: respect meili’s result ranking
Meilisearch is already configured to return results sorted by a
particular ranking configured in the meilisearch CLI task.
Resorting the returned top results by date partially negates this and
runs counter to what someone with tweaked settings expects.

Issue and fix identified by AdamK2003 in
AkkomaGang/akkoma#579
But instead of using a O(n^2) resorting, this commit directly
retrieves results in the correct order from the database.

Closes: AkkomaGang/akkoma#579
2024-05-29 23:17:27 +00:00
5d6cb6a459 meilisearch: remove duplicate preload 2024-05-29 23:17:27 +00:00
5bdef8c724 Merge pull request 'Allow for attachment to be a single object in user data' (#783) from single-attachment into develop
Reviewed-on: AkkomaGang/akkoma#783
2024-05-27 01:44:53 +00:00
f15eded3e1 Add extra test case for nonsense field, increase timeouts 2024-05-27 02:09:48 +01:00
da67e69af5 Allow for attachment to be a single object in user data 2024-05-26 17:09:26 +01:00
c2d3221be3 Fix Exiftool stderr being read as an image description
Fixes: AkkomaGang/akkoma#773
2024-05-23 14:44:17 -04:00
b72127b45a Merge remote-tracking branch 'oneric-sec/media-owner' into develop 2024-05-22 19:36:10 +01:00
9a91299f96 Don't try to handle non-media objects as media
Trying to display non-media as media crashed the renderer,
but when posting a status with a valid, non-media object id
the post was still created, but then crashed e.g. timeline rendering.
It also crashed C2S inbox reads, so this could not be used to leak
private posts.
2024-05-22 20:30:23 +02:00
fbd961c747 Drop activity_type override for uploads
Afaict this was never used, but keeping this (in theory) possible
hinders detecting which objects are actually media uploads and
which proper ActivityPub objects.

It was originally added as part of upload support itself in
02d3dc6869 without being used
and `git log -S:activity_type` and `git log -Sactivity_type:`
don't find any other commits using this.
2024-05-22 20:30:23 +02:00
0c2b33458d Restrict media usage to owners
In Mastodon media can only be used by owners and only be associated with
a single post. We currently allow media to be associated with several
posts and until now did not limit their usage in posts to media owners.
However, media update and GET lookup was already limited to owners.
(In accordance with allowing media reuse, we also still allow GET
lookups of media already used in a post unlike Mastodon)

Allowing reuse isn’t problematic per se, but allowing use by non-owners
can be problematic if media ids of private-scoped posts can be guessed
since creating a new post with this media id will reveal the uploaded
file content and alt text.
Given media ids are currently just part of a sequentieal series shared
with some other objects, guessing media ids is with some persistence
indeed feasible.

E.g. sampline some public media ids from a real-world
instance with 112 total and 61 monthly-active users:

  17.465.096  at  t0
  17.472.673  at  t1 = t0 + 4h
  17.473.248  at  t2 = t1 + 20min

This gives about 30 new ids per minute of which most won't be
local media but remote and local posts, poll answers etc.
Assuming the default ratelimit of 15 post actions per 10s, scraping all
media for the 4h interval takes about 84 minutes and scraping the 20min
range mere 6.3 minutes. (Until the preceding commit, post updates were
not rate limited at all, allowing even faster scraping.)
If an attacker can infer (e.g. via reply to a follower-only post not
accessbile to the attacker) some sensitive information was uploaded
during a specific time interval and has some pointers regarding the
nature of the information, identifying the specific upload out of all
scraped media for this timerange is not impossible.

Thus restrict media usage to owners.

Checking ownership just in ActivitDraft would already be sufficient,
since when a scheduled status actually gets posted it goes through
ActivityDraft again, but would erroneously return a success status
when scheduling an illegal post.

Independently discovered and fixed by mint in Pleroma
1afde067b1
2024-05-22 20:30:18 +02:00
marcin mikołajczak
3a21293970 Fix tests
Signed-off-by: marcin mikołajczak <git@mkljczk.pl>
2024-05-22 19:27:31 +01:00
marcin mikołajczak
0d66237205 Fix validate_webfinger when running a different domain for Webfinger
Signed-off-by: marcin mikołajczak <git@mkljczk.pl>
2024-05-22 19:20:02 +01:00
6ef6b2a289 Apply rate limits to status updates 2024-05-22 20:18:08 +02:00
94e9c8f48a Purge unused media description update on post
In MastoAPI media descriptions are updated via the
media update API not upon post creation or post update.

This functionality was originally added about 6 years ago in
ba93396649 which was part of
https://git.pleroma.social/pleroma/pleroma/-/merge_requests/626 and
https://git.pleroma.social/pleroma/pleroma-fe/-/merge_requests/450.
They introduced image descriptions to the front- and backend,
but predate adoption of Mastodon API.

For a while adding an `descriptions` array on post creation might have
continued to work as an undocumented Pleroma extension to Masto API, but
at latest when OpenAPI specs were added for those endpoints four years
ago in 7803a85d2c, these codepaths ceased
to be used. The API specs don’t list a `descriptions` parameter and
any unknown parameters are stripped out.

The attachments_from_ids function is only called from
ScheduledActivity and ActivityDraft.create with the latter
only being called by CommonAPI.{post,update} whihc in turn
are only called from ScheduledActivity again, MastoAPI controller
and without any attachment or description parameter WelcomeMessage.
Therefore no codepath can contain a descriptions parameter.
2024-05-22 20:18:08 +02:00
873aa9da1c activity_draft: mark new/2 as private 2024-05-22 20:18:08 +02:00
34a48cb87f scheduled_activity: mark private functions as private
And remove unused due_activities/1
2024-05-22 20:18:08 +02:00
Alex Gleason
a953b1d927 Prevent spoofing webfinger 2024-05-22 19:08:37 +01:00
76ded10a70 Merge pull request 'Backoff on HTTP requests when 429 is recieved' (#762) from backoff-http into develop
Reviewed-on: AkkomaGang/akkoma#762
2024-05-11 04:38:47 +00:00
4457928e32 duct-tape fix for #438
we really need to make this less manual
2024-05-11 05:30:18 +01:00
bd74693db6 additionally support retry-after values 2024-05-06 23:34:48 +01:00
010e8c7bb2 where were you when lint fail 2024-04-26 19:28:01 +01:00
f531484063 Merge branch 'develop' into backoff-http 2024-04-26 19:06:18 +01:00
ec7e9da734 Correct ttl syntax for new cachex 2024-04-26 19:05:12 +01:00
3c384c1b76 Add ratelimit backoff to HTTP get 2024-04-26 19:01:12 +01:00
2437a3e9ba add test for backoff 2024-04-26 19:01:01 +01:00
ad7dcf38a8 Add HTTP backoff cache to respect 429s 2024-04-26 19:00:35 +01:00
828158ef49 Merge remote-tracking branch 'oneric/fedfix-public-ld' into develop 2024-04-26 18:49:31 +01:00
5ee0fb18cb exiftool: make stripped tags configurable 2024-04-26 18:57:24 +02:00
a95af3ee4c exiftool: strip all non-essential tags
Documentation was already clear on this only stripping GPS tags.
But there are more potentially sensitive metadata tags (e.g. author
and possibly description) and the name alone suggests a broader effect.

Thus change the filter to strip all metadata except for colourspace info
and orientation (technically it strips everything and then readds
selected tags).

Explicitly stripping CommonIFD0 is needed since -all does not modify
IFD0 due to TIFF storing some actual image data there. CommonIFD0 then
strips a bunch of commonly used actual metadata tags from IFD0, to my
understanding leaving TIFF image data and custom metadata tags intact.
2024-04-25 23:00:42 +02:00
163cb1d5e0 exiftool: strip JXL and HEIC
As of exiftool 12.57 both formats are supported, but EXIF data is
optional for JXL and if exiftool doesn’t find a preexisting metadata
chunk it will create one and treat it as a minor error resulting in
a non-zero exit code.
Setting -ignoreMinorErrors avoids failing on such uploads.
2024-04-25 23:00:42 +02:00
b0a46c1e2e Normalise public adressing to fix federation
Due to JSON-LD compaction the full address of public scope
may also occur in shorter forms and the spec requires us to treat them
all equivalently. To save us the pain of repeatedly checking for all
variants internally, normalise inbound data to just one form.
See note at: https://www.w3.org/TR/activitypub/#public-addressing

This needs to happen very early, even before the other addressing fixes
else an earlier validator will reject the object. This in turn required
to move the list-tpye normalisation earlier as well, but since I was
unsure about putting empty lists into the data when no such field
existed before, I excluded this case and thus the later fixing had to be
kept as well.

Fixes: AkkomaGang/akkoma#670
2024-04-25 18:45:16 +02:00
b1c6621e66 Merge pull request 'Read image description from EXIF data' (#744) from timorl/akkoma:elseinspe into develop
Reviewed-on: AkkomaGang/akkoma#744
2024-04-25 12:52:31 +00:00
764dbeded4 Merge pull request 'Accept all standard actor types' (#751) from Oneric/akkoma:all-actor-types into develop
Reviewed-on: AkkomaGang/akkoma#751
2024-04-24 17:09:02 +00:00
80e1c094c7 Merge pull request 'Don't strip newlines in pre' (#709) from snan/akkoma:pre into develop
Reviewed-on: AkkomaGang/akkoma#709
2024-04-24 17:00:34 +00:00
4a0e90e8a8 Merge pull request 'ReceiverWorker: Make sure non-{:ok, _} is returned as {:error, …}' (#753) from Oneric/akkoma:receive-worker-return into develop
Reviewed-on: AkkomaGang/akkoma#753
2024-04-24 17:00:18 +00:00
83f75c3e93 Accept all standard actor types 2024-04-23 18:14:34 +02:00
92168fa5a1 Merge remote-tracking branch 'origin/develop' into who-wants-to-yeet-c2s-i-want-to-yeet-c2s 2024-04-23 14:37:05 +01:00
3e199242b0 remove upload_media from AP representation 2024-04-23 14:35:52 +01:00
Haelwenn (lanodan) Monnier
0c2f200b4d ReceiverWorker: Make sure non-{:ok, _} is returned as {:error, …}
Otherwise an error like `{:signature, {:error, {:error, :not_found}}}`
ends up considered a success.

Cherry-picked-from: a299ddb10e
2024-04-21 20:58:06 +02:00
timorl
09d3ccf770
Read description before stripping metadata 2024-04-19 20:51:54 +02:00
timorl
9da0fe930e
Format, but this time with a non-ancient version of elixir 2024-04-19 18:07:50 +02:00
timorl
2a9db73b4c
Merge branch 'develop' into elseinspe 2024-04-19 17:11:55 +02:00