WIP: fix attachment cleanup after media domain migration #818

Draft
Oneric wants to merge 1 commit from Oneric/akkoma:attachcleanup-uploadmigration into develop
Owner

Originally brought up as apart of #789 and depends on this PR

The current patch requires admins to keep a list of all past domains; this isn’t the most elegant solution, but if properly set works reliably and also reduces points of concern for a future™ migration of the whole upload db scheme (see #765; after which this fix becomes obsolete anyway)
(We could also add a mix task to verify all uploads match a listed base url)

An alternative not requiring so much admin action was suggested by FloatingGhost:

suggestion: best-effort in which we check for the same base domain

for example, we consider an attachment deletable if its base domain (x.(MATCHES.net)) is the same as our endpoint OR media URLs - which should account for both media migrated to s3 and people that just moved to a subdomain

This is probably fine for attachment cleanup, but missed mappings will be more severe for the aforementioned migration. Ofc, we could also just delay worrying about the migration until it actually happens (i’d prefer not to though)

Unless there’s a solution working both reliably and not needing so much manual admin action, this is pretty much a subjective weighting between different drawbacks, so i’ll just adapt the patch to match whatever the final decision yields

Originally brought up as apart of #789 and depends on this PR The current patch requires admins to keep a list of all past domains; this isn’t the most elegant solution, but if properly set works reliably and also reduces points of concern for a future™ migration of the whole upload db scheme (see #765; after which this fix becomes obsolete anyway) (We could also add a mix task to verify all uploads match a listed base url) An alternative not requiring so much admin action was suggested by FloatingGhost: > suggestion: best-effort in which we check for the same base domain > > for example, we consider an attachment deletable if its base domain (x.(MATCHES.net)) is the same as our endpoint OR media URLs - which should account for both media migrated to s3 and people that just moved to a subdomain This is probably fine for attachment cleanup, but missed mappings will be more severe for the aforementioned migration. Ofc, we could also just delay worrying about the migration until it actually happens (i’d prefer not to though) Unless there’s a solution working both reliably and not needing so much manual admin action, this is pretty much a subjective weighting between different drawbacks, so i’ll just adapt the patch to match whatever the final decision yields
this is #789
squahsed into a single commit
Allow deleting uploaded files after media migration
Some checks are pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
219fffa0c3
Until now only the current base_url was checked.
After a media domain migration all pre-existing files
thus turned undeletable. Fix this with a new config
option allowing to list all old media base urls.
(This may also come in handy for a later db refactor, see
 #765)
Contributor

I'd imagine one way to try to avoid the problem going forward is to record the domains used for each attachment in the database itself somehow. Won't solve past instances of course, but should help going forward.

I'd imagine one way to try to avoid the problem going forward is to record the domains used for each attachment in the database itself somehow. Won't solve past instances of course, but should help going forward.
Author
Owner

Actually looking at this again, even the curent version doesn't deal with domain changes fully correct. It will now kick off a deletion process even for old upload domain versions, but when checking if the attachment is still used by something else the respective URLs are compared without taking a potential base url difference into account. Meaning if file A was first uploaded and used under oldmedia.example.org and later again but now with img.example.org, if either post is deleted the uploaded file will be deleted too.
This should be fixable without too much hassle using trim_first_leading earlier already.

I'd imagine one way to try to avoid the problem going forward is to record the domains used for each attachment in the database itself somehow.

If fixed up similar to the above, i guess this would work about the same for deletions as the old-base-url list. The advantage being not having to test all old base urls, the disadvantages being not handling already existing uploads and not assisting with the intended future™ db schema migration.

In the future schema as it is planned now, only relative URLs (and possibly the uploader type) will be recorded in the DB, which makes things much easier to handle. For this to be possible though, we need to identify and strip the base url of all, even old, uploads during the migration.
I’m also not sure if it is worth it to tweak the upload db schema now, if a larger change is planned anyway which will add further complexity to the migration.

So I think if the fix involves a d migration, it should be something which already prepares and helps with the future schema change. If config changes like the one I proposed are also not acceptable, this only leaves some form of best-effort guess and the question is how to do that and the heuristic we arrive at should be good enough to also use during the schema migration in the future.

Actually looking at this again, even the curent version doesn't deal with domain changes fully correct. It will now kick off a deletion process even for old upload domain versions, but when checking if the attachment is still used by something else the respective URLs are compared without taking a potential base url difference into account. Meaning if file A was first uploaded and used under `oldmedia.example.org` and later again but now with `img.example.org`, if either post is deleted the uploaded file will be deleted too. This should be fixable without too much hassle using `trim_first_leading` earlier already. > I'd imagine one way to try to avoid the problem going forward is to record the domains used for each attachment in the database itself somehow. If fixed up similar to the above, i guess this would work about the same for deletions as the old-base-url list. The advantage being not having to test all old base urls, the disadvantages being not handling already existing uploads and not assisting with the intended future™ db schema migration. In the future schema as it is planned now, only relative URLs (and possibly the uploader type) will be recorded in the DB, which makes things much easier to handle. For this to be possible though, we need to identify and strip the base url of _all_, even old, uploads during the migration. I’m also not sure if it is worth it to tweak the upload db schema now, if a larger change is planned anyway which will add further complexity to the migration. So I think if the fix involves a d migration, it should be something which already prepares and helps with the future schema change. If config changes like the one I proposed are also not acceptable, this only leaves some form of best-effort guess and the question is how to do that and the heuristic we arrive at should be good enough to also use during the schema migration in the future.
Oneric force-pushed attachcleanup-uploadmigration from 219fffa0c3
Some checks are pending
ci/woodpecker/pr/build-amd64 Pipeline is pending
ci/woodpecker/pr/build-arm64 Pipeline is pending
ci/woodpecker/pr/docs Pipeline is pending
ci/woodpecker/pr/lint Pipeline is pending
ci/woodpecker/pr/test Pipeline is pending
to 9a57ab6459
Some checks are pending
ci/woodpecker/pr/build-amd64 Pipeline is pending approval
ci/woodpecker/pr/build-arm64 Pipeline is pending approval
ci/woodpecker/pr/docs Pipeline is pending approval
ci/woodpecker/pr/lint Pipeline is pending approval
ci/woodpecker/pr/test/1 Pipeline is pending approval
ci/woodpecker/pr/test/2 Pipeline is pending approval
2025-05-09 23:45:53 +00:00
Compare
Some checks are pending
ci/woodpecker/pr/build-amd64 Pipeline is pending approval
ci/woodpecker/pr/build-arm64 Pipeline is pending approval
ci/woodpecker/pr/docs Pipeline is pending approval
ci/woodpecker/pr/lint Pipeline is pending approval
ci/woodpecker/pr/test/1 Pipeline is pending approval
ci/woodpecker/pr/test/2 Pipeline is pending approval
This pull request has changes conflicting with the target branch.
  • CHANGELOG.md
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u attachcleanup-uploadmigration:Oneric-attachcleanup-uploadmigration
git switch Oneric-attachcleanup-uploadmigration
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
AkkomaGang/akkoma!818
No description provided.