akkoma/docs
ilja 7695010268
Some checks are pending
ci/woodpecker/push/woodpecker Pipeline is pending
Prune Objects --keep-threads option (#350)
This adds an option to the prune_objects mix task.
The original way deleted all non-local public posts older than a certain time frame.
Here we add a different query which you can call using the option --keep-threads.

We query from the activities table all context id's where
    1. the newest activity with this context is still old
    2. none of the activities with this context is is local
    3. none of the activities with this context is bookmarked
and delete all objects with these contexts.

The idea is that posts with local activities (posts, replies, likes, repeats...) may be interesting to keep.
Besides that, a post lives in a certain context (the thread), so we keep the whole thread as well.

Caveats:
* ~~Quotes have a different context. Therefore, when someone quotes a post, it's possible the quoted post will still be deleted.~~ fixed in #379
* Although undocumented (in docs/docs/administration/CLI_tasks/database.md/#prune-old-remote-posts-from-the-database), the 'normal' delete action still kept old remote non-public posts. I added an option to keep this behaviour, but this also means that you now have to explicitly provide that option. **This could be considered a breaking change!**
* ~~Note that this removes from the objects table, but not from the activities.~~ See #427 for that.

Some statistics from explain analyse:
(cost=1402845.92..1933782.00 rows=3810907 width=62) (actual time=2562455.486..2562455.495 rows=0 loops=1)
 Planning Time: 505.327 ms
 Trigger for constraint chat_message_references_object_id_fkey: time=651939.797 calls=921740
 Trigger for constraint deliveries_object_id_fkey: time=52036.009 calls=921740
 Trigger for constraint hashtags_objects_object_id_fkey: time=20665.778 calls=921740
 Execution Time: 3287933.902 ms

***
**TODO**
1. [x] **Question:** Is it OK to keep it like this in regard to quote posts? If not (ie post quoted by local users should also be kept), should we give quotes the same context as the post they are quoting? (If we don't want to give them the same context, I'll have to see how/if I can do it without being too costly)
    * See #379
2. [x] **Question:** the "original" query only deletes public posts (this is undocumented, but you can check the code). This new one doesn't care for scope. From the docs I get that the idea is that posts can be refetched when needed. But I have from a trusted source that Pleroma can't refetch non-public posts. I assume that's the reason why they are kept here. I see different options to deal with this
    1. ~~We keep it as currently implemented and just don't care about scope with this option~~
    2. ~~We add logic to not delete non-public posts either (I'll have to see how costly that becomes)~~
    3. We add an extra --keep-non-public parameter. This is technically speaking breakage (you didn't have to provide a param before for this, now you do), but I'm inclined to not care much because it wasn't documented nor tested in the first place.
3. [x] See if we can do the query using Elixir
4. [x] Test on a bigger DB to see that we don't run into a timeout
5. [x] Add docs

Co-authored-by: ilja <git@ilja.space>
Reviewed-on: #350
Co-authored-by: ilja <akkoma.dev@ilja.space>
Co-committed-by: ilja <akkoma.dev@ilja.space>
2023-01-09 22:15:41 +00:00
..
docs Prune Objects --keep-threads option (#350) 2023-01-09 22:15:41 +00:00
theme/partials Documentation updates for stable release (#73) 2022-07-15 12:27:16 +00:00
Makefile add manual deploy for docs 2022-11-10 10:55:57 +00:00
mkdocs.yml Add dark and light theme mode to docs, detection, and button 2022-12-09 22:51:43 -05:00
Pipfile Documentation updates for stable release (#73) 2022-07-15 12:27:16 +00:00
Pipfile.lock varnish config/docs (#342) 2022-12-05 13:39:27 +00:00
README.md Documentation updates for stable release (#73) 2022-07-15 12:27:16 +00:00
requirements.txt fix requirements 2022-11-11 16:07:07 +00:00

Building the docs

You don't need to build and test the docs as long as you make sure the syntax is correct. But in case you do want to build the docs, feel free to do so.

You'll need to install mkdocs for which you can check the mkdocs installation guide. Generally it's best to install it using pip. You'll also need to install the correct dependencies.

Example using a Debian based distro

1. Install pipenv and dependencies

pip install pipenv
pipenv sync

2. (Optional) Activate the virtual environment

Since dependencies are installed in a virtual environment, you can't use them directly. To use them you should either prefix the command with pipenv run, or activate the virtual environment for current shell by executing pipenv shell once.

3. Build the docs using the script

[pipenv run] make all

4. Serve the files

A folder site containing the static html pages will have been created. You can serve them from a server by pointing your server software (nginx, apache...) to this location. During development, you can run locally with

[pipenv run] mkdocs serve

This handles setting up an http server and rebuilding when files change. You can then access the docs on http://127.0.0.1:8000