Removing unused indexes #764
Labels
No labels
approved, awaiting change
bug
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs docs
needs tests
not a bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
11 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: AkkomaGang/akkoma#764
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Been discussing with @Oneric in private about this for a bit. There's quite a bit of indexes that are either unused, null, or duplicates of others.
On my instance, this is what Phoenix LiveDashboard reports:
Not listed are indices that have a low hit rate, since I don't really know what's a good threshold for those.
Removing unused indices should help with disk space usage and potentially also make write operations faster.
Apparently the
aa_users_ap_id_COALESCE_follower_address_index
does get used a lot more in some other instances, so it would be nice to get more data about indexes on other instances that are unused.Instance admins can get the same info as above by navigating to
$INSTANCE_DOMAIN/phoenix/live_dashboard/ecto_stats
. Phoenix LiveDashboard is only accessible to admins.Some more details about those listed above:
Null indices
email
andlast_active_at
are only set for local users, so most entries in those indices are just onNULL
values and useless. Turning them into partial indices might save some space and perf overhead, but we should ensure relevant queries then also explicitly setWHERE local = 't'
and thus can use the partial index.Duplicate indices
user_frontend_setting_profiles_user_id_frontend_name_profile_na
really seems redundant with the (implicitly created) primary-key index; we should be able to just drop thisnotifications_id_desc_nulls_last_index
differs in ordering from the primary-key index; if that ordering is actually used or useful i don’t think this duplication can be avoideddata_migration
stuff is managed by Ecto; we shouldn’t touch that (downstream; we could send a patch after investigating this more) but it also doesn’t take much space in the first placeUnused indices
Those will need more investigation and ideally stats from several real-world instances. Given we already have an optional migration (adopted from Pleroma?) dropping some of the top
activities_
contenders, at least those (and likely more) should be safe to purgeOn my tiny local test instance (which has some manually fetched remote posts), this is the most used index and it does get used during activity. Just starting the server, logging in and loading the timeline added about 50 index scans to it. But, for akko.wtf it doesn't seem to get used at all during normal operation. It also wasn’t used during an earlier index rebuild, so it was at actual zero index accesses.
When it was originally added it was noted to still help during normal operation (on a very large instance), but its main selling point was speeding up restores (from 180h to 2h).
Later the index was renamed in migration
20230127143303
toaa_
because(which seems rather fragile; with
-j
restore can parallelise index creation and there is no stable order. Even in single-threaded mode, PostgreSQL may decide to change up the order at any time)as reported by ihba:
no major indices unused which is positive
Answering to request from chat. This is for ilja.space
Total users 15, 6 monthly active. But really only 1 is very relevant (rest are bots or accounts I only use sporadically). Server is running since 2018, last restore was probably years ago (I don't remember restoring since I switched to Akkoma end of 2022, and I don't know how long it was before then). No relays, I do run the prune task from time to time. Total database size is 13 GB (
pg_database_size
gives 14491308591).Forwarding diagnostic info from ikkoma on IRC for a ~6 month old single user instance:
Ecto’s diagnostics query (part1, part2) excludes indices which are themselves currently too small or belong to a too small table (reasoning being they might only start to get used once bigger).
This might explain why fewer entries show up here, but will need to check.
_ccnew
indices are from a currently running or in the past aborted REINDEX (if the latter they can be safely deleted)birthday stuff are leftovers from Pleroma and even the table field is never used as it doesn't exist in a clean Akkoma install
The following weren’t listed above in akko.wtf’s diagnostic query, but they show up in the full output with zero scans:
Adding those and filtering out
ccnew
and pleroma-only indices, aggregating all reports so far gives:Note:
activities_hosts
)(+1 because the migration also drops
activities_actor_index
which just doesn't exist in a clean Akkoma install; prob something from Pleroma)user_frontend_setting_profiles_user_id_frontend_name_profile_name
is just a duplicate of this table’s primary key indexGrepping and glancing over the code, from the 11 always (nearly) unused indices
users.invisible
actually being used as a selector in queries — but it was always combined with several other conditions (e.g.discoverable
orsuggested
) whose indices gets used instead, so this index doesn't prove useful.last_status_at
only occurs for sorting — but the index is neither sorted nor is it likely to help if it was since other selectors are applied before.activities_to
could in theory be useful for DM timelines, but clearly it’s not actually used atm (and just for this it would probably better be a partial index or tracked some other way; GIN indices over arrays are relatively expensive)activities_hosts
however is actually always used when using the?instance=
parameter of the public federated timeline. The thing is though, akkoma-fe and afaik nothing else either has support for this parameter so it’s probably only used on manual API requests.For the remaining indices:
apps_
(the below show up with 0 scans for all but ikkoma but there are more apps indices; almost certainly just excluded due to small size)The OAuth plug fetches apps via its primary key, not sure what else fetches it. Since the index didn’t show up here (nor in unfiltered akko.wtf stats for zero-accesses) at least
apps_client_id
is apparently used in practiceapps_client_id_client_secret_index
even if a use case shows up at some point this can already use the more usefulapps_client_id_index
. Surely a single client won't have so many tokens for an index to be necessaryapps_user_id_index
no known usageactivities_local
(all but ikkoma) seems like it should be useful and could be used byActivityPub.restrict_local
/ActivityPub.restrict_remote
, but clearly it doesn’t. Maybe because other filters are applied first or maybe becauseactivities_id_desc_nulls_last_local_index
gets used instead (multicol B-TREE indices are useful for any subset of their columns, but using leading columns is more efficient)(though it is also a bit redundant with
activities_hosts
albeit somewhat faster for the particular usecase)users_is_suggested
(all but ikkoma) is used by the api/v2/suggested MastoAPI you mentioned but probably barely anyone uses this (might actually be cleaner and cheaper to implement as a separate table)conversation_participation*
not sure about any of themusers_is_discoverable_index
(ihba and ilja) seems like it should be used in/api/v1/directory
and apparently gets used by akko.wtf and maybe ikkoma; prob better keep or investigate more firstactivities_visibility_index
(ilja) not sure but seems useful everywhere else, so prob best to keepusers_uri
(akko) will iinm rarely be used in search as most of the time something else hits or the URL gets fetched via HTTP before it hits the index lookup; akko.wtf reindexing recent'ish is prob why there just weren’t yet enough logged scans to pull it out of unusedaa_users_ap_id_COALESCE_follower_address_index
(akko) no idea why this is unused on akko but it’s clearly beneficial everywhere else (even if a bit hacky; might need to update docs to discourage parellelised restore until (if ever) there’s some better way to deal with this)tl;dr i think it’s safe to promote and extend the optional migration to drop all except
activities_hosts
from the 11 everywhere unused indices but addapps_client_id_client_secret_index
,apps_user_id_index
and the duplicateuser_frontend_setting_profiles_user_id_frontend_name_profile_name
.Maybe also some
conversations_
indices, but i’m not familiar enough with what those tables are used for and how to make a judgment on them with any degree of confidence.Possible later follow ups (testing and patches welcome i guess):
users_email_index
to a partial index and if necessary update affected queriesis_suggested
with a separate table (it’s effectively a null index atm, just withfalse
instead ofnull
)?instance=otp.akkoma.dev
feature if it apparently is basically unused but adds bloat for everyone (e.g. with 152MiB it’s the tenth largest index on akko.wtf + ofc the CPU overhead to maintain it)activities_local_index
furtherPossibly this could use
activities_id_desc_nulls_last_local_index
instead (but investigate whether it really makes sense to have sorted ids first). Or if it staysactivities_hosts
. Also investigate why it is unused for 3/4 instances and if queries can be tweaked to reliably use it or an equivalent (will need extensive testing on large'ish real-world instances)since everyone's been posting their full unused index report, here's mine
Since norm asked for it on fedi here are my unsued indexes for akko.lightnovel-dungeon.de, a rather small instance with very few users (its not quite single user) also is fairly new since I recently moved it over to a VPS and started fresh on a new domain.
As reported by ak.cute.rest (7 month old single user instance)
Statistics from social.qunn.eu (two years old instance with 5 MAU)
Hi, here are mine. This (SingleUser)Instance is nearly 2 years old:
Unused indexes
Null indexes
Duplicate indexes
This is from miku.place (around 8 users, running since August of 2023)
Unused indexes:
Null indexes:
Duplicate indexes:
Here are
girlth.ing
's statistics (mostly single user)thanks everyone! i’ll reaggregate and check if anything unexpected shows up later
@tea your data is the only one where
activities_hosts
is not nearly unused. Do you use or used a frontend or app which actually supports instance-specific federated timelines (which one)?note, the full report has a higher scan limit than the diagnostic output and does not filter out indices ≤1MB, but it still contains the hardcoded filter for table whose pure data is ≤40KiB. In practice this probably at most affects
apps
and maybeconversation_*
related indices on small instances, basically never the mainactivities
andobjects
offenders.Good question, the index also only has a size of 2.7 MB.
I currently use Moshidon but I've tried too many apps to remember…
Without instances names since otherwise Forgejo’s soft line wrapping impedes readability (full data further below):
Full stats with instance names
The "everywhere unused" indices are basically still the same except for
activities_hosts
(which will be kept anyway) andusers_tags
not being listed forgirlth.ing
; due to instance size+age and this index being close to 1MiB for other small instances, i suspect it was just cut by the 1MiB index limit from the diagnostic summary.The two apps indices scheduled for removal are also only missing for
girlth.ing
and ikkoma, both of which likely just hot size filters in the diagnostic output (only the diagnostic summary was provided). Where more detail was provided the only uses were a single access ofapps_user_id_index
for yokairick, lightnovel-dungeon.de and qunn.eu each.There’s also a clear cut between "
user_is_suggested
+ the 13 to-be-dropped indices" and all other indices by going from being reported on 8/10 instances to 4/10.So this confirmed previous results (thx) and i think it’s safe to go ahead with the 13 trivial drops and opened a PR: #767.
Possible later followups to clean up even more (but requiring some effort) is still the same as before but (i now fixed an inconsistency where the followup list was referring to an
activites_local
removal eventhough this wasn't planned anymore already and) additionally i think, sinceactivities_hosts
is only used for querying remote local timelines; if we keep this feature we should still be able to reduce overhead by turning it into a partial index ondata->>'type' = 'Create'
.(and look into whether we actually need the ordering for the otherwise duplicated
notifications_id_desc_nulls_last_index
)welp, guess some more research is needed then to decide what to do with
?instance=
. And to avoid possible confusion later: Moshidon has a feature for adding public timelines of remote servers, but afaict it doesn’t use the *oma specific?instance=
to query it from the homeserver, but directly polls the remote target server. (which allows it to work with Mastodon home servers)Not sure if still needed, but here are mine from junimo.party:
Oh, also, I might still be a version behind...
This fits well with the other results, thanks!
~3 month old private instance with few users; nothing unusual afaik:
Unused:
Null
Bloat (top relative bloat):
Bloat (top absolute waste):