[bug] Japanese full text search using tsja doesn't work when logged in #650
Labels
No labels
approved, awaiting change
bug
configuration
documentation
duplicate
enhancement
extremely low priority
feature request
Fix it yourself
help wanted
invalid
mastodon_api
needs docs
needs tests
not a bug
planned
pleroma_api
privacy
question
static_fe
triage
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: AkkomaGang/akkoma#650
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Your setup
From source
Extra details
NixOS @ 81e8f48ebdecf07aab321182011b067aafc78896
Version
3.10.4
PostgreSQL version
13
What were you trying to do?
I followed this guide. At first I wanted to open a report about Japanese search not working at all, but then I tried a search in private mode, and it works. However, when I'm logged in, it still doesn't work.
What did you expect to happen?
I expected Japanese search to always work.
What actually happened?
Japanese search doesn't work when logged in.
Logs
No response
Severity
I can manage
Have you searched for this issue?
When enabling debug logs
is there in the anonymous search results, but there's nothing that includes 678472 (the search language oid) in the logged in search results. Still,
select current_setting('default_text_search_config')::regconfig::oid;
runs in both cases. I assume this means the query fails when logged in.postgresql logs confirm this:
this happens both with missing words like ajwoawjfwao and Japanese words
And postgres complains
WARNING: invalid value for parameter "default_text_search_config": "japanese"
every once in a while, even though it doesn't log anything when tryingSELECT * FROM ts_debug('日本語が喋れるんかい')
After trying out various queries, the main issue seems to be that the query simply takes too long to execute for most queries
notably, if I replace
websearch_to_tsquery()
function call with the result directly, it gets noticeably faster (previously I couldn't wait for it to finish, now it takes a couple seconds). If it's cached, that would constitute a partial fix.@floatingghost I've verified that caching
websearch_to_tsquery
massively helps (though plainSELECT * FROM objects WHERE data->>content = ...
is still much much faster), but I have no idea how to do it - Elixir most likely doesn't support thetsquery
Postgres type, so it probably has to become an even more convoluted SQL expression, and both my SQL and Elixir skills aren't enough for that. Alternatively, a SQL expression reordering could help too (doing theSELECT * FROM objects
before everything else), but, again, I have no idea how to do that. As Russians say, "this is the extent of my authority".hmm, i can’t quite reproduce this with
english
. Currently we use the single argument version ofwebsearch_to_tsvector
which is indeed not ideal since it’s not immutable like the two argument version with explicit language configuration. The latter really ought to be cached by PostgreSQL already but the current version should already be cached too since the config can’t change during an (implicit) transaction. With PostgreSQL both appear to get cached for me, with just a minor cost difference since the one arg version needs to query the current config once.Unless ofc setting up a custom config somehow degraded
to_tsquery
into an unstable function.In theory we could extract the call into the
FROM
part with a cross join (as used for better readability in modern PostgreSQL docs about textseach), but there’s a problem: this requests a join operation which is costly and prior to PostgreSQL 13 (meaning your setup just barely make the cut), Postgres wasn’t smart enough to optimise this join away incurring a substantial performance penalty.Can you test if perf is improved if you change the
websearch_to_tsquery(?)
call here towebsearch_to_tsquery('japanese', ?)
?A plain comparison will actually need to iterate through the entire table and compare each post’s content to the search term. Fulltext search with ts_queries and ts_vector can use the
objects_fts
index and should be much faster. Are you sure you run the mix task to update your index to the new language setting?Are you sure you setup tsja and the default_text_search_config for the right user and database? I’m not really familiar with custom text search configs but this does seem odd
For more insight please connect to the akkoma database as the akkoma user and run