[bug] Japanese full text search using tsja doesn't work when logged in #650

Open
opened 2023-10-20 10:02:12 +00:00 by chayleaf · 3 comments

Your setup

From source

Extra details

NixOS @ 81e8f48ebdecf07aab321182011b067aafc78896

Version

3.10.4

PostgreSQL version

13

What were you trying to do?

I followed this guide. At first I wanted to open a report about Japanese search not working at all, but then I tried a search in private mode, and it works. However, when I'm logged in, it still doesn't work.

What did you expect to happen?

I expected Japanese search to always work.

What actually happened?

Japanese search doesn't work when logged in.

Logs

No response

Severity

I can manage

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup From source ### Extra details NixOS @ 81e8f48ebdecf07aab321182011b067aafc78896 ### Version 3.10.4 ### PostgreSQL version 13 ### What were you trying to do? I followed [this guide](https://docs.akkoma.dev/stable/configuration/howto_search_cjk/). At first I wanted to open a report about Japanese search not working at all, but then I tried a search in private mode, and [it works](https://fedi.pavluk.org/search?query=%E8%A8%80%E8%AA%9E). However, when I'm logged in, it still doesn't work. ### What did you expect to happen? I expected Japanese search to always work. ### What actually happened? Japanese search doesn't work when logged in. ### Logs _No response_ ### Severity I can manage ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
chayleaf added the
bug
label 2023-10-20 10:02:12 +00:00
Author

When enabling debug logs

SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (a0."local" = TRUE) ORDER BY a0."id" desc nulls last LIMIT $4 OFFSET $5 ["https://www.w3.org/ns/activitystreams#Public", 678472, "日本語", 20, 0]

is there in the anonymous search results, but there's nothing that includes 678472 (the search language oid) in the logged in search results. Still, select current_setting('default_text_search_config')::regconfig::oid; runs in both cases. I assume this means the query fails when logged in.

postgresql logs confirm this:

Oct 20 23:46:07 server postgres[1983882]: [1983882] ERROR:  canceling statement due to user request
Oct 20 23:46:07 server postgres[1983882]: [1983882] STATEMENT:  SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6
Oct 20 23:46:07 server postgres[1984689]: [1984689] FATAL:  terminating connection due to administrator command
Oct 20 23:46:07 server postgres[1984689]: [1984689] STATEMENT:  SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6
Oct 20 23:46:07 server postgres[1984690]: [1984690] FATAL:  terminating connection due to administrator command
Oct 20 23:46:07 server postgres[1984690]: [1984690] STATEMENT:  SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6
Oct 20 23:46:07 server postgres[1805940]: [1805940] LOG:  background worker "parallel worker" (PID 1984689) exited with exit code 1
Oct 20 23:46:07 server postgres[1805940]: [1805940] LOG:  background worker "parallel worker" (PID 1984690) exited with exit code 1

this happens both with missing words like ajwoawjfwao and Japanese words

And postgres complains WARNING: invalid value for parameter "default_text_search_config": "japanese" every once in a while, even though it doesn't log anything when trying SELECT * FROM ts_debug('日本語が喋れるんかい')

When enabling debug logs ``` SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (a0."local" = TRUE) ORDER BY a0."id" desc nulls last LIMIT $4 OFFSET $5 ["https://www.w3.org/ns/activitystreams#Public", 678472, "日本語", 20, 0] ``` is there in the anonymous search results, but there's nothing that includes 678472 (the search language oid) in the logged in search results. Still, `select current_setting('default_text_search_config')::regconfig::oid;` runs in both cases. I assume this means the query fails when logged in. postgresql logs confirm this: ``` Oct 20 23:46:07 server postgres[1983882]: [1983882] ERROR: canceling statement due to user request Oct 20 23:46:07 server postgres[1983882]: [1983882] STATEMENT: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6 Oct 20 23:46:07 server postgres[1984689]: [1984689] FATAL: terminating connection due to administrator command Oct 20 23:46:07 server postgres[1984689]: [1984689] STATEMENT: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6 Oct 20 23:46:07 server postgres[1984690]: [1984690] FATAL: terminating connection due to administrator command Oct 20 23:46:07 server postgres[1984690]: [1984690] STATEMENT: SELECT a0."id", a0."data", a0."local", a0."actor", a0."recipients", a0."inserted_at", a0."updated_at", o1."id", o1."data", o1."inserted_at", o1."updated_at" FROM "activities" AS a0 INNER JOIN "objects" AS o1 ON (o1."data"->>'id') = COALESCE(a0."data"->'object'->>'id', a0."data"->>'object') INNER JOIN LATERAL (SELECT is_active from users WHERE ap_id = a0."actor" AND is_active = TRUE) AS f2 ON TRUE WHERE (a0."data"->>'type' = 'Create') AND ($1 = ANY(a0."recipients")) AND (to_tsvector($2::oid::regconfig, o1."data"->>'content') @@ websearch_to_tsquery($3)) AND (NOT (a0."actor" = ANY($4))) ORDER BY a0."id" desc nulls last LIMIT $5 OFFSET $6 Oct 20 23:46:07 server postgres[1805940]: [1805940] LOG: background worker "parallel worker" (PID 1984689) exited with exit code 1 Oct 20 23:46:07 server postgres[1805940]: [1805940] LOG: background worker "parallel worker" (PID 1984690) exited with exit code 1 ``` this happens both with missing words like ajwoawjfwao and Japanese words And postgres complains `WARNING: invalid value for parameter "default_text_search_config": "japanese"` every once in a while, even though it doesn't log anything when trying `SELECT * FROM ts_debug('日本語が喋れるんかい')`
Author

After trying out various queries, the main issue seems to be that the query simply takes too long to execute for most queries

notably, if I replace websearch_to_tsquery() function call with the result directly, it gets noticeably faster (previously I couldn't wait for it to finish, now it takes a couple seconds). If it's cached, that would constitute a partial fix.

After trying out various queries, the main issue seems to be that the query simply takes too long to execute for most queries notably, if I replace `websearch_to_tsquery()` function call with the result directly, it gets noticeably faster (previously I couldn't wait for it to finish, now it takes a couple seconds). If it's cached, that would constitute a partial fix.
Author

@floatingghost I've verified that caching websearch_to_tsquery massively helps (though plain SELECT * FROM objects WHERE data->>content = ... is still much much faster), but I have no idea how to do it - Elixir most likely doesn't support the tsquery Postgres type, so it probably has to become an even more convoluted SQL expression, and both my SQL and Elixir skills aren't enough for that. Alternatively, a SQL expression reordering could help too (doing the SELECT * FROM objects before everything else), but, again, I have no idea how to do that. As Russians say, "this is the extent of my authority".

@floatingghost I've verified that caching `websearch_to_tsquery` massively helps (though plain `SELECT * FROM objects WHERE data->>content = ...` is still much much faster), but I have no idea how to do it - Elixir most likely doesn't support the `tsquery` Postgres type, so it probably has to become an even more convoluted SQL expression, and both my SQL and Elixir skills aren't enough for that. Alternatively, a SQL expression reordering could help too (doing the `SELECT * FROM objects` before everything else), but, again, I have no idea how to do that. As Russians say, "this is the extent of my authority".
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#650
No description provided.