forked from AkkomaGang/akkoma
[TESTING] dbsearch: actually rank search results
Until now we always sorted search matches by id (basically: most recent) and grabbed the first couple. However, if no special input modifiers are used websearch_to_tsquery creates a pretty loose search vector and the most recent matches might be far from the most relevant matches. Thus rank results by their relevancy instead. Normalisation mode 16 takes the amount of unique words of a post into account since a post with 500 unique words matching a 5 word query is likely less relevant than a post consisting entirely out of the 5 queried words. Let’s hope this improves on our notoriously bad search. The tradeoff here being more costly queries. ON a simplified mockup query without any filtering or joins with activities, the planner’s cost estimate didn’ŧ change much, but measured wall clock time for a single query increased from ~1.7ms to 2.3ms. As far as I can tell, the cost of ts_rank(_cd) was never discussed before. The original version9f0a2a714b
didn’t sort at all and there’s no associated discussion. Later it was sorted by date1dd2c8163f
but quickly changed to sorting by idff5e957476
(which isn’t to different with old sequential ids and current FlakeIDs). This was carried forward until eventually being removed in817c66bc3e
prob because pagination sorts anyway. For now RUM results continue to be ranked solely by recencey as it did since its introduction in01c45ddc9e
andf1e67bdc31
It is possible to make it use an efficient relevancy-based ranking, but this requires changes to its index which is beyond the scope of this commit. TODO: not sure how much the normalisation helps or if a non-log or non-unique word normalisation would be better.
This commit is contained in:
parent
403913a2e1
commit
bc84698087
2 changed files with 19 additions and 2 deletions
|
@ -6,6 +6,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
|
|||
|
||||
## Unreleased
|
||||
|
||||
## Fixed
|
||||
- Search results for the default built-in GIN search are now actually ranked by relevancy
|
||||
|
||||
## 2024.04
|
||||
|
||||
## Added
|
||||
|
|
|
@ -15,6 +15,9 @@ defmodule Pleroma.Search.DatabaseSearch do
|
|||
|
||||
@behaviour Pleroma.Search.SearchBackend
|
||||
|
||||
# See: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING
|
||||
@rank_normalisation 16
|
||||
|
||||
def search(user, search_query, options \\ []) do
|
||||
index_type = if Pleroma.Config.get([:database, :rum_enabled]), do: :rum, else: :gin
|
||||
limit = Enum.min([Keyword.get(options, :limit), 40])
|
||||
|
@ -31,7 +34,7 @@ def search(user, search_query, options \\ []) do
|
|||
|> maybe_restrict_author(author)
|
||||
|> maybe_restrict_blocked(user)
|
||||
|> Pagination.fetch_paginated(
|
||||
%{"offset" => offset, "limit" => limit, "skip_order" => index_type == :rum},
|
||||
%{"offset" => offset, "limit" => limit, "skip_order" => true},
|
||||
:offset
|
||||
)
|
||||
|> maybe_fetch(user, search_query)
|
||||
|
@ -86,7 +89,18 @@ defp query_with(q, :gin, search_query) do
|
|||
o.data,
|
||||
^tsc,
|
||||
^search_query
|
||||
)
|
||||
),
|
||||
order_by: [
|
||||
desc:
|
||||
fragment(
|
||||
"ts_rank_cd(to_tsvector(?::oid::regconfig, ?->>'content'), websearch_to_tsquery(?::oid::regconfig, ?), ?)",
|
||||
^tsc,
|
||||
o.data,
|
||||
^tsc,
|
||||
^search_query,
|
||||
@rank_normalisation
|
||||
)
|
||||
]
|
||||
)
|
||||
end
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue