client: Sort emojis by query similarity in fuzzy picker #156

Michcio · 2022-09-15T08:52:24Z

Michcio commented

2022-09-15 08:52:24 +00:00

I added a faux commit by Tosti just applying the patch to retain authorship (@toast is this okay?)

Changes from the original patch:

moved dependency to client package
typed emojiSearch stronger
distances indexed by emoji name cause by type spec objects can't be indexed with objects
aliases turned from ternary to ifs (for typing and readability) (is there a performance impact at this level really?)

Co-authored-by: Chloe Kudryavtsev code@toast.bunkerlabs.net

I added a faux commit by Tosti just applying the patch to retain authorship (@toast is this okay?) Changes from the original patch: - moved dependency to client package - typed `emojiSearch` stronger - `distances` indexed by emoji name cause by type spec objects can't be indexed with objects - `aliases` turned from ternary to ifs (for typing and readability) (is there a performance impact at this level really?) Co-authored-by: Chloe Kudryavtsev <code@toast.bunkerlabs.net>

toast commented

2022-09-15 09:08:48 +00:00

is this okay?

Yeah, that's ok.

My concern with the indexing was that you could theoretically have multiple emojis with the same name (I recall thatbeing an issue, anyway).
But we can't really go by id either because of unicode emoji.
So I just indexed the whole JS object.

We don't keep the data around - it's essentially a cache, so it should be fine, really.
But it MIGHT be faster to index on name, maybe?
But yeah ideally we'd index by a unique int if there is one in common (without resorting to a second aliases-like func).

> is this okay? Yeah, that's ok. My concern with the indexing was that you could theoretically have multiple emojis with the same name (I recall thatbeing an issue, anyway). But we can't really go by id either because of unicode emoji. So I just indexed the whole JS object. We don't keep the data around - it's essentially a cache, so it should be fine, really. But it MIGHT be faster to index on name, maybe? But yeah ideally we'd index by a unique int if there is one in common (without resorting to a second aliases-like func).

toast commented

2022-09-15 09:24:37 +00:00

Oh also, purely technically, this isn't edit distance.
Edit distance would be levenshtein and co, but this is gestalt/ratcliff-obershelp.
It's a similarity rating based on recursive longest subsequences (it uses LCS internally).
LCS-based algs make sense given the "fuzziness" here is very much subpattern-based (the fuzzy is in-between subpatterns).

Due to how our fuzzy picker works, we know that the "primary" metric is going to be equivalent - namely the 2K_m.
As such, as of currently, we can estimate the similarity rating using query.split(' ').join('').length() / matched_alias_or_name.length().
Keeping the matched entity around would be a speed optimization.

However, it's not really worth it, because I may be interested in implementing "real" precomputed fuzzy search, likely using skeleton or exclusion fingerprints, at which point gestalt will need to be used "properly" anyways.

Oh also, purely technically, this isn't edit distance. Edit distance would be levenshtein and co, but this is gestalt/ratcliff-obershelp. It's a similarity rating based on recursive longest subsequences (it uses LCS internally). LCS-based algs make sense given the "fuzziness" here is very much subpattern-based (the fuzzy is in-between subpatterns). Due to how our fuzzy picker works, we know that the "primary" metric is going to be equivalent - namely the 2K_m. As such, as of currently, we can estimate the similarity rating using `query.split(' ').join('').length() / matched_alias_or_name.length()`. Keeping the matched entity around would be a speed optimization. However, it's not *really* worth it, because I may be interested in implementing "real" precomputed fuzzy search, likely using skeleton or exclusion fingerprints, at which point gestalt will need to be used "properly" anyways.

Michcio changed title from ~~client: Sort emojis by edit distance in fuzzy picker~~ to client: Sort emojis by query similarity in fuzzy picker

2022-09-15 09:25:34 +00:00

norm reviewed 2022-09-15 14:24:22 +00:00

packages/client/src/components/emoji-picker.vue Outdated

					
				@ -141,0 +151,4 @@

					const distance = (str: string): number => rodistance(joinq, str);

					const mindistance = (strs: string[]): number => Math.min(...strs.map(distance));

					const distinguisher = (emoji: Type): string => 'char' in emoji ? emoji.char : emoji.id;

					matches.forEach(emoji => distances[distinguisher(emoji)] = Math.min(distance(emoji.name), mindistance(aliases(emoji))));