client: Sort emojis by query similarity in fuzzy picker #156

Merged
norm merged 6 commits from Michcio/FoundKey-0x7f:tostis-emojosort into main 2022-09-19 14:43:13 +00:00
Contributor

I added a faux commit by Tosti just applying the patch to retain authorship (@toast is this okay?)

Changes from the original patch:

  • moved dependency to client package
  • typed emojiSearch stronger
  • distances indexed by emoji name cause by type spec objects can't be indexed with objects
  • aliases turned from ternary to ifs (for typing and readability) (is there a performance impact at this level really?)

Co-authored-by: Chloe Kudryavtsev code@toast.bunkerlabs.net

I added a faux commit by Tosti just applying the patch to retain authorship (@toast is this okay?) Changes from the original patch: - moved dependency to client package - typed `emojiSearch` stronger - `distances` indexed by emoji name cause by type spec objects can't be indexed with objects - `aliases` turned from ternary to ifs (for typing and readability) (is there a performance impact at this level really?) Co-authored-by: Chloe Kudryavtsev <code@toast.bunkerlabs.net>
Owner

is this okay?

Yeah, that's ok.

My concern with the indexing was that you could theoretically have multiple emojis with the same name (I recall thatbeing an issue, anyway).
But we can't really go by id either because of unicode emoji.
So I just indexed the whole JS object.

We don't keep the data around - it's essentially a cache, so it should be fine, really.
But it MIGHT be faster to index on name, maybe?
But yeah ideally we'd index by a unique int if there is one in common (without resorting to a second aliases-like func).

> is this okay? Yeah, that's ok. My concern with the indexing was that you could theoretically have multiple emojis with the same name (I recall thatbeing an issue, anyway). But we can't really go by id either because of unicode emoji. So I just indexed the whole JS object. We don't keep the data around - it's essentially a cache, so it should be fine, really. But it MIGHT be faster to index on name, maybe? But yeah ideally we'd index by a unique int if there is one in common (without resorting to a second aliases-like func).
Owner

Oh also, purely technically, this isn't edit distance.
Edit distance would be levenshtein and co, but this is gestalt/ratcliff-obershelp.
It's a similarity rating based on recursive longest subsequences (it uses LCS internally).
LCS-based algs make sense given the "fuzziness" here is very much subpattern-based (the fuzzy is in-between subpatterns).

Due to how our fuzzy picker works, we know that the "primary" metric is going to be equivalent - namely the 2K_m.
As such, as of currently, we can estimate the similarity rating using query.split(' ').join('').length() / matched_alias_or_name.length().
Keeping the matched entity around would be a speed optimization.

However, it's not really worth it, because I may be interested in implementing "real" precomputed fuzzy search, likely using skeleton or exclusion fingerprints, at which point gestalt will need to be used "properly" anyways.

Oh also, purely technically, this isn't edit distance. Edit distance would be levenshtein and co, but this is gestalt/ratcliff-obershelp. It's a similarity rating based on recursive longest subsequences (it uses LCS internally). LCS-based algs make sense given the "fuzziness" here is very much subpattern-based (the fuzzy is in-between subpatterns). Due to how our fuzzy picker works, we know that the "primary" metric is going to be equivalent - namely the 2K_m. As such, as of currently, we can estimate the similarity rating using `query.split(' ').join('').length() / matched_alias_or_name.length()`. Keeping the matched entity around would be a speed optimization. However, it's not *really* worth it, because I may be interested in implementing "real" precomputed fuzzy search, likely using skeleton or exclusion fingerprints, at which point gestalt will need to be used "properly" anyways.
Michcio changed title from client: Sort emojis by edit distance in fuzzy picker to client: Sort emojis by query similarity in fuzzy picker 2022-09-15 09:25:34 +00:00
norm reviewed 2022-09-15 14:24:22 +00:00
@ -141,0 +151,4 @@
const distance = (str: string): number => rodistance(joinq, str);
const mindistance = (strs: string[]): number => Math.min(...strs.map(distance));
const distinguisher = (emoji: Type): string => 'char' in emoji ? emoji.char : emoji.id;
matches.forEach(emoji => distances[distinguisher(emoji)] = Math.min(distance(emoji.name), mindistance(aliases(emoji))));
Owner

Is there a reason forEach is used instead of for (const emoji of matches)?

Is there a reason `forEach` is used instead of `for (const emoji of matches)`?
Author
Contributor

No, it just wasn't an error. I too prefer for (...).

No, it just wasn't an error. I too prefer `for (...)`.
Michcio marked this conversation as resolved
norm approved these changes 2022-09-19 05:13:44 +00:00
norm left a comment
Owner

Other than that one comment (which is a small nitpick - feel free to ignore), it looks good to go.

Other than that one comment (which is a small nitpick - feel free to ignore), it looks good to go.
Michcio force-pushed tostis-emojosort from 9b4607dd5c to 395e6171b0 2022-09-19 10:57:48 +00:00 Compare
norm merged commit d8a8306603 into main 2022-09-19 14:43:13 +00:00
Sign in to join this conversation.
No reviewers
No labels
feature
fix
upkeep
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#156
No description provided.