[bug] Invalid Emoji ID #820

Open
opened 2024-07-13 13:51:59 +00:00 by silverpill · 7 comments

Your setup

From source

Extra details

No response

Version

No response

PostgreSQL version

No response

What were you trying to do?

The id property of Emoji objects (e.g. in EmojiReact activity) is not always a valid URI.
According to the ActivityPub spec, IDs are "Publicly dereferencable URIs". An URI may only contain a limited set of allowed characters in its path component. However, sometimes Akkoma uses an unescaped whitespace there, which is not allowed.

To comply with the URI standard, non-allowed characters should be percent-encoded.

I submitted the same bug report to Pleroma (the problem has been resolved): https://git.pleroma.social/pleroma/pleroma/-/issues/3280

What did you expect to happen?

No response

What actually happened?

No response

Logs

No response

Severity

I cannot use it as easily as I'd like

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup From source ### Extra details _No response_ ### Version _No response_ ### PostgreSQL version _No response_ ### What were you trying to do? The `id` property of `Emoji` objects (e.g. in `EmojiReact` activity) is not always a valid URI. According to the ActivityPub spec, IDs are ["Publicly dereferencable URIs"](https://www.w3.org/TR/activitypub/#obj-id). An URI may only contain a limited set of allowed characters in its path component. However, sometimes Akkoma uses an unescaped whitespace there, which is not allowed. To comply with the [URI standard](https://datatracker.ietf.org/doc/html/rfc3986#section-2.1), non-allowed characters should be percent-encoded. I submitted the same bug report to Pleroma (the problem has been resolved): https://git.pleroma.social/pleroma/pleroma/-/issues/3280 ### What did you expect to happen? _No response_ ### What actually happened? _No response_ ### Logs _No response_ ### Severity I cannot use it as easily as I'd like ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
silverpill added the
bug
label 2024-07-13 13:51:59 +00:00
Member

Will be fixed by #815

According to the ActivityPub spec, IDs are "Publicly dereferencable URIs".

Note spec further requires objects to be fetchable from their id (unless it’s a transient (no id) or anonymous (explicit null id) object). Pleroma’s and current Akkoma’s use of an image URL as id conflicts with this and can lead to other issues.

Will be fixed by #815 > According to the ActivityPub spec, IDs are "Publicly dereferencable URIs". Note spec further requires objects to be fetchable from their id (unless it’s a transient (no id) or anonymous (explicit `null` id) object). Pleroma’s and current Akkoma’s use of an image URL as id conflicts with this and can lead to other issues.
Author

Will be fixed by #815

How consumers are supposed to de-duplicate Emoji objects without an id? Emojis are usually not ephemeral, they have a persistent name and can be added to multiple objects. I guess icon.url can be used, but id is more appropriate and is supported by existing implementations.

Note spec further requires objects to be fetchable from their id (unless it’s a transient (no id) or anonymous (explicit null id) object). Pleroma’s and current Akkoma’s use of an image URL as id conflicts with this and can lead to other issues.

Yes, but in practice non-fetchable IDs are common in Fediverse (especially in activities), so most implementations can deal with that. What issues do you observe (or expect)?
Instead of setting to null, I think it would be better to either generate a synthetic ID (that is, any URI that resolves to 404 Not Found), or generate a dereferencable URI like Mastodon does (example: https://scalie.zone/emojis/5075).

>Will be fixed by #815 How consumers are supposed to de-duplicate `Emoji` objects without an `id`? Emojis are usually not ephemeral, they have a persistent `name` and can be added to multiple objects. I guess `icon.url` can be used, but `id` is more appropriate and is supported by existing implementations. >Note spec further requires objects to be fetchable from their id (unless it’s a transient (no id) or anonymous (explicit null id) object). Pleroma’s and current Akkoma’s use of an image URL as id conflicts with this and can lead to other issues. Yes, but in practice non-fetchable IDs are common in Fediverse (especially in activities), so most implementations can deal with that. What issues do you observe (or expect)? Instead of setting to `null`, I think it would be better to either generate a synthetic ID (that is, any URI that resolves to `404 Not Found`), or generate a dereferencable URI like Mastodon does (example: https://scalie.zone/emojis/5075).
Member

What issues do you observe (or expect)?

See the linked pr and the issue it links for example. Given it breaks spec and in a different way than Mastodon’s top-lvel fragment ids there might be many more lurking issues with the current behaviour.

Instead of setting to null, I think it would be better to either [...] , or generate a dereferencable URI like Mastodon does

This would indeed be ideal, but *oma currently doesn’ŧ track emoji at all, neither local nor remote ones so this would be a larger effort. But also not necessary to fix most known federation issues (the only knwon federation benefit would be proper cache invalidation through a correct update_at field)

I think it would be better to either generate a synthetic ID (that is, any URI that resolves to 404 Not Found)

I personally strongly oppose this; Mastodon’s fragment activities are already bad enough (and at least dereference the object they acted upon, which means in some sense the effect of their action will be reflected in the response). Making things worse is an awful idea and there isn’t even any scenario it’s nown to help with (see below)

How consumers are supposed to de-duplicate Emoji objects without an id?

I’m not sure what you mean. *oma doesn't ever read the emoji id field. *key track remote emoji and explicitly handles null emoji ids.Iceshrimp also handles null ids (and sends them sometimes). Within a post there’s no such thing as deduplicating emoji by id, only its name is ever relevant.

Emojis are usually not ephemeral, they have a persistent name and can be added to multiple objects

I disagree; emoji are inherently ephemeral in *oma and in general should always be regarded this way unless stable id is provided. E.g. Pleroma allows fully custom emoji, where users can use whatever image they want with whatever shortcode in posts via C2S; those emoji need not to exist or be registered anywhere. Even for "registered" emoji, *oma don’t persistnetly track them (and as a consequence can’t set meaningful updated fields); just whatever happens to be in a particular directory tree on the filesystem atm is available.

But also anonymous objects are not necessarily "ephemeral", that’d be transient objects (no id at all). Anonymous objects are those which only exist within a parent context, which perfectly reflects *oma’s current state of emoji tracking. While the post was created, for the author — or if created via MastoAPI for the whole instance — the contained emoji existed under the given name and the given icon url, but it might have disappeared, be renamed, changed icon since or even never have existed outside this one post to begin with (C2S).

> What issues do you observe (or expect)? See the linked pr and the issue it links for example. Given it breaks spec and in a different way than Mastodon’s top-lvel fragment ids there might be many more lurking issues with the current behaviour. > Instead of setting to null, I think it would be better to either [...] , or generate a dereferencable URI like Mastodon does This would indeed be ideal, but \*oma currently doesn’ŧ track emoji at all, neither local nor remote ones so this would be a larger effort. But also not necessary to fix most known federation issues (the only knwon federation benefit would be proper cache invalidation through a correct `update_at` field) > I think it would be better to either generate a synthetic ID (that is, any URI that resolves to 404 Not Found) I personally strongly oppose this; Mastodon’s fragment activities are already bad enough (and at least dereference the object they acted upon, which means in some sense the effect of their action will be reflected in the response). Making things _worse_ is an awful idea and there isn’t even any scenario it’s nown to help with (see below) > How consumers are supposed to de-duplicate Emoji objects without an id? I’m not sure what you mean. \*oma doesn't ever read the emoji `id` field. \*key track remote emoji and explicitly handles `null` emoji ids.Iceshrimp also handles `null` ids (and sends them sometimes). Within a post there’s no such thing as deduplicating emoji by `id`, only its `name` is ever relevant. > Emojis are usually not ephemeral, they have a persistent name and can be added to multiple objects I disagree; emoji are inherently ephemeral in \*oma and in general should always be regarded this way _unless_ stable `id` is provided. E.g. Pleroma allows fully custom emoji, where users can use whatever image they want with whatever shortcode in posts via C2S; those emoji need not to exist or be registered anywhere. Even for "registered" emoji, \*oma don’t persistnetly track them (and as a consequence can’t set meaningful `updated` fields); just whatever happens to be in a particular directory tree on the filesystem atm is available. But also anonymous objects are not necessarily "ephemeral", that’d be transient objects (no id at all). Anonymous objects are those which only exist within a parent context, which perfectly reflects \*oma’s current state of emoji tracking. While the post was created, for the author — or if created via MastoAPI for the whole instance — the contained emoji existed under the given name and the given icon url, but it might have disappeared, be renamed, changed icon since or even never have existed outside this one post to begin with (C2S).
Author

I see, so in Akkoma custom emojis are just images that can be inserted into text, and not independent entities? That's fine, but other applications may work differently. For example, in my software custom emojis are first-class objects that should always have an id, so if you set id to null my software will have to ignore them.

There's no formal standard for working with Emoji objects, and as far as I know the only existing guidance is Mastodon documentation where the object is not anonymous.

I personally strongly oppose this; Mastodon’s fragment activities are already bad enough (and at least dereference the object they acted upon, which means in some sense the effect of their action will be reflected in the response). Making things worse is an awful idea and there isn’t even any scenario it’s nown to help with (see below)

I'm not suggesting fragment IDs, it could be a regular ID that resolves to 404. Non-public objects often can't be fetched either, this is a normal situation and I think ActivityPub spec shouldn't be interpreted as "objects must be fetchable from their IDs or be anonymous". As long as emoji ID is a valid URI and is globally unique, there shouldn't be any federation issues. Even if Akkoma doesn't have any database record of Emoji object, you can still generate unique synthetic ID from file name or its content hash.

See the linked pr and the issue it links for example. Given it breaks spec and in a different way than Mastodon’s top-lvel fragment ids there might be many more lurking issues with the current behaviour.

I found a link to issue #694. A properly constructed synthetic ID should solve that too, I assume the implementation in question will not attempt to fetch object if origin is the same.

I see, so in Akkoma custom emojis are just images that can be inserted into text, and not independent entities? That's fine, but other applications may work differently. For example, in my software custom emojis are first-class objects that should always have an `id`, so if you set `id` to `null` my software will have to ignore them. There's no formal standard for working with `Emoji` objects, and as far as I know the only existing guidance is Mastodon [documentation](https://docs.joinmastodon.org/spec/activitypub/#Emoji) where the object is not anonymous. >I personally strongly oppose this; Mastodon’s fragment activities are already bad enough (and at least dereference the object they acted upon, which means in some sense the effect of their action will be reflected in the response). Making things worse is an awful idea and there isn’t even any scenario it’s nown to help with (see below) I'm not suggesting fragment IDs, it could be a regular ID that resolves to 404. Non-public objects often can't be fetched either, this is a normal situation and I think ActivityPub spec shouldn't be interpreted as "objects must be fetchable from their IDs or be anonymous". As long as emoji ID is a valid URI and is globally unique, there shouldn't be any federation issues. Even if Akkoma doesn't have any database record of Emoji object, you can still generate unique synthetic ID from file name or its content hash. >See the linked pr and the issue it links for example. Given it breaks spec and in a different way than Mastodon’s top-lvel fragment ids there might be many more lurking issues with the current behaviour. I found a link to issue #694. A properly constructed synthetic ID should solve that too, I assume the implementation in question will not attempt to fetch object if origin is the same.
Member

so in Akkoma custom emojis are just images that can be inserted into text, and not independent entities?

In Akkoma it’s not quite "arbitrary image" anymore, but it’s not too far from it either; no info about emoji is stored in the database. For any C2S-capable server (e.g. Pleroma) it’s entirely up to the cleint which images and names are used. (In theory some hypothetical C2S server may explicitly strip all but certain endorsed emoji, but i see little reason to and Pleroma doesn’t)

There's no formal standard for working with Emoji objects, and as far as I know the only existing guidance is Mastodon documentation where the object is not anonymous.

Mastodon’s documentation only mentions name and the icon property; imo those are the only ones you should hard rely on. Everything else only appears in an illustrative example of what might be.

Note, Iceshrimp already federates some emoji as anonymous objects and Misskey explicitly handles null ids since before Iceshrimp’s existence iinm hinting at other implementations doing so as well. Therefore you already need to handle null IDs anyway for compatibility reasons.
Going by AP sec, an anonymous object is afict perfectly appropriate at this place too and as mentioned before for e.g. C2S it’s entirely impossible to have meaningful IDs. Fabricated IDs on the other hand blatantly go against AP design.

it could be a regular ID that resolves to 404. Non-public objects often can't be fetched either

Putting aside how this might go against the spirit of AP spec, there’s still a difference. The only thing someone unauthorised ever sees of an access-restricted post is

I assume the implementation in question will not attempt to fetch object if origin is the same.

This assumption is not entirely unreasonable (it saves work and at the toplevel is required due to Mastodon shenanigans). But given it’s not uncommon for inlined objects to be minimised in some form, fetching them to expand all details isn’t entirely unreasonable either.

To reiterate:

Anonymous emoji already exist. It is outright impossible for some implementations to provide meaningful non-anonymous emoji. Anonymous objects are part of core AP spec. Mastodon docs don’t even mention an emoji id outside one illustrative example.

Unless troubles with major other implementations show up in testing, i personally really see no reason why fabricating a fake ID instead, breaking expected AP semantics, would be preferable. (And even then i’d still report it as a bug to affected implementations so future implementers won’t be held up by this).
Whatever format we fabricate now might also get in the way should we later implement more advanced emoji management and tracking.

> so in Akkoma custom emojis are just images that can be inserted into text, and not independent entities? In Akkoma it’s not quite "arbitrary image" anymore, but it’s not too far from it either; no info about emoji is stored in the database. For any C2S-capable server (e.g. Pleroma) it’s entirely up to the cleint which images and names are used. (In theory some hypothetical C2S server may explicitly strip all but certain endorsed emoji, but i see little reason to and Pleroma doesn’t) > There's no formal standard for working with Emoji objects, and as far as I know the only existing guidance is Mastodon documentation where the object is not anonymous. Mastodon’s documentation only mentions `name` and the `icon` property; imo those are the only ones you should hard rely on. Everything else only appears in an illustrative example of what might be. Note, Iceshrimp _already_ federates some emoji as anonymous objects and Misskey explicitly handles `null` ids since before Iceshrimp’s existence iinm hinting at other implementations doing so as well. Therefore you already need to handle `null` IDs anyway for compatibility reasons. Going by AP sec, an anonymous object is afict perfectly appropriate at this place too and as mentioned before for e.g. C2S it’s entirely impossible to have meaningful IDs. Fabricated IDs on the other hand blatantly go against AP design. > it could be a regular ID that resolves to 404. Non-public objects often can't be fetched either Putting aside how this might go against the spirit of AP spec, there’s still a difference. The only thing someone unauthorised ever sees of an access-restricted post is > I assume the implementation in question will not attempt to fetch object if origin is the same. This assumption is not _entirely_ unreasonable (it saves work and at the toplevel is required due to Mastodon shenanigans). But given it’s not uncommon for inlined objects to be minimised in some form, fetching them to expand all details isn’t entirely unreasonable either. To reiterate: Anonymous emoji already exist. It is outright impossible for some implementations to provide meaningful non-anonymous emoji. Anonymous objects are part of core AP spec. Mastodon docs don’t even mention an emoji `id` outside one illustrative example. Unless troubles with major other implementations show up in testing, i personally really see no reason why fabricating a fake ID instead, breaking expected AP semantics, would be preferable. (And even then i’d still report it as a bug to affected implementations so future implementers won’t be held up by this). Whatever format we fabricate now might also get in the way should we later implement more advanced emoji management and tracking.
Author

Note, Iceshrimp already federates some emoji as anonymous objects and Misskey explicitly handles null ids since before Iceshrimp’s existence iinm hinting at other implementations doing so as well. Therefore you already need to handle null IDs anyway for compatibility reasons.

No, existing implementations use non-anonymous emojis, I have never seen an Emoji object without id. This is the de-facto standard that emerged over the years. I don't think Iceshrimp.NET is a good counter-example because it is a new software (their site says it's beta).

Unless troubles with major other implementations show up in testing, i personally really see no reason why fabricating a fake ID instead, breaking expected AP semantics, would be preferable. (And even then i’d still report it as a bug to affected implementations so future implementers won’t be held up by this).

ActivityPub spec doesn't say that anonymous objects are always valid, obviously that depends on the object type and the context. So, if software doesn't support anonymous emojis this is not a bug, just an implementation detail.

I agree that fabricated emoji IDs are not ideal, but they work fine in practice (Pleroma used them for a long time). On the other hand, setting id to null is a breaking change that will make Akkoma emojis incompatible with some other implementations.

>Note, Iceshrimp already federates some emoji as anonymous objects and Misskey explicitly handles null ids since before Iceshrimp’s existence iinm hinting at other implementations doing so as well. Therefore you already need to handle null IDs anyway for compatibility reasons. No, existing implementations use non-anonymous emojis, I have never seen an `Emoji` object without `id`. This is the de-facto standard that emerged over the years. I don't think Iceshrimp.NET is a good counter-example because it is a new software (their site says it's beta). >Unless troubles with major other implementations show up in testing, i personally really see no reason why fabricating a fake ID instead, breaking expected AP semantics, would be preferable. (And even then i’d still report it as a bug to affected implementations so future implementers won’t be held up by this). ActivityPub spec doesn't say that anonymous objects are always valid, obviously that depends on the object type and the context. So, if software doesn't support anonymous emojis this is not a bug, just an implementation detail. I agree that fabricated emoji IDs are not ideal, but they work fine in practice (Pleroma used them for a long time). On the other hand, setting `id` to `null` is a breaking change that will make Akkoma emojis incompatible with some other implementations.
Member

I continue to disagree and i’ll once more point out it isn’t just Iceshrimp.NET so anything assuming emoji are first-class objects is already broken.

One more thing to note: if we were to fabricate local ids for remote or custom user emoji, this may become a inter-instance moderation issue as emoji not approved by the instance admin get labeled as official instance emoji. This may lead to misunderstandings or break existing strict blocking policies, which use domain of id or as fallback icon.url

Given no agreement is in sight and we just go in circles i’ll stop responding here unless something noteworthy new comes up. Practical tests and/or Floati’s decision will have to decide this

I continue to disagree and i’ll once more point out it isn’t just Iceshrimp.NET so anything assuming emoji are first-class objects is already broken. One more thing to note: if we were to fabricate local ids for remote or custom user emoji, this may become a inter-instance moderation issue as emoji not approved by the instance admin get labeled as official instance emoji. This may lead to misunderstandings or break existing strict blocking policies, which use domain of `id` or as fallback `icon.url` Given no agreement is in sight and we just go in circles i’ll stop responding here unless something noteworthy new comes up. Practical tests and/or Floati’s decision will have to decide this
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#820
No description provided.