Undo activities may not be handled. #91

Open
opened 2022-08-28 16:15:34 +00:00 by Johann150 · 2 comments
Owner

Some Undo activities are not handled correctly. The problem is that the remote server sends the object as only its id instead of the full object.

Because it was undone, the remote server will return a 404 error code when Misskey tries to fetch the object with HTTP. Particularly, this is part of Litepub with its plausible deniability "features".

example Activity
{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "https://pleroma.example/schemas/litepub-0.1.jsonld",
    {
      "@language": "und"
    }
  ],
  "type": "Undo",
  "id": "https://pleroma.example/activities/320d5a6f-2a0b-4d3f-b59b-77fbe66c9796",
  "actor": "https://pleroma.example/users/example",
  "to": [ "https://pleroma.example/users/example/followers" ],
  "cc": [ "https://www.w3.org/ns/activitystreams#Public" ],
  "object": "https://pleroma.example/activities/a684eca1-3c1d-4ec9-b259-323c2b1312e5"
}

In theory the id is known to Misskey so it should be able to handle the Undo. However, this id is either not saved when the Create activity is handled (e.g. for reactions) or not looked up when handling the Undo, even if external resolution fails (e.g. renotes).

Steps to Reproduce

  1. React to a note from Pleroma
  2. Remove reaction
  3. Reaction is still visible on Foundkey

see also https://github.com/misskey-dev/misskey/issues/8796

Some `Undo` activities are not handled correctly. The problem is that the remote server sends the `object` as only its `id` instead of the full object. Because it was undone, the remote server will return a 404 error code when Misskey tries to fetch the object with HTTP. Particularly, this is part of Litepub with its plausible deniability "features". <details> <summary>example Activity</summary> ```json { "@context": [ "https://www.w3.org/ns/activitystreams", "https://pleroma.example/schemas/litepub-0.1.jsonld", { "@language": "und" } ], "type": "Undo", "id": "https://pleroma.example/activities/320d5a6f-2a0b-4d3f-b59b-77fbe66c9796", "actor": "https://pleroma.example/users/example", "to": [ "https://pleroma.example/users/example/followers" ], "cc": [ "https://www.w3.org/ns/activitystreams#Public" ], "object": "https://pleroma.example/activities/a684eca1-3c1d-4ec9-b259-323c2b1312e5" } ``` </details> In theory the `id` is known to Misskey so it should be able to handle the `Undo`. However, this `id` is either not saved when the `Create` activity is handled (e.g. for reactions) or not looked up when handling the `Undo`, even if external resolution fails (e.g. renotes). ## Steps to Reproduce 1. React to a note from Pleroma 2. Remove reaction 3. Reaction is still visible on Foundkey see also https://github.com/misskey-dev/misskey/issues/8796
Johann150 added this to the (deleted) project 2022-08-28 16:15:39 +00:00
Author
Owner

Something I proposed in the Misskey issue was to have all ActivityPub ids in a separate database table. I wonder if that could also be used to implement #53.

But I don't really like the solution because of the potential added SQL join in some places.

Something I proposed in the Misskey issue was to have all ActivityPub `id`s in a separate database table. I wonder if that could also be used to implement #53. But I don't really like the solution because of the potential added SQL join in some places.
Johann150 added the
fix
label 2022-12-23 10:22:12 +00:00
Johann150 removed this from the (deleted) project 2022-12-23 10:22:15 +00:00
Author
Owner

Here is a discussion of all options I can think of. The final goal is having an easy way to go from database object to URI and back. This means it requires a relation containing: database id, table name and URI.

1) moving the URIs into a separate table

This is the solution I already mentioned previously.

The idea is that the current url columns/attributes would be removed, the values would be moved to a new table (e.g. uri_map). In the code, the uri attributes would be replaced with a join column that is a new entity.

Advantages

  • It would easily be possible to check that IDs and URIs are only used once on the entire server.
  • Indices can make a search quick.

Disadvantages

  • Most likely hard to refactor since all uses of the previous uri attributes have to be checked and maybe replaced.
  • Introduces more joins.
  • Makes code more complicated because it has to be checked wether the URI relation was already loaded.
  • Since multiple tables would have a relation to uri_map, having a foreign key with on cascade delete will only be possible in the wrong direction, more complicated logic to delete data from uri_map would probably be necessary.

2) duplicating the URIs in a separate table

Similar to 1) insofar as there is a new table created. However, the existing columns and attributes remain as is.

Advantages

  • It would easily be possible to check that IDs and URIs are only used once on the entire server.
  • Indices can make a search quick.
  • Easier to refactor than option 1) since existing columns and attributes remain as they are.

Disadvantages

  • Duplicates data among multiple tables.
  • Makes code more complicated to insert or delete entries in the uri_map table every time.

3) creating a view

The implementation is somewhat similar to option 2, however no data is actually duplicated. A view is basically a query that is stored by the database engine and run every time you request data from the view. In this case the view would be a UNION of all the different tables that contain a URI.

Advantages

  • Easier to refactor than option 1) since existing columns and attributes remain as they are.
  • No code is required to keep the view up to date because it queries the underlying tables instead.

Disadvantages

  • Indices cannot be defined on views, lookups could be very slow. (Maybe the indices of the underlying tables would be used?)
  • Uniqueness cannot easily be checked because there are no indices over the whole view.
  • "custom" URIs like mentioned above for contexts cannot be added since the data is strictly data from the existing tables.

4) creating a materialized view

The initial implementation is similar to option 3, but then using it is more similar to option 2.

Advantages

  • Indices can make a search quick.
  • It would easily be possible to check that IDs and URIs are only used once on the entire server.
  • Easier to refactor than option 1) since existing columns and attributes remain as they are.
  • No code is required to keep the view up to date because it queries the underlying tables instead (when refreshed).

Disadvantages

  • Additional code is required to refresh the materialized view. It is unclear when this would be, perhaps periodically? In many use cases, the most current data may not be necessary.
  • Refreshing the view can add performance overhead.
  • Duplicates data from the tables because a materialized view is stored to disk.
  • "custom" URIs like mentioned above for contexts cannot be added since the data is strictly data from the existing tables.
Here is a discussion of all options I can think of. The final goal is having an easy way to go from database object to URI and back. This means it requires a relation containing: database id, table name and URI. ### 1) moving the URIs into a separate table This is the solution I already mentioned previously. The idea is that the current `url` columns/attributes would be removed, the values would be moved to a new table (e.g. `uri_map`). In the code, the `uri` attributes would be replaced with a join column that is a new entity. #### Advantages - It would easily be possible to check that IDs and URIs are only used once on the entire server. - Indices can make a search quick. #### Disadvantages - Most likely hard to refactor since all uses of the previous `uri` attributes have to be checked and maybe replaced. - Introduces more joins. - Makes code more complicated because it has to be checked wether the URI relation was already loaded. - Since multiple tables would have a relation to `uri_map`, having a foreign key with on cascade delete will only be possible in the wrong direction, more complicated logic to delete data from `uri_map` would probably be necessary. ### 2) duplicating the URIs in a separate table Similar to 1) insofar as there is a new table created. However, the existing columns and attributes remain as is. #### Advantages - It would easily be possible to check that IDs and URIs are only used once on the entire server. - Indices can make a search quick. - Easier to refactor than option 1) since existing columns and attributes remain as they are. #### Disadvantages - Duplicates data among multiple tables. - Makes code more complicated to insert or delete entries in the `uri_map` table every time. ### 3) creating a view The implementation is somewhat similar to option 2, however no data is actually duplicated. A view is basically a query that is stored by the database engine and run every time you request data from the view. In this case the view would be a `UNION` of all the different tables that contain a URI. #### Advantages - Easier to refactor than option 1) since existing columns and attributes remain as they are. - No code is required to keep the view up to date because it queries the underlying tables instead. #### Disadvantages - Indices cannot be defined on views, lookups could be very slow. (Maybe the indices of the underlying tables would be used?) - Uniqueness cannot easily be checked because there are no indices over the whole view. - "custom" URIs like mentioned above for contexts cannot be added since the data is strictly data from the existing tables. ### 4) creating a materialized view The initial implementation is similar to option 3, but then using it is more similar to option 2. #### Advantages - Indices can make a search quick. - It would easily be possible to check that IDs and URIs are only used once on the entire server. - Easier to refactor than option 1) since existing columns and attributes remain as they are. - No code is required to keep the view up to date because it queries the underlying tables instead (when refreshed). #### Disadvantages - Additional code is required to refresh the materialized view. It is unclear when this would be, perhaps periodically? In many use cases, the most current data may not be necessary. - Refreshing the view can add performance overhead. - Duplicates data from the tables because a materialized view is stored to disk. - "custom" URIs like mentioned above for contexts cannot be added since the data is strictly data from the existing tables.
Sign in to join this conversation.
No labels
feature
fix
upkeep
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#91
No description provided.