deliver Delete activities to all known instances #198

Merged
toast merged 4 commits from deliver-delete-everyone into main 2022-10-16 13:46:24 +00:00
Owner
No description provided.
Johann150 added 1 commit 2022-10-10 22:32:48 +00:00
8920eeb86a
ActivityPub: allow all known shared inboxes to be addressed
This is oriented on this paragraph from the AP spec:

> Additionally, if an object is addressed to the Public special collection,
> a server MAY deliver that object to all known sharedInbox endpoints
> on the network.
helene requested changes 2022-10-11 17:10:46 +00:00
@ -88,0 +101,4 @@
if (this.recipes.some(r => isEveryone(r))) {
// deliver to all of known network
const sharedInboxes = await Users.createQueryBuilder('users')
Contributor

SELECT DISTINCT would be preferable. Overall, it seems that target inboxes are not deduplicated.

`SELECT DISTINCT` would be preferable. Overall, it seems that target inboxes are not deduplicated.
Author
Owner

That is what the groupBy in line 111 was meant for. I didn't know, but just checked and there is .distinct(true) available for query builders to achieve SELECT DISTINCT. I'm not sure if it makes a difference to GROUP BY.

The overall deduplication is achieved because inboxes is a Set and not an array.

That is what the `groupBy` in line 111 was meant for. I didn't know, but just checked and there is `.distinct(true)` available for query builders to achieve `SELECT DISTINCT`. I'm not sure if it makes a difference to `GROUP BY`. The overall deduplication is achieved because `inboxes` is a `Set` and not an array.
Contributor

Best way to check that would be via Postgres EXPLAIN ANALYZE, I believe.

Best way to check that would be via Postgres `EXPLAIN ANALYZE`, I believe.
Author
Owner

I ran it myself but I also had a look at this Stackoverflow answer.

result of running EXPLAIN ANALYZE. tl;dr the query plans are identical
misskey=> explain analyze select distinct "sharedInbox" from "user" where "sharedInbox" is not null and "host" is not null;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1482.82..1509.61 rows=2679 width=28) (actual time=14.398..14.724 rows=2684 loops=1)
   Group Key: "sharedInbox"
   Batches: 1  Memory Usage: 369kB
   ->  Seq Scan on "user"  (cost=0.00..1427.35 rows=22187 width=28) (actual time=0.013..9.184 rows=22216 loops=1)
         Filter: (("sharedInbox" IS NOT NULL) AND (host IS NOT NULL))
         Rows Removed by Filter: 144
 Planning Time: 0.798 ms
 Execution Time: 14.933 ms
(8 rows)

misskey=> explain analyze select "sharedInbox" from "user" where "sharedInbox" is not null and "host" is not null group by "sharedInbox";
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1482.82..1509.61 rows=2679 width=28) (actual time=14.026..14.350 rows=2684 loops=1)
   Group Key: "sharedInbox"
   Batches: 1  Memory Usage: 369kB
   ->  Seq Scan on "user"  (cost=0.00..1427.35 rows=22187 width=28) (actual time=0.010..8.622 rows=22216 loops=1)
         Filter: (("sharedInbox" IS NOT NULL) AND (host IS NOT NULL))
         Rows Removed by Filter: 144
 Planning Time: 0.124 ms
 Execution Time: 14.491 ms
(8 rows)

misskey=> select version();
                                                          version
---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 13.8 (Debian 13.8-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

From the Stackoverflow answer:

For only a few records (f.e. 100000) it doesn't really matter. Both will use the same HashAggregate method. [...]
But for a larger recordset there's a difference. [...]
So in this example the DISTINCT was still faster. [...]

I ran it myself but I also had a look at [this Stackoverflow answer](https://stackoverflow.com/a/70436474). <details><summary>result of running <code>EXPLAIN ANALYZE</code>. tl;dr the query plans are identical</summary> ```text misskey=> explain analyze select distinct "sharedInbox" from "user" where "sharedInbox" is not null and "host" is not null; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=1482.82..1509.61 rows=2679 width=28) (actual time=14.398..14.724 rows=2684 loops=1) Group Key: "sharedInbox" Batches: 1 Memory Usage: 369kB -> Seq Scan on "user" (cost=0.00..1427.35 rows=22187 width=28) (actual time=0.013..9.184 rows=22216 loops=1) Filter: (("sharedInbox" IS NOT NULL) AND (host IS NOT NULL)) Rows Removed by Filter: 144 Planning Time: 0.798 ms Execution Time: 14.933 ms (8 rows) misskey=> explain analyze select "sharedInbox" from "user" where "sharedInbox" is not null and "host" is not null group by "sharedInbox"; QUERY PLAN ------------------------------------------------------------------------------------------------------------------ HashAggregate (cost=1482.82..1509.61 rows=2679 width=28) (actual time=14.026..14.350 rows=2684 loops=1) Group Key: "sharedInbox" Batches: 1 Memory Usage: 369kB -> Seq Scan on "user" (cost=0.00..1427.35 rows=22187 width=28) (actual time=0.010..8.622 rows=22216 loops=1) Filter: (("sharedInbox" IS NOT NULL) AND (host IS NOT NULL)) Rows Removed by Filter: 144 Planning Time: 0.124 ms Execution Time: 14.491 ms (8 rows) misskey=> select version(); version --------------------------------------------------------------------------------------------------------------------------- PostgreSQL 13.8 (Debian 13.8-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit (1 row) ``` </details> From the Stackoverflow answer: > For only a few records (f.e. 100000) it doesn't really matter. Both will use the same HashAggregate method. [...] > But for a larger recordset there's a difference. [...] > So in this example the `DISTINCT` was still faster. [...]
Owner

Fun thing just happend, postgres 15 came out and...

Queries using SELECT DISTINCT can now be executed in parallel.

Would be nice to see if we're affected.

Fun thing just happend, postgres 15 came out and... > Queries using SELECT DISTINCT can now be executed in parallel. Would be nice to see if we're affected.
Johann150 marked this conversation as resolved
norm reviewed 2022-10-11 17:11:53 +00:00
@ -11,3 +11,3 @@
import { Notes, Users, Instances } from '@/models/index.js';
import { notesChart, perUserNotesChart, instanceChart } from '@/services/chart/index.js';
import { deliverToFollowers, deliverToUser } from '@/remote/activitypub/deliver-manager.js';
import { DeliverManager } from '@/remote/activitypub/deliver-manager.js';
Owner

This seems to cause an import error:

file:///home/foundkey/foundkey/packages/backend/built/services/note/delete.js:11
import { DeliverManager } from '../../remote/activitypub/deliver-manager.js';
         ^^^^^^^^^^^^^^
SyntaxError: The requested module '../../remote/activitypub/deliver-manager.js' does not provide an export named 'DeliverManager'
    at ModuleJob._instantiate (node:internal/modules/esm/module_job:123:21)
    at async ModuleJob.run (node:internal/modules/esm/module_job:189:5)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:526:24)
    at async workerMain (file:///home/foundkey/foundkey/packages/backend/built/boot/worker.js:9:5)
    at async default (file:///home/foundkey/foundkey/packages/backend/built/boot/index.js:25:9)
This seems to cause an import error: ``` file:///home/foundkey/foundkey/packages/backend/built/services/note/delete.js:11 import { DeliverManager } from '../../remote/activitypub/deliver-manager.js'; ^^^^^^^^^^^^^^ SyntaxError: The requested module '../../remote/activitypub/deliver-manager.js' does not provide an export named 'DeliverManager' at ModuleJob._instantiate (node:internal/modules/esm/module_job:123:21) at async ModuleJob.run (node:internal/modules/esm/module_job:189:5) at async Promise.all (index 0) at async ESMLoader.import (node:internal/modules/esm/loader:526:24) at async workerMain (file:///home/foundkey/foundkey/packages/backend/built/boot/worker.js:9:5) at async default (file:///home/foundkey/foundkey/packages/backend/built/boot/index.js:25:9) ```
Johann150 marked this conversation as resolved
Johann150 force-pushed deliver-delete-everyone from 103348ee3a to 421b42d07d 2022-10-11 17:32:37 +00:00 Compare
Johann150 added 1 commit 2022-10-11 18:16:48 +00:00
ci/woodpecker/push/lint-client Pipeline was successful Details
ci/woodpecker/push/lint-backend Pipeline was successful Details
ci/woodpecker/push/build Pipeline was successful Details
ci/woodpecker/push/lint-foundkey-js Pipeline was successful Details
ci/woodpecker/push/test Pipeline was successful Details
0b8fa2665c
use DISTINCT instead of GROUP BY
This should have better performance for large recordsets.

Ref: #198 (comment)
norm requested changes 2022-10-11 19:14:41 +00:00
@ -140,1 +139,4 @@
manager.addDirectRecipe(remoteUser);
}
if (['public', 'home', 'followers'].contains(note.visibility)) {
Owner

Should be includes instead of contains.

Should be `includes` instead of `contains`.
norm marked this conversation as resolved
@ -141,0 +143,4 @@
manager.addFollowersRecipe();
}
if (['public', 'home'].contains(note.visibility)) {
Owner

Also should be changed to includes

Also should be changed to `includes`
norm marked this conversation as resolved
Owner

Also seems like it's spending a lot of time fetching the nodeinfo for every instance, making things very slow

Also seems like it's spending a lot of time fetching the nodeinfo for every instance, making things very slow
Johann150 added 1 commit 2022-10-11 19:26:32 +00:00
ci/woodpecker/push/build Pipeline was successful Details
ci/woodpecker/push/lint-backend Pipeline was successful Details
ci/woodpecker/push/lint-foundkey-js Pipeline was successful Details
ci/woodpecker/push/lint-client Pipeline was successful Details
ci/woodpecker/push/test Pipeline was successful Details
ci/woodpecker/pr/lint-backend Pipeline was successful Details
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful Details
ci/woodpecker/pr/build Pipeline was successful Details
ci/woodpecker/pr/lint-client Pipeline failed Details
ci/woodpecker/pr/test Pipeline failed Details
7cd11e7afd
fix function name
Owner

Does seem to work now, albeit it does seem to be spending a lot of time going through all of the dead instances it knows (which I assume would be fixed once #200 is resolved).

Does seem to work now, albeit it does seem to be spending a lot of time going through all of the dead instances it knows (which I assume would be fixed once #200 is resolved).
Author
Owner

Well since the deliver activity includes fetching instance data always, I'm not sure if these things are related?

Well since the deliver activity includes fetching instance data always, I'm not sure if these things are related?
Owner

The nodeinfo thing seems to have resolved itself after a bit but the actual sending of delete activites is still related to the issue of dealing with dead instances since we're now sending those to every known instance.

The nodeinfo thing seems to have resolved itself after a bit but the actual sending of delete activites is still related to the issue of dealing with dead instances since we're now sending those to every known instance.
Owner

Just applied this on toast.cafe - let's see how that goes.

Just applied this on toast.cafe - let's see how that goes.
Owner

So update, it's not a great experience.
I get floods of
[chart] instance:SOME_INSTANCE(hour): New commit created
and
[chart] instance:SOME_INSTANCE: Updated
during which everything basically stops dead for a while.
I wonder exactly what's going on there, and whether that could be stuffed into a priority queue in redis instead 🤔

So update, it's not a great experience. I get floods of `[chart] instance:SOME_INSTANCE(hour): New commit created` and `[chart] instance:SOME_INSTANCE: Updated` during which everything basically stops dead for a while. I wonder exactly what's going on there, and whether that could be stuffed into a priority queue in redis instead 🤔
Author
Owner

Those two are the log messages for starting and finishing an update of another instances information (e.g. nodeinfo).

Those two are the log messages for starting and finishing an update of another instances information (e.g. nodeinfo).
Johann150 added a new dependency 2022-10-16 12:47:31 +00:00
toast merged commit 811d5cd0d7 into main 2022-10-16 13:46:24 +00:00
toast deleted branch deliver-delete-everyone 2022-10-16 13:46:24 +00:00
Sign in to join this conversation.
No reviewers
No Label
feature
fix
upkeep
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: FoundKeyGang/FoundKey#198
No description provided.