backend: add automatic dead instance detection #204

toast · 2022-10-16T12:13:25Z

toast commented

2022-10-16 12:13:25 +00:00

It works by having a day-long cache of
"when did we last successfully communicate with this instance?"
Anything over a specified threshold (1 month) will act as though the instance
is suspended - all outgoing jobs are dropped on processing.
The day-long cache is in place because the ordering is necessarily a
linear scan.
Once an instance comes back online, we will detect that is the case as soon as
we receive an activity from them (which will update the "last communicated at")
field.

Potential future TODOs:

Improve the caching system, it's actually pretty inefficient as it is.
CacheBox with a call override?
Think of ways to make it not-a-linear-scan, since the instances table can get
pretty big. It's around 4500 on toast cafe.

Recommend fast-forward merging.

It works by having a day-long cache of "when did we last successfully communicate with this instance?" Anything over a specified threshold (1 month) will act as though the instance is suspended - all outgoing jobs are dropped on processing. The day-long cache is in place because the ordering is necessarily a linear scan. Once an instance comes back online, we will detect that is the case as soon as we receive an activity from them (which will update the "last communicated at") field. Potential future TODOs: * Improve the caching system, it's actually pretty inefficient as it is. CacheBox with a call override? * Think of ways to make it not-a-linear-scan, since the instances table can get pretty big. It's around 4500 on toast cafe. Recommend fast-forward merging.

🎉 1

toast force-pushed dead-instance from 32b208298b to 91a4f38871

2022-10-16 12:16:12 +00:00

Compare

toast commented

2022-10-16 12:22:56 +00:00

This fixes #200 and should be merged before #198 (which IMO should be merged immediately afterwards).

Johann150 added a new dependency 2022-10-16 12:47:31 +00:00

#198 deliver Delete activities to all known instances

Johann150 added a new dependency 2022-10-16 12:47:52 +00:00

#200 Stop federation attempts with dead instances

Johann150 requested changes 2022-10-16 12:51:33 +00:00

packages/backend/src/queue/processors/deliver.ts Outdated

					
				@ -20,1 +21,4 @@

				const suspendedHostsCache = new Cache<Instance[]>(1000 * 60 * 60);

				// dead host list is a linear scan, so cache it longer

				const deadHostsCache = new Cache<Instance[]>(1000 * 60 * 60 * 24);

				const deadThreshold = 1000 * 60 * 60 * 24 * 30; // 1 month

Johann150 commented

2022-10-16 12:39:00 +00:00

You should probably use the time constants here:

import { DAY } from '@/const.ts';

//...
const deadHostsCache = new Cache<Instance[]>(DAY);
const deadThreshold = 30 * DAY;

You should probably use the time constants here: ```typescript import { DAY } from '@/const.ts'; //... const deadHostsCache = new Cache<Instance[]>(DAY); const deadThreshold = 30 * DAY; ```

toast commented

2022-10-16 13:03:50 +00:00

Should probably also do that for prelude/time.ts huh
TODO ig

Should probably also do that for prelude/time.ts huh TODO ig

toast marked this conversation as resolved

packages/backend/src/queue/processors/deliver.ts Outdated

					
				@ -45,0 +53,4 @@

							lastCommunicatedAt: LessThan(deadTime),

						});

						deadHostsCache.set(null, deadHosts);

					}

Johann150 commented

2022-10-16 12:51:08 +00:00

This whole bit with the caching I don't really understand. Couldn't you just as well just request that one instance? Perhaps something like this:

const deadTime = new Date(Date.now() - deadThreshold);
const isDead = await Instances.countBy({
	host,
	lastCommunicatedAt: LessThan(deadTime),
});

if (isDead) return 'skip (dead instance)';

This would also have the advantage of immediately resuming contact with an instance instead of having to wait for the cache to time out.

This whole bit with the caching I don't really understand. Couldn't you just as well just request that one instance? Perhaps something like this: ```typescript const deadTime = new Date(Date.now() - deadThreshold); const isDead = await Instances.countBy({ host, lastCommunicatedAt: LessThan(deadTime), }); if (isDead) return 'skip (dead instance)'; ``` This would also have the advantage of immediately resuming contact with an instance instead of having to wait for the cache to time out.

toast commented

2022-10-16 13:03:00 +00:00

Would it make sense to do that for suspended instances too? (same logic)

Johann150 commented

2022-10-16 13:04:57 +00:00

I think yes? I don't know why it was done that way and how the different performances are ~~but it might well have just been waves hands classic Misskey~~.

I think yes? I don't know why it was done that way and how the different performances are ~~but it might well have just been *waves hands* classic Misskey~~.

👍 1

Johann150 commented

2022-10-16 13:25:53 +00:00

I think you removed the definition of deadTime.

I think you removed the definition of `deadTime`.

toast commented

2022-10-16 13:30:44 +00:00

how to tell I'm not finished caffeinating 👍

toast marked this conversation as resolved

toast added 1 commit 2022-10-16 13:23:25 +00:00

backend: simplify suspended and dead queries

ci/woodpecker/push/build Pipeline was successful

Details

ci/woodpecker/push/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/push/lint-client Pipeline was successful

Details

ci/woodpecker/push/lint-backend Pipeline was successful

Details

ci/woodpecker/push/test Pipeline was successful

Details

ci/woodpecker/pr/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/pr/lint-backend Pipeline was successful

Details

ci/woodpecker/pr/build Pipeline was successful

Details

ci/woodpecker/pr/lint-client Pipeline failed

Details

ci/woodpecker/pr/test Pipeline failed

Details

21c1e5c06c

This should also have better latency due to being a single query.
Furthermore, it's no longer a linear scan, since host is indexed.
Would be cool to simplify it further to a single query for blocks also...
Why exactly are blocks not in the db?

toast requested review from Johann150 2022-10-16 13:23:44 +00:00

toast added 1 commit 2022-10-16 13:32:23 +00:00

backend: fixup missing deadTime and incorrect import

ci/woodpecker/pr/lint-backend Pipeline was successful

Details

ci/woodpecker/pr/build Pipeline was successful

Details

ci/woodpecker/pr/lint-client Pipeline failed

Details

ci/woodpecker/pr/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/push/build Pipeline was successful

Details

ci/woodpecker/push/lint-client Pipeline was successful

Details

ci/woodpecker/push/lint-backend Pipeline was successful

Details

ci/woodpecker/push/lint-foundkey-js Pipeline was successful

Details

ci/woodpecker/push/test Pipeline was successful

Details

ci/woodpecker/pr/test Pipeline failed

Details

d762143b89

toast merged commit d762143b89 into main

2022-10-16 13:45:29 +00:00

toast deleted branch dead-instance

2022-10-16 13:45:30 +00:00

Sign in to join this conversation.

No reviewers

No labels

feature

fix

upkeep

No milestone

No project

No assignees

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks

#198 deliver Delete activities to all known instances}

FoundKeyGang/FoundKey

#200 Stop federation attempts with dead instances}

FoundKeyGang/FoundKey

Reference: FoundKeyGang/FoundKey#204