backend: add automatic dead instance detection #204

Merged
toast merged 3 commits from dead-instance into main 2022-10-16 13:45:29 +00:00
Owner

It works by having a day-long cache of
"when did we last successfully communicate with this instance?"
Anything over a specified threshold (1 month) will act as though the instance
is suspended - all outgoing jobs are dropped on processing.
The day-long cache is in place because the ordering is necessarily a
linear scan.
Once an instance comes back online, we will detect that is the case as soon as
we receive an activity from them (which will update the "last communicated at")
field.

Potential future TODOs:

  • Improve the caching system, it's actually pretty inefficient as it is.
    CacheBox with a call override?

  • Think of ways to make it not-a-linear-scan, since the instances table can get
    pretty big. It's around 4500 on toast cafe.

    Recommend fast-forward merging.

It works by having a day-long cache of "when did we last successfully communicate with this instance?" Anything over a specified threshold (1 month) will act as though the instance is suspended - all outgoing jobs are dropped on processing. The day-long cache is in place because the ordering is necessarily a linear scan. Once an instance comes back online, we will detect that is the case as soon as we receive an activity from them (which will update the "last communicated at") field. Potential future TODOs: * Improve the caching system, it's actually pretty inefficient as it is. CacheBox with a call override? * Think of ways to make it not-a-linear-scan, since the instances table can get pretty big. It's around 4500 on toast cafe. Recommend fast-forward merging.
toast force-pushed dead-instance from 32b208298b to 91a4f38871 2022-10-16 12:16:12 +00:00 Compare
Author
Owner

This fixes #200 and should be merged before #198 (which IMO should be merged immediately afterwards).

This fixes #200 and should be merged before #198 (which IMO should be merged immediately afterwards).
Johann150 added a new dependency 2022-10-16 12:47:31 +00:00
Johann150 added a new dependency 2022-10-16 12:47:52 +00:00
Johann150 requested changes 2022-10-16 12:51:33 +00:00
@ -20,1 +21,4 @@
const suspendedHostsCache = new Cache<Instance[]>(1000 * 60 * 60);
// dead host list is a linear scan, so cache it longer
const deadHostsCache = new Cache<Instance[]>(1000 * 60 * 60 * 24);
const deadThreshold = 1000 * 60 * 60 * 24 * 30; // 1 month
Owner

You should probably use the time constants here:

import { DAY } from '@/const.ts';

//...
const deadHostsCache = new Cache<Instance[]>(DAY);
const deadThreshold = 30 * DAY;
You should probably use the time constants here: ```typescript import { DAY } from '@/const.ts'; //... const deadHostsCache = new Cache<Instance[]>(DAY); const deadThreshold = 30 * DAY; ```
Author
Owner

Should probably also do that for prelude/time.ts huh
TODO ig

Should probably also do that for prelude/time.ts huh TODO ig
toast marked this conversation as resolved
@ -45,0 +53,4 @@
lastCommunicatedAt: LessThan(deadTime),
});
deadHostsCache.set(null, deadHosts);
}
Owner

This whole bit with the caching I don't really understand. Couldn't you just as well just request that one instance? Perhaps something like this:

const deadTime = new Date(Date.now() - deadThreshold);
const isDead = await Instances.countBy({
	host,
	lastCommunicatedAt: LessThan(deadTime),
});

if (isDead) return 'skip (dead instance)';

This would also have the advantage of immediately resuming contact with an instance instead of having to wait for the cache to time out.

This whole bit with the caching I don't really understand. Couldn't you just as well just request that one instance? Perhaps something like this: ```typescript const deadTime = new Date(Date.now() - deadThreshold); const isDead = await Instances.countBy({ host, lastCommunicatedAt: LessThan(deadTime), }); if (isDead) return 'skip (dead instance)'; ``` This would also have the advantage of immediately resuming contact with an instance instead of having to wait for the cache to time out.
Author
Owner

Would it make sense to do that for suspended instances too? (same logic)

Would it make sense to do that for suspended instances too? (same logic)
Owner

I think yes? I don't know why it was done that way and how the different performances are but it might well have just been waves hands classic Misskey.

I think yes? I don't know why it was done that way and how the different performances are ~~but it might well have just been *waves hands* classic Misskey~~.
Owner

I think you removed the definition of deadTime.

I think you removed the definition of `deadTime`.
Author
Owner

how to tell I'm not finished caffeinating 👍

how to tell I'm not finished caffeinating 👍
toast marked this conversation as resolved
toast added 1 commit 2022-10-16 13:23:25 +00:00
backend: simplify suspended and dead queries
Some checks failed
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/pr/lint-backend Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline failed
ci/woodpecker/pr/test Pipeline failed
21c1e5c06c
This should also have better latency due to being a single query.
Furthermore, it's no longer a linear scan, since host is indexed.
Would be cool to simplify it further to a single query for blocks also...
Why exactly are blocks not in the db?
toast requested review from Johann150 2022-10-16 13:23:44 +00:00
toast added 1 commit 2022-10-16 13:32:23 +00:00
backend: fixup missing deadTime and incorrect import
Some checks failed
ci/woodpecker/pr/lint-backend Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline failed
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/pr/test Pipeline failed
d762143b89
toast merged commit d762143b89 into main 2022-10-16 13:45:29 +00:00
toast deleted branch dead-instance 2022-10-16 13:45:30 +00:00
Sign in to join this conversation.
No reviewers
No labels
feature
fix
upkeep
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference: FoundKeyGang/FoundKey#204
No description provided.