backend: add automatic dead instance detection #204
No reviewers
Labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Blocks
#198 deliver Delete activities to all known instances}
FoundKeyGang/FoundKey
#200 Stop federation attempts with dead instances}
FoundKeyGang/FoundKey
Reference: FoundKeyGang/FoundKey#204
Loading…
Reference in a new issue
No description provided.
Delete branch "dead-instance"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
It works by having a day-long cache of
"when did we last successfully communicate with this instance?"
Anything over a specified threshold (1 month) will act as though the instance
is suspended - all outgoing jobs are dropped on processing.
The day-long cache is in place because the ordering is necessarily a
linear scan.
Once an instance comes back online, we will detect that is the case as soon as
we receive an activity from them (which will update the "last communicated at")
field.
Potential future TODOs:
Improve the caching system, it's actually pretty inefficient as it is.
CacheBox with a call override?
Think of ways to make it not-a-linear-scan, since the instances table can get
pretty big. It's around 4500 on toast cafe.
Recommend fast-forward merging.
32b208298b
to91a4f38871
This fixes #200 and should be merged before #198 (which IMO should be merged immediately afterwards).
@ -20,1 +21,4 @@
const suspendedHostsCache = new Cache<Instance[]>(1000 * 60 * 60);
// dead host list is a linear scan, so cache it longer
const deadHostsCache = new Cache<Instance[]>(1000 * 60 * 60 * 24);
const deadThreshold = 1000 * 60 * 60 * 24 * 30; // 1 month
You should probably use the time constants here:
Should probably also do that for prelude/time.ts huh
TODO ig
@ -45,0 +53,4 @@
lastCommunicatedAt: LessThan(deadTime),
});
deadHostsCache.set(null, deadHosts);
}
This whole bit with the caching I don't really understand. Couldn't you just as well just request that one instance? Perhaps something like this:
This would also have the advantage of immediately resuming contact with an instance instead of having to wait for the cache to time out.
Would it make sense to do that for suspended instances too? (same logic)
I think yes? I don't know why it was done that way and how the different performances are
but it might well have just been waves hands classic Misskey.I think you removed the definition of
deadTime
.how to tell I'm not finished caffeinating 👍