backend: add automatic dead instance detection #204

Merged
toast merged 3 commits from dead-instance into main 2022-10-16 13:45:29 +00:00

3 commits

Author SHA1 Message Date
d762143b89 backend: fixup missing deadTime and incorrect import
Some checks failed
ci/woodpecker/pr/lint-backend Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline failed
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/pr/test Pipeline failed
2022-10-16 09:32:01 -04:00
21c1e5c06c backend: simplify suspended and dead queries
Some checks failed
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
ci/woodpecker/pr/lint-foundkey-js Pipeline was successful
ci/woodpecker/pr/lint-backend Pipeline was successful
ci/woodpecker/pr/build Pipeline was successful
ci/woodpecker/pr/lint-client Pipeline failed
ci/woodpecker/pr/test Pipeline failed
This should also have better latency due to being a single query.
Furthermore, it's no longer a linear scan, since host is indexed.
Would be cool to simplify it further to a single query for blocks also...
Why exactly are blocks not in the db?
2022-10-16 09:22:05 -04:00
91a4f38871 backend: add automatic dead instance detection
All checks were successful
ci/woodpecker/push/build Pipeline was successful
ci/woodpecker/push/lint-client Pipeline was successful
ci/woodpecker/push/lint-backend Pipeline was successful
ci/woodpecker/push/lint-foundkey-js Pipeline was successful
ci/woodpecker/push/test Pipeline was successful
It works by having a day-long cache of
"when did we last successfully communicate with this instance?"
Anything over a specified threshold (1 month) will act as though the instance
is suspended - all outgoing jobs are dropped on processing.
The day-long cache is in place because the ordering is necessarily a
linear scan.
Once an instance comes back online, we will detect that is the case as soon as
we receive an activity from them (which will update the "last communicated at")
field.

Potential future TODOs:
* Improve the caching system, it's actually pretty inefficient as it is.
  CacheBox with a call override?
* Think of ways to make it not-a-linear-scan, since the instances table can get
  pretty big. It's around 4500 on toast cafe.

ChangeLog: Added
2022-10-16 12:16:04 +00:00