Make Remote Cache purge itself free from old media #71
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Remote Cache is somewhat broken (as we all know), in that it doesn't purge old cached data leading to an every growing cache directory which you'd have to clean manually.
With that, enabling remote cache doesn't make a lot of sense.
So i suggest making a way to purge old remote media automatically, or at least have a cli task that does it, so we could run a cron to handle it.
or remove it entierly (altho, i would like to have a working remote cache. makes loading media way smoother)
Question would be what criteria to use for expiration and what to do if someone requests expired media. For example we could do something like keeping media for a number of days (perhaps configurable) like maybe 30 days or so. After that they are removed from disk and if someone requests them they are handled as if remote cache was disabled for them.
Implementation wise, this expiring could be possible by checking
drive_file.createdAt
. Disabling remote cache for those should be possible withUPDATE drive_file SET storedInternal = false, isLink = true
when they are removed from disk. Not sure if this correctly handles cases when the file is stored in S3.i think, last time it was fetched from remote, or last time it was accessed. or it could also use a threadhold in datastorage (might be harder to implement). maybe we can peek at pleroma what it uses as criteria.
pleroma does refetch it and caches it again afaik (might be wrong on that tho)
i don't think this is a good way of handling it. i think in that case, it should be cached again, but maybe for a shorter time? if remote cache is enabled, a client should not have to fetch from another place ever.
was trying to avoid having to keep track of that. Having to store last access time could probably be annoying. Not sure if we could use the file system metadata reliably for this. Further, I'm not sure if we could even do it because files might be cached, in which case we would have no way to know when the last access time was.
Mmh, I think the file is only fetched once? Although I guess with Misskey drive it could be used on multiple posts.
again, was trying to avoid having to keep track of additional data, but it might be unavoidable.
I never said they should fetch it from the remote. We could use media proxy. In that case we might also be able to leverage HTTP caching and/or caching of the web server to avoid having to handle caching ourselves.
Yeah if we use the filesystem, it may just not work if it's mounted with
noatime
which means no access time is recorded. Probably the best option is using the time it was fetched from the remote.Does Misskey perform a duplicate check of some sort when fetching files?
I believe Pleroma does use Nginx's server-side caching for their implementation (could be wrong), so it's possible we could do the same if that's the case.
We currently only have the
createdAt
time in the database.Yes, it uses an MD5 hash of the file. If there is another file with the same MD5 hash (and the
force
option is not set) the temporary file that was just downloaded will be discarded.0965d3cbd9/packages/backend/src/services/drive/add-file.ts (L356-L367)