Exiftool: Strip all non-essential metadata tags #745

Merged
floatingghost merged 5 commits from Oneric/akkoma:exiftool-strip-all into develop 2024-04-26 17:38:47 +00:00
Member

Documentation was already clear on this only stripping GPS tags. But there are more metadata tags which can potentially be sensitive (e.g. author and possibly description) and the filter’s name alone suggests a broader effect than it currently has.
Thus change the filter to strip all metadata except for colourspace info and orientation.

Additionally fix a doc typo and enable stripping of HEIC and JXL (confirmed to work with exiftool 12.70 and 12.57, released on Feb. 23, 2023)
Judging from release logs, write support for JPEG XL was added in 12.25 (2021-04-22) and for HEIC in 11.33 (2019-03-28).
As a point of reference, Debian stable (bookworm) has a new enough version for both, oldstable (bullseye) has exiftool 12.16 lacking JXL (however, release logs mentioned making JXL support "official" in 12.25, potentially it already worked to some degree earlier)

HEIC and JXL samples for testing can be found here:

If desired i can make “strip all” vs “strip only GPS” a config option though personally i currently don’t see why one would want to strip one but not the other

.

There’s some relation to description extraction from #744 / #241:

  • if we now strip everything renaming the filter to StripLocation no longer makes sense
  • Pleroma.Upload.Filter.Exiftool.ReadDescriptionmust be placed before metadata stripping to wrok correctly
    If the image description contained sensitive data, uploaders will now get a chance to review and edit it before publishing a post with it (assuming metadata stripping is active).
Documentation was already clear on this only stripping GPS tags. But there are more metadata tags which can potentially be sensitive (e.g. author and possibly description) and the filter’s name alone suggests a broader effect than it currently has. Thus change the filter to strip all metadata except for colourspace info and orientation. Additionally fix a doc typo and enable stripping of HEIC and JXL (confirmed to work with exiftool 12.70 and 12.57, released on Feb. 23, 2023) Judging from release logs, write support for JPEG XL was added in 12.25 (2021-04-22) and for HEIC in 11.33 (2019-03-28). As a point of reference, Debian stable (bookworm) has a new enough version for both, oldstable (bullseye) has exiftool 12.16 lacking JXL *(however, release logs mentioned making JXL support "official" in 12.25, potentially it already worked to some degree earlier)* HEIC and JXL samples for testing can be found here: - https://heic.digital/samples/ - https://jpegxl.info/art/ If desired i can make “strip all” vs “strip only GPS” a config option though personally i currently don’t see why one would want to strip one but not the other . There’s some relation to description extraction from #744 / #241: - if we now strip everything renaming the filter to `StripLocation` no longer makes sense - `Pleroma.Upload.Filter.Exiftool.ReadDescription`must be placed before metadata stripping to wrok correctly If the image description contained sensitive data, uploaders will now get a chance to review and edit it before publishing a post with it *(assuming metadata stripping is active)*.
Oneric added 3 commits 2024-04-15 22:16:49 +00:00
7a7ebecf00 exiftool: strip JXL and HEIC
As of exiftool 12.57 both formats are supported, but EXIF data is
optional for JXL and if exiftool doesn’t find a preexisting metadata
chunk it will create one and treat it as a minor error resulting in
a non-zero exit code.
Setting -ignoreMinorErrors avoids failing on such uploads.
ci/woodpecker/pr/build-amd64 Pipeline is pending Details
ci/woodpecker/pr/build-arm64 Pipeline is pending Details
ci/woodpecker/pr/docs Pipeline is pending Details
ci/woodpecker/pr/lint Pipeline is pending Details
ci/woodpecker/pr/test Pipeline is pending Details
5a2fd2908f
exiftool: strip all non-essential tags
Documentation was already clear on this only stripping GPS tags.
But there are more potentially sensitive metadata tags (e.g. author
and possibly description) and the name alone suggests a broader effect.

Thus change the filter to strip all metadata except for colourspace info
and orientation (technically it strips everything and then readds
selected tags).

Explicitly stripping CommonIFD0 is needed since -all does not modify
IFD0 due to TIFF storing some actual image data there. CommonIFD0 then
strips a bunch of commonly used actual metadata tags from IFD0, to my
understanding leaving TIFF image data and custom metadata tags intact.
Author
Member

TODO note for me, make this more configurable as suggested by timorl

Quoting my comment over at the “read description“ PR

As for metadata stripping being more configurable in general (but independent of other loaded filters) to allow preserving/deleting specific tag (groups) this is a good point. It would probably be easiest to expose a purge and preserve string list which will just get mapped directly to exiftool commandline parameters. The drawback being misspelling something here will result in all uploads failing

TODO note for me, make this more configurable as suggested by timorl Quoting my comment over at the “read description“ PR > As for metadata stripping being more configurable in general (but independent of other loaded filters) to allow preserving/deleting specific tag (groups) this is a good point. It would probably be easiest to expose a purge and preserve string list which will just get mapped directly to exiftool commandline parameters. The drawback being misspelling something here will result in all uploads failing
Oneric force-pushed exiftool-strip-all from 5a2fd2908f to f01ace481a 2024-04-18 22:36:23 +00:00 Compare
Author
Member

Exact metadata tags to be stripped are now configurable, plus more tests.
For new config to show up in admin-fe AkkomaGang/admin-fe#25 is required

cc @timorl: to save you the trouble of now also having to migrate the filter’s config option, adjust all their docs and the relevant admin-fe code, i pulled in the renaming bits into this PR. You can just drop everything related to renaming

Exact metadata tags to be stripped are now configurable, plus more tests. For new config to show up in admin-fe https://akkoma.dev/AkkomaGang/admin-fe/pulls/25 is required cc @timorl: to save you the trouble of now also having to migrate the filter’s config option, adjust all their docs and the relevant admin-fe code, i pulled in the renaming bits into this PR. You can just drop everything related to renaming
Contributor

Aaa, I should've checked here before writing the other comment! :D

Looks nice, what do you think about checking the config on startup as I suggested in #744? Is that doable and a good idea even?

Aaa, I should've checked here before writing the other comment! :D Looks nice, what do you think about checking the config on startup as I suggested in #744? Is that doable and a good idea even?
Oneric force-pushed exiftool-strip-all from f01ace481a to 09798f5121 2024-04-20 23:24:36 +00:00 Compare
Author
Member

Looks nice, what do you think about checking the config on startup as I suggested in #744? Is that doable and a good idea even?

In theory this might be nice, but we’d need to ship unremovable fixed-location sample files and e.g. Mogrify i think already suffers from the same problem and if we add too much it will slow down startup.

More problematic however is something i just recalled from 3.12.2’s base_url check: those deprecation checks happen before the database is loaded (since in theory deprecated options might also affect how to connect to the db?). And once the db is up the whole server is running and accepting requests.
Meaning, we can’t know which filters and which options are selected here if in-database config is used. Ofc this also means, this deprecation warning and gracefull fallback as well as all other deprecation checks don’t work for in-db config... but given the renames comes with a migration for in-db config this shouldn’t actually be an issue here (but it would be for the startup test).

> Looks nice, what do you think about checking the config on startup as I suggested in #744? Is that doable and a good idea even? In theory this might be nice, but we’d need to ship unremovable fixed-location sample files and e.g. `Mogrify` i think already suffers from the same problem and if we add too much it will slow down startup. More problematic however is something i just recalled from 3.12.2’s `base_url` check: those deprecation checks happen _before_ the database is loaded *(since in theory deprecated options might also affect how to connect to the db?)*. And once the db is up the whole server is running and accepting requests. Meaning, we can’t know which filters and which options are selected here if in-database config is used. Ofc this also means, this deprecation warning and gracefull fallback as well as all other deprecation checks don’t work for in-db config... but given the renames comes with a migration for in-db config this shouldn’t actually be an issue here (but it would be for the startup test).

bit of a merge conflict here after the other exiftool PR i think

bit of a merge conflict here after the other exiftool PR i think
Oneric force-pushed exiftool-strip-all from 09798f5121 to 3430aaec91 2024-04-25 23:11:43 +00:00 Compare
Author
Member

bit of a merge conflict here after the other exiftool PR i think

rebased and resolved conflicts dropping the duplicate rename
should be good again now

> bit of a merge conflict here after the other exiftool PR i think rebased and resolved conflicts dropping the duplicate rename should be good again now
Oneric added 1 commit 2024-04-26 00:13:38 +00:00
ci/woodpecker/pr/lint Pipeline was successful Details
ci/woodpecker/pr/test Pipeline failed Details
ci/woodpecker/pr/build-arm64 unknown status Details
ci/woodpecker/pr/build-amd64 unknown status Details
ci/woodpecker/pr/docs unknown status Details
fb7c172d5e
changelog: add note about StripMetadata and ReadDescription order
Author
Member

noticed this leaves the temporary stripped files around after running tests.
I’m not sure how this was cleaned up prevuiously, the officially sanctioned methods seem to be an on_exit callback or @(module)tag :tmp_dir (the latter since elixir 1.11)
Will update shortly

noticed this leaves the temporary stripped files around after running tests. I’m not sure how this was cleaned up prevuiously, the officially sanctioned methods seem to be an `on_exit` callback or `@(module)tag :tmp_dir` (the latter since elixir 1.11) Will update shortly
Oneric force-pushed exiftool-strip-all from fb7c172d5e to 5bc64c5753 2024-04-26 16:58:02 +00:00 Compare
Author
Member

Will update shortly

done (using tmp_dir)

> Will update shortly done (using `tmp_dir`)
floatingghost reviewed 2024-04-26 17:35:50 +00:00
floatingghost left a comment
Owner

yeppers looks cool, i'll wait for tests to pass :nod:

yeppers looks cool, i'll wait for tests to pass :nod:
@ -15,3 +15,3 @@
It is required for the following Akkoma features:
* `Pleroma.Upload.Filters.Mogrify`, `Pleroma.Upload.Filters.Mogrifun` upload filters (related config: `Plaroma.Upload/filters` in `config/config.exs`)
* `Pleroma.Upload.Filters.Mogrify`, `Pleroma.Upload.Filters.Mogrifun` upload filters (related config: `Pleroma.Upload/filters` in `config/config.exs`)

hehe plaroma

hehe plaroma
@ -32,0 +41,4 @@
assert String.match?(exif_filtered, ~r/Color Space/)
end
# this is a nonsensical configuration, but it shouldn't explode

good call, we all know the first thing someone will do is supply a nonsensical config :hehe:

good call, we all know the first thing someone will do is supply a nonsensical config :hehe:
floatingghost merged commit 7da6f41718 into develop 2024-04-26 17:38:47 +00:00
floatingghost deleted branch exiftool-strip-all 2024-04-26 17:38:47 +00:00
Sign in to join this conversation.
No description provided.