Fix HTML attribute parsing for escaped quotes #480

Merged
Oneric merged 2 commits from mkljczk/akkoma-fe:get-attrs-fix into develop 2026-02-19 12:32:01 +00:00
Contributor

I cherry-picked an upstream commit that's merely a workaround for the bug I encountered, but at least keeps akkoma-fe from crashing, and made another commit that slightly improves the regex for this specific case.

The post that breaks our akkomas: https://www.shrimple.pl/2026/02/links2-how-are-things/

I cherry-picked an upstream commit that's merely a workaround for the bug I encountered, but at least keeps akkoma-fe from crashing, and made another commit that slightly improves the regex for this specific case. The post that breaks our akkomas: https://www.shrimple.pl/2026/02/links2-how-are-things/
Signed-off-by: nicole mikołajczyk <git@mkljczk.pl>
Fix HTML attribute parsing for escaped quotes
Some checks failed
ci/woodpecker/pull_request_metadata/woodpecker Pipeline is pending approval
ci/woodpecker/pr/woodpecker Pipeline failed
ci/woodpecker/pull_request_closed/woodpecker Pipeline is pending approval
4ab3424508
Signed-off-by: nicole mikołajczyk <git@mkljczk.pl>
Oneric approved these changes 2026-02-19 12:29:55 +00:00
Oneric left a comment
Owner

Curiously in the content from the API response the offending href attributes have the embedded quote character HTML escaped like href="\&quot;$l[1]\&quot;" which should just work as is. But apparently something converts this to a regular " character before it reaches this code.

It feels like there ought to be a better way to actually properly parse HTML in a web-aligned language like JS/ECMA, but at first glance i only see either an unsafe injection vulnerability footgun or an experimental API with poor browser support :\

Notably however the regex part matcing the key, isn’t correct either. Many more characters are valid in key names: https://html.spec.whatwg.org/multipage/syntax.html#attributes-2

Since this appears to be only used to extract mentions and the likes, I guess this is good enough for now™ to avoid locking up. Thanks!

Curiously in the `content` from the API response the offending `href` attributes have the embedded quote character HTML escaped like `href="\&quot;$l[1]\&quot;"` which _should_ just work as is. But apparently something converts this to a regular `"` character before it reaches this code. It feels like there ought to be a better way to actually _properly parse_ HTML in a web-aligned language like JS/ECMA, but at first glance i only see either an unsafe injection vulnerability footgun or an experimental API with poor browser support :\ Notably however the regex part matcing the key, isn’t correct either. Many more characters are valid in key names: https://html.spec.whatwg.org/multipage/syntax.html#attributes-2 Since this appears to be only used to extract mentions and the likes, I guess this is good enough for now™ to avoid locking up. Thanks!
Oneric merged commit a123b41a2f into develop 2026-02-19 12:32:01 +00:00
Sign in to join this conversation.
No description provided.