Federate MFM in content field using HTML #343

Open
opened 2023-02-18 15:39:51 +00:00 by ilja · 5 comments

See Akkoma issue AkkomaGang/akkoma#381

Generally fedi servers use html for content which can then be directly used. For example if you input using Markdown *something*, then it will be federated by using <i>something</i> in the content field. The advantage is that implementations just need to understand html, not every input method some server comes up with.

In case of MFM this isn't always true, however. This causes compatibility problems for other software who now have to fully implement a new parser and re-parse the incoming source if it's MFM.

Example (from https://ilja.space/notice/ASo2tpidQ5yVRUYxG4):
When you post

\(x= \frac{-b' \pm \sqrt{(b')^2-ac}}{a}\)
$[flip.h,v FoundKey expands the world of the Fediverse]

It now federates as

<code>x= \\frac{-b' \\pm \\sqrt{(b')^2-ac}}{a}</code><span><br></span><i><span>FoundKey expands the world of the Fediverse</span></i>

I expect it to be something more like how the html looks when I check the post through the Foundkey FE

show example
<span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>=</mo><mfrac><mrow><mo></mo><msup><mi>b</mi><mo lspace="0em" rspace="0em" mathvariant="normal"></mo></msup><mo>±</mo><msqrt><mrow><mo stretchy="false">(</mo><msup><mi>b</mi><mo lspace="0em" rspace="0em" mathvariant="normal"></mo></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo></mo><mi>a</mi><mi>c</mi></mrow></msqrt></mrow><mi>a</mi></mfrac></mrow><annotation encoding="application/x-tex">x= \frac{-b' \pm \sqrt{(b')^2-ac}}{a}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.6746em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3296em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.6038em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"></span><span class="mord mtight"><span class="mord mathnormal mtight">b</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8278em;"><span style="top:-2.931em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mbin mtight">±</span><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0369em;"><span class="svg-align" style="top:-3.4286em;"><span class="pstrut" style="height:3.4286em;"></span><span class="mord mtight" style="padding-left:1.19em;"><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathnormal mtight">b</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828em;"><span style="top:-2.786em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"></span></span></span></span></span></span></span></span></span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463em;"><span style="top:-2.786em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mbin mtight"></span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight">c</span></span></span><span style="top:-3.0089em;"><span class="pstrut" style="height:3.4286em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.5429em;"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.5429em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702
c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14
c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54
c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10
s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429
c69,-144,104.5,-217.7,106.5,-221
l0 -0
c5.3,-9.3,12,-14,20,-14
H400000v40H845.2724
s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7
c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z
M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.4197em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span><br><span style="display: inline-block; transform: scale(-1);">FoundKey expands the world of the Fediverse</span>
See Akkoma issue https://akkoma.dev/AkkomaGang/akkoma/issues/381 Generally fedi servers use html for content which can then be directly used. For example if you input using Markdown `*something*`, then it will be federated by using `<i>something</i>` in the content field. The advantage is that implementations just need to understand html, not every input method some server comes up with. In case of MFM this isn't always true, however. This causes compatibility problems for other software who now have to fully implement a new parser and re-parse the incoming `source` if it's MFM. Example (from <https://ilja.space/notice/ASo2tpidQ5yVRUYxG4>): When you post ``` \(x= \frac{-b' \pm \sqrt{(b')^2-ac}}{a}\) $[flip.h,v FoundKey expands the world of the Fediverse] ``` It now federates as ```html <code>x= \\frac{-b' \\pm \\sqrt{(b')^2-ac}}{a}</code><span><br></span><i><span>FoundKey expands the world of the Fediverse</span></i> ``` I expect it to be something more like how the html looks when I check the post through the Foundkey FE <details><summary>show example</summary> ```html <span><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi><mo>=</mo><mfrac><mrow><mo>−</mo><msup><mi>b</mi><mo lspace="0em" rspace="0em" mathvariant="normal">′</mo></msup><mo>±</mo><msqrt><mrow><mo stretchy="false">(</mo><msup><mi>b</mi><mo lspace="0em" rspace="0em" mathvariant="normal">′</mo></msup><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo>−</mo><mi>a</mi><mi>c</mi></mrow></msqrt></mrow><mi>a</mi></mfrac></mrow><annotation encoding="application/x-tex">x= \frac{-b' \pm \sqrt{(b')^2-ac}}{a}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em;"></span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em;"></span></span><span class="base"><span class="strut" style="height:1.6746em;vertical-align:-0.345em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3296em;"><span style="top:-2.655em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">a</span></span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.6038em;"><span class="pstrut" style="height:3em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">−</span><span class="mord mtight"><span class="mord mathnormal mtight">b</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8278em;"><span style="top:-2.931em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mbin mtight">±</span><span class="mord sqrt mtight"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0369em;"><span class="svg-align" style="top:-3.4286em;"><span class="pstrut" style="height:3.4286em;"></span><span class="mord mtight" style="padding-left:1.19em;"><span class="mopen mtight">(</span><span class="mord mtight"><span class="mord mathnormal mtight">b</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6828em;"><span style="top:-2.786em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span><span class="mclose mtight"><span class="mclose mtight">)</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7463em;"><span style="top:-2.786em;margin-right:0.0714em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span><span class="mbin mtight">−</span><span class="mord mathnormal mtight">a</span><span class="mord mathnormal mtight">c</span></span></span><span style="top:-3.0089em;"><span class="pstrut" style="height:3.4286em;"></span><span class="hide-tail mtight" style="min-width:0.853em;height:1.5429em;"><svg xmlns="http://www.w3.org/2000/svg" width="400em" height="1.5429em" viewBox="0 0 400000 1080" preserveAspectRatio="xMinYMin slice"><path d="M95,702 c-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14 c0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54 c44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10 s173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429 c69,-144,104.5,-217.7,106.5,-221 l0 -0 c5.3,-9.3,12,-14,20,-14 H400000v40H845.2724 s-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7 c-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z M834 80h400000v40h-400000z"></path></svg></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.4197em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.345em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span><br><span style="display: inline-block; transform: scale(-1);">FoundKey expands the world of the Fediverse</span> ``` </details>
Owner

general case

As you have probably noticed, in general we do federate HTML markup for "normal" things.

line breaks

From looking at https://ilja.space/notice/ASo4tEHCpV7nE1ZaEK I see that you are ignoring the line break markup though, in the snippet you gave there is <span><br></span> to mark the newline which is apparently ignored. I'm not sure why the line break is inside of a span, if that is causing issues I could remove that.

reparsing MFM

If you are re-parsing MFM, note that every line break in the input will result in a line break in the output.

Side note because I also learned this while embarking on the MFM parser journey in #338: A blockquote ends at a linebreak as well. However a single empty line between blockquotes is treated as if it didn't exist:

> a

> b
c

means

<blockquote>asdf<br>asdf</blockquote><br>c

KaTeX

The reason we do not federate as in your example is simple: We wouldn't understand it if we received it and thus don't expect others to understand it. As far as I am aware there is no KaTeX in Pleroma (and I think Misskey even removed it?).

Misskey and derivatives do not store HTML. Incoming notes will be converted from HTML to MFM. As you can probably guess from the monstrosity of the generated HTML you cannot transform that back into KaTeX, much less MFM. And HTML tags are not supported in MFM (apart from some exceptions). Thus if you were to federate the last example to Foundkey (and there was no MFM), it would throw out the KaTeX part completely.

(In theory I would like to fix this and store HTML at some point. But probably in the very distant future because it would be a HUGE refactor.)

MFM functions

Regarding $[flip.h,v example] and its transformation to <span style="display: inline-block; transform: scale(-1);">example</span>: I would be surprised if any HTML sanitizer of fediverse software would allow the style attribute to pass through.

I am currently working on using an "actual" Markdown parser. Taking inspiration from sfr's MFM extension for marked.js it would render this example as:

<span class="mfm-flip" data-mfm-h data-mfm-v>example</span>

My idea is that this output would eventually also be federated like that. However I'm only beginning to experiment with that for Foundkey Pages, see also #338.

### general case As you have probably noticed, in general we do federate HTML markup for "normal" things. ### line breaks From looking at https://ilja.space/notice/ASo4tEHCpV7nE1ZaEK I see that you are ignoring the line break markup though, in the snippet you gave there is `<span><br></span>` to mark the newline which is apparently ignored. I'm not sure why the line break is inside of a `span`, if that is causing issues I could remove that. #### reparsing MFM If you are re-parsing MFM, note that every line break in the input will result in a line break in the output. Side note because I also learned this while embarking on the MFM parser journey in #338: A blockquote ends at a linebreak as well. However a single empty line between blockquotes is treated as if it didn't exist: ``` > a > b c ``` means ```html <blockquote>asdf<br>asdf</blockquote><br>c ``` ### KaTeX The reason we do not federate as in your example is simple: We wouldn't understand it if we received it and thus don't expect others to understand it. As far as I am aware there is no KaTeX in Pleroma (and I think Misskey even *removed* it?). Misskey and derivatives ***do not store HTML***. Incoming notes will be converted from HTML to MFM. As you can probably guess from the monstrosity of the generated HTML you cannot transform that back into KaTeX, much less MFM. And HTML tags are not supported in MFM (apart from some exceptions). Thus if you were to federate the last example to Foundkey (and there was no MFM), it would throw out the KaTeX part completely. (In theory I would like to fix this and store HTML *at some point*. But probably in the very distant future because it would be a HUGE refactor.) ### MFM functions Regarding `$[flip.h,v example]` and its transformation to `<span style="display: inline-block; transform: scale(-1);">example</span>`: I would be surprised if any HTML sanitizer of fediverse software would allow the `style` attribute to pass through. I am currently working on using an "actual" Markdown parser. Taking inspiration from sfr's MFM extension for marked.js it would render this example as: ```html <span class="mfm-flip" data-mfm-h data-mfm-v>example</span> ``` My idea is that this output would eventually also be federated like that. However I'm only beginning to experiment with that for Foundkey Pages, see also https://akkoma.dev/FoundKeyGang/FoundKey/pulls/338.
Author

you are ignoring the line break markup though, in the snippet you gave there is <span><br></span>

Indeed, we do ignore this! Akkoma will check if the source is of mediaType text/x.misskeymarkdown. If it is, then we can't use the content (hence this issue), so we reprocess the MFM source. So we don't just ignore the <span><br></span>, but actually the entire content!
The reason why we don't have a newline, is because we interpret this as markdown and markdown doesn't consider \n to be a newline. For a newline in markdown you either need \n\n (which is more like a paragraph because it also adds a blank line), or \n with two trailing spaces.
I'm sure we can fix this on Akkoma's side for MFM, though. So I'll do that (Done).

You don't really need to store the HTML I think, you just process it on sending out (which already happens), but you do make a good point about the HTML sanitizer. I never actually worked with that, so I'm unsure how fine grained you can tell it to go 🤔
Using custom classes/attributes instead of what we have now, indeed sounds like a good solution to me. In practice remote instance still need to add some support, but then it's mostly just some css and not a whole parser.

I understand this isn't something that will happen very soon and I understand why, so thk you for a quick response 🤗❤️

> you are ignoring the line break markup though, in the snippet you gave there is `<span><br></span>` Indeed, we do ignore this! Akkoma will check if the `source` is of mediaType `text/x.misskeymarkdown`. If it is, then we can't use the `content` (hence this issue), so we reprocess the MFM source. So we don't just ignore the `<span><br></span>`, but actually the entire `content`! The reason why we don't have a newline, is because we interpret this as markdown and markdown doesn't consider `\n` to be a newline. For a newline in markdown you either need `\n\n` (which is more like a paragraph because it also adds a blank line), or `\n` with two trailing spaces. I'm sure we can fix this on Akkoma's side for MFM, though. ~~So I'll do that~~ (Done). You don't really need to store the HTML I think, you just process it on sending out (which already happens), but you do make a good point about the HTML sanitizer. I never actually worked with that, so I'm unsure how fine grained you can tell it to go 🤔 Using custom classes/attributes instead of what we have now, indeed sounds like a good solution to me. In practice remote instance still need to add some support, but then it's mostly just some css and not a whole parser. I understand this isn't something that will happen very soon and I understand why, so thk you for a quick response 🤗❤️
Owner

Using classes instead of attributes indeed seems like a good solution to me. In practice remote instance still need to add some support, but then it's mostly just some css and not a whole parser.

Well the nice thing is, since we would have to do the same we have to write such a style sheet anyway, which you could of course use. But still there would be some additional support required because some data-mfm-... attributes would have to be made accessible to CSS. (Unless CSS attr() is suddenly implemented by all major browsers.) But that would be minor compared with having to reparse MFM.

As I said the eventual goal is to federate proper markup for everything but as you noted it's probably going to take a long time.

> Using classes instead of attributes indeed seems like a good solution to me. In practice remote instance still need to add some support, but then it's mostly just some css and not a whole parser. Well the nice thing is, since we would have to do the same we have to write such a style sheet anyway, which you could of course use. But still there would be some additional support required because some `data-mfm-...` attributes would have to be made accessible to CSS. (Unless [CSS `attr()`](https://caniuse.com/css3-attr) is suddenly implemented by all major browsers.) But that would be minor compared with having to reparse MFM. As I said the eventual goal is to federate proper markup for everything but as you noted it's probably going to take a long time.
Johann150 added this to the replace MFM with HTML milestone 2023-02-18 19:18:55 +00:00
Author

Maybe relevant for the katex part:

In Akkoma chat someone mentioned this https://codeberg.org/fediverse/fep/src/branch/main/fep/dc88/fep-dc88.md

Apparently there's something called MathML, which is basically some extra elements in HTML. And it seems browsers already support it. I don't know if it can go as complex as what MFM currently allows, but if so, maybe one day the Katex parts could be transformed to MathML.

Maybe relevant for the katex part: In Akkoma chat someone mentioned this https://codeberg.org/fediverse/fep/src/branch/main/fep/dc88/fep-dc88.md Apparently there's something called MathML, which is basically some extra elements in HTML. And it seems browsers already support it. I don't know if it can go as complex as what MFM currently allows, but if so, maybe one day the Katex parts could be transformed to MathML.
Owner

KaTeX already renders to MathML, so adding that for federating HTML was not that hard.

I did not see a fallback level in the FEP though, so I guess formulas will be completely scrubbed by some receiving implementations now. I don't have an idea for how a fallback could be implemented either though. 🤷

KaTeX already renders to MathML, so [adding that for federating HTML](https://akkoma.dev/FoundKeyGang/FoundKey/commit/f6c3d442655931533d5ec52e6515275a3e10a2b2) was not that hard. I did not see a fallback level in the FEP though, so I guess formulas will be completely scrubbed by some receiving implementations now. I don't have an idea for how a fallback could be implemented either though. 🤷
Sign in to join this conversation.
No Label
feature
fix
upkeep
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: FoundKeyGang/FoundKey#343
No description provided.