[bug] See if we can use the content for posts, also when it's MFM. #381

Open
opened 2022-12-19 13:49:11 +00:00 by ilja · 4 comments
Contributor

Your setup

From source

Extra details

No response

Version

No response

PostgreSQL version

No response

What were you trying to do?

Generally when we send posts between servers, the content field uses html. As I understand it, with MFM we don't use the content because the content strips away the MFM.

If there's a difference in how Misskey (or w/e server sends the MFM) and Akkoma interprets MFM, we get a different view than what's expected. This happens for example with Misskey interpreting a newline as a line break, while Markdown doesn't.

It's also not consistent I believe. I assume we only do this for MFM.

Example post: curl -Lv 'https://mk.toast.cafe/notes/98ycd9vc6b' -H 'Accept: application/activity+json' | jq .

What did you expect to happen?

I expect the html version from the content to be used. If the current way of stripping the html strips too much, then we should see if we can address it there. I'm unsure atm how easy, or even possible, that is though.

What actually happened?

We ignore the content and use the source content in some cases (at least MFM, not sure about other).

Logs

No response

Severity

I can manage

Have you searched for this issue?

  • I have double-checked and have not found this issue mentioned anywhere.
### Your setup From source ### Extra details _No response_ ### Version _No response_ ### PostgreSQL version _No response_ ### What were you trying to do? Generally when we send posts between servers, the content field uses html. As I understand it, with MFM we don't use the content because the content strips away the MFM. If there's a difference in how Misskey (or w/e server sends the MFM) and Akkoma interprets MFM, we get a different view than what's expected. This happens for example with Misskey interpreting a newline as a line break, while Markdown doesn't. It's also not consistent I believe. I assume we only do this for MFM. Example post: `curl -Lv 'https://mk.toast.cafe/notes/98ycd9vc6b' -H 'Accept: application/activity+json' | jq .` ### What did you expect to happen? I expect the html version from the content to be used. If the current way of stripping the html strips too much, then we should see if we can address it there. I'm unsure atm how easy, or even possible, that is though. ### What actually happened? We ignore the content and use the source content in some cases (at least MFM, not sure about other). ### Logs _No response_ ### Severity I can manage ### Have you searched for this issue? - [x] I have double-checked and have not found this issue mentioned anywhere.
ilja added the
bug
label 2022-12-19 13:49:11 +00:00

Ref AkkomaGang/pleroma-fe#155 for context

Ref https://akkoma.dev/AkkomaGang/pleroma-fe/issues/155 for context
Author
Contributor

Another reason to keep content, katex, https://snug.moe/notes/9azp4eandb 🥺

cheat sheet at /mfm-cheat-sheet e.g. https://snug.moe/mfm-cheat-sheet

Another reason to keep `content`, katex, <https://snug.moe/notes/9azp4eandb> 🥺 cheat sheet at `/mfm-cheat-sheet` e.g. <https://snug.moe/mfm-cheat-sheet>
Author
Contributor

I now understand why we do what we do. The problem is that the content doesn't contain the needed html. I made an issue for Foundkey now https://akkoma.dev/FoundKeyGang/FoundKey/issues/343.

The way I currently see it

  1. If it gets fixed in Foundkey (wont happen soon, and may provide additional problems with html sanitizing, so unsure atm how feasible it would be)
    1. Remove the re-parsing from the fix_misskey_content function in the Pleroma.Web.ActivityPub.ObjectValidators.ArticleNotePageValidator
    2. Make sure MFM still properly works (I.e. we don't strip too much html)
    3. Also check the FE that it shows properly
    4. Maybe write an MRF module in the case of other *key software who still has this issue. (Maybe we can somehow check if it's needed to re-parse or not, no need to re parse for Akkoma or Foundkey in this case.)
  2. If it's not something that can be fixed in Foundkey
    1. See if we can at least fix the newline issue. Maybe the markdown parser has an option which we can pass when doing MFM. Otherwise we can change the MFMParser to translate \n to <br> and see if it can work that way. See #478
I now understand why we do what we do. The problem is that the content doesn't contain the needed html. I made an issue for Foundkey now <https://akkoma.dev/FoundKeyGang/FoundKey/issues/343>. The way I currently see it 1. If it gets fixed in Foundkey (wont happen soon, and may provide additional problems with html sanitizing, so unsure atm how feasible it would be) 1. Remove the re-parsing from the `fix_misskey_content` function in the `Pleroma.Web.ActivityPub.ObjectValidators.ArticleNotePageValidator` 2. Make sure MFM still properly works (I.e. we don't strip too much html) 3. Also check the FE that it shows properly 4. Maybe write an MRF module in the case of other \*key software who still has this issue. (Maybe we can somehow check if it's needed to re-parse or not, no need to re parse for Akkoma or Foundkey in this case.) 2. If it's not something that can be fixed in Foundkey 1. ~~See if we can at least fix the newline issue. Maybe the markdown parser has an option which we can pass when doing MFM. Otherwise we can change the MFMParser to translate `\n` to `<br>` and see if it can work that way.~~ See https://akkoma.dev/AkkomaGang/akkoma/pulls/478
Author
Contributor

I got some new awesome revelation on this. If we can figure out a proper way to represent using less complex html (see FoundKeyGang/FoundKey#343 ), then we can do the following and we don't have to wait on other software!

  1. Change it so that the front-end gets proper "less complex html" representation that aren't being scrubbed away.
    • An example given in the Foundkey issue is turning $[flip.h,v example] into <span class="mfm-flip" data-mfm-h data-mfm-v>example</span> (now it's <span style="display: inline-block; transform: scale(-1);">example</span>, which is way more complex to get through the scrubber).
    • Obviously also make it so that the front-end properly handles this.
  2. Make that the "less complex html" representation is also the content who's being federated to other instances.
    • Instances who do not understand MFM will now receive proper HTML. If they wish to show it, they can without having to implement a special parser for it.
    • We still send source with mediaType text/x.misskeymarkdown so software who currently uses MFM can still re-parse and render this the way they always have.
  3. Set some random flag no one knows about. Idk something like mfm_html: true (other naming is also good, it's just an example)
    • Software who uses these html-tags can now check for this flag. If it's set to true, we don't re-parse the MFM. If it's not set to true, we reparse MFM if the source is text/x.misskeymarkdown.
    • If others pick up this "less complex html" representation, this mfm_html: true flag may eventually become obsolete.

That way

  • It should be easier for us to get MFM through the scrubber, making it work more properly again.
  • Everything will keep working as before for remote servers who use MFM.
  • It will be similar or an improvement to remote servers who currently don't use MFM.
  • If others servers using MFM also start using this "less complex html" representation, we won't have issues with different parser implementations any more.

Everybody wins 🎉

I got some new awesome revelation on this. If we can figure out a proper way to represent using less complex html (see https://akkoma.dev/FoundKeyGang/FoundKey/issues/343 ), then we can do the following and we don't have to wait on other software! 1. Change it so that the front-end gets proper "less complex html" representation that aren't being scrubbed away. * An example given in the Foundkey issue is turning `$[flip.h,v example]` into `<span class="mfm-flip" data-mfm-h data-mfm-v>example</span>` (now it's `<span style="display: inline-block; transform: scale(-1);">example</span>`, which is way more complex to get through the scrubber). * Obviously also make it so that the front-end properly handles this. 2. Make that the "less complex html" representation is also the content who's being federated to other instances. * Instances who do not understand MFM will now receive proper HTML. If they wish to show it, they can without having to implement a special parser for it. * We still send `source` with mediaType `text/x.misskeymarkdown` so software who currently uses MFM can still re-parse and render this the way they always have. 3. Set some random flag no one knows about. Idk something like `mfm_html: true` (other naming is also good, it's just an example) * Software who uses these html-tags can now check for this flag. If it's set to true, we don't re-parse the MFM. If it's not set to true, we reparse MFM if the source is `text/x.misskeymarkdown`. * If others pick up this "less complex html" representation, this `mfm_html: true` flag may eventually become obsolete. That way * It should be easier for us to get MFM through the scrubber, making it work more properly again. * Everything will keep working as before for remote servers who use MFM. * It will be similar or an improvement to remote servers who currently don't use MFM. * If others servers using MFM also start using this "less complex html" representation, we won't have issues with different parser implementations any more. Everybody wins 🎉
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: AkkomaGang/akkoma#381
No description provided.