418068793f
This is basically a rewrite of big parts of the parser, introducing a lot of breaking changes. The parser was originally written mostly as an exercise for myself and not really aimed as-is for practical usage. An adapted version has been used in Akkoma, however, and this pointed out serious flaws in how MFM was done in general on the fediverse. This was discussed [on the Foundkey issue tracker](FoundKeyGang/FoundKey#343) and a better way was decided. At the time of writing, this is being formalised into [FEP-c16b](https://codeberg.org/fediverse/fep/src/branch/main/fep/c16b/fep-c16b.md). This commit rewrites this parser to be FEP-c16b compliant. Previously, the parser had knowledge of the specific MFM functions. This was useful for setting default attribute values and adding specific CSS. This is not the case any more. The parser has no knowledge of specific MFM functions any more. It also had an understanding of the concept of newlines, this isn't the case any more either. It only does a "simple" translation from MFM function notation to FEP-c16b compliant HTML. Because of this, we also don't add CSS any more. It's up to the software who uses this HTML to decide what functions they want to provide and use the correct CSS. In practice the CSS from this parser was never used in Akkoma, so it's not really a loss.
101 lines
4 KiB
Markdown
101 lines
4 KiB
Markdown
# MfmParser
|
|
|
|
A simple [FEP-c16b](https://codeberg.org/fediverse/fep/src/branch/main/fep/c16b/fep-c16b.md) compliant parser for Misskey's [Markup language For Misskey](https://misskey-hub.net/en/docs/for-users/features/mfm/) MFM functions.
|
|
|
|
It only parses the MFM specific syntax of the form `$[name.attributes content]`.
|
|
That means that it doesn't parse e.g. links, usernames, HTML, Markdown or Katex.
|
|
|
|
The Parser returns a tree. For example, `it's not chocolatine, it's $[spin.alternate,speed=0.5s pain au chocolat]` will look like
|
|
|
|
[
|
|
%MfmParser.Node.Text{
|
|
content: "it's not chocolatine, it's "
|
|
},
|
|
%MfmParser.Node.MFM{
|
|
name: "twitch",
|
|
attributes: [
|
|
[{"alternate"}, {"speed", "0.5s"}]
|
|
],
|
|
content: [
|
|
%MfmParser.Node.Text{
|
|
content: "pain au chocolat"
|
|
}
|
|
]
|
|
}
|
|
]
|
|
|
|
You can also convert the tree into FEP-c16b compatible HTML.
|
|
|
|
it's not chocolatine, it's <span class=\"mfm-spin\" data-mfm-alternate data-mfm-speed=\"0.5s\">pain au chocolat</span>
|
|
|
|
## Examples
|
|
|
|
Here we turn our input into a tree
|
|
|
|
iex> "$[twitch.speed=0.5s 🍮]" |> MfmParser.Parser.parse()
|
|
[
|
|
%MfmParser.Node.MFM{
|
|
name: "twitch",
|
|
attributes: [{"speed", "0.5s"}],
|
|
content: [%MfmParser.Node.Text{content: "pain au chocolat"}]
|
|
}
|
|
]
|
|
|
|
Here we pipe the MFM notation through the encoder and then the parser, turning the MFM into FEP-c16b compatible HTML.
|
|
|
|
iex> "$[twitch.speed=0.5s 🍮]" |> MfmParser.Parser.parse() |> MfmParser.Encoder.to_html()
|
|
"<span class="mfm-twitch" data-mfm-speed="0.5s">🍮</span>"
|
|
|
|
Or we can use `MfmParser.Encoder.to_html/1` directly without having to call the parser ourselves.
|
|
|
|
iex> "$[twitch.speed=0.5s 🍮]" |> MfmParser.Encoder.to_html()
|
|
"<span class="mfm-twitch" data-mfm-speed="0.5s">🍮</span>"
|
|
|
|
## Reading
|
|
### The Parser
|
|
|
|
A [parser](https://en.wikipedia.org/wiki/Parsing#Parser) takes in structured text and outputs a so called "tree". A tree is a data structure which can be more easily worked with.
|
|
|
|
A parser typically consists of three parts
|
|
* a Reader
|
|
* a Lexer (aka Tokeniser)
|
|
* the Parser
|
|
|
|
A Reader typically has a `next` function which takes the next character out of the input and returns it.
|
|
A `peek` function allows it to peek at the next character without changing the input.
|
|
There's also some way of detecting if the eof (End Of File) is reached.
|
|
Depending on the needs of the parser, it may be implemented to allow asking for the nth character instead of just the next.
|
|
|
|
A Lexer uses the Reader. It also has a `peek` and `next` function, but instead of returning the next (or nth) character, it returns the next (or nth) token.
|
|
E.g. if you have the MFM `$[spin some text]`, then `$[spin`, `some text`, and `]` can be considered three different tokens.
|
|
|
|
The parser takes in the tokens and forms the tree. This is typically a data structure the programming language understands and can more easily work with.
|
|
|
|
### The Encoder
|
|
|
|
Once we have a good data structure, we can process this and do things with it.
|
|
E.g. an Encoder encodes the tree into a different format.
|
|
|
|
### The code
|
|
|
|
The code can be found in the *lib* folder. It contains, among other things, the Reader, Lexer, Parser, and Encoder modules.
|
|
|
|
The *test* folder contains the tests.
|
|
|
|
## License
|
|
|
|
A parser/encoder for Misskey Flavoured Markdown.
|
|
Copyright (C) 2024 ilja.space
|
|
|
|
This program is free software: you can redistribute it and/or modify
|
|
it under the terms of the GNU Affero General Public License as
|
|
published by the Free Software Foundation, either version 3 of the
|
|
License, or (at your option) any later version.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU Affero General Public License for more details.
|
|
|
|
You should have received a copy of the GNU Affero General Public License
|
|
along with this program. If not, see <https://www.gnu.org/licenses/>.
|