[feat] Allow configuration of title-less items in RSS feeds #391

New issue

Open

opened 2022-12-22 02:01:48 +00:00 by pzingg · 0 comments

pzingg commented

2022-12-22 02:01:48 +00:00

The idea

Add a configuration key for a boolean value :parse_source with a default value of false under config :pleroma, :feed, :post_title. An example description for the key is shown below.
Modify the function Pleroma.Formatter.truncate/3 to accept zero as the value for Pleroma.Config.get([:feed, :post_title, :max_length]). Currently, this function will raise an exception if max_length is less than String.length(omission). If max_length is less than the length of the omission string, just use String.slice(text, 0, max_length) (without adding the omission) as the truncated text.
Modify the logic in Pleroma.Web.Feed.FeedView.activity_title/2 to test if and only if the opts parameter has %{parse_source: true} and the activity has non-nil values at activity.data["source"]["mediaType"] and activity.data["source"]["content"]. If so, do not use the HTML content for the activity to produce the title. Instead, parse the title from the "source" "content", according to rules for each content type, ignoring the :max_length configuration value (see below for the rules).

Notes

If the :parse_source option is set, and the rules set forth below fail to parse out an authored title, the :max_length option will be used as before to render a plaintext title sub-element and a description sub-element that contains HTML encoded from the original full activity.data["source"]["content"].

If the :parse_source option is set, no title is parsed, and :max_length is zero, the feed item/entry will be rendered without a title sub-element.

If the :parse_source option is set, and the rules below parse out an authored title, the description sub-element of the feed item/entry will contain HTML encoded from a slice of the activity.data["source"]["content"] beginning at an offset just beyond the end of the parsed title. In this way, the description sub-element will not repeat the title that was parsed out.

In all cases, including under the current behavior, the title sub-element's value should be a single line of plain text, trimmed of leading and trailing whitespace, without links or emojis. If no title is produced, the feed item/entry will be rendered without a title sub-element.

Example :parse_source in description.exs

          %{
            key: :parse_source,
            type: :boolean,
            description: "Use content type-specific parsers to extract title (ignores max_length)",
            suggestions: [true]
          }

Parsing title from "text/html" content

The title is the content of an initial h1 or h2 element in the content, or nil if this is not satisfied.

Parsing title from "text/plain" content

The title is a leading single line of text, separated from the description by two newlines or nil if is not satisfied. This is the same logic that separates the headers of a plaintext email from its body.

Parsing title from "text/bbcode" content

The title is the content of a leading [b] element, separated from the description by two newlines, or nil if is not satisfied.

Parsing title from "text/markdown" content

The title is the content of a leading # or ## (h1 or h2) element, separated from the description by two newlines, or nil if is not satisfied.

Parsing title from "text/x.misskeymarkdown" content

The title is the content of a leading ** (bold) element, separated from the description by two newlines, or nil if is not satisfied. (Not sure if the Pleroma implementation of x.misskeymarkdown supports a title element, which could be used in place of **).

The reasoning

Admins configuring a server should be able to choose between:

Truncating titles for all items in RSS feeds from the HTML-encoded description (i.e. from activity.data["content"]). This is the current behavior (:max_length greater than zero and :parse_source false).
Making all items in RSS feeds title-less. (:max_length equal to zero and :parse_source false). Rationales for this are explored in Why Mastodon should have title-less feeds and Common features that a "document" should support.
Letting post authors specify a title for individual posts by following rules according to the content type listed above, and parsing (not truncating) titles from activity.data["source"]["content"] (:parse_source equal to true). This extends the expressiveness and intentions of author's posts.

Titles in other ActivityPub aware server applications:

The main Mastodon application and its most popular forks, Hometown and glitch-soc, do not expose title elements for RSS feed item.
Among other examples of microblogging software that do not expose titles is Manton Reece's micro.blog feed.
Apparently, Matt Mullenweg has indicated that title-less items will be supported in upcoming versions of Wordpress and Tumblr.

Have you searched for this feature request?

I have double-checked and have not found this feature request mentioned anywhere.
This feature is related to the Akkoma backend specifically, and not pleroma-fe.

### The idea 1. Add a configuration key for a boolean value `:parse_source` with a default value of `false` under `config :pleroma, :feed, :post_title`. An example description for the key is shown below. 2. Modify the function `Pleroma.Formatter.truncate/3` to accept zero as the value for `Pleroma.Config.get([:feed, :post_title, :max_length])`. Currently, this function will raise an exception if `max_length` is less than `String.length(omission)`. If `max_length` is less than the length of the `omission` string, just use `String.slice(text, 0, max_length)` (without adding the `omission`) as the truncated text. 3. Modify the logic in `Pleroma.Web.Feed.FeedView.activity_title/2` to test if and only if the opts parameter has `%{parse_source: true}` and the activity has non-nil values at `activity.data["source"]["mediaType"]` and `activity.data["source"]["content"]`. If so, do not use the HTML content for the activity to produce the title. Instead, parse the title from the "source" "content", according to rules for each content type, ignoring the `:max_length` configuration value (see below for the rules). #### Notes If the :parse_source option is set, and the rules set forth below fail to parse out an authored title, the :max_length option will be used as before to render a plaintext title sub-element and a description sub-element that contains HTML encoded from the original full `activity.data["source"]["content"]`. If the :parse_source option is set, no title is parsed, and :max_length is zero, the feed item/entry will be rendered without a title sub-element. If the :parse_source option is set, and the rules below parse out an authored title, the description sub-element of the feed item/entry will contain HTML encoded from a slice of the `activity.data["source"]["content"]` beginning at an offset just beyond the end of the parsed title. In this way, the description sub-element will not repeat the title that was parsed out. In all cases, including under the current behavior, the title sub-element's value should be a single line of plain text, trimmed of leading and trailing whitespace, without links or emojis. If no title is produced, the feed item/entry will be rendered without a title sub-element. #### Example :parse\_source in description.exs ``` %{ key: :parse_source, type: :boolean, description: "Use content type-specific parsers to extract title (ignores max_length)", suggestions: [true] } ``` #### Parsing title from "text/html" content The title is the content of an initial `h1` or `h2` element in the content, or `nil` if this is not satisfied. #### Parsing title from "text/plain" content The title is a leading single line of text, separated from the description by two newlines or `nil` if is not satisfied. This is the same logic that separates the headers of a plaintext email from its body. #### Parsing title from "text/bbcode" content The title is the content of a leading `[b]` element, separated from the description by two newlines, or `nil` if is not satisfied. #### Parsing title from "text/markdown" content The title is the content of a leading `#` or `##` (h1 or h2) element, separated from the description by two newlines, or `nil` if is not satisfied. #### Parsing title from "text/x.misskeymarkdown" content The title is the content of a leading `**` (bold) element, separated from the description by two newlines, or `nil` if is not satisfied. (Not sure if the Pleroma implementation of x.misskeymarkdown supports a `title` element, which could be used in place of `**`). ### The reasoning Admins configuring a server should be able to choose between: 1. Truncating titles for all items in RSS feeds from the HTML-encoded description (i.e. from `activity.data["content"]`). This is the current behavior (`:max_length` greater than zero and `:parse_source` false). 2. Making all items in RSS feeds title-less. (`:max_length` equal to zero and `:parse_source` false). Rationales for this are explored in [Why Mastodon should have title-less feeds](http://scripting.com/2022/12/10.html) and [Common features that a "document" should support](http://this.how/whatIsADocument/). 3. Letting post authors specify a title for individual posts by following rules according to the content type listed above, and parsing (not truncating) titles from `activity.data["source"]["content"]` (`:parse_source` equal to true). This extends the expressiveness and intentions of author's posts. Titles in other ActivityPub aware server applications: * The main Mastodon application and its most popular forks, Hometown and glitch-soc, do not expose `title` elements for RSS feed item. * Among other examples of microblogging software that do not expose titles is [Manton Reece's micro.blog feed](https://www.manton.org/feed.xml). * Apparently, Matt Mullenweg has indicated that title-less items will be supported in upcoming versions of Wordpress and Tumblr. ### Have you searched for this feature request? - [x] I have double-checked and have not found this feature request mentioned anywhere. - [x] This feature is related to the Akkoma backend specifically, and not pleroma-fe.