Skip posts in indexer where publish date is nil #356

Merged
floatingghost merged 1 commit from sn0w/akkoma:feature/indexer-skip-broken-activities into develop 2022-12-09 20:28:49 +00:00
Contributor

Currently indexing can fail in a somewhat cryptic way when the DB contains broken activities where the publish date is null. More specifically, this happens:

Created indices. Starting to insert posts.
Entries to index: 9615011
** (FunctionClauseError) no function clause matching in Calendar.ISO.parse_utc_datetime/2    
    
    The following arguments were given to Calendar.ISO.parse_utc_datetime/2:
    
        # 1
        nil
    
        # 2
        :extended
    
    Attempted function clauses (showing 1 out of 1):
    
        def parse_utc_datetime(string, format) when is_binary(string) and format === :basic or format === :extended
    
    (elixir 1.14.2) Calendar.ISO.parse_utc_datetime/2
    (elixir 1.14.2) lib/calendar/datetime.ex:1169: DateTime.from_iso8601/3
    (pleroma 3.3.1-475-g0681a26d-sn0w+cofe) lib/pleroma/search/meilisearch.ex:132: Pleroma.Search.Meilisearch.object_to_search_data/1
    (elixir 1.14.2) lib/stream.ex:612: anonymous fn/4 in Stream.map/2
    (elixir 1.14.2) lib/enum.ex:4751: Enumerable.List.reduce/3
    (elixir 1.14.2) lib/stream.ex:1026: Stream.do_transform_inner_list/7
    (elixir 1.14.2) lib/stream.ex:1811: Enumerable.Stream.do_each/4
    (elixir 1.14.2) lib/stream.ex:942: Stream.do_transform/5
exit code: 1

Looking through my DB it seems that there are only a total of four posts that have this broken structure, most of them being from people messing around with test instances / custom activitypub servers. Of course I could just delete those, but since this kind of stuff might federate in again at any time i think it'd make sense to skip them in the indexer.

Currently indexing can fail in a somewhat cryptic way when the DB contains broken activities where the publish date is null. More specifically, this happens: ``` Created indices. Starting to insert posts. Entries to index: 9615011 ** (FunctionClauseError) no function clause matching in Calendar.ISO.parse_utc_datetime/2 The following arguments were given to Calendar.ISO.parse_utc_datetime/2: # 1 nil # 2 :extended Attempted function clauses (showing 1 out of 1): def parse_utc_datetime(string, format) when is_binary(string) and format === :basic or format === :extended (elixir 1.14.2) Calendar.ISO.parse_utc_datetime/2 (elixir 1.14.2) lib/calendar/datetime.ex:1169: DateTime.from_iso8601/3 (pleroma 3.3.1-475-g0681a26d-sn0w+cofe) lib/pleroma/search/meilisearch.ex:132: Pleroma.Search.Meilisearch.object_to_search_data/1 (elixir 1.14.2) lib/stream.ex:612: anonymous fn/4 in Stream.map/2 (elixir 1.14.2) lib/enum.ex:4751: Enumerable.List.reduce/3 (elixir 1.14.2) lib/stream.ex:1026: Stream.do_transform_inner_list/7 (elixir 1.14.2) lib/stream.ex:1811: Enumerable.Stream.do_each/4 (elixir 1.14.2) lib/stream.ex:942: Stream.do_transform/5 exit code: 1 ``` Looking through my DB it seems that there are only a total of four posts that have this broken structure, most of them being from people messing around with test instances / custom activitypub servers. Of course I could just delete those, but since this kind of stuff might federate in again at any time i think it'd make sense to skip them in the indexer.
sn0w added 1 commit 2022-12-09 20:04:37 +00:00
Skip posts in indexer where publish date is nil
Some checks are pending
ci/woodpecker/pr/woodpecker Pipeline is pending
4c0911592b

yeah we should mandate published on the schema anyhow

i'll make it so we reject without published

yeah we should mandate published on the schema anyhow i'll make it so we reject without published
floatingghost referenced this pull request from a commit 2022-12-09 20:28:19 +00:00
floatingghost approved these changes 2022-12-09 20:28:41 +00:00
floatingghost merged commit f667884962 into develop 2022-12-09 20:28:49 +00:00
floatingghost deleted branch feature/indexer-skip-broken-activities 2022-12-09 20:28:49 +00:00
Sign in to join this conversation.
No description provided.