Does ActivityPub send those to other instances, or does ActivityPub only send the original post and the rest (upvotes, downvotes, replies) are stored only on the original server where the post was made?

  • iso@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    3
    arrow-down
    17
    ·
    11 months ago

    haven’t worked with AP yet, but as a webdev I’m certain it’s original server only. Syncing upvotes between nodes would be an insane datavolume and one hell to properly keep in sync to begin with.

    • Dave@lemmy.nz
      link
      fedilink
      arrow-up
      10
      ·
      edit-2
      11 months ago

      My instance has 800 users, is 4 months old, and the database only is over 30GB. It is an insane amount of data.

        • Dave@lemmy.nz
          link
          fedilink
          arrow-up
          2
          ·
          edit-2
          11 months ago

          I’m a bad example. I haven’t properly tuned the settings, currently RAM will grow to whatever is available.

          I’m very lucky, the instance is running in a proxmox container alongside some other fediverse servers (run by others), on dedicated hardware in a datacentre. The sysadmin has basically thrown me plenty of spare resources since the other containers aren’t using them and RAM not used is wasted, so I’ve got 32GB allocated currently. I still need to restart once a week or that RAM gets used up and the database container crashes.

          It’s been on my list of things to do for a while, try some different postgres configs, but I just haven’t got around to it.

          I know a couple of months back lemmy.world were restarting every 30 mins so they didn’t use up all the RAM and crash. I presume some time and some lemmy updates later that’s no longer the case.

          I know some smaller servers get away with 2gb of RAM, and we should be able to use a lot less than 32GB if I actually try to tune the postgres config.

      • nutomic@lemmy.ml
        link
        fedilink
        arrow-up
        2
        ·
        11 months ago

        There is a postgres command to show the size of each table. Most likely it is from activity tables which can be cleared out to save space.

        • Dave@lemmy.nz
          link
          fedilink
          arrow-up
          1
          ·
          11 months ago

          After the second-to-last update the database shrunk and I was under the impression there was some automatic removal happening. Was this not the case?

          It’s helpful info for others but personally I’m not that worried about the database size. The size of the pictrs cache is much more of a concern, and as I understand it there isn’t an easy way to identify and remove cache images without accidentally taking out user image uploads.

          • nutomic@lemmy.ml
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            11 months ago

            Yes there is automatic removal so if you have enough disk space, no need to worry about it.

            The pictrs storage only consists of uploads from local users, and thumbnails for both local and remote posts. Thumbnails for remote posts could theoretically be wiped and loaded from the other instance, but they shouldnt take much space anyway.

            • Dave@lemmy.nz
              link
              fedilink
              arrow-up
              1
              ·
              11 months ago

              Yes there is automatic removal so if you have enough disk space, no need to worry about it.

              What triggers this? My DB was about 30GB, then the update shrunk it down to 5GB, then it grew back to 30GB.

              The pictrs storage only consists of uploads from local users, and thumbnails for both local and remote posts. Thumbnails for remote posts could theoretically be wiped and loaded from the other instance, but they shouldnt take much space anyway.

              I’d be pretty confident that the 140GB of pictrs cache I have is mostly cache. There are occasionaly users uploading images, but we don’t have that many active users, I’d be surprised if there was more than a few GB of image uploads in total out of that 140GB. We just aren’t that big of a server.

              The pictrs volume also grows consistently at a little under 1GB per day. I just went and had a look, in the files directory there are 6 directories from today (the day only has a couple of hours left), and these sum to almost 700MB of images and almost 6000 files, or a little over 100KB each.

              The instance has had just 27 active users today (though of course users not posting will still generate thumbnails).

              While the cached images may be small, it adds up really quick.

              As far as I can tell there is no cache pruning, as the cache goes up pretty consistently each day.

              • nutomic@lemmy.ml
                link
                fedilink
                arrow-up
                2
                arrow-down
                1
                ·
                11 months ago

                The activities table is cleared out automatically every week, items older than 3 months are deleted. During the update only a smaller number of rows was migrated so the db temporarily was slower. You can manually clear older items in sent_activity and received_activity to free more space.

                Actually Im wrong about images, turns out that all remote images are mirrored locally in order to generate thumbnails. 0.19 will have an option to disable that. This could use more improvements, the whole image handling is rather confusing now.

                • Dave@lemmy.nz
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  11 months ago

                  Thanks for the info! Ior performance reasons it would be nice to have a way to configure how long the cache is kept rather than disable it completely, but I understand you probably have other priorities.

                  Would disabling the cache remove images cached up to that point?

    • Skull giver@popplesburger.hilciferous.nl
      link
      fedilink
      arrow-up
      6
      ·
      11 months ago

      Yes it does, Lemmy keeps a record of all votes on the server and rebroadcasts them to other servers (most of the time). Other servers may get out of sync, especially when you take defederation into account, but that’s not a huge problem in my experience.

      Network traffic is not as bad as you may think, especially with modern HTTPS libraries that will keep connections open while also multiplexing requests.

      The protocol is described in https://www.w3.org/TR/activitypub/ (with a few implemented objects and implementations as the spec allows)

      This is a example from the spec:

      {"@context": "https://www.w3.org/ns/activitystreams",
       "type": "Like",
       "id": "https://social.example/alyssa/posts/5312e10e-5110-42e5-a09b-934882b3ecec",
       "to": ["https://chatty.example/ben/"],
       "actor": "https://social.example/alyssa/",
       "object": "https://chatty.example/ben/p/51086"}
      

      That’s about 287 characters per vote.

      • Tehhund@lemmy.worldOP
        link
        fedilink
        English
        arrow-up
        4
        ·
        11 months ago

        Thanks, that’s very informative. How does this work since ActivityPub can be used for other things, e.g., Mastodon? They ignore any “Type” entries that they don’t support?

        • Skull giver@popplesburger.hilciferous.nl
          link
          fedilink
          arrow-up
          8
          ·
          11 months ago

          They ignore any “Type” entries that they don’t support?

          Basically. For example, ActivityPub objects such as events or locations aren’t supported by many platforms (though they do exist).

          Exact implementations differ per platform. Mastodon doesn’t have a like button, but it does have a favourite button, which is translated into a like when the activity federates. Downvotes are implemented as dislikes (an Activity Streams 2.0 feature, not part of the ActivityPub spec itself) but Mastodon just ignore those.

          Furthermore, there are tons of extra JSON fields and extensions that allow servers of a particular type to talk to each other better. For example, take the JSON returned when I query for details on your user account:

          curl -LH 'Accept: application/ld+json; profile="w3.org/ns/activitystreams"' https://lemmy.world/u/Tehhund | jq
          {
            "@context": [
              "https://www.w3.org/ns/activitystreams",
              "https://w3id.org/security/v1",
              {
                "lemmy": "https://join-lemmy.org/ns#",
                "litepub": "http://litepub.social/ns#",
                "pt": "https://joinpeertube.org/ns#",
                "sc": "http://schema.org/",
                "ChatMessage": "litepub:ChatMessage",
                "commentsEnabled": "pt:commentsEnabled",
                "sensitive": "as:sensitive",
                "matrixUserId": "lemmy:matrixUserId",
                "postingRestrictedToMods": "lemmy:postingRestrictedToMods",
                "removeData": "lemmy:removeData",
                "stickied": "lemmy:stickied",
                "moderators": {
                  "@type": "@id",
                  "@id": "lemmy:moderators"
                },
                "expires": "as:endTime",
                "distinguished": "lemmy:distinguished",
                "language": "sc:inLanguage",
                "identifier": "sc:identifier"
              }
            ],
            "type": "Person",
            "id": "https://lemmy.world/u/Tehhund",
            "preferredUsername": "Tehhund",
            "inbox": "https://lemmy.world/u/Tehhund/inbox",
            "outbox": "https://lemmy.world/u/Tehhund/outbox",
            "publicKey": {
              "id": "https://lemmy.world/u/Tehhund#main-key",
              "owner": "https://lemmy.world/u/Tehhund",
              "publicKeyPem": "-----BEGIN PUBLIC KEY-----\nMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAr8QYBRNqyM3A8JHL+rWD\nN22EJDEBd+1D8hzbOnevWnmalBhbp94MY5xyTCOfGIxYo1tZs5BeuM79JRT7eFV6\nefSPZclwri4XOmizgMY2VVRw2zH3zVXmKjbIn84JaNIUez5z5NAtqgzPr+UDxWIZ\n2lH0kJuZ2YBBvH3Bk1xsJznQ3olnh0hGD9+wU10fTSI4d/razTO+4btOMV5yQYry\noZ3RWD4Zq9nhKw5s4Sb5QPQ0NNHnPsnsZPip5FfN67XOQn/d/H2TzBAdKUtEIVBH\nDivI3FWPWmCbdaz3LImS5FpKNoJvoh7Dwlfh2eIE7mkZ9FH64DNw6cd6A2fSOm1w\nXQIDAQAB\n-----END PUBLIC KEY-----\n"
            },
            "endpoints": {
              "sharedInbox": "https://lemmy.world/inbox"
            },
            "published": "2023-06-11T19:07:49.583473+00:00"
          }
          

          Notice the special fields for PeerTube, LitePub, Matrix in the context object: these are additional fields to provide optional metadata for compatibility, in case they’re necessary. In your case (and in most cases to be honest), they’re not used.

          ActivityPub has a relatively simple core architecture with lots of flexibility. You can ignore most of that flexibility to get an extremely simple client, or you can go through every server and find all the rich content they provide to build the mother of all social media apps.

    • Max-P@lemmy.max-p.me
      link
      fedilink
      arrow-up
      5
      ·
      11 months ago

      It does sync them, I can even query all of your votes on my local DB for every community my instance is tracking.