2023-08-19

RoundSparrow @ BT · 11 months ago

2023-08-19

RoundSparrow @ BT · edit-2 11 months ago

PostgreSQL keeps failing

And I feel like the project keep ignoring that basic fact. The servers crashing aren’t a feature, they are a bug! Yes, now there are 1500 instances to brag about, but they are all pulling data from lemmy.world and all the broken things in federation are smoldering issues.

join_collapse_limit is the PostgreSQL design team telling you don’t build apps with 15 JOIN on real-time no-caching queries. And look what happens, it goes off into wild behaviors depending on the amount of data that has built up on a given server. And new instances starting with zero data gives the illusion that the problem is solved… but once data starts getting into that database, the overhead of all that JOIN logic and counting grows and grows.

RoundSparrow @ BT · 11 months ago

The reason join_collapse_limit needs OPEN DISCUSSION is because it highlights the core of he problem. Too many JOIN in the primary logic of listing posts. The ‘too many fields’ was kind of obvious, the size of the SELECT statement is huge! It’s machine generated.

And I can’t even REMOVE joins that aren’t needed for anonymous. The Rust objects are so binding, that “saved posts” - which can not be saved for an anonymous user, can’t be decoupled.

The servers crashing isn’t treated like an actual problem… as a siren going off saying the code design is faulty. The mere existence of join_collapse_limit as a topic being ignored - shows the lack of design concern. Now instance blocking is being added, another new layer of work for this query.

RoundSparrow @ BT · 11 months ago

Back to Basics

All this INSERT overhead, real-time counting. Real time votes. But it is only chewing up dead tuples with constant rewrites of PostgreSQL rows to +1 every single thing in the site to give non-cached results.

And it isn’t benefiting the SELECT side of reading that data, it’s burdening it.

The subscribed table is likely merged for federated and local users. But when it comes time to list posts, having to sort through remote users data in the same table is overhead for every post listing. Same goes for votes, and yes - every SELECT looks at granular votes - because it wants to show the UI which items were already voted on. But it’s a huge amount of data in that table to filter out all the votes on outdated posts, votes from user snot even on this server, etc.

And there are no limits… you could block every person and make the database have to labor away filtering out all the people you blocked. You can block a community. The testing code to reproduce these edge cases alone is a lot of work that isn’t being done… and it creates sitting time bomb that some user who hits the ‘save’ on every post or block on every user throws queries into wild behaviors.

I think some sanity has to be considered, like “2 weeks worth of posts” is how data is organized… and then at least someone who goes wild with save post or blocking users - there is a cut-off.

I think the personalization of data should pretty much be an entire post-production layer of the app. The core engine should be focused on post and comment storage and retrieval. “saved post” lists, blocking of instances, blocking of persons… let post-production deal with that.

There will be major world news events where people want to get in and see the latest comments, and the code will be crashing left and right because of personal block lists that some hand full of users built up to 80,000 people (on a single account) with some script file. Meanwhile, nobody has made a test script file to see what happens at 80,000 people on a block list…

RoundSparrow @ BT · 11 months ago

Ok… so where to begin?

language choices. I think it’s a noble gesture, but it’s hard to ignore the overhead factor and all the end user who accidentally hide their posts and comments by getting confused by it.
all sorts but “Most comment”, “old”, and “Controversial” come down to recent posts. Nobody is complain about a 3 week old post not appearing… with one exception, featured. I think I have some tricks to play with featured. Can some basic sanity be added to the project by putting a limit on time? 3 days? Are most people here to browse the most recent 3 days of content? 7 days? Can all data be divided and organized around this? With the exception being: single community?
Is there a limp mode? Can something short of Beehaw and Lemmy.world turning off their entire front page - need to be built into the app. I think it needs to be done. In emergency / limp mode, you could cut off old data, or cut off personalization.

I think the project has fundamentally misinformed the population that servers are too busy because of too many users. I just don’t see that many users!! Everything I see is too many JOIN statements! Moving to new virgin servers starts with zero data, that’s why it worked. Lemmy.world has way more data than some empty instance that is 3 weeks old. And the project leaders have failed to understand or communicate this basic issue.