Some feedback regarding to database performance (General)

by Joe I, Sunday, April 02, 2023, 19:03 (361 days ago)

Hopefully, some of this is useful. We are on a decent shared cloud plan, with shared MySQL. Performance is obviously relative, but should be instructional.
[For reference, our board currently has ~500,000 posts. The board is open for viewing, but only available to prior vetted users for posting. Hence, spam is not an issue for us.]

Hello

I've now completed a test upgrade to 20220803.1. The upgrade went fairly smoothly once I got a hang of the process and some cPanel File Manager limitations.


What limitation did you encounter?

The key cPanel limitation is that it does not allow folder uploads, only files. The workaround is to upload a ZIP of the forum install into a different directory and then copy the appropriate files.

As you mention, 20220803.1 does appear to run significantly slower for us than even 2.0.2, on every page.


Wow, slower than 2.0.2. That's disappointing.

Yes, definitely. I'll post some results / conclusions below.

Based on yours and Daniel's comments, I understand the DB changes from MyISAM to InnoDB, and the additional spam features added seem to be the primary culprits. Note that 2.4.24 provided the (significantly) best performance for us across all upgrade steps.


Maybe it's a good idea to stay with version 2.4.24 for the production/real system.

Yes, thanks. This is exactly where I am leaning at this point, particularly after digging into the SQL a bit.

I'm going to take a look at the current DB structure a little more and see if anything stands out from a SQL performance perspective. The fact that performance is slow on every page, including Users / Admin / Forum Settings, etc, is a bit surprising here.


Thank you for your effort. I hope, that, looking from different perspectives onto the issue, more eyes see more.

Glad to be of any service. It's the least I can do to appreciate all the work you and everyone have done to keep this board alive all these years.


OK, for some results / conclusions:

1) As noted previously, every page runs slowly on our forum in 20220803.1. Admin / user / help pages all take ~5 seconds to load. Home page, with 20 threads and 15 last postings takes ~ 10 seconds. Given this info, I focused on the "totals" line at the bottom of the page to start, since that displays on every forum page.

2) I tweaked 2 files (/includes/index.inc.php and includes/main.inc.php to remove any checks in the SQL WHERE clause for .spam = [0 || 1]. These fields are in the mlf2_akismet_rating and mlf2_b8_rating tables. I did leave the table JOINs in place, since that did not impact performance. After digging, I came to the conclusion these tweaks also impacted performance of the home page thread list and last postings.

3) After these quick changes:
a. Performance of the home page is significantly improved (~80% faster) with the changes to the above files.
b. Performance of the home page is significantly worse (75-125% slower) for non-logged in visitors over logged in visitors. 2-3 seconds for logged in visitors, 5-6 seconds for non logged in visitors.

4) Note #a is promising, but note #b is baffling. I have not done any digging yet on why performance might differ here, but welcome to any feedback.

5) I can duplicate the issues Daniel mentions in your performance issues link. You can see the references to "WHERE / AND (mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0)" or similar in all the queries he posts about.

6) From a SQL perspective, the biggest issue I see with these spam checks is there is basically no cardinality on these fields (0 or 1 only), plus MySQL does not have a great way of handling boolean fields. In MySQL, TinyInt(1) and Bit field types both use a full byte and do not have true boolean characteristics according to SQL standards. Since cardinality is so low, an index on these fields (even though they exist) is not of great value to the queries. This likely causes a table scan and definitely affects performance. I am not sure what the best answer moving forward is, but wanted to shed some light on the issue.

7) One note regarding the upgrade process. After upgrading, I looked at the SQL table indexes, and compared it to a clean install. After upgrade to 20220803.1, the mlf2_akismet_rating table was missing indexes on the spam and spam_check_status fields. These indexes exist on a clean install, but not an upgrade.

I will post some performance results after I run through a separate instance upgrade to 2.4.24.

Joe

Tags:
performance, 20220803.1, database

Avatar

Some feedback regarding to database performance

by Auge ⌂, Monday, April 03, 2023, 21:27 (360 days ago) @ Joe I

Hello

[checks for spam]

See therefore also my observations, which more or less confirm what you said, documented in the corresponding Github issue (#673).

6) From a SQL perspective, the biggest issue I see with these spam checks is there is basically no cardinality on these fields (0 or 1 only), plus MySQL does not have a great way of handling boolean fields.

MySQL has no boolean field type at all even there is one. This …

In MySQL, TinyInt(1) and Bit field types both use a full byte and do not have true boolean characteristics according to SQL standards.

… because the existing type boolean is a tinyint behind the hood (at least in MySQL 5.anything).

Since cardinality is so low, an index on these fields (even though they exist) is not of great value to the queries.

According to my linked observations the index in mlf2_b8_rating seems to work while the index in the similar structured table mlf2_akismet_rating does not work.

7) One note regarding the upgrade process. After upgrading, I looked at the SQL table indexes, and compared it to a clean install. After upgrade to 20220803.1, the mlf2_akismet_rating table was missing indexes on the spam and spam_check_status fields. These indexes exist on a clean install, but not an upgrade.

Thank you for spotting this.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Tags:
performance, database, MySQL

Some feedback regarding to database performance

by Joe I, Monday, April 03, 2023, 23:24 (360 days ago) @ Auge


MySQL has no boolean field type at all even there is one. This …
… because the existing type boolean is a tinyint behind the hood (at least in MySQL 5.anything).

According to my linked observations the index in mlf2_b8_rating seems to work while the index in the similar structured table mlf2_akismet_rating does not work.

Yes to all this, and very interesting on your observation between the 2 tables (b8_rating vs akismet_rating). Let me do a little more digging here.

FYI, Here's some performance comparison between the versions I'm testing. All on the same server, but each has an identical copy of the same DB (with upgrades). Many runs of each to mitigate shared server performance issues.

1. Performance of forum running MLF 2.0.2

  • 1. Home Page (20, 15), not logged in: 2 - 2.25 seconds
  • 2. Home Page (20, 15), logged in: 2.25 - 3 seconds
  • 3. Admin Page, logged in: 1 second
  • 4. Contact Page, not logged in: 1 second

2. Performance of forum running MLF 2.3.5

  • 1. Home Page (20, 15), not logged in: 1.75 - 2.5 seconds
  • 2. Home Page (20, 15), logged in: 2.25 - 3.25 seconds
  • 3. Admin Page, logged in: 1 second
  • 4. Contact Page, not logged in: 1 second

3. Performance of forum running MLF 2.4.24

  • 1. Home Page (20, 15), not logged in: 4 - 5.5 seconds
  • 2. Home Page (20, 15), logged in: 0.75 - 1.5 seconds
  • 3. Admin Page, logged in: 0.5 seconds
  • 4. Contact Page, not logged in: 0.5 second

4. Performance of forum running MLF 20220803.1

  • 1. Home Page (20, 15), not logged in: 5.5 - 6.75 seconds
  • 2. Home Page (20, 15), logged in: 3 - 4 seconds
  • 3. Admin Page, logged in: 3 seconds
  • 4. Contact Page, not logged in: 3 seconds

2.4.24 is a combination of the best and almost worst for performance. Main page is fantastic when logged in, but when not logged in, performance suffers dramatically. This is also impacted in 20220803.1, but not earlier versions. I'm going to take a deeper look at the differences in queries when logged in vs not.

Tags:
performance, database, MySQL

Performance update

by Joe I, Tuesday, April 04, 2023, 15:56 (359 days ago) @ Joe I

Please ignore the performance measurements in the last post. I didn't include performance stats for 20220803.1 for both base and modified code. I highlighted the particularly notable stats.

[For reference, our board currently has ~500,000 posts.]


1. Performance of forum running MLF 2.0.2

  • Home Page (20, 15), not logged in: 2 - 2.25 seconds
  • Home Page (20, 15), logged in: 2.25 - 3 seconds
  • Admin Page, logged in: 1 second
  • Contact Page, not logged in: 1 second

2. Performance of forum running MLF 2.3.25

  • Home Page (20, 15), not logged in: 1.75 - 2.5 seconds
  • Home Page (20, 15), logged in: 2.25 - 3.25 seconds
  • Admin Page, logged in: 1 second
  • Contact Page, not logged in: 1 second

3. Performance of forum running MLF 2.4.24

  • Home Page (20, 15), not logged in: 4 - 5.5 seconds
  • Home Page (20, 15), logged in: 0.75 - 1.5 seconds
  • Admin Page, logged in: 0.5 seconds
  • Contact Page, not logged in: 0.5 second

4. Performance of forum running base MLF 20220803.1

  • Home Page (20, 15), not logged in: 9 - 13 seconds
  • Home Page (20, 15), logged in: 10.5 - 14.5 seconds
  • Admin Page, logged in: seconds
  • Contact Page, not logged in: seconds

5. Performance of forum running modified (remove .spam check) MLF 20220803.1

  • Home Page (20, 15), not logged in: 5 - 7 seconds
  • Home Page (20, 15), logged in: 3 - 4.5 seconds
  • Admin Page, logged in: 3 seconds
  • Contact Page, not logged in: 3 seconds

Tags:
performance, database, MySQL

2.4.24 performance comments.

by Joe I, Thursday, April 06, 2023, 16:43 (357 days ago) @ Joe I
edited by Joe I, Thursday, April 06, 2023, 17:06

I dug into the issue I encountered within 2.4.24 where home page performance was much slower for non logged-in users versus logged in. That assessment was close, but not complete. The performance issue relates to whether a given user has any records in the mlf2_read_entries table or not, and specifically to the query(ies) which retrieve last_postings. [Note: our forum ~500,000 entries.]

Following is the code to retrieve last_postings, with $categories == false. There are different queries based on forum categories being active and containing records, but the structure is similar. This is in /include/index_inc.php:

Base Code:
SELECT id, pid, tid, name, user_name, ft.user_id, UNIX_TIMESTAMP(ft.time) AS time, UNIX_TIMESTAMP(ft.time + INTERVAL 0 MINUTE) AS timestamp, UNIX_TIMESTAMP(last_reply) AS last_reply, subject, category, rst.user_id AS req_user
FROM mlf2_entries AS ft
LEFT JOIN mlf2_userdata ON mlf2_userdata.user_id = ft.user_id
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND rst.user_id = 0
WHERE spam = 0
ORDER BY ft.time DESC LIMIT 1;

When running this on our forum for users who have mlf2_read_entries records, it runs <0.25 seconds.
When running this on our forum for users who do not have mlf2_read_entries or are not logged in, it runs over 3 seconds.

The issue here stems from performing a LEFT JOIN on a table that returns no results. I am not sure if it's because this query decided to use a table scan instead of an index, but it's significant. I made a simple but obscure tweak to the query, and performance is now quick in all instances. We'll stick with the modified 2.4.24 until I can get better performance out of 20220803.1 (or later).

Modified Code:
SELECT id, pid, tid, name, user_name, ft.user_id, UNIX_TIMESTAMP(ft.time) AS time, UNIX_TIMESTAMP(ft.time + INTERVAL 0 MINUTE) AS timestamp, UNIX_TIMESTAMP(last_reply) AS last_reply, subject, category, rst.user_id AS req_user
FROM mlf2_entries AS ft
LEFT JOIN mlf2_userdata ON mlf2_userdata.user_id = ft.user_id
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND IFNULL(rst.user_id, NULL) = 0
WHERE spam = 0
ORDER BY ft.time DESC LIMIT 1;

The only difference is on this line, and I've bolded the change:
Base:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND rst.user_id = 0
Modified:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND IFNULL(rst.user_id, NULL) = 0

I've come across this before and it always seems bizarre, but it works.

FYI my background is also in MS SQL, and thanks for your continued work on this.

Joe

Avatar

2.4.24 performance comments.

by Micha ⌂, Friday, April 07, 2023, 12:38 (356 days ago) @ Joe I

Hi Joe,

Base:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND rst.user_id = 0
Modified:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND IFNULL(rst.user_id, NULL) = 0

I've come across this before and it always seems bizarre, but it works.

This looks strange. In the first statement, rst.user_id is compared with zero. It is like

if (rst.user_id == 0) {
 //do something
}

In your modified statement, a second condition is added in front of the prior condition. However, the second one is identical to the condition of the first statement.

if (rst.user_id == NULL) {
  rst.user_id = NULL;
}
if (rst.user_id == 0) {
 //do something
}

I could have immediately understood that the following change would make an improvement (no further type-cast),

if (rst.user_id == NULL) {
  rst.user_id = -1;
}

However, if such a simple change works, we should change the statements...

/Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

2.4.24 performance comments.

by Joe I, Wednesday, April 12, 2023, 22:11 (351 days ago) @ Micha

Hi Micha,

Base:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND rst.user_id = 0
Modified:
LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = ft.id AND IFNULL(rst.user_id, NULL) = 0

I've come across this before and it always seems bizarre, but it works.


This looks strange. In the first statement, rst.user_id is compared with zero. It is like

if (rst.user_id == 0) {
//do something
}

In your modified statement, a second condition is added in front of the prior condition. However, the second one is identical to the condition of the first statement.

if (rst.user_id == NULL) {
rst.user_id = NULL;
}
if (rst.user_id == 0) {
//do something
}

I could have immediately understood that the following change would make an improvement (no further type-cast),

if (rst.user_id == NULL) {
rst.user_id = -1;
}

Yes to most of the above, and yes it's strange.

For clarity, the 0 in "AND rst.user_id = 0" refers to the logged in User ID.
The actual code in index.inc.php is AND rst.user_id = ". intval($tmp_user_id) ."
If there is no logged in user, then tmp_user_id is set to 0, and the query would be "AND rst.user_id = 0"
If Admin is logged in, the query would be "AND rst.user_id = 1"

For our DB:
In any instance where the specified user has a record in mlf2_read_entries, the result is immediate.
In any instance where the specified user does not have a record, the response is quite slow.

NOTE: From some quick testing on my local machine, there appears to be a difference in the way MySQL 8 and MariaDB 10 handle this.
MySQL appears to run this query quickly regadless.
MariaDB gives the performances differences noted above.

On MariaDB, I did confirm material query plan differences when using using IFNULL and without, for different User IDs, and that is the underlying issue. IFNULL is a workaround to make MariaDB act smartly.

Given the above notes, YMMV based on installation type.


However, if such a simple change works, we should change the statements...

/Micha

For someone still running 2.4.24 and having performance issues, I think this would be a very easy change, and potentially very helpful.
It may also be helpful for 20220803.1, so let me check into that as well and post back here.


Joe

Avatar

2.4.24 performance comments.

by Auge ⌂, Thursday, April 13, 2023, 10:54 (350 days ago) @ Joe I

Hello

For clarity, the 0 in "AND rst.user_id = 0" refers to the logged in User ID.
The actual code in index.inc.php is AND rst.user_id = ". intval($tmp_user_id) ."
If there is no logged in user, then tmp_user_id is set to 0, and the query would be "AND rst.user_id = 0"
If Admin is logged in, the query would be "AND rst.user_id = 1"

For our DB:
In any instance where the specified user has a record in mlf2_read_entries, the result is immediate.
In any instance where the specified user does not have a record, the response is quite slow.

Ok, now your and Michas discussion becomes more clear to me. For the read state we can assume the following preconditions.

1. Only registered users can have records about read entries. This is a feature for registered users only.
2. Marking entries as read in the thread trees and the latest entries listing makes no sense for not logged in users (registered or not).

It would be optimal to have two queries at this point, one for logged in users ($tmp_user_id > 0) and one for all others. Our problem is, that there are already several queries with several additions for optional conditions (categories, etc.). It is already complicated to read the code, such a change would make it more complicated.

Stored procedures would be a nice replacement for the complex evaluation of conditions on the side of PHP and would lower the complexity because of the separation of the SQL code from the PHP scripts. As shown a few entries aside it would be possible to have clearly arranged SQL code that can cover several (sub) cases in one procedure that would be called from the PHP script in one or a few lines of code.

If and when we decide and implement the stored procedures, we are at the point to say, from version xy on we only support installations, where the MySQL- or MariaDB-server supports the database user to create, alter and execute the stored procedures.

For now it would be enough to check the cases and to separate the query strings.

A similar although not identical case is the spam check status. If a forum is a fresh installation of a version of the 2.5 branch (20220508.1 or newer), there will be entries in the spam check tables (mlf2_b8_rating and mlf2_akismet_rating) for every created entry. In upgraded forums the update script should create those spam check entries for all existing forum entries. As a result there should be no forum entry without the corresponding spam check entries. But now we join the spam check tables regardless of the actual settings for checking forum entries with B8 and/or Akismet (or not).

With separating the different cases the queries could be simpler depending on the actual settings. That makes it not unnecessary to check, why the queries are so slow, at least for admins and mods (all reports was made by admins/mods/forum operators).

NOTE: From some quick testing on my local machine, there appears to be a difference in the way MySQL 8 and MariaDB 10 handle this.
MySQL appears to run this query quickly regadless.
MariaDB gives the performances differences noted above.

Beside from getting the code generally so fast as it is possible, we have no influence on the existing software equipment.

For someone still running 2.4.24 and having performance issues, I think this would be a very easy change, and potentially very helpful.
It may also be helpful for 20220803.1, so let me check into that as well and post back here.

Thank you for your effort.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

2.4.24 performance comments.

by Joe I, Tuesday, April 18, 2023, 21:58 (345 days ago) @ Auge

Hello

Ok, now your and Michas discussion becomes more clear to me. For the read state we can assume the following preconditions.

1. Only registered users can have records about read entries. This is a feature for registered users only.
2. Marking entries as read in the thread trees and the latest entries listing makes no sense for not logged in users (registered or not).

OK, good and yes.

It would be optimal to have two queries at this point, one for logged in users ($tmp_user_id > 0) and one for all others. Our problem is, that there are already several queries with several additions for optional conditions (categories, etc.). It is already complicated to read the code, such a change would make it more complicated.

Testing on my board indicates that the simple change of wrapping rst.user_id inside an IFNULL (for the latest_postings query) should work well across all cases.

Stored procedures would be a nice replacement for the complex evaluation of conditions on the side of PHP and would lower the complexity because of the separation of the SQL code from the PHP scripts. As shown a few entries aside it would be possible to have clearly arranged SQL code that can cover several (sub) cases in one procedure that would be called from the PHP script in one or a few lines of code.

Agree on stored procedures overall.

If and when we decide and implement the stored procedures, we are at the point to say, from version xy on we only support installations, where the MySQL- or MariaDB-server supports the database user to create, alter and execute the stored procedures.

For now it would be enough to check the cases and to separate the query strings.

see above; I don't even think that separate queries are necessary in this instance.

A similar although not identical case is the spam check status. If a forum is a fresh installation of a version of the 2.5 branch (20220508.1 or newer), there will be entries in the spam check tables (mlf2_b8_rating and mlf2_akismet_rating) for every created entry. In upgraded forums the update script should create those spam check entries for all existing forum entries. As a result there should be no forum entry without the corresponding spam check entries. But now we join the spam check tables regardless of the actual settings for checking forum entries with B8 and/or Akismet (or not).

With separating the different cases the queries could be simpler depending on the actual settings. That makes it not unnecessary to check, why the queries are so slow, at least for admins and mods (all reports was made by admins/mods/forum operators).

Yes, understood. At many points in development you have to weight the complexity / performance tradeoff.

NOTE: From some quick testing on my local machine, there appears to be a difference in the way MySQL 8 and MariaDB 10 handle this.
MySQL appears to run this query quickly regadless.
MariaDB gives the performances differences noted above.


Beside from getting the code generally so fast as it is possible, we have no influence on the existing software equipment.

Understood. That was more a mental note that the DB in use can affect performance on top of so many other variables.

For someone still running 2.4.24 and having performance issues, I think this would be a very easy change, and potentially very helpful.
It may also be helpful for 20220803.1, so let me check into that as well and post back here.


Thank you for your effort.

You're quite welcome and see my 20220803.1 notes I'm about to post.

Tschö, Auge

20220803.1 performance comments.

by Joe I, Wednesday, April 19, 2023, 00:09 (345 days ago) @ Joe I

OK, so here's an update. Performance notes across all MLF versions tested:

1. Performance of forum running MLF 2.0.2

  • Home Page (20, 15), not logged in: 2 - 2.25 seconds
  • Home Page (20, 15), logged in: 2.25 - 3 seconds
  • Admin Page, logged in: 1 second
  • Contact Page, not logged in: 1 second

2. Performance of forum running MLF 2.3.25

  • Home Page (20, 15), not logged in: 1.75 - 2.5 seconds
  • Home Page (20, 15), logged in: 2.25 - 3.25 seconds
  • Admin Page, logged in: 1 second
  • Contact Page, not logged in: 1 second

3. Performance of forum running base MLF 2.4.24

  • Home Page (20, 15), not logged in: 4 - 5.5 seconds
  • Home Page (20, 15), logged in: 0.75 - 1.5 seconds
  • Admin Page, logged in: 0.5 seconds
  • Contact Page, not logged in: 0.5 second

4. Performance of forum running Mod 1 (latest_postings) MLF 2.4.24

  • Home Page (20, 15), not logged in: 0.75 - 1.5 seconds
  • Home Page (20, 15), logged in: 0.75 - 1.5 seconds
  • Admin Page, logged in: 0.5 seconds
  • Contact Page, not logged in: 0.5 second

5. Performance of forum running base MLF 20220803.1

  • Home Page (20, 15), not logged in: 9 - 13 seconds
  • Home Page (20, 15), logged in: 10.5 - 14.5 seconds
  • Admin Page, logged in: 4 seconds
  • Contact Page, not logged in: 4 seconds

6. Performance of forum running modified (remove .spam check) MLF 20220803.1

  • Home Page (20, 15), not logged in: 5 - 7 seconds
  • Home Page (20, 15), logged in: 3 - 4.5 seconds
  • Admin Page, logged in: 3 seconds
  • Contact Page, not logged in: 3 seconds

7. Performance of forum running Mod 2 (total_spam, total_postings, latest_postings) MLF 20220803.1

  • Home Page (20, 15), not logged in: 2.5 seconds
  • Home Page (20, 15), logged in: 3 seconds
  • Admin Page, logged in: 2 seconds
  • Contact Page, not logged in: 2 seconds

Notes:

Item #4 is as I tested and noted in a 2.4.24 performance post, which contains only a simple change in latest_postings to wrap rst.user_id in an IFNULL.
This is now the best performing version across all tested.

Item #7 is after tweaking and testing the longer running queries in 20220803.1, and mods to the total_spam, total_postings, and latest_postings queries in index.inc.php and main.inc.php. Note that these results are the best I've gotten so far with simple tweaks to the above queries, but I'll take one more run at them. Significantly better performance than Item #5 - base 20220803.1.

I was successful in getting even better performance out of these above queries, but changes to the code get more complex, mostly because of the additional changes necessary to display_query_where and display_query_and includes and I didn't yet dig into where those two "includes" are also used.

I have attempted a number of variations on each of these queries, using ANALYZE for results testing, and attempting subqueries, temp tables, JOIN rearranging, FORCE INDEXes, and various other methods.

My mods to 20220803.1 are below (note, these are the queries as run through SQL testing. I can post the actual php code changes if desired. The php code mods should have a net zero effect on all board installations, regardless of configuration.

total_spam
base

-- total_spam base (650 ms)
-- main.inc.php
SELECT COUNT(*) FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 1 OR mlf2_b8_rating.spam = 1)

mod

-- total_spam Mod 2 (400 ms)
SELECT COUNT(*) FROM mlf2_entries 
LEFT JOIN 
 (SELECT eid AS ak_eid, spam AS ak_spam FROM mlf2_akismet_rating WHERE spam = 1) AS akismet 
    ON ak_eid = mlf2_entries.id
LEFT JOIN 
 (SELECT eid AS b8_eid, spam AS b8_spam FROM mlf2_b8_rating WHERE spam = 1) AS b8 
    ON b8_eid = mlf2_entries.id 
   WHERE ak_spam = 1 OR b8_spam = 1

The differences between these two are:

  • akismet and b8 JOINs are moved into subqueries


total_postings
base

-- total_postings (no spam) - base [1 s]
-- main.inc.php
SELECT COUNT(*) 
FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0)

mod

 
-- total_postings (no spam) - Mod 7 [650 ms] 
-- Faster options require logic rewrite to display_spam_query_where and display+spam_query_and
-- main.inc.php
SELECT STRAIGHT_JOIN COUNT(*) 
FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0)

The differences between these two are:

  • Add STRAIGHT_JOIN to force order of JOINs


latest_postings
base

-- latest_postings base [1.75 s]
SELECT 
   id, pid, tid, name, user_name, mlf2_entries.user_id, UNIX_TIMESTAMP(mlf2_entries.time) AS TIME , UNIX_TIMESTAMP(mlf2_entries.time + INTERVAL 0 MINUTE) AS TIMESTAMP, UNIX_TIMESTAMP(last_reply) AS last_reply, subject, category, rst.user_id AS req_user
    FROM mlf2_entries -- FORCE INDEX (timeid)
 LEFT JOIN mlf2_userdata ON mlf2_userdata.user_id = mlf2_entries.user_id
 LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = mlf2_entries.id AND rst.user_id = 0
 LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
 LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id
 WHERE mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0
 ORDER BY mlf2_entries.time DESC 
 LIMIT 15;

mod

-- latest_postings Mod 3 [b][1.5 ms][/b]
SELECT 
   id, pid, tid, name, user_name, mlf2_entries.user_id, UNIX_TIMESTAMP(mlf2_entries.time) AS TIME , UNIX_TIMESTAMP(mlf2_entries.time + INTERVAL 0 MINUTE) AS TIMESTAMP, UNIX_TIMESTAMP(last_reply) AS last_reply, subject, category, rst.user_id AS req_user
    FROM mlf2_entries FORCE INDEX (TIME)
 LEFT JOIN mlf2_userdata ON mlf2_userdata.user_id = mlf2_entries.user_id
 LEFT JOIN mlf2_read_entries AS rst ON rst.posting_id = mlf2_entries.id AND IFNULL(rst.user_id, NULL) = 0
 LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
 LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id
 WHERE mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0
 ORDER BY mlf2_entries.time DESC 
 LIMIT 15 

The differences between these two are:

  • Add FORCE INDEX (time) on mlf2_entries
  • IFNULL(rst.user_id, NULL)

IFNULL is the same change I made to improve 2.4.24 performance. Here, I’ve also needed. To add FORCE INDEX.

In the table above, you can see that just these changes make a significant difference in performance over the base 20220803.1 code. Feel free to incorporate any/all of these changes into MLF code, or ask any questions.

[edit by Auge]: I enclosed the provided SQL-code in code blocks for better readability. Column lists can overflow the block width in a few cases, but the interesting things are all still visible.

Avatar

20220803.1 performance comments, first questions

by Auge ⌂, Wednesday, April 19, 2023, 11:49 (344 days ago) @ Joe I

Hello Joe

First questions. More will follow.

My mods to 20220803.1 are below (note, these are the queries as run through SQL testing. I can post the actual php code changes if desired. The php code mods should have a net zero effect on all board installations, regardless of configuration.

total_spam
base

-- total_spam base (650 ms)
-- main.inc.php
SELECT COUNT(*) FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 1 OR mlf2_b8_rating.spam = 1)

mod

-- total_spam Mod 2 (400 ms)
SELECT COUNT(*) FROM mlf2_entries 
LEFT JOIN 
(SELECT eid AS ak_eid, spam AS ak_spam FROM mlf2_akismet_rating WHERE spam = 1) AS akismet 
ON ak_eid = mlf2_entries.id
LEFT JOIN 
(SELECT eid AS b8_eid, spam AS b8_spam FROM mlf2_b8_rating WHERE spam = 1) AS b8 
ON b8_eid = mlf2_entries.id 
WHERE ak_spam = 1 OR b8_spam = 1

The differences between these two are:

  • akismet and b8 JOINs are moved into subqueries

It's looking plausible to use pre filtered results for joining. Additionally this removes the OR in the WHERE-clause, that alone can lead to a full table scan instead of using an existing index, from the where clause.

total_postings
base

-- total_postings (no spam) - base [1 s]
-- main.inc.php
SELECT COUNT(*) 
FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0)

mod

 
-- total_postings (no spam) - Mod 7 [650 ms] 
-- Faster options require logic rewrite to display_spam_query_where and display+spam_query_and
-- main.inc.php
SELECT STRAIGHT_JOIN COUNT(*) 
FROM mlf2_entries 
LEFT JOIN mlf2_akismet_rating ON mlf2_akismet_rating.eid = mlf2_entries.id 
LEFT JOIN mlf2_b8_rating ON mlf2_b8_rating.eid = mlf2_entries.id 
WHERE (mlf2_akismet_rating.spam = 0 AND mlf2_b8_rating.spam = 0)

The differences between these two are:

  • Add STRAIGHT_JOIN to force order of JOINs

Here I have a serious question. This is the first time I read about STRAIGHT_JOIN. But allexamples I can find are using it in the FROM … JOIN … section but not directly after SELECT. Additionally the MySQL documentation states STRAIGHT_JOIN as a replacement for pure JOIN (which itself is a shorthand for INNER JOIN). We are using LEFT JOINs. Does this not contradict itself?

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

20220803.1 performance comments, first questions

by Joe I, Friday, April 28, 2023, 17:36 (335 days ago) @ Auge
edited by Joe I, Friday, April 28, 2023, 17:46


First questions. More will follow.

My mods to 20220803.1 are below (note, these are the queries as run through SQL testing. I can post the actual php code changes if desired. The php code mods should have a net zero effect on all board installations, regardless of configuration.

total_spam

The differences between these two are:

  • akismet and b8 JOINs are moved into subqueries


It's looking plausible to use pre filtered results for joining. Additionally this removes the OR in the WHERE-clause, that alone can lead to a full table scan instead of using an existing index, from the where clause.

Yes, agreed on pre filtering and table scan issues. In this particular case, the above changes can be made without any pre filtering and the performance difference is quite significant.

total_postings

The differences between these two are:

  • Add STRAIGHT_JOIN to force order of JOINs


Here I have a serious question. This is the first time I read about STRAIGHT_JOIN. But allexamples I can find are using it in the FROM … JOIN … section but not directly after SELECT. Additionally the MySQL documentation states STRAIGHT_JOIN as a replacement for pure JOIN (which itself is a shorthand for INNER JOIN). We are using LEFT JOINs. Does this not contradict itself?

Tschö, Auge

Ahh, good question. I will caveat this first by noting that I am running on MariaDB. MySQL has an additional optimizer hint (/*+ JOIN_FIXED_ORDER */) that will also work in forcing the order of the JOINs, but MariaDB has not (yet) implemented this hint. Fort compatibility across both, I'm using STRAIGHT_JOIN.

Both MySQL and MariaDB allow for use of the STRAIGHT_JOIN in either the SELECT or FROM clauses.

You are correct in that using the STRAIGHT_JOIN in the FROM clause will act as an INNER JOIN.

In MariaDB (and my untested understanding for MySQL), using the STRAIGHT_JOIN in the SELECT will simply force the order of the JOINs as specified in the FROM clause, but will not force INNER JOIN for tables in the FROM clause. I have confirmed this to be true in MariaDB, and the LEFT JOINs still work properly.


Note: I am still working on performance improvements to 20220803.1. It's not yet fast enough for us to roll out on our forum, so I'm sticking with my modded 2.4.24 for now. Performant JOINs on the akismet_rating and b8_rating tables are still eluding me, but I'll keep trying.

Avatar

20220803.1 performance comments, first questions

by Micha ⌂, Saturday, April 29, 2023, 07:35 (334 days ago) @ Joe I

Hello,

In MariaDB (and my untested understanding for MySQL), using the STRAIGHT_JOIN in the SELECT will simply force the order of the JOINs as specified in the FROM clause, but will not force INNER JOIN for tables in the FROM clause. I have confirmed this to be true in MariaDB, and the LEFT JOINs still work properly.

The use of STRAIGHT_JOIN is not without controversy. I didn't find a real recommendation but several critical posts.

/Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences

20220803.1 performance comments, first questions

by Joe I, Tuesday, May 02, 2023, 20:57 (331 days ago) @ Micha

The use of STRAIGHT_JOIN is not without controversy. I didn't find a real recommendation but several critical posts.

/Micha

From a DB administrator perspective, I have no argument with a quote from one of your linked threads: "Optimizer hints should generally be your last resort.". I completely agree, and you'll see those same comments for just about every hint available for use in SQL. Hints should always be used with caution, but can be very helpful when you can't find a more direct way to cajole SQL into using the most efficient query plan.

To put a finer point on it, some hints aren't much more than a strong recommendation, others tell SQL to perform as instructed, unless it deems a significant performance hit, and yet others tell SQL it must do this regardless. Proceed with caution. And yes, efficiency can change with DB statistics updates, index changes, data changes, field changes, and event DB versions, so SQL life is not static.

Regarding the performance comments I've been posting, and my philosophy in general, hints really shouldn't be used unless there is a dramatic performance improvement, and you've exhausted all other mechanisms. As such, the only change I've noted so far that I'd strongly consider implementing is the mod to latest_postings, where run time on our DB goes from 1.75 sec to 1.5 ms (20220803.1).

My mods to total_postings (and some other poorly performing queries) are still an exercise in progress, and do not yet see a material enough performance gain to warrant implementing. My comments were more an exercise in the approaches I've undertaken to try to get them more performant. As noted, the JOINs on _akismet_rating and _b8_rating, in conjunction with a WHERE clause on the spam fields are what is causing poor performance due to table scans.

Anyway, I'm going to keep at it to see what I can further squeeze out of them.

Joe

Avatar

Some ideas regarding to database performance

by Auge ⌂, Wednesday, April 05, 2023, 15:56 (358 days ago) @ Joe I

Hello

Because of the reported performance issues I have been tossing around ideas to solve the problem. One was to rely a few functions, that are follow ups of database operations, on database triggers, another one was to outsource tasks more or less completely to the database, that needs nowadays complex preparations in our PHP-code.

I begun with the tests independent from the corresponding issue #673. I thereby draw on my experience with MS SQL and its implementation of stored procedures (from now on SP). There I impemented a program with only SP, that decomposes the nested levels of a JSON object into its parts and key-value-pairs, analises them and selects the data depending of the curent dataset, to decide what of the dataset should be stored in another database.

As mentioned somewhere else for triggers in MySQL, I have – in a similar manner – no real experience with SP in MySQL. It's a different syntax as well as a smaller featureset than in MS SQL and, not to forget, I don't know anything about the performance in comparision with transmitting queries as strings from PHP (including their previous preparation in PHP). So I started from scratch with a small task. Getting the categories, a user has access to. Reminder: I do not expect to see a relevant change of the performance with my tests. This is only an experiment at the moment.

Currently getting the categories is a two stage task with the functions get_categories for selecting the accessible categories (if categories exist) and get_category_ids for extracting a list of their IDs from the previous generated category list. The first stage selects the categories depending of the function/rank of the requesting user (not logged in visitor, logged in user, moderator and administrator). The forum operator can regulate the access to categories depending of the rank (even this is not a full matching term). So the function checks the existence of categories and the rank of the requsting user to provide the proper list of categories (or nothing in case of not existing categories). Remind, that the general access to the forum is managed somewhere else.

Because of the limitations of MySQL I had to split similar tasks into different but similar (even not identival) looking SP.

Counting the categories for the decision to generate the selection (no categories means also no need to select them).

DROP PROCEDURE IF EXISTS get_CategoriesCount;
DELIMITER //
CREATE PROCEDURE IF NOT EXISTS get_CategoriesCount(OUT cCats INT)
BEGIN
 SELECT COUNT(*) INTO cCats FROM mlf24_categories;
END //
DELIMITER ;

Select the list of accessible categories depending from the user rank with their ID and name

DROP PROCEDURE IF EXISTS get_AllowedCategories;
DELIMITER //
CREATE PROCEDURE IF NOT EXISTS get_AllowedCategories(IN userID INT)
BEGIN
 DECLARE cntCategories INT DEFAULT 0;
 DECLARE userType SMALLINT DEFAULT -1;
 CALL get_CategoriesCount(cntCategories);
 
 IF cntCategories > 0 THEN
  IF userID IS NOT NULL AND userID > 0 THEN
   SELECT user_type INTO userType FROM mlf24_userdata WHERE user_id = userID;
   IF userType = 0 THEN
    SELECT id, category FROM mlf24_categories WHERE accession IN(0, 1) ORDER BY order_id ASC;
   ELSEIF userType IN(1, 2) THEN
    SELECT id, category FROM mlf24_categories WHERE accession IN(0, 1, 2) ORDER BY order_id ASC;
   ELSE
    SELECT id, category FROM mlf24_categories WHERE accession = 0 ORDER BY order_id ASC;
   END IF;
  ELSE
   SELECT id, category FROM mlf24_categories WHERE accession = 0 ORDER BY order_id ASC;
  END IF;
 END IF;
END //
DELIMITER ;

Select the list of accessible categories depending from the user rank with their ID only

DROP PROCEDURE IF EXISTS get_AllowedCategoryIDs;
DELIMITER //
CREATE PROCEDURE IF NOT EXISTS get_AllowedCategoryIDs(IN userID INT)
BEGIN
 DECLARE cntCategories INT DEFAULT 0;
 DECLARE userType SMALLINT DEFAULT -1;
 CALL get_CategoriesCount(cntCategories);
 
 IF cntCategories > 0 THEN
  IF userID IS NOT NULL AND userID > 0 THEN
   SELECT user_type INTO userType FROM mlf24_userdata WHERE user_id = userID;
   IF userType = 0 THEN
    SELECT id FROM mlf24_categories WHERE accession IN(0, 1);
   ELSEIF userType IN(1, 2) THEN
    SELECT id FROM mlf24_categories WHERE accession IN(0, 1, 2);
   ELSE
    SELECT id FROM mlf24_categories WHERE accession = 0;
   END IF;
  ELSE
   SELECT id FROM mlf24_categories WHERE accession = 0;
  END IF;
 END IF;
END //
DELIMITER ;

After failing several times with generating the different PS because of syntax errors (MySQL is very strict with closing commands lines with semicola when it comes to SP) I moved on to a slightly more complex task. Counting the number of entries in accessible categories.

 
DROP PROCEDURE IF EXISTS get_EntriesCount;
DELIMITER //
CREATE PROCEDURE IF NOT EXISTS get_EntriesCount(IN userID INT)
BEGIN
 DECLARE cntCategories INT DEFAULT 0;
 DECLARE userType SMALLINT DEFAULT -1;
 CALL get_CategoriesCount(cntCategories);
 
 IF cntCategories > 0 THEN
  IF userID IS NOT NULL AND userID > 0 THEN
   SELECT user_type INTO userType FROM mlf24_userdata WHERE user_id = userID;
   IF userType = 0 THEN
    SELECT
     COUNT(*)
    FROM mlf24_entries
     LEFT JOIN mlf24_akismet_rating ON mlf24_akismet_rating.eid = mlf24_entries.id
     LEFT JOIN mlf24_b8_rating ON mlf24_b8_rating.eid = mlf24_entries.id
    WHERE mlf24_entries.category IN(SELECT id FROM mlf24_categories WHERE accession IN(0, 1))
     AND (mlf24_akismet_rating.spam = 0 AND mlf24_b8_rating.spam = 0);
   ELSEIF userType IN(1, 2) THEN
    SELECT
     COUNT(*)
    FROM mlf24_entries
     LEFT JOIN mlf24_akismet_rating ON mlf24_akismet_rating.eid = mlf24_entries.id
     LEFT JOIN mlf24_b8_rating ON mlf24_b8_rating.eid = mlf24_entries.id
    WHERE mlf24_entries.category IN(SELECT id FROM mlf24_categories WHERE accession IN(0, 1, 2))
     AND (mlf24_akismet_rating.spam = 0 AND mlf24_b8_rating.spam = 0);
   ELSE
    SELECT
     COUNT(*)
    FROM mlf24_entries
     LEFT JOIN mlf24_akismet_rating ON mlf24_akismet_rating.eid = mlf24_entries.id
     LEFT JOIN mlf24_b8_rating ON mlf24_b8_rating.eid = mlf24_entries.id
    WHERE mlf24_entries.category IN(SELECT id FROM mlf24_categories WHERE accession = 0)
     AND (mlf24_akismet_rating.spam = 0 AND mlf24_b8_rating.spam = 0);
   END IF;
  ELSE
   SELECT
    COUNT(*)
   FROM mlf24_entries
    LEFT JOIN mlf24_akismet_rating ON mlf24_akismet_rating.eid = mlf24_entries.id
    LEFT JOIN mlf24_b8_rating ON mlf24_b8_rating.eid = mlf24_entries.id
   WHERE mlf24_entries.category IN(SELECT id FROM mlf24_categories WHERE accession = 0)
    AND (mlf24_akismet_rating.spam = 0 AND mlf24_b8_rating.spam = 0);
  END IF;
 ELSE
  SELECT
   COUNT(*)
  FROM mlf24_entries
   LEFT JOIN mlf24_akismet_rating ON mlf24_akismet_rating.eid = mlf24_entries.id
   LEFT JOIN mlf24_b8_rating ON mlf24_b8_rating.eid = mlf24_entries.id
  WHERE (mlf24_akismet_rating.spam = 0 AND mlf24_b8_rating.spam = 0);
 END IF;
END //
DELIMITER ;

I would expect the SP to be sligthly faster than the generation of the query strings with PHP. There would be only one much simpler query per task. The only data to transmit would be the user-id, everything else would be decided inside the SP on the database server. That way we could start with substituting complex constructions in PHP which prepares code for querying the database here and there with first SP which would move the complexity to the database server.

Until now I have not performed any test regarding running SP with PHP.

Pitfall here is, to being sure, that a common forum operator is allowed to create, alter and execute SP on her/his MySQL database instance when running a forum upgrade or installation. It would make absolutely no sense to begin with the implementation if forum operators would be unable to install or upgrade their forum instances because of strict limitations of their hosting ISPs. On the other hand we would be absolutely incapable to maintain a version, that (partially) runs with SP and another one, that runs the "classic" way.

With running SHOW GRANTS FOR 'database-user-name'; when logged in in phpMyAdmin I got the following result

Grants for database-user-name@%  
GRANT USAGE ON *.* TO `database-user-name`@`database-host`
GRANT ALL PRIVILEGES ON `database-name`.* TO `database-user-name`@`database-host`

With GRANT ALL PRIVILEGES my user is allowed to do more or less anything. This output could look completely different on other servers. The following permissions are explicitly required to handle SP: CREATE ROUTINE, ALTER ROUTINE and EXECUTE. It would be nice to see the anonymised output from other database servers. So please execute the query GRANT ALL PRIVILEGES on your server and post the anonymised output here in this thread.

Tschö, Auge

--
Trenne niemals Müll, denn er hat nur eine Silbe!

Tags:
performance, database, MySQL

SHOW GRANTS

by Joe I, Thursday, April 06, 2023, 16:16 (357 days ago) @ Auge
edited by Joe I, Thursday, April 06, 2023, 16:50

Hello

Because of the reported performance issues I have been tossing around ideas to solve the problem. One was to rely a few functions, that are follow ups of database operations, on database triggers, another one was to outsource tasks more or less completely to the database, that needs nowadays complex preparations in our PHP-code.

I begun with the tests independent from the corresponding issue #673. I thereby draw on my experience with MS SQL and its implementation of stored procedures (from now on SP). There I impemented a program with only SP, that decomposes the nested levels of a JSON object into its parts and key-value-pairs, analises them and selects the data depending of the curent dataset, to decide what of the dataset should be stored in another database.

With running SHOW GRANTS FOR 'database-user-name'; when logged in in phpMyAdmin I got the following result

Grants for database-user-name@%
GRANT USAGE ON *.* TO `database-user-name`@`database-host`
GRANT ALL PRIVILEGES ON `database-name`.* TO `database-user-name`@`database-host`

With GRANT ALL PRIVILEGES my user is allowed to do more or less anything. This output could look completely different on other servers. The following permissions are explicitly required to handle SP: CREATE ROUTINE, ALTER ROUTINE and EXECUTE. It would be nice to see the anonymised output from other database servers. So please execute the query GRANT ALL PRIVILEGES on your server and post the anonymised output here in this thread.

Tschö, Auge

ok, my results.

Command (For me, I was required to include @'127.0.0.1'):
SHOW GRANTS FOR 'myTestUser'@'127.0.0.1';

Returns:
Grants for myTestUser@127.0.0.1
GRANT USAGE ON *.* TO `myTestUser `@`127.0.0.1`
GRANT ALL PRIVILEGES ON `myTestDB `.* TO `myTestUser `@`127.0.0.1`

This is a good start. I am also able to create the SPs you mention above. I am running on a shared account using a shared MySQL (MariaDB) server from Hostinger. Other hosts may provide different access, so as you mention we will likely need to get more data from other users here just to confirm it's standard practice.

Some ideas regarding to database performance

by Andreas Schneider, Wednesday, April 19, 2023, 17:49 (344 days ago) @ Auge
edited by Andreas Schneider, Wednesday, April 19, 2023, 17:55

Hi,

(my skill does not reach up to yours, but i do my best...)

the combination of
Hoster: IONOS Germany
DB: MariaDB 10.6
PHP: 8.1
Forum: 20220803.1

and this php script as a dummy test:

<?
$xtext="CREATE PROCEDURE Test5 () 
        NOT DETERMINISTIC 
        CONTAINS SQL 
        SQL SECURITY DEFINER 
                BEGIN 
                select count(*) from mlf2_settings; 
                END 
 
        ; "
 
?>

works to create a SP.
I put an include at the end of includes/admin.inc.php (just before th subtemplate is called)
for this

and a
$result = mysqli_query($connid, $xtext);

as next.

Using phpmyadmin shows syntax differences between php and the frontend of phpmyadmin.
Testing SQL code in phpmyadmin, you have to use the Delimiter field below of the input field.
Works, but only for testing.

Best
Andi

Some ideas regarding to database performance

by Joe I, Friday, April 28, 2023, 17:43 (335 days ago) @ Andreas Schneider

This is a great idea for testing. Let me see if I can put this all together in a single script that people can upload and run for confirmation.

RSS Feed of thread