Avatar

Profile IN() statement in queries (General)

by Micha ⌂, Saturday, March 25, 2017, 11:17 (2586 days ago) @ danielb987
edited by Auge, Monday, March 27, 2017, 06:50

Hi,

the profiling tool works fine but I have to modify the profile_mysqli.php to use it on a windows system (WAMPP).

$file = str_replace($forum_folder, "", $file);
$file = str_replace("\\", "/", $file);

Okay, I take a look to that queries and it seems, that all (or most of them) queries contains an IN() statement.

category IN (0, 4, 5, 6, 7, 8, 9, 25, 14, 23, 19, 22, 24, 10, 21, 11, 12, 13, 15, 16, 26, 18, 20, 27)
 

I made some enquiries how to improve the IN() statement. Someones said, the IN() query is faster than an OR but slower than an INNER JOIN statement.

A query like:

SELECT id, tid FROM mlf2_entries 
WHERE 
pid = 0 AND 
spam = 0 AND 
category IN (0, 4, 5, 6, 7, 8, 9, 25, 14, 23, 19, 22, 24, 10, 21, 11, 12, 13, 15, 16, 26, 18, 20, 27) 
ORDER BY sticky DESC, TIME DESC LIMIT 0, 40

can rewritten to

SELECT `id`, `tid` FROM `mlf2_entries` 
INNER JOIN (
SELECT 0 AS `cat_id` 
UNION ALL SELECT 4
UNION ALL SELECT 5 
UNION ALL SELECT 6
UNION ALL SELECT 7
UNION ALL SELECT 8
UNION ALL SELECT 9 
UNION ALL SELECT 25
UNION ALL SELECT 14
UNION ALL SELECT 23 
UNION ALL SELECT 19 
UNION ALL SELECT 22
UNION ALL SELECT 24 
UNION ALL SELECT 10 
UNION ALL SELECT 21
UNION ALL SELECT 11 
UNION ALL SELECT 12
UNION ALL SELECT 13
UNION ALL SELECT 15
UNION ALL SELECT 16
UNION ALL SELECT 26
UNION ALL SELECT 18
UNION ALL SELECT 20
UNION ALL SELECT 27 ) AS `tmp_table` 
ON `category` = `cat_id` 
WHERE 
`pid` = 0 AND 
`spam` = 0 
ORDER BY sticky DESC, TIME DESC LIMIT 0, 40

and corresponding PHP-code index.inc.php line ~69ff

 $sqlQuery = "SELECT `id`, `tid` FROM `".$db_settings['forum_table']."` INNER JOIN (SELECT ".$category_ids[0]." AS `cat_id`";
 for ($c=1; $c<count($category_ids); $c++) {
  $sqlQuery .= " UNION ALL SELECT ".$category_ids[$c];
 }
 $sqlQuery .= ") AS `tmp_table` ON `category` = `cat_id` WHERE `pid` = 0 AND `spam` = 0 ORDER BY sticky DESC, time DESC LIMIT 0, 40";
 $result=profile_mysqli_query($connid, $sqlQuery);
 $result=profile_mysqli_query($connid, "SELECT id, tid FROM ".$db_settings['forum_table']." WHERE pid = 0".$display_spam_query_and." AND category IN (".$category_ids_query.") ORDER BY sticky DESC, ".$db_thread_order." ".$descasc." LIMIT ".$ul.", ".$settings['threads_per_page']) or raise_error('database_error',mysqli_error($connid));

I test it and it seems(!) that the INNER JOIN is a little bit faster that the query that uses an IN() condition. Can someone confirm this behaviour?


Furthermore, I have take a look to the get_category_ids($categories) function. I think, we can improve it by

/**
 * returns all available catgory ids
 *
 * @return array
 */
function get_category_ids($categories) {
 if(!$categories)
  return false;
 
 return array_keys($categories);
}


We can also improve the get_categories() function for forums without restricted categories because, if the IN() statement contains ALL available categories, we don't need to restrict the query. One suggestion:

/**
 * returns all available categories
 *
 * @return array
 */
function get_categories($compact = false) {
 global $settings, $connid, $db_settings;
 $count_result = profile_mysqli_query($connid, "SELECT COUNT(*) FROM ".$db_settings['category_table']);
 list($category_count) = mysqli_fetch_row($count_result);
 mysqli_free_result($count_result);
 
 if($category_count) {
  if (empty($_SESSION[$settings['session_prefix'].'user_id']))
   $result = profile_mysqli_query($connid, "SELECT id, category FROM ".$db_settings['category_table']." WHERE accession = 0 ORDER BY order_id ASC");
  elseif (isset($_SESSION[$settings['session_prefix'].'user_id']) && isset($_SESSION[$settings['session_prefix'].'user_type']) && $_SESSION[$settings['session_prefix'].'user_type'] == 0)
   $result = profile_mysqli_query($connid, "SELECT id, category FROM ".$db_settings['category_table']." WHERE accession IN (0, 1) ORDER BY order_id ASC");
  elseif (isset($_SESSION[$settings['session_prefix'].'user_id']) && isset($_SESSION[$settings['session_prefix'].'user_type']) && ($_SESSION[$settings['session_prefix'].'user_type'] == 1 || $_SESSION[$settings['session_prefix'].'user_type'] == 2))
   $result = profile_mysqli_query($connid, "SELECT id, category FROM ".$db_settings['category_table']." WHERE accession IN (0, 1, 2) ORDER BY order_id ASC");
 
  if(!$result) 
   raise_error('database_error',mysqli_error($connid));
 
  if ($compact && mysqli_num_rows($result) == $category_count)
   return false;
 
  $categories[0]='';
  while ($line = mysqli_fetch_array($result)) {
   $categories[$line['id']] = htmlspecialchars($line['category']);
  }
  mysqli_free_result($result);
  return $categories;
 }
 return false;
}

If $compact == true and all categorises are selected, the function returns false. In index.inc.php line ~56 this value is evaluated. I only found a single call to this function in main.inc.php line 121 and set it to $categories = get_categories(true);. The improvement was significant.

Duo to the fact that IN() is faster than OR, I replaced the OR statements by an equivalent IN, e.g. accession IN (0, 1, 2). The columns order_id and accession should be indexed too.


regards
Micha

--
applied-geodesy.org - OpenSource Least-Squares Adjustment Software for Geodetic Sciences


Complete thread:

 RSS Feed of thread