Skip to content

Optimize plan for filecache search performance especially on Chinese text search #26293

Open
@jgzh

Description

As my nextcloud instance filecache record raise up to 2 millions rows, the search performance comes slow. By read about the source code, I found 2 reasons:

  1. lib/private/Files/View.php:searchCommon function will call multi-times Cache search function (at lib/private/Files/Cache/Cache.php) using the same pattern (e.g. user has some share mount points). In this situation, the difference between each query SQL is storage ID as every substorage has it's own storage ID in filecache table. This situation can be optimized by introducing a caching mechanism (e.g. using Redis in this article).

  2. lib/private/Files/Cache/Cache.php:search function always executes a database query vie LIKE %%, which means file search cannot use any indexing mechanism. This situation can be optimized by use different matching methods according to different patterns (e.g. using MATCH...AGAINST match MySQL FULLTEXT index in this article).

Here is the code I've tested on version 20.0.8 with add FULLTEXT index to the name column of filecache table.
lib/private/Files/Cache/Cache.php:search:

public function search($pattern) {
	// normalize pattern
	$pattern = $this->normalize($pattern);

	if ($pattern === '%%') {
		return [];
	}

	/**
	 * 3/25/2021 - Add fulltext search support for Chinese, and caching query results by using Redis
	 * Pre-works:
	 *   1. Add FULLTEXT index to the `name` column of `filecache` table.
	 *   2. Modify *PREFIX* for `filecache` table & set up your Redis server IP, port below.
	 * Steps:
	 *   1. Remove % in $pattern.
	 *   2. Initialize Redis server.
	 *   3. Check if $pattern has been queried recently (find raw-results in Redis which contain all storage ID).
	 *   4-1. Take out raw-results stored in Redis if it's true.
	 *   4-2. Perform database query if not.
	 *   4-2-1. Using MATCH...AGAINST if $pattern is Chinese, else using LOCATE instead of LIKE %%.
	 *   4-2-2. Save raw-results to Redis only for a while (60s is enough in usual).
	 *   5. Filter storage ID separately with out done by database query.
	 **/ 
	$pattern = ltrim(rtrim($pattern,"%"),"%");
	$_redis_keyname = 'fastsearch-'.md5($pattern);
	$_redis_logfile = fopen('/tmp/nextcloud-fastsearch-redis.log', 'a');
	if(class_exists('\Redis')){
		//fwrite($_redis_logfile, "\nCreate Redis instance...\n");
		$_redis = new \Redis();
		if(!$_redis->connect('127.0.0.1', '6379')) {
			unset($_redis);
			fwrite($_redis_logfile, "\nFailed connect to Redis...\n");
		};
	} else {
		fwrite($_redis_logfile, "\nRedis class not exists.\n");
	}
	// Check if $pattern has been queried recently.
	if(isset($_redis)) {
		// Get raw-results if Redis cache exists.
		if ($_redis->exists($_redis_keyname)) {
			$result_files = unserialize($_redis->get($_redis_keyname));
			//fwrite($_redis_logfile, "\nGot key:".$_redis_keyname."\n");
		} else {
			//fwrite($_redis_logfile, "\nNot exists key:".$_redis_keyname."\n");
		}
	}
	fclose($_redis_logfile);
	// Perform database query if not.
	if(!isset($result_files)) {
		$sql = 'SELECT `filecache`.`fileid`, `storage`, `path`, `path_hash`, `filecache`.`parent`, '.
				'`name`, `mimetype`, `mimepart`, `size`, `mtime`, `storage_mtime`, `encrypted`, '.
				'`etag`, `permissions`, `checksum`, `metadata_etag`, `creation_time`, `upload_time` '.
			'FROM `oc_filecache` `filecache` '.
			'IGNORE INDEX(`fs_storage_path_hash`, `fs_parent_name_hash`, `fs_storage_mimetype`, `fs_storage_mimepart`, `fs_storage_size`, `fs_mtime`) '.
			'LEFT JOIN `oc_filecache_extended` `fe` '.
			'ON `filecache`.`fileid` = `fe`.`fileid` ';
		if (preg_match('/^[\x{4e00}-\x{9fa5}]+$/u', $pattern)) {
			// Using MATCH...AGAINST if $pattern is Chinese.
			$sql = 'SELECT * FROM '.
				'(' . $sql . ' WHERE MATCH(`name`) AGAINST (?)) `t` '.
				'WHERE LOCATE(?, `t`.`name`);';
			$params = [$pattern, $pattern];
		} else {
			// Else using LOCATE instead of LIKE %%.
			$sql = $sql . 'WHERE LOCATE(?, `name`);';
			$params = [$pattern];
		}
		$result = $this->connection->executeQuery($sql, $params);
		$result_files = $result->fetchAll();
		$result->closeCursor();
		// Save raw-results to Redis only for a while (60s is enough in usual).
		if (isset($_redis)) {
			$_redis->set($_redis_keyname, serialize($result_files));
			$_redis->expire($_redis_keyname, 60);
		}
	}
	// Filter storage ID separately.
	$files = array_filter($result_files, function($file) {
		return ((int)$file['storage']===$this->getNumericStorageId());
	});

	if (isset($_redis)) {
		$_redis->close();
		unset($_redis);
	}

	return array_map(function (array $data) {
		return self::cacheEntryFromData($data, $this->mimetypeLoader);
	}, $files);
}

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions