Description
Thanks for the great work on the Spring Data JPA project! There is one potential important improvement. Currently, the Spring Data JPA generates queries like:
SELECT COUNT(*) > 0 FROM my_table WHERE some_column = ?
for repository methods like:
boolean existsBySomeColumn(String value);
While this works just fine, a more optimal SQL pattern exists and is supported by all major relational databases: using SELECT EXISTS (...)
. This allows the database engine to short-circuit and return as soon as a matching row is found, potentially improving performance, especially on very large datasets. (think about tables with rows magnitude of 10^x).
Another reason: many DB engines skip full index/table scans for EXISTS
, but must scan all matching rows for COUNT(*)
. This can be proved using EXPLAIN ANALYZE
or EXPLAIN
for both queries.
Main proposal
Instead of generating:
SELECT COUNT(*) > 0 FROM my_table WHERE some_column = ?
Generate, using universal JPQL as usual (no need of nativeQuery):
SELECT EXISTS (SELECT 1 FROM my_table WHERE some_column = ?)
Compatibility
The SELECT EXISTS (...)
form is ANSI SQL-compliant and has been supported by virtually all relational databases for decades. There is no risk of incompatibility. MySQL, Postgres, SQLite, SQLServer, Oracle and more.
Quick Example
Repository:
public interface UserRepository extends JpaRepository<User, Long> {
boolean existsByEmail(String email);
}
Generated SQL (suggested):
SELECT EXISTS (SELECT 1 FROM users WHERE email = ?)
Backwards Compatibility Concerns ?
If there are edge cases or dialect-specific concerns, or backwards compatibility (opting-out), perhaps this could be controllable via a Spring Data JPA property (in .yml or .properties). For example:
spring.data.jpa.repository.exists-query-strategy=exists-subquery