Skip to content

500 Error using _lastUpdated parameter to search count #3021

Open

Description

Describe the bug
FHIR sever with large size database has poor performance and returned 500 Error when searching count with _lastUpdated parameter. 500 Error can stably be reproduced on FHIR server with 1tb database and sometimes happened on 500GB.
image
FHIR Version?
Stu3/R4/R5

Data provider?
SQL Server

To Reproduce
Steps to reproduce the behavior:

Get https://<name>.fhir.azurehealthcareapis.com?_summary=count&_lastUpdated=lt2022-12-05T02:50:36&_lastUpdated=ge2022-01-01T11:02:05

on FHIR server with large sql database.

Expected behavior
Return resource count.

Actual behavior
500 internal error.

My Debug Process

After debugging locally, I find the original request will be translated and rewrite to a SQL query:

SELECT COUNT_BIG(*)
      FROM dbo.Resource r
      WHERE ResourceSurrogateId < @p0
          AND ResourceSurrogateId >= @p1
          AND ResourceTypeId IN (@p2,@p3,@p4,@p5,@p6,@p7,@p8,@p9,@p10,@p11,@p12,@p13,@p14,@p15,@p16,@p17,@p18,@p19,@p20,@p21,@p22,@p23,@p24,@p25,@p26,@p27,@p28,@p29,@p30,@p31,@p32,@p33,@p34,@p35,@p36,@p37,@p38,@p39,@p40,@p41,@p42,@p43,@p44,@p45,@p46,@p47,@p48,@p49,@p50,@p51,@p52,@p53,@p54,@p55,@p56,@p57,@p58,@p59,@p60,@p61,@p62,@p63,@p64,@p65,@p66,@p67,@p68,@p69,@p70,@p71,@p72,@p73,@p74,@p75,@p76,@p77,@p78,@p79,@p80,@p81,@p82,@p83,@p84,@p85,@p86,@p87,@p88,@p89,@p90,@p91,@p92,@p93,@p94,@p95,@p96,@p97,@p98,@p99,@p100,@p101,@p102,@p103,@p104,@p105,@p106,@p107,@p108,@p109,@p110,@p111,@p112,@p113,@p114,@p115,@p116,@p117,@p118,@p119,@p120,@p121,@p122,@p123,@p124,@p125,@p126,@p127,@p128,@p129,@p130,@p131,@p132,@p133,@p134,@p135,@p136,@p137,@p138,@p139,@p140,@p141,@p142,@p143,@p144,@p145,@p146,@p147)

          AND IsHistory = 0
          AND IsDeleted = 0

The query will add all resource types and there is an inline comment in code :
fhir-server/PartitionEliminationRewriter.cs at main · microsoft/fhir-server (github.com)
image
It seems make sense since resourceTypeId is partition key in SQL database.
So I did some experiments in SQL server and explore the reason why this "better query plan" still has poor performance which led to 500 Error.

My Experiment
I created a 4T FHIR SQL database using OSS project and compared different queries:

  1. Filter by ResourceSurrogateId and all ResourceTypeId (This is the query plan used by FHIR server. )
    Search result: 2427786740
    Time: 6min49s

image

2. Only filter by ResourceSurrogateId . Search result:2427786740 Time:1min10s

image

  1. Filter by ResourceSurrogateId with one ResourceTypeID
    Search result: 1166112668(half of the previous)
    Time: 9.3s

image

From this set of experiments, we find that:

  1. Query with one resource type has good performance since resource type is partition key.
  2. Enumerate all resource type has unacceptable performance, although query on each resource type is quick, the overall time is much longer than query without resource type. I guess much overhead is used for parallelization (inner join and aggregate).

According to the experiment, it verified that it's not a good idea to simply enumerate all resource types in the sql query.

I also try to analysis it in theory:

For Resource table, it creates three un-clustered indexes to improve query performance:
IX_Resource_ResourceTypeId_ResourceId
IX_Resource_ResourceTypeId_ResourceId_Version
IX_Resource_ResourceTypeId_ResourceSurrgateId

All of them use ResourceTypeId as the first index key and these indexes will only be used when query parameters include resource type. To use these indexes, it's reasonable to add resource type in SQL query. But when all resource types are added, all of the improvement of indexes will be offset by poor performance of inner join and aggregate.

At last, I try to build a new index on ResourceSurrogateId, and using this index, the request search count will improve 5-6 times than original search method.

Adding a new index will change the schema of SQL database and may not an only way. I hope my experiment and attempts results can help the team solve this problem and improve the search performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions