Skip to content

Quad Database

Jussi Saarivirta edited this page Aug 1, 2021 · 8 revisions

Overview

To detect image star matches to catalog stars, we need a database of catalog stars to perform the comparison. More specifically, we're not interested in individual stars - instead we're interested in the geometric shapes the stars form. Four stars create quite ideal unique enough shape for our purposes. The database is built using the Gaia2 Quad Database Creator

The most obvious step in solving is to compare the quads made of the image's stars with quads made of catalog stars with known positions. In order to speed up the calculations, we precalculate the catalog star quads in advance and store them in a database. The pros of this approach are:

  • Fewer and simpler calculations required during the solving process
  • More compact size of the database

The cons are:

  • You need to figure out star/quad densities at the time of building the database
  • Multiple passes need to be built with different densities to cover both large and small telescope FOVs and short/long exposures

However the pros outweigh the cons here, because the performance boost is truly significant.

Additionally we will want to try solving in many locations simultaneously to speed up the process, especially with blind solves. Hence why we have the quad database split into multiple segments, each segment representing roughly the same size of surface area. This will also help us to run tasks in parallel when building and using the database. The chosen approach was to use a 406 equal area cell division described here in a paper by Zinovy Malkin:

So at the end we have 406 files for the database, one per cell, each covering roughly 10x10 degrees patch of the sky.

Passes

Because the images we're trying to solve may have short exposures, long exposures and they may have a large or small FOV (Field of View), we need to cover multiple different quad/star densities. Longer exposures will show more stars, hence the quads formed from the image will be more numerous and fainter stars are included in them. Short and wide field exposures will have less stars in them. If we can't match the star/quad density close enough, chances are that we're not going to be forming matching quads. That's why we need multiple passes.

Depending on the target parameters, the database can be formed with as many passes as are wanted, with the practical limits being the source material (star catalog) and disk space.

Subdivisions

As our FOV gets smaller and smaller, we need to get less and less catalog quads to test the image quads against. Our roughly 10x10 degree patches contain a lot of quads and we'd rather not go through all of them, so to keep things fast the cells are subdivided to sub cells inside each file. Each pass in each file has its own set of sub cells - how many of them depends on the quad density, i.e. the number of quads in the pass. The more quads, the more subdivisions. By using sub cells we can group our quads into them and we can exclude all the quads in sub cell if the sub cell itself is not contained inside the given search radius:

The file format

The quad database format has lived a few iterations and optimizations to reach its current form. It's a binary format, with the data streamlined and packed tightly to save precious disk space as the database can get quite large the more high density passes and stars are included in it. The file starts with a header, and then all the data per each pass, per each sub cell is stored as a continuous blob of bytes.

The files are usually named with the band and cell id in them (e.g. gaia2-b01c02-11-20.qdb) in addition to the source name (Gaia2) and the pass count and the lowest quad density it contains but the filename has no actual significance.

The file contents look like:

The header first contains the per-file information, then a set of per-pass information inside which there is per-sub cell information. Then the data follows again per-pass, per-sub cell.

FileID

All the database files must start with WATNEYQDB. If that is not present, the file is considered invalid.

Version

The file format version. If the structure of the database file changes, the format version will be changed as well and the database code knows what formats it can support. This guarantees we're not trying to read an unsupported file format.

Human readable header

This is a human readable description, as JSON string that contains a brief description of the file's contents. Its only purpose is to help identify the file, it is not used in any way by the solver/quad database code.

Null byte

This marks the end of the human readable header.

Band

The index of the band of this cell. Bands are the roughly 10 degrees tall latitude bands defined in the SkySegmentSphere class. Kind of the Y-coordinate in the 2D grid.

Cell

The index of the cell inside the band. Kind of the X-coordinate in the 2D grid.

Pass count

The number of quad passes this file contains

Density

The quad density in the pass, quads per square degree.

Subdivs

How many subdivisions the pass has (N x N, N == Subdivs)

Subcell count

How many sub cells the pass has, total (N x N == Subcell count)

Subcell RA, Dec and data length

The sub cell's center point RA, Dec coordinates and the data length of the quad data in bytes (number of quads * size of single quad as bytes)

Quad data bytes

Each quad consists of the basic information of the five ratios, the longest distance and the RA, Dec coordinate of the calculated center of the quad. The ratios are stored as ushort. This saves space, as we can store the fractional numbers 0..1 into two bytes with enough accuracy. ushort can store numbers from 0 to 65535, which is actually good enough. For example, consider the number 0.78225627:

  • Multiply by 100 000 => 78 225.627
  • Divide by 2 => 39 112.8135
  • Strip digits => 39 112 => we can now store it as unsigned short

To unpack, do it in reverse:

  • Multiply by 2 => 78 224
  • Divide by 100 000 => 0.78224

We do lose some accuracy, but not enough to be a meaningful loss. Basically the extra digits don't even matter when we're trying to find matches with about 1% deviance, so we just leave them out to save space.

Usage in code

The CompactQuadDatabase class is where the implementation code is at. It should be noted, that the class implements the IDisposable interface. Inside the class there are members that are disposable, and some resources are kept reserved until the instance is disposed - namely the open streams to quad database files. In order to reduce the IO overhead of opening and closing files, pools of FileStreams are kept open so that they can be reused rather than opening a new stream every time a cell file needs to be accessed. If there's no stream to reuse, a new stream is opened and added to the pool. When the CompactQuadDatabase is disposed, all the pooled FileStreams will be closed and disposed. If no cleanup is ever done, and new CompactQuadDatabase instances are being created, it will slowly eat up open file handles. So treating the CompactQuadDatabase as a singleton or scoped for the duration of the solve or solves is the right thing to do.