Create partitions with more similar size

# Partitions with similar size
## Description
Right now, our partition implementation is a partition per repository, that means, partitions can have really different sizes, causing problems like really long times processing 5% of the total amount repositories (only one thread parsing a huge repo).

With this proposal, I want to normalize partition size changing the way we split the data into several threads.

## Proposal
Change actual partition ID:
```
[REPOSITORY_ID]
```
To:
```
[REPOSITORY_ID]+[PACKFILE_HASH]+[OFFSET_FROM]+[OFFSET_TO]
```

## Values description
- `[REPOSITORY_ID]`: gitbase ID identifying the repository. Examples `gitbase`, `kubernetes`, `spark`.
- `[PACKFILE_HASH]`: packfile hash. Example: `ccbe18d4ba50a14a28855a7b53880d93d87dc498`
- `[OFFSET_FROM]`: packfile offset where the first object for that partition is. Example `23144`
- `[OFFSET_TO]`: packfile offset where the last object for that partition is.  Example `7009211`

## How to do repartition
We need to decide the number of objects that will be on each partition to generate the offsets. That object count will be a magic number and won't be changed. If we change it over configuration, indexes and other internal modules will not work correctly.

Partitions will be done consulting the packfile indexes when PartitionIter is requested. All the partitions will have the same amount of objects, excluding the last one. This operation should be fast enough to do it per query. We can cache the partition names if necessary.

Taking into account repositories updates, this approach is totally valid. Partitions are related to the packfile hash, so the content cannot change, only be deleted or create new packfiles.

## How to handle this partitions on each main tables
### Object tables (Commits, Blobs, Trees, Tree entries)
Each partition will only iterate the objects on their ranges, so no objects will be duplicated at the output.

### References
Only references that are pointing to an object inside the partition range will be an output for the partition. To do this, we can use the packfile indexes and should be fast.

### Files
Files will be appearing only on the partition where the root tree entry is present. Might be necessary to get tree_entries from other packfile ranges to generate the path using the repository.

## Caveats
### Problems
- We need to check how go-git behaves when several threads are using it.
### Benefits
- Partitions will be more homogeneous and will take more or less the same time to process, giving to the user a good approximation about the size of the query and remaining time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create partitions with more similar size #619

Partitions with similar size

Description

Proposal

Values description

How to do repartition

How to handle this partitions on each main tables

Object tables (Commits, Blobs, Trees, Tree entries)

References

Files

Caveats

Problems

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Create partitions with more similar size #619

Description

Partitions with similar size

Description

Proposal

Values description

How to do repartition

How to handle this partitions on each main tables

Object tables (Commits, Blobs, Trees, Tree entries)

References

Files

Caveats

Problems

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions