Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve naming in data frame abstraction #857

Merged
merged 4 commits into from
Jun 1, 2024
Merged

Conversation

bchapuis
Copy link
Member

@bchapuis bchapuis commented May 30, 2024

This is an attempt at improving the data frame abstraction. The following diagramm describes the overall architecture.

classDiagram
    class DataColumn {
        +String name()
        +Type type()
    }
    class DataSchema {
        +String name()
        +List<DataColumn> columns()
        +DataRow createRow()
    }
    class DataRow {
        +DataSchema schema()
        +List<?> values()
        +Object get(String column)
        +Object get(int index)
        +void set(String column, Object value)
        +void set(int index, Object value)
        +DataRow with(String column, Object value)
        +DataRow with(int index, Object value)
    }
    class DataTable {
        +DataSchema schema()
        +boolean add(DataRow row)
        +void clear()
        +long size()
        +Iterator<DataRow> iterator()
    }
    class DataStore {
        +List<String> list()
        +DataTable get(String name)
        +void add(DataTable table)
        +void add(String name, DataTable table)
        +void remove(String name)
    }
    DataStore --> DataTable : has
    DataTable --> DataRow : has
    DataTable --> DataSchema : follows
    DataRow --> DataSchema : follows
    DataSchema --> DataColumn : has
Loading

@bchapuis
Copy link
Member Author

bchapuis commented May 30, 2024

@sebr72 @Drabble I'm trying to refactor the org.apache.baremaps.data.schema package. Naming these classes is quite hard and I also struggle with the package name.

I think the new naming convention (see diagram) is a bit less confusing. What do you think?

Regarding the module and package names, I hesitate between many alternatives dataframe, datastore, storage, etc. The difficulty is that the abstraction can be used for in-memory, off-heap, and on-disk collections/dataframes. It is also use to access many files (geopackage, geoparquet, shapefile, etc.) in a uniform way.

Here are a comple of alternatives (module / package):

  • baremaps-data / org.apache.baremaps.data.store
  • baremaps-data / org.apache.baremaps.data.frame
  • baremaps-storage / org.apache.baremaps.storage.store
  • baremaps-storage / org.apache.baremaps.storage.frame
  • baremaps-storage / org.apache.baremaps.storage.datastore

Any idea or suggestion?

@sebr72
Copy link
Contributor

sebr72 commented May 30, 2024

@bchapuis : Following our conversation, I would suggest we keep DataTable rather than Frame.
As well as ...data.schema becomes ...data.storage

- The DataFrame class becomes the DataTable class
- The data.schema package becomes the data.storage package
- The method and variable names in the unit tests have been improved
@@ -22,22 +22,22 @@
import org.apache.baremaps.testing.TestFiles;
import org.junit.jupiter.api.Test;

class GeoPackageDataSchemaTest {
class GeoPackageDataStoreTest {

Check notice

Code scanning / CodeQL

Unused classes and interfaces Note test

Unused class: GeoPackageDataStoreTest is not referenced within this codebase. If not used as an external API it should be removed.
@@ -22,22 +22,22 @@
import org.apache.baremaps.testing.TestFiles;
import org.junit.jupiter.api.Test;

class GeoParquetDataSchemaTest {
class GeoParquetDataStoreTest {

Check notice

Code scanning / CodeQL

Unused classes and interfaces Note test

Unused class: GeoParquetDataStoreTest is not referenced within this codebase. If not used as an external API it should be removed.
@bchapuis bchapuis marked this pull request as ready for review May 30, 2024 18:22
@bchapuis bchapuis requested a review from Drabble May 30, 2024 18:22
@bchapuis
Copy link
Member Author

@sebr72 I can't add you as a reviewer for now, but let me know if you think this refactoring corresponds to the changes we discussed yesterday.

Copy link
Contributor

@sebr72 sebr72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good.
One comment to fix

import java.util.List;

/**
* A {@link DataSchema} is a collection of {@link DataTable}s.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSchema is DataStore in comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link

sonarqubecloud bot commented Jun 1, 2024

Quality Gate Passed Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@bchapuis bchapuis merged commit ee7aed7 into main Jun 1, 2024
9 checks passed
@bchapuis bchapuis deleted the 849-refactoring-datastore branch June 1, 2024 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants