Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Indexing CSVs #28

Merged
merged 15 commits into from
Jun 16, 2021
Merged

Indexing CSVs #28

merged 15 commits into from
Jun 16, 2021

Conversation

gmourier
Copy link
Member

@gmourier gmourier commented Apr 9, 2021

The goal of this PR is to specify CSV file indexing feature.

@gmourier gmourier added the Draft Feature specification is in draft state. Summary and Motivation parts need to be written. label Apr 9, 2021
@gmourier gmourier added In Progress Feature specification is in elaboration. Important changes can still occurs in the specification. and removed Draft Feature specification is in draft state. Summary and Motivation parts need to be written. labels Apr 9, 2021
@MarinPostma
Copy link
Contributor

Should this be a joint spec with json line? Support already exist, and just needs to be exposed

@gmourier
Copy link
Member Author

gmourier commented Apr 9, 2021

I personally advocate for a separation of two distinct specs as we can use the merged pull requests to reflect what the engine is capable of as a single source of truth (like a spec dictionary?). Nonetheless, we need to take a decision about this statement. Do we use it as a task tracker or do we use it as a support for mapping the whole Meilisearch capabilities at a given point in time? I've added a commit on the last specification guide proposal to elaborate a specification lifecycle (github automation can't do all this steps, so it will need a lot of manual work I think).

What do you mean by saying "expose"? Thinking about this one, I'm wondering if we need to expose it as an SDK's method to pass CSV file through http. It would be nice to have for sure. Especially to propose an interface for the future dashboard or DX but maybe we can just use it as a discovery tool to measure usage and payload size modifications on the CLI level. It still permits for MeiliSearch to iterate in the future.

@MarinPostma
Copy link
Contributor

What I mean by expose is that this feature is already implemented and supported by the engine, and is disabled for use by the user.

I get your point about isolation of specs, I'm just wondering if that since this is already implemented, and is basically the same thing, we should maybe ship the two at once, and avoid unecessary overhead.

As for the sdk, what they decide to implement usually follows what MeiliSearch exposes in it's api.

text/csv-indexation.md Outdated Show resolved Hide resolved
text/csv-indexation.md Outdated Show resolved Hide resolved
text/csv-indexation.md Outdated Show resolved Hide resolved
text/csv-indexation.md Outdated Show resolved Hide resolved
text/csv-indexation.md Outdated Show resolved Hide resolved
@gmourier gmourier marked this pull request as ready for review April 13, 2021 16:20
@gmourier gmourier added Ready For Review Feature specification must be reviewed. and removed In Progress Feature specification is in elaboration. Important changes can still occurs in the specification. labels Apr 13, 2021
@gmourier
Copy link
Member Author

Just to follow up with you @curquiza, I will update the tech part as soon as I've updated the PR related to the change in spec definition.

@gmourier gmourier requested a review from curquiza April 13, 2021 16:26
@gmourier
Copy link
Member Author

As mentioned on meilisearch/transplant#142, tech part needs to be updated to mention that.

@gmourier gmourier added In Progress Feature specification is in elaboration. Important changes can still occurs in the specification. and removed Ready For Review Feature specification must be reviewed. labels Apr 14, 2021
@gmourier gmourier added Ready For Review Feature specification must be reviewed. and removed In Progress Feature specification is in elaboration. Important changes can still occurs in the specification. labels Apr 14, 2021
@gmourier
Copy link
Member Author

As mentioned in meilisearch/transplant#142 it will be a candidate for v0.22

@Kerollmops Kerollmops changed the title CSV Indexation Indexing CSVs Apr 14, 2021
@gmourier gmourier changed the title Indexing CSVs CSV Support Apr 15, 2021
Copy link
Member

@curquiza curquiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bout

text/0028-csv-indexation.md Outdated Show resolved Hide resolved
text/0028-csv-indexation.md Outdated Show resolved Hide resolved
@gmourier gmourier changed the title CSV Support Indexing CSVs Apr 16, 2021
gmourier and others added 2 commits May 25, 2021 16:00
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
bors bot added a commit to meilisearch/milli that referenced this pull request May 31, 2021
184: Transfer numbers and strings facets into the appropriate facet databases r=Kerollmops a=Kerollmops

This pull request is related to #152 and changes the layout of the facets values, numbers and strings are now in dedicated databases and the user no more needs to define the type of the fields. No more conversion between the two types is done, numbers (floats and integers converted to f64) go to the facet float database and strings go to the strings facet database.

There is one related issue that I found regarding CSVs, the values in a CSV are always considered to be strings, [meilisearch/specifications#28](https://github.com/meilisearch/specifications/blob/d916b57d748628c3b45500e9dfa46b50dfa2ad3f/text/0028-indexing-csv.md) fixes this issue by allowing the user to define the fields types using `:` in the "CSV Formatting Rules" section.

All previous tests on facets have been modified to pass again and I have also done hand-driven tests with the 115m songs dataset. Everything seems to be good!

Fixes #192.

Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: Kerollmops <clement@meilisearch.com>
Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you :)

text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
text/0028-indexing-csv.md Outdated Show resolved Hide resolved
Co-authored-by: Clément Renault <clement@meilisearch.com>
@gmourier gmourier merged commit 91d41f6 into main Jun 16, 2021
@gmourier gmourier deleted the csv-support branch June 16, 2021 13:27
@gmourier gmourier added Ready To Be Implemented Feature specification is ready to be implemented. Missing Tracking-Issue and removed Ready For Review Feature specification must be reviewed. Ready To Be Implemented Feature specification is ready to be implemented. labels Jun 16, 2021
gmourier added a commit that referenced this pull request Aug 17, 2021
* Initiate csv indexation support specification

* update spec file name to match pull request id

* update csv indexation spec

* fix code examples and typos

* fix typos

* update spec name

* Update header part to match MeiliSearch Tracking-Issues

* update spec from the equivalent ndjson spec reviews

* update --data sample examples

* add information about giving application/json content-type or not for a json payload

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Change curl --data param to --binary-data in examples

Co-authored-by: Clément Renault <clement@meilisearch.com>

* updates error codes

* moved behavior about missing content-type in explanation part

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

Co-authored-by: Clément Renault <clement@meilisearch.com>
gmourier added a commit that referenced this pull request Aug 17, 2021
* Initiate csv indexation support specification

* update spec file name to match pull request id

* update csv indexation spec

* fix code examples and typos

* fix typos

* update spec name

* Update header part to match MeiliSearch Tracking-Issues

* update spec from the equivalent ndjson spec reviews

* update --data sample examples

* add information about giving application/json content-type or not for a json payload

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Change curl --data param to --binary-data in examples

Co-authored-by: Clément Renault <clement@meilisearch.com>

* updates error codes

* moved behavior about missing content-type in explanation part

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

Co-authored-by: Clément Renault <clement@meilisearch.com>
gmourier added a commit that referenced this pull request Sep 13, 2021
* Initiate csv indexation support specification

* update spec file name to match pull request id

* update csv indexation spec

* fix code examples and typos

* fix typos

* update spec name

* Update header part to match MeiliSearch Tracking-Issues

* update spec from the equivalent ndjson spec reviews

* update --data sample examples

* add information about giving application/json content-type or not for a json payload

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Change curl --data param to --binary-data in examples

Co-authored-by: Clément Renault <clement@meilisearch.com>

* updates error codes

* moved behavior about missing content-type in explanation part

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

Co-authored-by: Clément Renault <clement@meilisearch.com>
@gmourier gmourier added v0.23 Ready To Be Implemented Feature specification is ready to be implemented. and removed Missing Tracking-Issue labels Sep 15, 2021
gmourier added a commit that referenced this pull request Oct 11, 2021
* Initiate csv indexation support specification

* update spec file name to match pull request id

* update csv indexation spec

* fix code examples and typos

* fix typos

* update spec name

* Update header part to match MeiliSearch Tracking-Issues

* update spec from the equivalent ndjson spec reviews

* update --data sample examples

* add information about giving application/json content-type or not for a json payload

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Change curl --data param to --binary-data in examples

Co-authored-by: Clément Renault <clement@meilisearch.com>

* updates error codes

* moved behavior about missing content-type in explanation part

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

Co-authored-by: Clément Renault <clement@meilisearch.com>
gmourier added a commit that referenced this pull request Oct 12, 2021
* Indexing NDJSONs (#29)

* initialize a draft for json lines indexation support specification

* update filename number to match related pull-request

* update specs

* update link to CSV spec

* update spec name

* Apply typos correction from code review

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* fix typo

* update impact on documentation part

* replace file by data

* add information about giving application/json content-type or not for a json payload

* updates error codes, curl instructions

* moved behavior about missing content-type in explanation part

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>

* Indexing CSVs (#28)

* Initiate csv indexation support specification

* update spec file name to match pull request id

* update csv indexation spec

* fix code examples and typos

* fix typos

* update spec name

* Update header part to match MeiliSearch Tracking-Issues

* update spec from the equivalent ndjson spec reviews

* update --data sample examples

* add information about giving application/json content-type or not for a json payload

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Change curl --data param to --binary-data in examples

Co-authored-by: Clément Renault <clement@meilisearch.com>

* updates error codes

* moved behavior about missing content-type in explanation part

* Apply suggestions from code review

Co-authored-by: Clément Renault <clement@meilisearch.com>

Co-authored-by: Clément Renault <clement@meilisearch.com>

* Geosearch (#59)

* Initialize draft specification for geo-search feature

* add future possibilities

* Update specification

* mention errors and aspects about filterableAttributes and sortableAttributes

* Add measure and finalized key changes

* Add description in OpenApi

* remove old falsy sentence

* Add definition and explanation for  error

* fix rebase on develop

* Specify missing edge cases (#63)

* Initialize draft specification for geo-search feature

* add future possibilities

* Update specification

* mention errors and aspects about filterableAttributes and sortableAttributes

* Add measure and finalized key changes

* Add description in OpenApi

* remove old falsy sentence

* Add definition and explanation for  error

* fix rebase on develop

* update open-api.yml with description on _geoPoint built-in sort rule and _geo field

* Apply suggestions from code review

Co-authored-by: gui machiavelli <gui@meilisearch.com>

* remove - char in geo-search

* update invalid_geo_field error definition

Co-authored-by: gui machiavelli <gui@meilisearch.com>

* Patch GeoSearch specification to mention technical limit on `desc` ordering around a _geoPoint (#66)

* mention decision and expected behavior for a desc ordering around a geoPoint

* add desc ordering around a geoPoint as a future possibility

* Patch error codes for csv and ndjson formats specs (#64)

* specify error codes dedicated to payload format for post/put documents endpoints

* Udpdate error codes naming

* Add errors definition

* update errors and cURL examples

* Add alternative message for reserved keyword and update invalid_criterion error definition (#67)

* add alternative message for reserved keyword and update invalid_criterion error

* update error name in link field for invalid_ranking_rule error

* update invalid_geo_field error message

* fix typo

* Add future possibilities from irevoire to geosearch spec file (#74)

* add descending order capability for _geoPoint built-in sort (#77)

* Add variant message to ensure _geoPoint and _geoRadius expressions are not used as a ranking rule to help the user (#78)

* Mention the supported separator character (#81)

* Add content-type header requirements

* update bump.yml configuration

Co-authored-by: cvermand <33010418+bidoubiwa@users.noreply.github.com>
Co-authored-by: Clément Renault <clement@meilisearch.com>
Co-authored-by: gui machiavelli <gui@meilisearch.com>
@gmourier gmourier added Implemented Feature specification has been implemented. and removed Ready To Be Implemented Feature specification is ready to be implemented. labels Oct 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Implemented Feature specification has been implemented. Q3:2021 v0.23
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants