Skip to content

Commit

Permalink
Add bulk import (#386)
Browse files Browse the repository at this point in the history
## Problem

Implement the following new methods:

- `start_import`
- `describe_import`
- `list_imports`
- `cancel_import`

## Solution

#### Code generation changes

Since these features are in prerelease, they only exist in the spec for
the upcoming 2024-10 API version. This required me to make modifications
to the codegen script that is now run as:

```
./codegen/build-oas.sh 2024-07 false && ./codegen/build-oas.sh 2024-10 true
```

The second boolean argument is used to tell the codegen script whether
the generated code should be stored in a new `pinecone/core_ea`
subpackage. In the future we should probably do more to hide this
complexity from the developer, but for now it is good enough.

#### Code organization

For the bespoke bits of the implementation that wrap the generated code,
I have put them into a new class, `ImportFeatureMixin`, that the `Index`
class inherits from. These functions could have all been implemented
directly in the `Index` class, but I thought it a bit tidier to
segregate these into a separate spot than just dump everything into one
giant file.

#### Overridden repr representation on generated objects

The default print output in the generated classes comes from pprint and
it looks quite poor for large objects. So I installed overrides that
dump the objects into a formatted json style instead. I had previously
done something similar for describe_index, etc, methods, so for this PR
it was just a matter of cleaning up that logic a bit and moving it
somewhere it could be reused.

So far, I haven't tweaked the generated classes to do this approach
across the board because it doesn't work well for long arrays of vector
values.

## Type of Change

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Infrastructure change (CI configs, etc)
- [ ] Non-code change (docs, etc)
- [ ] None of the above: (explain here)

## Test Plan

Manual testing with a dev release is in this [demo
notebook](https://colab.research.google.com/drive/1W3OhMDG1yW2rdwx-ZulYH847m9R_IUuK#scrollTo=gGvVbfkYNz61
)
  • Loading branch information
jhamon authored Sep 18, 2024
1 parent 4e9a40c commit ff7b81d
Show file tree
Hide file tree
Showing 131 changed files with 19,549 additions and 202 deletions.
1 change: 1 addition & 0 deletions .github/workflows/alpha-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ jobs:
secrets:
PYPI_USERNAME: __token__
PYPI_PASSWORD: ${{ secrets.PROD_PYPI_PUBLISH_TOKEN }}

23 changes: 23 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,26 @@ Hello, from your virtualenv!
```

If you experience any issues please [file a new issue](https://github.com/pinecone-io/pinecone-python-client/issues/new).


## Consuming API version upgrades

These instructions can only be followed by Pinecone employees with access to our private APIs repository.

Prerequisites:
- You must be an employee with access to private Pinecone repositories
- You must have [Docker Desktop](https://www.docker.com/products/docker-desktop/) installed and running. Our code generation script uses a dockerized version of the OpenAPI CLI.
- You must have initialized the git submodules under codegen

```sh
git submodule
```


To regenerate the generated portions of the client with the latest version of the API specifications, you need to have Docker Desktop running on your local machine.



```sh
./codegen/
```
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ develop:

test-unit:
@echo "Running tests..."
poetry run pytest --cov=pinecone --timeout=120 tests/unit
poetry run pytest --cov=pinecone --timeout=120 tests/unit -s -vv

test-integration:
@echo "Running integration tests..."
Expand Down
2 changes: 1 addition & 1 deletion codegen/apis
Submodule apis updated from 062b11 to 3b7369
37 changes: 27 additions & 10 deletions codegen/build-oas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,21 @@
set -eux -o pipefail

version=$1 # e.g. 2024-07
modules=("control" "data")
is_early_access=$2 # e.g. true

# if is_early_access is true, add the "ea" module
if [ "$is_early_access" = "true" ]; then
destination="pinecone/core_ea/openapi"
modules=("db_control" "db_data")
py_module_name="core_ea"
template_dir="codegen/python-oas-templates/templates5.2.0"
else
destination="pinecone/core/openapi"
modules=("control" "data")
py_module_name="core"
template_dir="codegen/python-oas-templates/templates5.2.0"
fi

destination="pinecone/core/openapi"
build_dir="build"

update_apis_repo() {
Expand Down Expand Up @@ -58,11 +70,9 @@ generate_client() {
local module_name=$1

oas_file="codegen/apis/_build/${version}/${module_name}_${version}.oas.yaml"
openapi_generator_config="codegen/openapi-config.${module_name}.json"
template_dir="codegen/python-oas-templates/templates5.2.0"
package_name="pinecone.${py_module_name}.openapi.${module_name}"

verify_file_exists $oas_file
verify_file_exists $openapi_generator_config
verify_directory_exists $template_dir

# Cleanup previous build files
Expand All @@ -73,13 +83,20 @@ generate_client() {
docker run --rm -v $(pwd):/workspace openapitools/openapi-generator-cli:v5.2.0 generate \
--input-spec "/workspace/$oas_file" \
--generator-name python \
--config "/workspace/$openapi_generator_config" \
--additional-properties=packageName=$package_name,pythonAttrNoneIfUnset=true \
--output "/workspace/${build_dir}" \
--template-dir "/workspace/$template_dir"

# Hack to prevent coercion of strings into datetimes within "object" types while still
# allowing datetime parsing for fields that are explicitly typed as datetime
find "${build_dir}" -name "*.py" | while IFS= read -r file; do
sed -i '' "s/bool, date, datetime, dict, float, int, list, str, none_type/bool, dict, float, int, list, str, none_type/g" "$file"
done

# Copy the generated module to the correct location
rm -rf "${destination}/${module_name}"
cp -r "build/pinecone/core/openapi/${module_name}" "${destination}/${module_name}"
mkdir -p "${destination}"
cp -r "build/pinecone/$py_module_name/openapi/${module_name}" "${destination}/${module_name}"
}

extract_shared_classes() {
Expand Down Expand Up @@ -118,13 +135,13 @@ extract_shared_classes() {

# Adjust import paths in every file
find "${destination}" -name "*.py" | while IFS= read -r file; do
sed -i '' 's/from \.\.model_utils/from pinecone\.core\.openapi\.shared\.model_utils/g' "$file"
sed -i '' "s/from \.\.model_utils/from pinecone\.$py_module_name\.openapi\.shared\.model_utils/g" "$file"

for module in "${modules[@]}"; do
sed -i '' "s/from pinecone\.core\.openapi\.$module import rest/from pinecone\.core\.openapi\.shared import rest/g" "$file"
sed -i '' "s/from pinecone\.$py_module_name\.openapi\.$module import rest/from pinecone\.$py_module_name\.openapi\.shared import rest/g" "$file"

for sharedFile in "${sharedFiles[@]}"; do
sed -i '' "s/from pinecone\.core\.openapi\.$module\.$sharedFile/from pinecone\.core\.openapi\.shared\.$sharedFile/g" "$file"
sed -i '' "s/from pinecone\.$py_module_name\.openapi\.$module\.$sharedFile/from pinecone\.$py_module_name\.openapi\.shared\.$sharedFile/g" "$file"
done
done
done
Expand Down
4 changes: 0 additions & 4 deletions codegen/openapi-config.control.json

This file was deleted.

4 changes: 0 additions & 4 deletions codegen/openapi-config.data.json

This file was deleted.

2 changes: 1 addition & 1 deletion codegen/python-oas-templates
2 changes: 2 additions & 0 deletions pinecone/config/openapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ def build(cls, api_key: str, host: Optional[str] = None, **kwargs):
openapi_config.host = host
openapi_config.ssl_ca_cert = certifi.where()
openapi_config.socket_options = cls._get_socket_options()
openapi_config.discard_unknown_keys = True

return openapi_config

@classmethod
Expand Down
5 changes: 2 additions & 3 deletions pinecone/control/repr_overrides.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
from pinecone.utils import install_json_repr_override
from pinecone.models.index_model import IndexModel
from pinecone.core.openapi.control.models import CollectionModel

import json


def install_repr_overrides():
"""
Expand All @@ -14,4 +13,4 @@ def install_repr_overrides():
query results.
"""
for model in [IndexModel, CollectionModel]:
model.__repr__ = lambda self: json.dumps(self.to_dict(), indent=4, sort_keys=False)
install_json_repr_override(model)
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/collection_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/collection_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/create_index_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,8 +89,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/deletion_protection.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/embed_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/embed_request_inputs.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/embedding.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/embeddings_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/embeddings_list_usage.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/error_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
8 changes: 3 additions & 5 deletions pinecone/core/openapi/control/model/error_response_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand All @@ -112,7 +110,7 @@ def openapi_types():
return {
"code": (str,), # noqa: E501
"message": (str,), # noqa: E501
"details": ({str: (bool, date, datetime, dict, float, int, list, str, none_type)},), # noqa: E501
"details": ({str: (bool, dict, float, int, list, str, none_type)},), # noqa: E501
}

@cached_property
Expand Down Expand Up @@ -169,7 +167,7 @@ def _from_openapi_data(cls, code, message, *args, **kwargs): # noqa: E501
Animal class but this time we won't travel
through its discriminator because we passed in
_visited_composed_classes = (Animal,)
details ({str: (bool, date, datetime, dict, float, int, list, str, none_type)}): Additional information about the error. This field is not guaranteed to be present.. [optional] # noqa: E501
details ({str: (bool, dict, float, int, list, str, none_type)}): Additional information about the error. This field is not guaranteed to be present.. [optional] # noqa: E501
"""

_check_type = kwargs.pop("_check_type", True)
Expand Down Expand Up @@ -262,7 +260,7 @@ def __init__(self, code, message, *args, **kwargs): # noqa: E501
Animal class but this time we won't travel
through its discriminator because we passed in
_visited_composed_classes = (Animal,)
details ({str: (bool, date, datetime, dict, float, int, list, str, none_type)}): Additional information about the error. This field is not guaranteed to be present.. [optional] # noqa: E501
details ({str: (bool, dict, float, int, list, str, none_type)}): Additional information about the error. This field is not guaranteed to be present.. [optional] # noqa: E501
"""

_check_type = kwargs.pop("_check_type", True)
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/index_list.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/index_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/index_model_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/index_model_status.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/pod_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,6 @@ def additional_properties_type():
lazy_import()
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,8 +65,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
2 changes: 0 additions & 2 deletions pinecone/core/openapi/control/model/serverless_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,6 @@ def additional_properties_type():
"""
return (
bool,
date,
datetime,
dict,
float,
int,
Expand Down
Loading

0 comments on commit ff7b81d

Please sign in to comment.