Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging all the codes of dev to main branch. #9

Merged
merged 64 commits into from
Oct 19, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
37650b3
[ML] Improve NLP model import by using nicely defined types (#459)
benwtrent May 3, 2022
74245f5
[ML] add support for question_answering NLP tasks (#457)
benwtrent May 4, 2022
f0b7272
[ML] improve general pytorch model import and add tests (#463)
benwtrent May 5, 2022
f468edb
Release 8.2.0
sethmlarson May 5, 2022
893c89d
[ML] fixes decision tree classifier upload to account for probabiliti…
benwtrent May 17, 2022
3ec7e6a
Add authentication methods for import model script (#466)
lcawl May 18, 2022
3a2353d
Ignore type checking for `agg_value`
technige May 31, 2022
29d3498
[DOCS] Adds question_answering task type for eland_import_hub_model
lcawl May 31, 2022
9865bed
Stop explicitly pulling master
sethmlarson May 31, 2022
205f989
Remove 'numpydoc' to stop reformatting
sethmlarson May 31, 2022
07b088a
Also pin traitlets
sethmlarson May 31, 2022
003a47d
[DOCS] Include missing attributes (#468)
lcawl May 31, 2022
becb9b4
[ML] ensure quantization is applied (#472)
benwtrent Jun 15, 2022
48969b9
Freeze the traced PyTorch model
davidkyle Jun 21, 2022
c8e1138
Bump minimum PyTorch version to 1.11
davidkyle Jun 21, 2022
3ba0ec8
[ML] adds new auto task type that attempts to automatically determine…
benwtrent Jun 23, 2022
1c6e8c1
added opensearch as dependency
LEFTA98 Jul 1, 2022
6bdaa33
replaced core mentions of elasticsearch client w opensearch
LEFTA98 Jul 1, 2022
0044256
changed index names for testing
LEFTA98 Jul 5, 2022
699c6bd
modified test dataframes to accommodate opensearch indexing
LEFTA98 Jul 6, 2022
eb95724
fixed aggregatable field name tests
LEFTA98 Jul 7, 2022
b4e1a71
fixing pytests that mention indices of ed/pd dataframes
LEFTA98 Jul 7, 2022
6323b05
fixed equality boolean filter to accommodate terminology difference i…
LEFTA98 Jul 7, 2022
70b688f
fixed pytests with indexing issues, geolocation field renaming issues
LEFTA98 Jul 7, 2022
849d45f
modified test setup code to work for opensearch
LEFTA98 Jul 8, 2022
f1c43f7
reverted many erroneous "fixes" to tests
LEFTA98 Jul 8, 2022
ee41ba6
fixed opensearch integration so remaining non-ml tests run
LEFTA98 Jul 8, 2022
c0a227a
added initial connection to predicting with sagemaker
LEFTA98 Jul 19, 2022
100aa13
added sagemaker predict api
LEFTA98 Jul 21, 2022
397a65a
added band-aid to fix iterating over rows
LEFTA98 Jul 25, 2022
a9952f1
debugging indexing issue
LEFTA98 Jul 25, 2022
691797e
reverted indexing change for sagemaker predict
LEFTA98 Jul 28, 2022
b8887c5
added deprecation warnings to ml module
LEFTA98 Aug 2, 2022
f8f6420
refactoring elasticsearch names to opensearch
LEFTA98 Aug 2, 2022
995912e
continued renaming opensearch variables
LEFTA98 Aug 3, 2022
7a4ee0a
more renaming changes
LEFTA98 Aug 4, 2022
e2d547f
first commit for ml common integration
LEFTA98 Aug 11, 2022
afa0446
PoC for model upload
LEFTA98 Aug 12, 2022
881a228
renamed model chunk uploading path
LEFTA98 Aug 23, 2022
3c5fd17
added total chunks to model upload
LEFTA98 Aug 24, 2022
9503bc1
fixed docstring typo
LEFTA98 Aug 31, 2022
6318e3e
added first iteration of custom model load supprot
LEFTA98 Sep 6, 2022
12e4fae
removed unsupported features
LEFTA98 Sep 6, 2022
e675a6a
renaming all instances of elastic in code
LEFTA98 Sep 8, 2022
11364b0
created new dev requirements file
LEFTA98 Sep 8, 2022
d2f2768
typo fix
LEFTA98 Sep 9, 2022
27f89e7
PR feedback
LEFTA98 Sep 6, 2022
7c22abf
implement PR feedback
LEFTA98 Sep 6, 2022
50395ad
PR feedback
LEFTA98 Sep 6, 2022
f909aed
implement pr feedback
LEFTA98 Sep 9, 2022
a540cae
Update README.md
LEFTA98 Sep 9, 2022
a56871d
added demo materials
LEFTA98 Sep 9, 2022
1995a65
refactoring
dhrubo-os Sep 29, 2022
de7a635
refactoring code and changed code to address some of the deprection w…
dhrubo-os Oct 3, 2022
9ed7f6f
adding header license info
dhrubo-os Oct 4, 2022
c2033b5
formatted code with black, isort, mypy
dhrubo-os Oct 4, 2022
4c216fc
updating git ci workflow
dhrubo-os Oct 4, 2022
c930411
refactoring code + adding pytest in the ci workflow
dhrubo-os Oct 7, 2022
10e99ed
removing test from ci workflow
dhrubo-os Oct 7, 2022
0a7b43b
setup CI for integration test
dhrubo-os Oct 12, 2022
9a1293b
resolving conflicts
dhrubo-os Oct 19, 2022
f8e57ad
adding files required for CI
dhrubo-os Oct 19, 2022
5d916a3
adding files which got deleted during merge
dhrubo-os Oct 19, 2022
07800e6
adding deleted files by git merge
dhrubo-os Oct 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
refactoring elasticsearch names to opensearch
Signed-off-by: Dhrubo Saha <dhrubo@amazon.com>
  • Loading branch information
LEFTA98 authored and dhrubo-os committed Oct 18, 2022
commit f8f64206a34f834b09d6c6f6da6e72a9e963c75d
14 changes: 7 additions & 7 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Added
Added
^^^^^

* Added support for ``eland.Series.unique()`` (`#448`_, contributed by `@V1NAY8`_)
* Added support for ``opensearch_py_ml.Series.unique()`` (`#448`_, contributed by `@V1NAY8`_)
* Added ``--ca-certs`` and ``--insecure`` options to ``eland_import_hub_model`` for configuring TLS (`#441`_)

.. _#448: https://github.com/elastic/eland/pull/448
Expand Down Expand Up @@ -95,7 +95,7 @@ Added
* Added support for Pandas 1.3.x (`#362`_, contributed by `@V1NAY8`_)
* Added support for LightGBM 3.x (`#362`_, contributed by `@V1NAY8`_)
* Added ``DataFrame.idxmax()`` and ``DataFrame.idxmin()`` methods (`#353`_, contributed by `@V1NAY8`_)
* Added type hints to ``eland.ndframe`` and ``eland.operations`` (`#366`_, contributed by `@V1NAY8`_)
* Added type hints to ``opensearch_py_ml.ndframe`` and ``opensearch_py_ml.operations`` (`#366`_, contributed by `@V1NAY8`_)

Removed
^^^^^^^
Expand Down Expand Up @@ -350,8 +350,8 @@ Deprecated
^^^^^^^^^^

* Deprecated ``info_es()`` in favor of ``es_info()`` (`#208`_)
* Deprecated ``eland.read_csv()`` in favor of ``eland.csv_to_eland()`` (`#208`_)
* Deprecated ``eland.read_es()`` in favor of ``eland.DataFrame()`` (`#208`_)
* Deprecated ``opensearch_py_ml.read_csv()`` in favor of ``opensearch_py_ml.csv_to_eland()`` (`#208`_)
* Deprecated ``opensearch_py_ml.read_es()`` in favor of ``opensearch_py_ml.DataFrame()`` (`#208`_)

Changed
^^^^^^^
Expand All @@ -373,7 +373,7 @@ Fixed
in the index if a sized operation like ``.head(X)`` was applied to the data
frame (`#205`_, contributed by `@mesejo`_)
* Fixed issue where both ``scikit-learn`` and ``xgboost`` libraries were
required to use ``eland.ml.ImportedMLModel``, now only one library is
required to use ``opensearch_py_ml.ml.ImportedMLModel``, now only one library is
required to use this feature (`#206`_)

.. _#200: https://github.com/elastic/eland/pull/200
Expand Down Expand Up @@ -402,13 +402,13 @@ Added
* Added ``es_type_overrides`` parameter to ``pandas_to_eland()`` (`#181`_)
* Added ``NDFrame.var()``, ``.std()`` and ``.median()`` aggregations (`#175`_, `#176`_, contributed by `@mesejo`_)
* Added ``DataFrame.es_query()`` to allow modifying ES queries directly (`#156`_)
* Added ``eland.__version__`` (`#153`_, contributed by `@mesejo`_)
* Added ``opensearch_py_ml.__version__`` (`#153`_, contributed by `@mesejo`_)

Removed
^^^^^^^

* Removed support for Python 3.5 (`#150`_)
* Removed ``eland.Client()`` interface, use
* Removed ``opensearch_py_ml.Client()`` interface, use
``elasticsearch.Elasticsearch()`` client instead (`#166`_)
* Removed all private objects from top-level ``eland`` namespace (`#170`_)
* Removed ``geo_points`` from ``pandas_to_eland()`` in favor of ``es_type_overrides`` (`#181`_)
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ currently using a minimum version of PyCharm 2019.2.4.

* Enter the URL to your fork of eland

(e.g.  `git@github.com:stevedodson/eland.git` )
(e.g.  `git@github.com:stevedodson/opensearch_py_ml.git` )

* Click \'Yes\' for \'Checkout from Version Control\'
* Configure PyCharm environment:
Expand Down Expand Up @@ -190,7 +190,7 @@ currently using a minimum version of PyCharm 2019.2.4.
* To validate installation, open python console and run

``` bash
> import eland as ed
> import opensearch_py_ml as ed
> ed_df = ed.DataFrame('localhost', 'flights')
```

Expand Down
42 changes: 21 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
<div align="center">
<a href="https://github.com/elastic/eland">
<img src="https://raw.githubusercontent.com/elastic/eland/main/docs/sphinx/logo/eland.png" width="30%"
<img src="https://raw.githubusercontent.com/elastic/eland/main/docs/sphinx/logo/opensearch_py_ml.png" width="30%"
alt="Eland" />
</a>
</div>
<br />
<div align="center">
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/v/eland.svg" alt="PyPI Version"></a>
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/v/opensearch_py_ml.svg" alt="PyPI Version"></a>
<a href="https://anaconda.org/conda-forge/eland"><img src="https://img.shields.io/conda/vn/conda-forge/eland"
alt="Conda Version"></a>
<a href="https://pepy.tech/project/eland"><img src="https://pepy.tech/badge/eland" alt="Downloads"></a>
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/status/eland.svg"
<a href="https://pypi.org/project/eland"><img src="https://img.shields.io/pypi/status/opensearch_py_ml.svg"
alt="Package Status"></a>
<a href="https://clients-ci.elastic.co/job/elastic+eland+main"><img
src="https://clients-ci.elastic.co/buildStatus/icon?job=elastic%2Beland%2Bmain" alt="Build Status"></a>
<a href="https://github.com/elastic/eland/blob/main/LICENSE.txt"><img src="https://img.shields.io/pypi/l/eland.svg"
<a href="https://github.com/elastic/eland/blob/main/LICENSE.txt"><img src="https://img.shields.io/pypi/l/opensearch_py_ml.svg"
alt="License"></a>
<a href="https://eland.readthedocs.io"><img
<a href="https://opensearch_py_ml.readthedocs.io"><img
src="https://readthedocs.org/projects/eland/badge/?version=latest" alt="Documentation Status"></a>
</div>

Expand All @@ -38,13 +38,13 @@ Eland also provides tools to upload trained machine learning models from common
Eland can be installed from [PyPI](https://pypi.org/project/eland) with Pip:

```bash
$ python -m pip install eland
$ python -m pip install opensearch_py_ml
```

Eland can also be installed from [Conda Forge](https://anaconda.org/conda-forge/eland) with Conda:

```bash
$ conda install -c conda-forge eland
$ conda install -c conda-forge opensearch_py_ml
```

### Compatibility
Expand Down Expand Up @@ -73,20 +73,20 @@ Users wishing to use Eland without installing it, in order to just run the avail
container:

```bash
$ docker build -t elastic/eland .
$ docker build -t elastic/opensearch_py_ml .
```

The container can now be used interactively:

```bash
$ docker run -it --rm --network host elastic/eland
$ docker run -it --rm --network host elastic/opensearch_py_ml
```

Running installed scripts is also possible without an interactive shell, e.g.:

```bash
$ docker run -it --rm --network host \
elastic/eland \
elastic/opensearch_py_ml \
eland_import_hub_model \
--url http://host.docker.internal:9200/ \
--hub-model-id elastic/distilbert-base-cased-finetuned-conll03-english \
Expand All @@ -103,7 +103,7 @@ You can pass either an instance of `elasticsearch.Elasticsearch` to Eland APIs
or a string containing the host to connect to:

```python
import eland as ed
import opensearch_py_ml as ed

# Connecting to an Elasticsearch instance running on 'localhost:9200'
df = ed.DataFrame("localhost:9200", es_index_pattern="flights")
Expand All @@ -120,23 +120,23 @@ df = ed.DataFrame(es, es_index_pattern="flights")

## DataFrames in Eland

`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API
`opensearch_py_ml.DataFrame` wraps an Elasticsearch index in a Pandas-like API
and defers all processing and filtering of data to Elasticsearch
instead of your local machine. This means you can process large
amounts of data within Elasticsearch from a Jupyter Notebook
without overloading your machine.

➤ [Eland DataFrame API documentation](https://eland.readthedocs.io/en/latest/reference/dataframe.html)
➤ [Eland DataFrame API documentation](https://opensearch_py_ml.readthedocs.io/en/latest/reference/dataframe.html)

➤ [Advanced examples in a Jupyter Notebook](https://eland.readthedocs.io/en/latest/examples/demo_notebook.html)
➤ [Advanced examples in a Jupyter Notebook](https://opensearch_py_ml.readthedocs.io/en/latest/examples/demo_notebook.html)

```python
>>> import eland as ed
>>> import opensearch_py_ml as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# opensearch_py_ml.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
AvgTicketPrice Cancelled ... dayOfWeek timestamp
Expand All @@ -149,7 +149,7 @@ without overloading your machine.
[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
<class 'opensearch_py_ml.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
# Column Non-Null Count Dtype
Expand Down Expand Up @@ -191,13 +191,13 @@ std 4.578263e+03 2.663867e+02
Eland allows transforming trained regression and classification models from scikit-learn, XGBoost, and LightGBM
libraries to be serialized and used as an inference model in Elasticsearch.

➤ [Eland Machine Learning API documentation](https://eland.readthedocs.io/en/latest/reference/ml.html)
➤ [Eland Machine Learning API documentation](https://opensearch_py_ml.readthedocs.io/en/latest/reference/ml.html)

➤ [Read more about Machine Learning in Elasticsearch](https://www.elastic.co/guide/en/machine-learning/current/ml-getting-started.html)

```python
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
>>> from opensearch_py_ml.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
Expand Down Expand Up @@ -236,8 +236,8 @@ $ eland_import_hub_model \
```python
>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel
>>> from opensearch_py_ml.ml.pytorch import PyTorchModel
>>> from opensearch_py_ml.ml.pytorch.transformers import TransformerModel

# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel("elastic/distilbert-base-cased-finetuned-conll03-english", "ner")
Expand Down
6 changes: 4 additions & 2 deletions bin/eland_import_hub_model
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,13 @@ import textwrap

from elastic_transport.client_utils import DEFAULT
from elasticsearch import AuthenticationException, Elasticsearch
from warnings import warn

MODEL_HUB_URL = "https://huggingface.co"


def get_arg_parser():
warn('function has been deprecated - only works for ElasticSearch', DeprecationWarning, stacklevel=2)
parser = argparse.ArgumentParser()
location_args = parser.add_mutually_exclusive_group(required=True)
location_args.add_argument(
Expand Down Expand Up @@ -166,8 +168,8 @@ if __name__ == "__main__":
logger.setLevel(logging.INFO)

try:
from eland.ml.pytorch import PyTorchModel
from eland.ml.pytorch.transformers import (
from opensearch_py_ml.ml.pytorch import PyTorchModel
from opensearch_py_ml.ml.pytorch.transformers import (
SUPPORTED_TASK_TYPES,
TaskTypeError,
TransformerModel,
Expand Down
8 changes: 4 additions & 4 deletions docs/guide/dataframes.asciidoc
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
[[dataframes]]
== Data Frames

`eland.DataFrame` wraps an Elasticsearch index in a Pandas-like API
`opensearch_py_ml.DataFrame` wraps an Elasticsearch index in a Pandas-like API
and defers all processing and filtering of data to Elasticsearch
instead of your local machine. This means you can process large
amounts of data within Elasticsearch from a Jupyter Notebook
without overloading your machine.

[source,python]
-------------------------------------
>>> import eland as ed
>>> import opensearch_py_ml as ed
>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('http://localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# opensearch_py_ml.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
AvgTicketPrice Cancelled ... dayOfWeek timestamp
Expand All @@ -26,7 +26,7 @@ without overloading your machine.
[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
<class 'opensearch_py_ml.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
# Column Non-Null Count Dtype
Expand Down
6 changes: 3 additions & 3 deletions docs/guide/machine-learning.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ model in {es}.
[source,python]
------------------------
>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel
>>> from opensearch_py_ml.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
Expand Down Expand Up @@ -61,8 +61,8 @@ $ eland_import_hub_model <authentication> \ <1>
------------------------
>>> import elasticsearch
>>> from pathlib import Path
>>> from eland.ml.pytorch import PyTorchModel
>>> from eland.ml.pytorch.transformers import TransformerModel
>>> from opensearch_py_ml.ml.pytorch import PyTorchModel
>>> from opensearch_py_ml.ml.pytorch.transformers import TransformerModel

# Load a Hugging Face transformers model directly from the model hub
>>> tm = TransformerModel("elastic/distilbert-base-cased-finetuned-conll03-english", "ner")
Expand Down
6 changes: 3 additions & 3 deletions docs/guide/overview.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
== Overview

Eland is a Python client and toolkit for DataFrames and {ml} in {es}.
Full documentation is available on https://eland.readthedocs.io[Read the Docs].
Full documentation is available on https://opensearch_py_ml.readthedocs.io[Read the Docs].
Source code is available on https://github.com/elastic/eland[GitHub].

[discrete]
Expand All @@ -28,7 +28,7 @@ Create a `DataFrame` object connected to an {es} cluster running on `http://loca

[source,python]
------------------------------------
>>> import eland as ed
>>> import opensearch_py_ml as ed
>>> df = ed.DataFrame(
... es_client="http://localhost:9200",
... es_index_pattern="flights",
Expand Down Expand Up @@ -57,7 +57,7 @@ You can also connect Eland to an Elasticsearch instance in Elastic Cloud:

[source,python]
------------------------------------
>>> import eland as ed
>>> import opensearch_py_ml as ed
>>> from elasticsearch import Elasticsearch

# First instantiate an 'Elasticsearch' instance connected to Elastic Cloud
Expand Down
12 changes: 6 additions & 6 deletions docs/sphinx/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,13 @@

# -- Project information -----------------------------------------------------

project = "eland"
project = "opensearch_py_ml"
copyright = f"{datetime.date.today().year}, Elasticsearch BV"

# The full version, including alpha/beta/rc tags
import eland
import opensearch_py_ml

version = str(eland._version.__version__)
version = str(opensearch_py_ml._version.__version__)

release = version

Expand All @@ -67,7 +67,7 @@

doctest_global_setup = """
try:
import eland as ed
import opensearch_py_ml as ed
except ImportError:
ed = None
try:
Expand Down Expand Up @@ -100,7 +100,7 @@
plot_html_show_formats = False
plot_html_show_source_link = False
plot_pre_code = """import numpy as np
import eland as ed"""
import opensearch_py_ml as ed"""

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]
Expand All @@ -127,7 +127,7 @@
# so a file named "default.css" will overwrite the builtin "default.css".
# html_static_path = ['_static']

html_logo = "logo/eland.png"
html_logo = "logo/opensearch_py_ml.png"
html_favicon = "logo/eland_favicon.png"

master_doc = "index"
4 changes: 2 additions & 2 deletions docs/sphinx/development/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Configuring PyCharm And Running Tests
Control\'-\>\'Git\' on the \"Welcome to PyCharm\" page <or other>

- Enter the URL to your fork of eland
<e.g. `git@github.com:stevedodson/eland.git`>
<e.g. `git@github.com:stevedodson/opensearch_py_ml.git`>

- Click \'Yes\' for \'Checkout from Version Control\'

Expand Down Expand Up @@ -189,7 +189,7 @@ Configuring PyCharm And Running Tests
- To validate installation, open python console and run
.. code-block:: bash

import eland as ed
import opensearch_py_ml as ed
ed_df = ed.DataFrame('localhost', 'flights')

- To run the automatic formatter and check for lint issues
Expand Down
Loading