Skip to content

Commit

Permalink
docs: enable spellcheck on all python files, notebooks, and docs (#733)
Browse files Browse the repository at this point in the history
We use `aspell`, `codespell`, and `pyenchant` with the list of ignored
words located in a file -- `spelling.txt'

`aspell`: rst and python files
`pyenchant`: notebook markdown cells
`codespell`: rst and python files

We currently do not use `pylint` as it is slow and not very useful.

---------

Co-authored-by: Gaurav <gaurav21776@gmail.com>
  • Loading branch information
jarulraj and gaurav274 authored May 15, 2023
1 parent b4fa64e commit 2893b3d
Show file tree
Hide file tree
Showing 81 changed files with 1,575 additions and 693 deletions.
2 changes: 2 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ jobs:
- run:
name: Install EVA package from GitHub repo with all dependencies
command: |
sudo apt-get update
sudo apt-get install -y enchant-2 aspell
"python<< parameters.v >>" -m venv test_evadb
pip install --upgrade pip
source test_evadb/bin/activate
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -192,3 +192,7 @@ prof/
output.txt
MagicMock/
queries.txt
dep.txt

*.bak
\<Magic*
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
* PR #495: docs: improve read-the-docs
* PR #488: docs: fix notebooks
* PR #486: docs: update notebooks + banner
* PR #476: feat: GPU jenkins support
* PR #476: feat: GPU Jenkins support

## [0.1.0] - 2022-11-12
### [Added]
Expand Down Expand Up @@ -256,7 +256,7 @@ Thanks to @gaurav274, @jarulraj, @xzdandy, @LordDarkula, @eloyekunle, and @devsh
### [Added]

* PR #295: Improve Error Messages and Query responses
* PR #292: Upating read the docs + website
* PR #292: Updating read the docs + website
* PR #288: Update README.md


Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ The following architecture diagram presents the critical components of the EVA d
|---------------|--------------|
|<img alt="Source Video" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-input.webp" width="150"> |<img alt="Query Result" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-output.webp" width="150"> |

### 🔮 [Movie Emotion Analysis](https://evadb.readthedocs.io/en/stable/source/tutorials/03-emotion-analysis.html) (Face Detection + Emotion Classfication Models)
### 🔮 [Movie Emotion Analysis](https://evadb.readthedocs.io/en/stable/source/tutorials/03-emotion-analysis.html) (Face Detection + Emotion Classification Models)

| Source Video | Query Result |
|---------------|--------------|
Expand Down
4 changes: 2 additions & 2 deletions data/ua_detrac/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* __8250 vehicles__ that are manually annotated
* Vehicle categories are __Car, Bus, Van,__ and __Other__
* Weather categories are __Night, Sunny, Rainy,__ and __Cloudy__
* Other annotations include __Scale of Vehicle, Occulsion Ratio,__ and __Truncation Ratio__.
* Other annotations include __Scale of Vehicle, Occlusion Ratio,__ and __Truncation Ratio__.



Expand Down Expand Up @@ -52,7 +52,7 @@ __Position information of target trajectories out of the general background, whi

__Unzipping the dataset__

You can use your own method for unzipping .zip files or for conveniency
You can use your own method for unzipping .zip files or for convenience

`sudo apt-get install unzip`

Expand Down
4 changes: 2 additions & 2 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Key Features

1. With EVA, you can **easily combine SQL and deep learning models to build next-generation database applications**. EVA treats deep learning models as functions similar to traditional SQL functions like SUM().

2. EVA is **extensible by design**. You can write an **user-defined function** (UDF) that wraps arounds your custom deep learning model. In fact, all the built-in models that are included in EVA are written as user-defined functions.
2. EVA is **extensible by design**. You can write an **user-defined function** (UDF) that wraps around your custom deep learning model. In fact, all the built-in models that are included in EVA are written as user-defined functions.

3. EVA comes with a collection of **built-in sampling, caching, and filtering optimizations** inspired by relational database systems. These optimizations help **speed up queries on large datasets and save money spent on model inference**.

Expand Down Expand Up @@ -107,7 +107,7 @@ Illustrative EVA Applications

|pic3| |pic4|

|:desert_island:| Movie Analysis Application using Face Detection + Emotion Classfication Models
|:desert_island:| Movie Analysis Application using Face Detection + Emotion Classification Models
~~~~

.. |pic5| image:: https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/gangubai-input.webp
Expand Down
28 changes: 14 additions & 14 deletions docs/source/contribute/new_command.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Command Handler
----

An input query string is handled by **Parser**,
**StatementTOPlanConvertor**, **PlanGenerator**, and **PlanExecutor**.
**StatementTOPlanConverter**, **PlanGenerator**, and **PlanExecutor**.
We discuss each part separately.

.. code:: python
Expand All @@ -19,7 +19,7 @@ We discuss each part separately.
#1. parser
stmt = Parser().parse(query)[0]
#2. statement to logical plan
l_plan = StatementToPlanConvertor().visit(stmt)
l_plan = StatementToPlanConverter().visit(stmt)
#3. logical to physical plan
p_plan = PlanGenerator().build(l_plan)
#4. parser
Expand All @@ -31,7 +31,7 @@ We discuss each part separately.
---------

The parser firstly generate **syntax tree** from the input string, and
then tansform syntax tree into **statement**.
then transform syntax tree into **statement**.

The first part of Parser is build from a LARK grammar file.

Expand Down Expand Up @@ -68,7 +68,7 @@ parser/lark_visitor

.. code:: python
from src.parser.parser_visitor._create_statement import CenameTable
from src.parser.parser_visitor._create_statement import CreateTable
class ParserVisitor(CommonClauses, CreateTable, Expressions,
Functions, Insert, Select, TableSources,
Load, Upload):
Expand All @@ -80,9 +80,9 @@ parser/
called in ``_[cmd]_statement.py``
- ``types.py`` - register new StatementType

.. _2-statementtoplanconvertor:
.. _2-statementtoplanconverter:

2. Statement To Plan Convertor
2. Statement To Plan Converter
---------------------------

The part transforms the statement into corresponding logical plan.
Expand Down Expand Up @@ -123,7 +123,7 @@ Optimizer
.. code:: python
# May need to convert the statement into another data type.
# The new data type is usable for excuting command.
# The new data type is usable for executing command.
# For example, column_list -> column_metadata_list
def visit_create(self, statement: AbstractStatement):
Expand All @@ -139,7 +139,7 @@ Optimizer
video_ref, column_metadata_list, if_not_exists)
self._plan = create_opr
- modify visit function to call the right visit_[cmd] funciton
- modify visit function to call the right visit_[cmd] function

.. code:: python
Expand Down Expand Up @@ -186,9 +186,9 @@ optimizer/rules

- Import operators
- Register new ruletype to **RuleType** and **Promise** (place it
**before IMPLEMENTATION_DELIMETER** !!)
- implement class ``Logical[cmd]ToPhysical``, its memeber function
apply() will construct a corresbonding\ ``[cmd]Plan`` object.
**before IMPLEMENTATION_DELIMITER** !!)
- implement class ``Logical[cmd]ToPhysical``, its member function
apply() will construct a corresponding\ ``[cmd]Plan`` object.

.. code:: python
Expand All @@ -210,14 +210,14 @@ optimizer/rules
- ``rules_base.py``-

- Register new ruletype to **RuleType** and **Promise** (place it
**before IMPLEMENTATION_DELIMETER** !!)
**before IMPLEMENTATION_DELIMITER** !!)

- ``rules_manager.py``-

- Import rules created in ``rules.py``
- Add imported logical to physical rules to ``self._implementation_rules``

.. _4-planexcutor:
.. _4-PlanExecutor:

4. Plan Executor
--------------
Expand Down Expand Up @@ -259,7 +259,7 @@ Key data structures in EVA:

- ``file_url`` - used to access the real table in storage engine.

- For the ``RENAME`` table command, we use the ``old_table_name`` to access the corresponing entry in metadata table, and the ``modified name`` of the table.
- For the ``RENAME`` table command, we use the ``old_table_name`` to access the corresponding entry in metadata table, and the ``modified name`` of the table.

- **Storage Engine**:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/overview/aidb.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
EVA AI-Relational Database System
====

Over the last decade, deep learning models have radically changed the world of computer vision and natural languague processing. They are accurate on a variety of tasks ranging from image classification to question answering. However, there are two challenges that prevent a lot of users from benefiting from these models.
Over the last decade, deep learning models have radically changed the world of computer vision and natural language processing. They are accurate on a variety of tasks ranging from image classification to question answering. However, there are two challenges that prevent a lot of users from benefiting from these models.

Usability and Application Maintainability
^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/evaql/udf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Here is a list of built-in user-defined functions in EVA.
FastRCNNObjectDetector is a model for detecting objects. MVITActionRecognition is a model for recognizing actions.

ArrayCount and Crop are utility functions for counting the number of objects in an array and cropping a bounding box from an image, resepectively.
ArrayCount and Crop are utility functions for counting the number of objects in an array and cropping a bounding box from an image, respectively.

SELECT WITH MULTIPLE UDFS
----
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/gpu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Configure GPU
A valid output from the command indicates that your GPU is configured and ready to use. If not, you will need to install the appropriate GPU driver. `This page <https://towardsdatascience.com/deep-learning-gpu-installation-on-ubuntu-18-4-9b12230a1d31>`_ provides a step-by-step guide on installing and configuring the GPU driver in the Ubuntu Operating System.

* When installing an NVIDIA driver, ensure that the version of the GPU driver is correct to avoid compatibiility issues.
* When installing an NVIDIA driver, ensure that the version of the GPU driver is correct to avoid compatibility issues.
* When installing cuDNN, you will need to create an account and ensure that you get the correct `deb` files for your operating system and architecture.

2. You can run the following code in a Jupyter notebook to verify that your GPU is detected by PyTorch:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/udf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Example of a Setup function

.. code-block:: python
@setup(cachable=True, udf_type="object_detection", batchable=True)
@setup(cacheable=True, udf_type="object_detection", batchable=True)
def setup(self, threshold=0.85):
#custom setup function that is specific for the UDF
self.threshold = threshold
Expand Down
10 changes: 5 additions & 5 deletions docs/source/reference/udfs/custom.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,15 @@ Example of the `setup` function:

.. code-block:: python
@setup(cachable=True, udf_type="object_detection", batchable=True)
@setup(cacheable=True, udf_type="object_detection", batchable=True)
def setup(self, threshold=0.85):
self.threshold = threshold
self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
weights="COCO_V1", progress=False
)
self.model.eval()
In this instance, we have configured the `cachable` and `batchable` attributes to `True`. As a result, EVA will cache the UDF outputs and utilize batch processing for increased efficiency.
In this instance, we have configured the `cacheable` and `batchable` attributes to `True`. As a result, EVA will cache the UDF outputs and utilize batch processing for increased efficiency.

Forward
--------
Expand Down Expand Up @@ -130,7 +130,7 @@ Here, is an example query that registers a UDF that wraps around the ``fasterrcn

.. code-block:: sql
CREATE UDF FastrcnnObjectDetector
CREATE UDF FastRCNNObjectDetector
IMPL 'eva/udfs/fastrcnn_object_detector.py';
Expand All @@ -140,11 +140,11 @@ Call registered UDF in a query

.. code-block:: sql
SELECT FastrcnnObjectDetector(data) FROM MyVideo WHERE id < 5;
SELECT FastRCNNObjectDetector(data) FROM MyVideo WHERE id < 5;
Drop the UDF
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: sql
DROP UDF IF EXISTS FastrcnnObjectDetector;
DROP UDF IF EXISTS FastRCNNObjectDetector;
2 changes: 1 addition & 1 deletion eva/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

* `server` - Code for launching server that sends client commands to command handler.
* `parser` - Converts SQL queries to statements (e.g., CREATE, SELECT, INSERT, and LOAD statements).
* In a SELECT statement, some tokens are direclty mapped to expressions (`expression`). For instance, an user-defined function is mapped to function expression.
* In a SELECT statement, some tokens are directly mapped to expressions (`expression`). For instance, an user-defined function is mapped to function expression.
* `optimizer / statement_to_opr_convertor.py` - Optimizer transforms every statement to a tree-structured query plan (`optimizer / operators.py`).
* For statements other than complex SELECT queries, it is mostly one-to-one mapping from statement to operator tree.
* SELECT statements are expanded to different operators PROJECT and FILTER, etc.
Expand Down
20 changes: 2 additions & 18 deletions eva/binder/statement_binder.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from functools import singledispatchmethod
from pathlib import Path

from eva.binder.binder_utils import (
Expand Down Expand Up @@ -46,22 +46,6 @@
from eva.utils.generic_utils import get_file_checksum, load_udf_class_from_file
from eva.utils.logging_manager import logger

if sys.version_info >= (3, 8):
from functools import singledispatchmethod
else:
# https://stackoverflow.com/questions/24601722/how-can-i-use-functools-singledispatch-with-instance-methods
from functools import singledispatch, update_wrapper

def singledispatchmethod(func):
dispatcher = singledispatch(func)

def wrapper(*args, **kw):
return dispatcher.dispatch(args[1].__class__)(*args, **kw)

wrapper.register = dispatcher.register
update_wrapper(wrapper, func)
return wrapper


class StatementBinder:
def __init__(self, binder_context: StatementBinderContext):
Expand Down Expand Up @@ -259,7 +243,7 @@ def _bind_func_expr(self, node: FunctionExpression):
else:
if udf_obj.type == "ultralytics":
# manually set the impl_path for yolo udfs we only handle object
# detection for now, hopefully this can be generelized
# detection for now, hopefully this can be generalized
udf_obj.impl_file_path = (
Path(f"{EVA_DEFAULT_DIR}/udfs/yolo_object_detector.py")
.absolute()
Expand Down
8 changes: 4 additions & 4 deletions eva/binder/statement_binder_context.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ def add_derived_table_alias(
Add a alias -> derived table column mapping
Arguments:
alias (str): name of alias
target_list: list of Tuplevalue Expression or FunctionExpression or UdfIOCatalogEntry
target_list: list of TupleValueExpression or FunctionExpression or UdfIOCatalogEntry
"""
self._check_duplicate_alias(alias)
col_alias_map = {}
Expand All @@ -108,7 +108,7 @@ def get_binded_column(
"""
Find the binded column object
Arguments:
col_name (str): columna name
col_name (str): column name
alias (str): alias name
Returns:
Expand All @@ -123,7 +123,7 @@ def raise_error():
if not alias:
alias, col_obj = self._search_all_alias_maps(col_name)
else:
# serach in all alias maps
# search in all alias maps
col_obj = self._check_table_alias_map(alias, col_name)
if not col_obj:
col_obj = self._check_derived_table_alias_map(alias, col_name)
Expand All @@ -137,7 +137,7 @@ def _check_table_alias_map(self, alias, col_name) -> ColumnCatalogEntry:
"""
Find the column object in table alias map
Arguments:
col_name (str): columna name
col_name (str): column name
alias (str): alias name
Returns:
Expand Down
3 changes: 1 addition & 2 deletions eva/catalog/catalog_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,7 @@ def construct_udf_cache_catalog_entry(
expression tree. The cache name is represented by the signature of the function
expression.
Args:
func_expr (FunctionExpression): the function expression with which the cache is
assoicated
func_expr (FunctionExpression): the function expression with which the cache is associated
Returns:
UdfCacheCatalogEntry: the udf cache catalog entry
"""
Expand Down
2 changes: 1 addition & 1 deletion eva/catalog/models/association_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from eva.catalog.models.base_model import BaseModel

# dependency table to maintain a many-to-many relationship between udf_catalog and udf_cache_catalog. This is important to ensure that any changes to udf are propogated to udf_cache. For example, deletion of a udf should also clear the associated caches.
# dependency table to maintain a many-to-many relationship between udf_catalog and udf_cache_catalog. This is important to ensure that any changes to udf are propagated to udf_cache. For example, deletion of a udf should also clear the associated caches.

depend_udf_and_udf_cache = Table(
"depend_udf_and_udf_cache",
Expand Down
2 changes: 1 addition & 1 deletion eva/catalog/models/base_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ class CustomModel:
It skips the attributes that are not present for the model, thus if a
dict is passed with some unknown attributes for the model on creation,
it won't complain for `unkwnown field`s.
it won't complain for `unknown field`s.
Declares and int `_row_id` field for all tables
"""

Expand Down
4 changes: 2 additions & 2 deletions eva/catalog/models/column_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@

class ColumnCatalog(BaseModel):
"""The `ColumnCatalog` catalog stores information about the columns of the table.
It maintinas the following information for each column
It maintains the following information for each column
`_row_id:` an autogenerated identifier
`_name: ` name of the column
`_type:` the type of the column, refer `ColumnType`
Expand Down Expand Up @@ -93,7 +93,7 @@ def array_dimensions(self):

@array_dimensions.setter
def array_dimensions(self, value: Tuple[int]):
# This tranformation converts the ANYDIM enum to
# This transformation converts the ANYDIM enum to
# None which is expected by petastorm.
# Before adding data, petastorm verifies _is_compliant_shape
# and any unknown dimension is expected to be None
Expand Down
Loading

0 comments on commit 2893b3d

Please sign in to comment.