docs: enable spellcheck on all python files, notebooks, and docs (#733)

We use `aspell`, `codespell`, and `pyenchant` with the list of ignored words located in a file -- `spelling.txt' `aspell`: rst and python files `pyenchant`: notebook markdown cells `codespell`: rst and python files We currently do not use `pylint` as it is slow and not very useful. --------- Co-authored-by: Gaurav <gaurav21776@gmail.com>
georgia-tech-db · May 15, 2023 · 2893b3d · 2893b3d
1 parent b4fa64e
commit 2893b3d
Show file tree

Hide file tree

Showing 81 changed files with 1,575 additions and 693 deletions.
diff --git a/.circleci/config.yml b/.circleci/config.yml
@@ -89,6 +89,8 @@ jobs:
       - run:
           name: Install EVA package from GitHub repo with all dependencies
           command: |
+            sudo apt-get update
+            sudo apt-get install -y enchant-2 aspell
             "python<< parameters.v >>" -m venv test_evadb
             pip install --upgrade pip
             source test_evadb/bin/activate

diff --git a/.gitignore b/.gitignore
@@ -192,3 +192,7 @@ prof/
 output.txt
 MagicMock/
 queries.txt
+dep.txt
+
+*.bak
+\<Magic*
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -186,7 +186,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 * PR #495: docs: improve read-the-docs 
 * PR #488: docs: fix notebooks
 * PR #486: docs: update notebooks + banner
-* PR #476: feat: GPU jenkins support
+* PR #476: feat: GPU Jenkins support
 
 ## [0.1.0] - 2022-11-12
 ### [Added]
@@ -256,7 +256,7 @@ Thanks to @gaurav274, @jarulraj, @xzdandy, @LordDarkula, @eloyekunle, and @devsh
 ### [Added]
 
 * PR #295: Improve Error Messages and Query responses 
-* PR #292: Upating read the docs + website
+* PR #292: Updating read the docs + website
 * PR #288: Update README.md
 
 

diff --git a/README.md b/README.md
@@ -186,7 +186,7 @@ The following architecture diagram presents the critical components of the EVA d
 |---------------|--------------|
 |<img alt="Source Video" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-input.webp" width="150"> |<img alt="Query Result" src="https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/mnist-output.webp" width="150"> |
 
-### 🔮 [Movie Emotion Analysis](https://evadb.readthedocs.io/en/stable/source/tutorials/03-emotion-analysis.html) (Face Detection + Emotion Classfication Models)
+### 🔮 [Movie Emotion Analysis](https://evadb.readthedocs.io/en/stable/source/tutorials/03-emotion-analysis.html) (Face Detection + Emotion Classification Models)
 
 | Source Video  | Query Result |
 |---------------|--------------|

diff --git a/data/ua_detrac/README.md b/data/ua_detrac/README.md
@@ -15,7 +15,7 @@
 * __8250 vehicles__ that are manually annotated
 * Vehicle categories are __Car, Bus, Van,__ and __Other__
 * Weather categories are __Night, Sunny, Rainy,__ and __Cloudy__
-* Other annotations include __Scale of Vehicle, Occulsion Ratio,__ and __Truncation Ratio__. 
+* Other annotations include __Scale of Vehicle, Occlusion Ratio,__ and __Truncation Ratio__. 
 
 
 
@@ -52,7 +52,7 @@ __Position information of target trajectories out of the general background, whi
 
 __Unzipping the dataset__
 
-You can use your own method for unzipping .zip files or for conveniency
+You can use your own method for unzipping .zip files or for convenience
 
 `sudo apt-get install unzip`
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -46,7 +46,7 @@ Key Features
 
 1. With EVA, you can **easily combine SQL and deep learning models to build next-generation database applications**. EVA treats deep learning models as  functions similar to traditional SQL functions like SUM().
 
-2. EVA is **extensible by design**. You can write an **user-defined function** (UDF) that wraps arounds your custom deep learning model. In fact, all the built-in models that are included in EVA are written as user-defined functions.
+2. EVA is **extensible by design**. You can write an **user-defined function** (UDF) that wraps around your custom deep learning model. In fact, all the built-in models that are included in EVA are written as user-defined functions.
 
 3. EVA comes with a collection of **built-in sampling, caching, and filtering optimizations** inspired by relational database systems. These optimizations help **speed up queries on large datasets and save money spent on model inference**.
 
@@ -107,7 +107,7 @@ Illustrative EVA Applications
 
 |pic3| |pic4|
 
-|:desert_island:| Movie Analysis Application using Face Detection + Emotion Classfication Models
+|:desert_island:| Movie Analysis Application using Face Detection + Emotion Classification Models
 ~~~~
 
 ..  |pic5| image:: https://github.com/georgia-tech-db/eva/releases/download/v0.1.0/gangubai-input.webp

diff --git a/docs/source/contribute/new_command.rst b/docs/source/contribute/new_command.rst
@@ -7,7 +7,7 @@ Command Handler
 ----
 
 An input query string is handled by **Parser**,
-**StatementTOPlanConvertor**, **PlanGenerator**, and **PlanExecutor**. 
+**StatementTOPlanConverter**, **PlanGenerator**, and **PlanExecutor**. 
 We discuss each part separately.
 
 .. code:: python
@@ -19,7 +19,7 @@ We discuss each part separately.
        #1. parser
        stmt = Parser().parse(query)[0]
        #2. statement to logical plan
-       l_plan = StatementToPlanConvertor().visit(stmt)
+       l_plan = StatementToPlanConverter().visit(stmt)
        #3. logical to physical plan
        p_plan = PlanGenerator().build(l_plan)
        #4. parser
@@ -31,7 +31,7 @@ We discuss each part separately.
 ---------
 
 The parser firstly generate **syntax tree** from the input string, and
-then tansform syntax tree into **statement**.
+then transform syntax tree into **statement**.
 
 The first part of Parser is build from a LARK grammar file.
 
@@ -68,7 +68,7 @@ parser/lark_visitor
 
 .. code:: python
 
-   from src.parser.parser_visitor._create_statement import CenameTable
+   from src.parser.parser_visitor._create_statement import CreateTable
    class ParserVisitor(CommonClauses, CreateTable, Expressions,
                        Functions, Insert, Select, TableSources,
                        Load, Upload):
@@ -80,9 +80,9 @@ parser/
    called in ``_[cmd]_statement.py``
 -  ``types.py`` - register new StatementType
 
-.. _2-statementtoplanconvertor:
+.. _2-statementtoplanconverter:
 
-2. Statement To Plan Convertor
+2. Statement To Plan Converter
 ---------------------------
 
 The part transforms the statement into corresponding logical plan.
@@ -123,7 +123,7 @@ Optimizer
    .. code:: python
 
       # May need to convert the statement into another data type.
-      # The new data type is usable for excuting command.
+      # The new data type is usable for executing command.
       # For example, column_list -> column_metadata_list
 
       def visit_create(self, statement: AbstractStatement):
@@ -139,7 +139,7 @@ Optimizer
               video_ref, column_metadata_list, if_not_exists)
           self._plan = create_opr
 
-   -  modify visit function to call the right visit_[cmd] funciton
+   -  modify visit function to call the right visit_[cmd] function
 
    .. code:: python
 
@@ -186,9 +186,9 @@ optimizer/rules
 
    -  Import operators
    -  Register new ruletype to **RuleType** and **Promise** (place it
-      **before IMPLEMENTATION_DELIMETER** !!)
-   -  implement class ``Logical[cmd]ToPhysical``, its memeber function
-      apply() will construct a corresbonding\ ``[cmd]Plan`` object.
+      **before IMPLEMENTATION_DELIMITER** !!)
+   -  implement class ``Logical[cmd]ToPhysical``, its member function
+      apply() will construct a corresponding\ ``[cmd]Plan`` object.
 
    .. code:: python
 
@@ -210,14 +210,14 @@ optimizer/rules
 -  ``rules_base.py``-
 
    -  Register new ruletype to **RuleType** and **Promise** (place it
-      **before IMPLEMENTATION_DELIMETER** !!)
+      **before IMPLEMENTATION_DELIMITER** !!)
 
 -  ``rules_manager.py``-
 
    -  Import rules created in ``rules.py``
    -  Add imported logical to physical rules to ``self._implementation_rules``
 
-.. _4-planexcutor:
+.. _4-PlanExecutor:
 
 4. Plan Executor
 --------------
@@ -259,7 +259,7 @@ Key data structures in EVA:
 
       -  ``file_url`` - used to access the real table in storage engine.
 
-   -  For the ``RENAME`` table command, we use the ``old_table_name`` to access the corresponing entry in metadata table, and the ``modified name`` of the table.
+   -  For the ``RENAME`` table command, we use the ``old_table_name`` to access the corresponding entry in metadata table, and the ``modified name`` of the table.
 
 -  **Storage Engine**:
 

diff --git a/docs/source/overview/aidb.rst b/docs/source/overview/aidb.rst
@@ -1,7 +1,7 @@
 EVA AI-Relational Database System
 ====
 
-Over the last decade, deep learning models have radically changed the world of computer vision and natural languague processing. They are accurate on a variety of tasks ranging from image classification to question answering. However, there are two challenges that prevent a lot of users from benefiting from these models.
+Over the last decade, deep learning models have radically changed the world of computer vision and natural language processing. They are accurate on a variety of tasks ranging from image classification to question answering. However, there are two challenges that prevent a lot of users from benefiting from these models.
 
 Usability and Application Maintainability
 ^^^^

diff --git a/docs/source/reference/evaql/udf.rst b/docs/source/reference/evaql/udf.rst
@@ -19,7 +19,7 @@ Here is a list of built-in user-defined functions in EVA.
 
 FastRCNNObjectDetector is a model for detecting objects. MVITActionRecognition is a model for recognizing actions. 
 
-ArrayCount and Crop are utility functions for counting the number of objects in an array and cropping a bounding box from an image, resepectively.
+ArrayCount and Crop are utility functions for counting the number of objects in an array and cropping a bounding box from an image, respectively.
 
 SELECT WITH MULTIPLE UDFS
 ----

diff --git a/docs/source/reference/gpu.rst b/docs/source/reference/gpu.rst
@@ -10,7 +10,7 @@ Configure GPU
 
 A valid output from the command indicates that your GPU is configured and ready to use. If not, you will need to install the appropriate GPU driver. `This page <https://towardsdatascience.com/deep-learning-gpu-installation-on-ubuntu-18-4-9b12230a1d31>`_ provides a step-by-step guide on installing and configuring the GPU driver in the Ubuntu Operating System.
 
-    * When installing an NVIDIA driver, ensure that the version of the GPU driver is correct to avoid compatibiility issues.
+    * When installing an NVIDIA driver, ensure that the version of the GPU driver is correct to avoid compatibility issues.
     * When installing cuDNN, you will need to create an account and ensure that you get the correct `deb` files for your operating system and architecture.
 
 2. You can run the following code in a Jupyter notebook to verify that your GPU is detected by PyTorch:

diff --git a/docs/source/reference/udf.rst b/docs/source/reference/udf.rst
@@ -43,7 +43,7 @@ Example of a Setup function
 
 .. code-block:: python
 
-  @setup(cachable=True, udf_type="object_detection", batchable=True)
+  @setup(cacheable=True, udf_type="object_detection", batchable=True)
   def setup(self, threshold=0.85):
       #custom setup function that is specific for the UDF
       self.threshold = threshold 

diff --git a/docs/source/reference/udfs/custom.rst b/docs/source/reference/udfs/custom.rst
@@ -45,15 +45,15 @@ Example of the `setup` function:
 
 .. code-block:: python
 
-    @setup(cachable=True, udf_type="object_detection", batchable=True)
+    @setup(cacheable=True, udf_type="object_detection", batchable=True)
     def setup(self, threshold=0.85):
         self.threshold = threshold
         self.model = torchvision.models.detection.fasterrcnn_resnet50_fpn(
             weights="COCO_V1", progress=False
         )
         self.model.eval()
 
-In this instance, we have configured the `cachable` and `batchable` attributes to `True`. As a result, EVA will cache the UDF outputs and utilize batch processing for increased efficiency.
+In this instance, we have configured the `cacheable` and `batchable` attributes to `True`. As a result, EVA will cache the UDF outputs and utilize batch processing for increased efficiency.
 
 Forward
 --------
@@ -130,7 +130,7 @@ Here, is an example query that registers a UDF that wraps around the ``fasterrcn
 
 .. code-block:: sql
 
-  CREATE UDF FastrcnnObjectDetector
+  CREATE UDF FastRCNNObjectDetector
   IMPL  'eva/udfs/fastrcnn_object_detector.py';
     
 
@@ -140,11 +140,11 @@ Call registered UDF in a query
 
 .. code-block:: sql
 
-  SELECT FastrcnnObjectDetector(data) FROM MyVideo WHERE id < 5;
+  SELECT FastRCNNObjectDetector(data) FROM MyVideo WHERE id < 5;
 
 Drop the UDF
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 .. code-block:: sql
 
-  DROP UDF IF EXISTS FastrcnnObjectDetector;
+  DROP UDF IF EXISTS FastRCNNObjectDetector;
diff --git a/eva/README.md b/eva/README.md
@@ -4,7 +4,7 @@
 
 * `server` - Code for launching server that sends client commands to command handler.
 * `parser` - Converts SQL queries to statements (e.g., CREATE, SELECT, INSERT, and LOAD statements).
-* In a SELECT statement, some tokens are direclty mapped to expressions (`expression`). For instance, an user-defined function is mapped to function expression.
+* In a SELECT statement, some tokens are directly mapped to expressions (`expression`). For instance, an user-defined function is mapped to function expression.
 * `optimizer / statement_to_opr_convertor.py` - Optimizer transforms every statement to a tree-structured query plan (`optimizer / operators.py`).
 * For statements other than complex SELECT queries, it is mostly one-to-one mapping from statement to operator tree.
 * SELECT statements are expanded to different operators PROJECT and FILTER, etc.

diff --git a/eva/binder/statement_binder.py b/eva/binder/statement_binder.py
@@ -12,7 +12,7 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import sys
+from functools import singledispatchmethod
 from pathlib import Path
 
 from eva.binder.binder_utils import (
@@ -46,22 +46,6 @@
 from eva.utils.generic_utils import get_file_checksum, load_udf_class_from_file
 from eva.utils.logging_manager import logger
 
-if sys.version_info >= (3, 8):
-    from functools import singledispatchmethod
-else:
-    # https://stackoverflow.com/questions/24601722/how-can-i-use-functools-singledispatch-with-instance-methods
-    from functools import singledispatch, update_wrapper
-
-    def singledispatchmethod(func):
-        dispatcher = singledispatch(func)
-
-        def wrapper(*args, **kw):
-            return dispatcher.dispatch(args[1].__class__)(*args, **kw)
-
-        wrapper.register = dispatcher.register
-        update_wrapper(wrapper, func)
-        return wrapper
-
 
 class StatementBinder:
     def __init__(self, binder_context: StatementBinderContext):
@@ -259,7 +243,7 @@ def _bind_func_expr(self, node: FunctionExpression):
         else:
             if udf_obj.type == "ultralytics":
                 # manually set the impl_path for yolo udfs we only handle object
-                # detection for now, hopefully this can be generelized
+                # detection for now, hopefully this can be generalized
                 udf_obj.impl_file_path = (
                     Path(f"{EVA_DEFAULT_DIR}/udfs/yolo_object_detector.py")
                     .absolute()

diff --git a/eva/binder/statement_binder_context.py b/eva/binder/statement_binder_context.py
@@ -87,7 +87,7 @@ def add_derived_table_alias(
         Add a alias -> derived table column mapping
         Arguments:
             alias (str): name of alias
-            target_list: list of Tuplevalue Expression or FunctionExpression or UdfIOCatalogEntry
+            target_list: list of TupleValueExpression or FunctionExpression or UdfIOCatalogEntry
         """
         self._check_duplicate_alias(alias)
         col_alias_map = {}
@@ -108,7 +108,7 @@ def get_binded_column(
         """
         Find the binded column object
         Arguments:
-            col_name (str): columna name
+            col_name (str): column name
             alias (str): alias name
 
         Returns:
@@ -123,7 +123,7 @@ def raise_error():
         if not alias:
             alias, col_obj = self._search_all_alias_maps(col_name)
         else:
-            # serach in all alias maps
+            # search in all alias maps
             col_obj = self._check_table_alias_map(alias, col_name)
             if not col_obj:
                 col_obj = self._check_derived_table_alias_map(alias, col_name)
@@ -137,7 +137,7 @@ def _check_table_alias_map(self, alias, col_name) -> ColumnCatalogEntry:
         """
         Find the column object in table alias map
         Arguments:
-            col_name (str): columna name
+            col_name (str): column name
             alias (str): alias name
 
         Returns:

diff --git a/eva/catalog/catalog_utils.py b/eva/catalog/catalog_utils.py
@@ -134,8 +134,7 @@ def construct_udf_cache_catalog_entry(
     expression tree. The cache name is represented by the signature of the function
     expression.
     Args:
-        func_expr (FunctionExpression): the function expression with which the cache is
-        assoicated
+        func_expr (FunctionExpression): the function expression with which the cache is associated
     Returns:
         UdfCacheCatalogEntry: the udf cache catalog entry
     """

diff --git a/eva/catalog/models/association_models.py b/eva/catalog/models/association_models.py
@@ -16,7 +16,7 @@
 
 from eva.catalog.models.base_model import BaseModel
 
-# dependency table to maintain a many-to-many relationship between udf_catalog and udf_cache_catalog. This is important to ensure that any changes to udf are propogated to udf_cache. For example, deletion of a udf should also clear the associated caches.
+# dependency table to maintain a many-to-many relationship between udf_catalog and udf_cache_catalog. This is important to ensure that any changes to udf are propagated to udf_cache. For example, deletion of a udf should also clear the associated caches.
 
 depend_udf_and_udf_cache = Table(
     "depend_udf_and_udf_cache",

diff --git a/eva/catalog/models/base_model.py b/eva/catalog/models/base_model.py
@@ -30,7 +30,7 @@ class CustomModel:
 
     It skips the attributes that are not present for the model, thus if a
     dict is passed with some unknown attributes for the model on creation,
-    it won't complain for `unkwnown field`s.
+    it won't complain for `unknown field`s.
     Declares and int `_row_id` field for all tables
     """
 

diff --git a/eva/catalog/models/column_catalog.py b/eva/catalog/models/column_catalog.py
@@ -33,7 +33,7 @@
 
 class ColumnCatalog(BaseModel):
     """The `ColumnCatalog` catalog stores information about the columns of the table.
-    It maintinas the following information for each column
+    It maintains the following information for each column
     `_row_id:` an autogenerated identifier
     `_name: ` name of the column
     `_type:` the type of the column, refer `ColumnType`
@@ -93,7 +93,7 @@ def array_dimensions(self):
 
     @array_dimensions.setter
     def array_dimensions(self, value: Tuple[int]):
-        # This tranformation converts the ANYDIM enum to
+        # This transformation converts the ANYDIM enum to
         # None which is expected by petastorm.
         # Before adding data, petastorm verifies _is_compliant_shape
         # and any unknown dimension is expected to be None