Skip to content

Commit

Permalink
add spell-checking to CI
Browse files Browse the repository at this point in the history
  • Loading branch information
darthtrevino committed Oct 26, 2023
1 parent 37a78e9 commit 41e43df
Show file tree
Hide file tree
Showing 112 changed files with 1,311 additions and 259 deletions.
16 changes: 13 additions & 3 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
"*.jsx": "${capture}.js",
"*.tsx": "${capture}.ts, ${capture}.hooks.ts, ${capture}.hooks.tsx, ${capture}.stories.tsx, ${capture}.story.tsx, ${capture}.spec.tsx, ${capture}.base.ts, ${capture}.base.tsx, ${capture}.types.ts, ${capture}.styles.ts, ${capture}.styles.tsx, ${capture}.utils.ts, ${capture}.utils.tsx, ${capture}.constants.ts, ${capture}.module.scss, ${capture}.module.css, ${capture}.md, ${capture}.css",
"tsconfig.json": "tsconfig.*.json",
"package.json": "turbo.json, tsconfig.json, rome.json, .npmignore",
"README.md": "SECURITY.md, SUPPORT.md, CODE_OF_CONDUCT.md, LICENSE",
"package.json": "package-lock.json, turbo.json, tsconfig.json, rome.json, .npmignore, dictionary.txt, cspell.config.yaml",
"README.md": "SECURITY.md, SUPPORT.md, CODE_OF_CONDUCT.md, LICENSE, CODEOWNERS",
".eslintrc": ".eslintignore",
".prettierrc": ".prettierignore",
".gitattributes": ".gitignore",
Expand All @@ -52,5 +52,15 @@
"source.organizeImports": true
}
},
"autoDocstring.docstringFormat": "sphinx"
"autoDocstring.docstringFormat": "sphinx",
"cSpell.customDictionaries": {
"project-words": {
"name": "project-words",
"path": "${workspaceRoot}/dictionary.txt",
"description": "Words used in this project",
"addWords": true
},
"custom": true,
"internal-terms": true
}
}
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ There are four goals of the project:
1. Create a shareable client/server schema for serialized wrangling instructions. This is in the ./schema folder. TypeScript types and JSONSchema generation is in javascript/schema, and published schemas are copied out to ./schema along with test cases that are executed by JavaScript and Python builds to ensure parity.
2. Maintain an implementation of a basic client-side wrangling engine (largely based on [Arquero](https://github.com/uwdata/arquero)). This is in the ./javascript folder.
3. Maintain a python implementation using common wrangling libraries (e.g., [pandas](https://pandas.pydata.org/)) for backend or data science deployments. This is in the ./python folder.
4. Provide some reusable React components so wrangling operations can be incorporated into webapps easily. This is in the ./javascript/react folder.
4. Provide some reusable React components so wrangling operations can be incorporated into web applications easily. This is in the ./javascript/react folder.

Individual documentation for the JavaScript and Python implementations can be found in their respective folders. Broad documentation about building pipelines and the available verbs is available in the [docs](docs) folder

We currently have six primary JavaScript packages:

- [react](javascript/react/docs/markdown/index.md) - this is a set of React components for each verb that you can include in web apps that enable tranformation pipeline building.
- [react](javascript/react/docs/markdown/index.md) - this is a set of React components for each verb that you can include in web apps that enable transformation pipeline building.
- [schema](javascript/schema/docs/markdown/index.md) - this is a set of core types and associated JSONSchema definitions for formalizing our data package and resource models (including the definitions for table parsing, Codebooks, and Workflows).
- [tables](javascript/tables/docs/markdown/index.md) - this is the primary set of utilities for loading and parsing data tables, using Arquero under the hood.
- [utilities](javascript/utilities/docs/markdown/index.md) - this is a set of helpers for working with files, etc., to ease building data wrangling applications.
Expand Down
26 changes: 26 additions & 0 deletions cspell.config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
$schema: https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json
version: '0.2'
allowCompoundWords: true
dictionaryDefinitions:
- name: dictionary
path: './dictionary.txt'
addWords: true
dictionaries:
- dictionary
ignorePaths:
- 'node_modules'
- 'storybook-static'
- 'output'
- 'dist'
- 'build'
- 'javascript/*/docs'
- './javascript/webapp/public/schema'
- './schema'
- .turbo
- '*.csv'
- '*.parquet'
- '*.arrow'
- smoking.json
- __pycache__
- pyproject.toml
- '*.ipynb'
94 changes: 94 additions & 0 deletions dictionary.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Python Idioms
PYTHONPATH
pycache
nopython
virtualenv
pyproject
ipynb
pymodule

# JavaScript Idioms
QNAN
href
hrefs
noscript

# Libraries
Arquero
pandarallel
numpy
linspace
immer
ahooks
fluentui

# Library Methods/Args
Expando
dtype
mkdirp
iloc
virtualenvs
iterrows
dropna
astype
aggfunc
fillna
isna
arange
bindvar
atable
strptime
isin

# Technical Terms / Studies
Freedman-Diaconis
Doane
Sturges
NHEFS
NHANES
Hyattsville
NCHS
QUDT
Subform
binarized
binnable

# Verbs
binarize
genid
umap
concat
onehot
unhot
groupby
ungroup
unorder
rollup
dedupe
cume_dist

# Args & Functions
stdevp
nand
xnor
nunique
unapply
unlisten
toposort
castable
stdev

# Corporate Terms
MSRC
msrc
Dayenne
Souza
Carvajal
Worthen
Blanco

# Test
derp
Hola
ABCDEFGHIJKLMPQRSTUVWXYZ
ZNGA
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
The core idea with these components is largely analagous to the object-oriented chain-of-responsibility pattern. We construct a workflow, which is a series of table transformation steps (e.g., middleware). We supply a table store to read and write to (e.g., context). After workflow execution is complete, we retrieve one or more output tables from the context.
The core idea with these components is largely analogous to the object-oriented chain-of-responsibility pattern. We construct a workflow, which is a series of table transformation steps (e.g., middleware). We supply a table store to read and write to (e.g., context). After workflow execution is complete, we retrieve one or more output tables from the context.

The fundamental unit of work in the system is a **verb**. Verbs represent primitive operations that return a table. Most verbs require an input table to transform.

Expand Down
8 changes: 4 additions & 4 deletions docs/datatypes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Data types present a number of thorny edge cases when dealing with different lan

## Common tricky use cases:
- Text-based data files may contain strings that represent primitive values. Parsing these files should respect the data file's intent even if it overrides default language behavior. The most common example of this is probably boolean data columns with the values "true" and "false". JavaScript will naturally parse any non-empty string as `true`, so "false" -> `true`. A similar situation has been observed with "null".
- Dates can be represented in a wide variey of formats, and parsing/guessing implementations differ by platform and library.
- Dates can be represented in a wide variety of formats, and parsing/guessing implementations differ by platform and library.
- `new Date()` in JavaScript is problematic, and may also conflict with pandas' [date guessing](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html?highlight=date#date-handling).
- Autotomatic type discovery for columns is performed by both Arquero and pandas, but may have different results.
- Automatic type discovery for columns is performed by both Arquero and pandas, but may have different results.
- Some verbs can only be performed on certain data types, and other verbs can work with different data types but have different operators available. For example:
- [bin](./verbs/bin.md) requires numeric input types.
- [filter](./verbs/filter.md) requires different comparison operators depending on type (e.g., string 'contains' versus numeric 'less than').
Expand All @@ -19,8 +19,8 @@ The following rules will be observed across implementations to ensure consistent
- Pandas' [missing data logic](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data) will be used for computations and boolean evaluations.
- In general, this means null values are carried forward and may result in null outputs.
- For boolean comparisons, null propagation is situation-dependent (see [three-valued logic](https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics)). For example, if any operand in an OR comparison is `true`, the evaluation can return `true` even with nulls present.
- Coercing unparseable strings to dates will result in an `Invalid Date` (JavaScript) or `NaT` (pandas.to_datetime with errors='coerce').
- Coercing unparseable strings to numbers will result in `NaN` (pandas.to_numeric with errors='coerce').
- Coercing unparsable strings to dates will result in an `Invalid Date` (JavaScript) or `NaT` (pandas.to_datetime with errors='coerce').
- Coercing unparsable strings to numbers will result in `NaN` (pandas.to_numeric with errors='coerce').
- When reading text files, the pandas default strings for [missing values](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#na-values) and [booleans](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#boolean-values) will be used.
- [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) will be used for standard date formatting. Other date formats will not be auto-guessed.
- When providing a custom parse or format pattern, we follow python and use the [1989 C standard tokens](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). [d3-time-format](https://github.com/d3/d3-time-format) supports this standard for JavaScript.
Expand Down
2 changes: 1 addition & 1 deletion docs/resources/tablebundle/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The **table bundle** brings together all of the component resources to fully materialize a table for analytic use. This can include a base [data table](../datatable/index.md) (e.g., CSV or JSON data file), a [codebook](../codebook/index.md) that defines the schema, and a [workflow](../workflow/index.md) defining transformations to apply.

You can contruct table bundles in a variety of ways, including symlinking from one to another to create derived collections that dynamically update as child dependencies are modified.
You can construct table bundles in a variety of ways, including symlinking from one to another to create derived collections that dynamically update as child dependencies are modified.

## Table view

Expand Down
2 changes: 1 addition & 1 deletion docs/verbs/bin.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ _The input column for a binning operation must be a numeric data type._
Multiple binning strategies are supported. Please see the [numpy documentation](https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html) for detailed descriptions of the algorithms.

- Auto: uses automatic bin boundary guessing to create optimal default bins.
- Fd: Freedman diaconis estimator, resilient to outliers.
- Fd: Freedman-Diaconis estimator, resilient to outliers.
- Doane: Better for non-normal datasets.
- Scott: Less robust but takes data variability into account.
- Stone: Based on leave-one-out cross-validation.
Expand Down
2 changes: 1 addition & 1 deletion docs/verbs/rollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Performs aggregation operations on table columns. Normally the table should be [
- `median`: finds the median of the values
- `stdev`: computes the standard deviation of the values
- `stdevp`: computes the population standard deviation of the values
- `variance`: computes the variane of the values
- `variance`: computes the variance of the values
- `array_agg`: collects all of the values in an array
- `array_agg_distinct`: collects all of the unique values in an array

Expand Down
2 changes: 1 addition & 1 deletion javascript/app-framework/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# app-framework

The DataShaper app-framework package provides infrastructure for creating new applications that include core DataShaper functionality by default, as well as extensibility to build your own interfaces that are managedby the system consistently.
The DataShaper app-framework package provides infrastructure for creating new applications that include core DataShaper functionality by default, as well as extensibility to build your own interfaces that are managed by the system consistently.

## Resources

Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion javascript/app-framework/docs/markdown/app-framework.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions javascript/app-framework/docs/report/app-framework.api.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ export const Delimiter: React.FC<{
useBoolean(false)
const [value, setValue] = useState(isOther ? selected : '')

const onDelimeterChange = useCallback(
const onDelimiterChange = useCallback(
(option?: IChoiceGroupOption) => {
if (option?.key === 'Other') {
customDelimiter()
Expand All @@ -38,7 +38,7 @@ export const Delimiter: React.FC<{
},
)

const onChangeCustomDelimeter = useCallback(
const onChangeCustomDelimiter = useCallback(
(
_: React.FormEvent<HTMLInputElement | HTMLTextAreaElement>,
newValue?: string,
Expand All @@ -58,15 +58,15 @@ export const Delimiter: React.FC<{
label='Delimiter'
defaultSelectedKey={selected}
options={delimiterOptions}
onChange={(_, option) => onDelimeterChange(option)}
onChange={(_, option) => onDelimiterChange(option)}
/>
<TextField
autoComplete='off'
title='custom delimiter'
name='customDelimiter'
disabled={!isOther}
value={value}
onChange={onChangeCustomDelimeter}
onChange={onChangeCustomDelimiter}
/>
</DelimiterContainer>
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ export const Parser: React.FC<ParserProps> = memo(function Parser({ parser }) {
<FlexContainer>
<Delimiter
selected={delimiter}
onChange={(delim: string) => {
parser.delimiter = delim
onChange={(delimiter: string) => {
parser.delimiter = delimiter
}}
/>
</FlexContainer>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import type { FileDefinition } from '../ResourcesPane/index.js'

export interface DataShaperAppProps<T = unknown> {
/**
* CSS Classname
* CSS class name
*/
className?: string

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import type { FileDefinition } from './ResourcesPane.types.js'
import { useLoadDataPackage } from '../../../hooks/useLoadDataPackage.js'

/**
* Gets the file-managament commandbar items
* Gets the file-management commandbar items
*
* @param examples - The provided examples
* @param expanded - Whether the pane is expended
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ function makeTreeItem(
// (a) rendered if empty, so that options can be selected, and
// (b) rendered _in place of_ the child resource, so we don't have redundant child entries.
// so:
// 1: iterate the field wells if present, creating a tree item for each. these should have no href, so are "unclickable"
// 1: iterate the field wells if present, creating a tree item for each. these should have no href, so are "un-clickable"
// 1.1: if any well has a selected key, then we don't need to render the child resource, save it for later
// 2: iterate the child resources and create an item for each one that isn't already marked from the wells
const handled = new Set<string>()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ export function useOnDeleteStep(
* Get a function to call when a step is created
* @param save - The save function to call when the step is created
* @param selectOutput - A function to select the output after the step is created
* @param dismissModal - The function used to dismill the modal
* @param dismissModal - The function used to dismiss the modal
* @returns
*/
export function useOnCreateStep(
Expand Down
Loading

0 comments on commit 41e43df

Please sign in to comment.