-
Notifications
You must be signed in to change notification settings - Fork 44
Deep Data Structures Manipulation Design
Ilya Sher edited this page Oct 18, 2021
·
26 revisions
As data-focused programming language, Next Generation Shell should have deep data structures manipulaton as part of the standard library.
NGS GitHub Issue #437: Design Deep Data Structures Operations
- Minimal syntax changes (implemented
pattern -> action
) - Customizable for user-defined types (postponed)
- Should work with tree-like data structure where "pre" and "post" visit are alternatives (processing the node or the children first)
- Should work with flat data structures as well (
Arr
,Hash
) (implemented) - Integrate well with existing facilities (implemented)
-
=~
and patterns - Universal Patten Matching
-
- support transformation (map-like behavior) (implemented)
- support filtering (filter-like behavior) (implemented)
- provide context to
deep_...
functions callbacks (path, etc), not only current value (implemented) - support passing arbitrary values from the callback to other callbacks invocations (todo)
- support in-place modifications? (maybe later)
- maybe: support for data which is not present yet - when needed additional data will be lazily fetched (think AWS results pagination, where you maybe find what you are looking for before fetching all pages) (maybe later)
- maybe: support implicit loops (not sure what's the use case) (maybe later)
- maybe: support of pub/sub (maybe somewhat like Go channels and
select
) (maybe later)
- Remove all
Hash
es that match{'url': ...}
which are in an array which resides underextension
key in anyHash
in the data. (implemented:my_data.tr({"url": AnyOf("http://hl7.org/fhir/StructureDefinition/maxValue", "http://hl7.org/fhir/StructureDefinition/minValue")} -> skip(Y))
) - Remove all empty arrays/hashes/strings (or strings containing only spaces) from a data structure (implemented:
my_data.tr([] -> skip(Y))
) - Parse JSON files into NGS objects (more thought needed regarding schema and serialization/deserialization in general) (maybe later)
- Prepend/append element to an array (implemented:
tr([1,2,[3,4]], Arr -> { ["start"] + descend(B) + ["end"] } )
) - "Deep uniq" - remove duplicate primitive values anywhere, remove empty Hashes or Arrays. See https://github.com/ngs-lang/nsd/blob/5c6cd31930cabe28b20bea6fb13e0631e655e2f5/data-manipulation/deep-uniq.ngs (maybe later)
In no particular order:
- XPath
- XSLT
- FHIRPath - note that FHIRPath was created for validation and hence lacks semantics for modification of the data. "... needs for path navigation, selection and formulation of invariants ..."
- GraphQL - not sure if it's a good match for arbitrary nested data such as FHIR
- Neo4j - native graph database platform
- https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#projections-and-lenses (thanks u/tobega)
- JsonPath
- JMESPath
- Raku Grammars
- Libraries
- New type -
TrContext
- transformation context (implemented) -
PatternAction
objects are restricted to (Any -> Fun) type in this design, at least for now. Actions are called if patterns match.- Note that this allows defining functions like for example
skip(Any, TrContext)
and then using them likeSomething -> skip
. (don't know how to solve yet. dangerous because laterskip(?, pattern:Any)
can be defined and might potentially take precedence)
- Note that this allows defining functions like for example
- New method
tr(data:Any, transformation:Any)
UPM patterns will be used. UPM patterns need to be updated to return MatchResult
(containing both indexed and named matches)
TBD: Whether UPM functionality will be extended with something specific for tr()
.
UPM should have a way to match on the context.
Action callbacks (in pattern -> action
) will be called with the following arguments:
- Current matched data
- Context object with:
-
[N]
- Positional matches (up until now) -
.something
- Named matches (up until now), can not start with_
. -
._path
- an array which represents the path of current focus relative to object root.[]
would signify the root. (implemented) -
._root
- the root object with whichtr()
(or=~
) was invoked. (implemented) -
._cur
- current item being processed (implemented, used bydescend(TrContext)
, not sure should be "public API") -
._transformation
- currently active transformation rules (implemented) -
.del()
method to delete currently matching data:bad_data_pattern -> { B.del() }
(maybe later) -
.replace()
...bad_data_pattern -> { B.replace("(warning, bad data: $A)") }
(maybe later) -
.return(v)
- returnv
as top level value of thetr()
operation (not done yet, open issue: how to set a default if.return()
is not called) -
.descend()
- continue recursion into current focus object. (implemented) TBD: pass rules/mode? Maybe.tr()
? -
.skip()
- exclude current focus from the new structure (implemented)
-
An action callback can either
- Return a value to be used as replacement in the newly generated data structure or
- Call one of the methods on the context object (to affect original data structure or do other, TBD, operations)
# Removing specific extension from FHIR Questionnaire (implemented)
questionnaire.tr({'url': 'my_extension'} -> skip(Y))
# Removing extension can lead to empty array, which is invalid in FHIR
questionnaire.tr({[] -> skip(Y)})
# Is our extension used?
questionnaire.tr({'url': 'my_extension'} -> {B.return(true)})
# maybe # our_ext_used = questionnaire =~ Deep({'url': 'my_extension'})
- What would
tr(data:Any)
mean? (Without any transformations passed). Currently nothing. Maybe: cleanup all empty arrays/hashes/strings. - Think about peg/leg style parsing, where
- The focus is moving inside a string
- Patterns which are defined later are referenced from earlier rules
- Augment UPM with
=~~
and!~~
for recursive search? Probably not but provideDeep(pattern)
facility. - Start deprecating the older
~
match? - Base the solution exclusively on Universal Pattern Matching?
- Probably not, sounds like
filter()
-like andmap()
-like functionality should be in anyway.
- Probably not, sounds like
- A function to get to element at given path? Example:
data.at([0, "a", 10, "b"])
- How to behave for missing path? Exception?
- Would it work with patterns for path and handle multiple paths? (Example:
data.at([Int, {"x": 1}])
)
- Visit the node first or children first?
- Implement as
visit_node()
andvisit_children()
? -
visit_node()
would be a no-op for non-tree containers such asArr
?
- Implement as
- "view" design that would allow modifying or operating on previously selected paths/items. -- supported by specifying the rules that only work with "interesting" objects + default rules to recursively apply rules.
- Context object (to pass to callbacks to each/visit/etc) with
- Current path
- Most recent container, index/key, and value
- Should callbacks be multimethods? (Considering that currently there is no syntax for expressing unnamed multimethods so specifying multimethod inline is not straightforward).
- Naming:
deep_...
vs..._deep
- Searches:
nested data structures
deep data structures
-
graph data structures
(less applicable)
NGS official website is at https://ngs-lang.org/