-
Notifications
You must be signed in to change notification settings - Fork 7
Add abstract semantics for remaining data frame operations #1583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ydq23
merged 37 commits into
absint-dataframe-staging
from
1580-add-abstract-semantics-for-remaining-data-frame-operations
May 1, 2025
Merged
Add abstract semantics for remaining data frame operations #1583
ydq23
merged 37 commits into
absint-dataframe-staging
from
1580-add-abstract-semantics-for-remaining-data-frame-operations
May 1, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…-cfg-for-forward-traversal
…o 1525-adapt-the-data-frame-processors-and-semantics-to-the-new-cfg
…e-processors-and-semantics-to-the-new-cfg
…he-new-cfg' into 1580-add-abstract-semantics-for-remaining-data-frame-operations
Base automatically changed from
1525-adapt-the-data-frame-processors-and-semantics-to-the-new-cfg
to
absint-dataframe-staging
April 20, 2025 14:03
…ics-for-remaining-data-frame-operations
EagleoutIce
approved these changes
May 1, 2025
EagleoutIce
added a commit
that referenced
this pull request
Jul 20, 2025
* Hooks for AbsInt as Decorators (#1418) * feat: basic concept for data frame create hook * feat: data frame hook for string based column access * feat: map variables to data frame domain at assignment * feat: data frame semantics for assignment and expression list * refactor: structure abstract interpretation hooks * feat: basic tests for data frame abstract domain * feat: test framework for inferred data frame properties * test: add tests for data frame domain * refactor: restructure processor decorators and function mapping * lint-fix: linting errors --------- Co-authored-by: Oliver <oliver.gerstl@uni-ulm.de> * feat: restructure absint info to capture state for each node * feat: add data frame processor for pipe operator * feat: interface to resolve data frame values * refactor: relocate mapping for expression semantics * Support multiple criteria in data frame tests (#1478) * test: add support for multiple criteria for data frame tests * test: test label support and clearer test entry type * feat: apply semantics also to data frame operation arguments * Improve performance of data frame absint tests (#1490) * test: parse code only once and check criteria on result * test: support parser selection for data frame assert tests * Implement fold over CFG for forward traversal (#1530) * feat: basic fold over CFG for forward traversal * feat: add simplified forward-connected control flow graph * feat-fix: correct label for CFG exit point edges * test: unit tests for simple control flow graph * feat: add CFG visitor for fixpoint iteration for data frames * feat-fix: add missing negation in has-changed check * feat-fix: subtract top from colnames should have no effect * Adapt the data frame processors and semantics to the new CFG (#1531) * feat: basic fold over CFG for forward traversal * feat: add simplified forward-connected control flow graph * feat-fix: correct label for CFG exit point edges * test: unit tests for simple control flow graph * feat: add CFG visitor for fixpoint iteration for data frames * feat-fix: add missing negation in has-changed check * feat: setup for data frame processors and semantics * feat: directly evaluate arguments when processing function * feat: add semantics mapper for column/row access and assignment * feat: implement data frame processors for basic nodes * feat: return resulting data frame domain * feat: support different types of inferred constraints * test: add tests for data frame state domain * feat: store abstract state in nodes * feat: support control flow constructs in abstract interpretation * test: add tests for control flow support * feat: support colnames assignment, cbind and rbind * Add abstract semantics for remaining data frame operations (#1583) * feat: basic fold over CFG for forward traversal * feat: add simplified forward-connected control flow graph * feat-fix: correct label for CFG exit point edges * test: unit tests for simple control flow graph * feat: add CFG visitor for fixpoint iteration for data frames * feat-fix: add missing negation in has-changed check * feat: setup for data frame processors and semantics * feat: directly evaluate arguments when processing function * feat: add semantics mapper for column/row access and assignment * feat: implement data frame processors for basic nodes * feat: return resulting data frame domain * feat: support different types of inferred constraints * test: add tests for data frame state domain * feat: store abstract state in nodes * feat: support control flow constructs in abstract interpretation * test: add tests for control flow support * feat: support colnames assignment, cbind and rbind * feat: support data frame head and tail function * feat: support data frame subsetting via access operator * feat: support data frame subset, filter, and select * feat: support column and row assignment and magrittr pipe * feat: support mutate, left_join and group_by * test-fix: try to fix CI data frame test errors * test-fix: skip tests with library to test CI * test-fix: check evaluation of R code with library call in CI * test-fix: unload library after test * test-fix: restart R shell after library test * refactor: restructure semantics and processors * feat: support summarize and limit col names * feat: basic delayed widening for fixpoint iteration * test: add test for multiple else-if branches * feat: use dataflow graph to resolve origin processor * feat(df-shape): provide query (#1658) * Prepare the first evaluation of abstract interpretation (#1666) * feat(benchmark): benchmark and summarizer for abstract interpretation * test(benchmark): test for abstract interpretation benchmark * feat-fix: adapt new absint info format * perf: improve performance and fix max call stack exceeded * perf: optimize performance of abstract domain operators * feat-fix: summarize data frame operations * feat: optional slicing criterion argument for df-shape query * feat-fix: only map data frame operation if arg is data frame * feat-fix: only infer colnames at creation if args are vectors * feat: add number of files containing data frames to benchmark * feat: add number of files without data frames to benchmark * perf: improve performance of queue pop * feat-fix: small issues in function mapping * feat-fix: fix function mapping of replacement functions * Use the new CFG visitor for abstract interpretation (#1750) * feat-fix: issues after merging new CFG * feat-fix: fix constant propagation of vectors * feat: use new CFG visitor for abstract interpretation * feat: adapt semantic mappers and support named arguments * feat-fix: restore auto-trimmed trailing whitespace * test-fix: adapt config amendment * Support loading data frames from external files (#1772) * wip: basic idea to extract data frame shapes from files * feat: extract data frame shapes from external files * test: tests for extracting data frame shapes from files * feat: separate abstract operation for read * feat: support read functions from readr * feat: support comments and custom quotes in files * feat-fix: unresolvable arguments are top * feat: correct over-approximation for special and duplicate colnames * feat: support reading CSV from text for tests * test-fix: skip tests with libraries to fix CI * refactor: outsource request retrieval and content parsing * Add more tests to verify capabilities of shape inference (#1787) * test: document and extend shape inference test framework * feat: categorize data frame shape inference tests * test: support tests asserting that no shape is inferred * test: tests for create, convert, read, access capabilities * test: tests for assignment, names, bind, head/tail capabilities * test: tests for subset, filter, select, transform, mutate capabilities * test: tests for group-by, summarize, left-join, merge capabilities * test: install required libraries before tests * test-fix: increase timeout for installing packages * refactor: outsource test function for tests with sources * test: skip tests if required packages not installed * refactor: outsource helper functions and add comments * feat-fix: more robust regex for parsing data frame properties * feat-fix: outsource check functions * Fix remaining special edge cases of wrong over-approximation (#1795) * refactor: remove redundant utility functions * feat: library origin and return type for data frame functions * feat: sound over-approximation for unsupported function arguments * feat: simplify abstract data frame operations * feat: support colnames checks and duplicate colnames * feat: support subsetting swith empty and duplicate columns * feat: support column deletion via column assignment to NULL * feat: support assigning a subset of column names * feat: extended support for subset and select function * feat: support multiple conditions in filter function * feat: support mixed column subset and duplicate row subset * feat: support mutate edge cases, mutating group-by, and empty summarize * feat: extended support for merge and join functions * test: fail tests for remaining unsupported special edge cases * feat: support merge by column indices * feat: support relocate witch changed colnames * refactor: outsource argument helper functions * feat-fix: operand modification at replacement functions * refactor: add semantic names for conditons * test-fix: adapt values of shape inference benchmark tests * More documentation for abstract interpretation of data frames (#1804) * refactor: document and refactor mappers and absint info * refactor: document and refactor absint visitor and shape inference * doc: abstract domains and semantics appliers * refactor: consistent naming of the data frame shape inference * test: simplify over-approximation specification for tests * feat: more documentation for data frame test framework * Add config options for data frame shape inference (#1801) * refactor: remove redundant utility functions * feat: library origin and return type for data frame functions * feat: sound over-approximation for unsupported function arguments * feat: simplify abstract data frame operations * feat: support colnames checks and duplicate colnames * feat: support subsetting swith empty and duplicate columns * feat: support column deletion via column assignment to NULL * feat: support assigning a subset of column names * feat: extended support for subset and select function * feat: support multiple conditions in filter function * feat: support mixed column subset and duplicate row subset * feat: support mutate edge cases, mutating group-by, and empty summarize * feat: extended support for merge and join functions * test: fail tests for remaining unsupported special edge cases * feat: support merge by column indices * feat: support relocate witch changed colnames * refactor: outsource argument helper functions * feat: config options for data frame shape inference * feat: add config flag to disable reading loaded datasets * doc: abstract interpretation configuration options * refactor: reorganize config options for reading external data files * Implement a linter rule for accessed data frame columns and rows (#1802) * feat: basic idea for data frame access linter rule * test: add tests for data frame access linter rule * doc: include data frame access linter into wiki page * Prepare the second evaluation of data frame shape inference (#1815) * feat: over-approximation of the rows for files with too many lines * feat-fix: support rbind with empty data frame * test-fix: remove redundant config and fix no inference case * feat: add list of unsupported data frame function for over-approximation * feat-fix: correct alias tracking for vector arguments * test-fix: file read check bug * test: simplify printed output format of instrumented code * feat: add config for data frame access linter rule * feat: documentation test cases for data frame access linter rule * feat-fix: restrict allowed unescaped regex tokens * feat: add number of inferred exact intervals to benchmark * feat: restructure and extend unsupported data frame functions * feat-fix: simplify escape regex and use single file existence check * doc-fix: documentation of vector resolve functions * feat-fix: simplify cast for critical parameters * feat-fix: implement feedback from review comments This is the master's thesis of Oliver Gerstl. --------- Co-authored-by: Oliver <oliver.gerstl@uni-ulm.de>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.