-
Notifications
You must be signed in to change notification settings - Fork 97
Updates PR #627 - Suffix Arrays #865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@zhihuidu, are you okay with the changes I've made here? I specifically made the move/rename Once I have your approval, I'll invite the other core team members to review Edit: I've heard back from zhihui and he is happy with this restructuring |
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl
537182a to
3fba0a3
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl
3fba0a3 to
0719d14
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8)
0719d14 to
42a7de2
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8)
42a7de2 to
3ef7ca5
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8)
3ef7ca5 to
4d82c0a
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8) - Updates SegmentedSuffixArray to use ServerErrors per PR#882
4d82c0a to
05bd50d
Compare
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8) - Updates SegmentedSuffixArray to use ServerErrors per PR#882
05bd50d to
935f29b
Compare
commit 82776f9 Author: Zhihui Du <zhihuidu@gmail.com> Date: Fri Mar 19 18:19:08 2021 -0400 solve the inconsistency in dtype commit 87c6327 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Mar 15 21:33:30 2021 -0400 merge with the latest version commit 924ac94 Merge: 18f481c 678097f Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Mar 15 20:33:30 2021 -0400 Merge branch 'master' of github.com:mhmerrill/arkouda commit 18f481c Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Mar 15 18:08:01 2021 -0400 handle some bytes to string commit fb89d88 Merge: c002419 6d65335 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Mar 15 17:05:35 2021 -0400 solve conflict of suffix array, SegmentedMsg, SegmentedArray and run_benchmarks commit 6d65335 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Jan 10 22:18:31 2021 -0500 tuple data type commit f5ca67a Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Jan 10 20:44:09 2021 -0500 remove mypy CI check error commit 010a446 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Jan 10 19:35:09 2021 -0500 return the string for suffix_array_file commit 80f78c6 Merge: 00b3579 e192878 Author: Zhihui Du <zhihuidu@gmail.com> Date: Fri Jan 8 16:08:57 2021 -0500 resolve convlict commit 00b3579 Author: Zhihui Du <zhihuidu@gmail.com> Date: Fri Jan 8 15:00:43 2021 -0500 single locales for C code commit c782b5a Author: Zhihui Du <zhihuidu@gmail.com> Date: Fri Jan 8 13:19:21 2021 -0500 add switch betwteen different SA algorithms commit 2d40c0e Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Jan 5 14:14:36 2021 -0500 solve the sphinx error commit 2af5ce8 Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Jan 5 09:57:42 2021 -0500 check the comments to remove docs CI check error commit 0bff3e4 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Jan 4 18:45:28 2021 -0500 remove the enhenced attribute in sym table commit a8a195a Merge: 221679e f1781e8 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Jan 4 17:40:48 2021 -0500 Merge pull request Bears-R-Us#4 from reuster986/master Fixed bug in UnitTestPeelStick commit 221679e Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Jan 4 17:23:31 2021 -0500 update lcp related code n commit f1781e8 Author: Bill Reus <reuster986@gmail.com> Date: Mon Jan 4 16:18:51 2021 -0500 Fixed bug in UnitTestPeelStick commit 50be2e3 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 27 22:25:52 2020 -0500 copy string_test.py commit 6fd3b05 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 27 18:31:09 2020 -0500 change suffix array return as an int array commit ac7e209 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 27 16:40:20 2020 -0500 add corectness check in sa.py commit 9f5c3d3 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 10:28:59 2020 -0500 check test/*.chpl commit ed98498 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 01:47:35 2020 -0500 make sa.py check easy commit 0aa835c Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 01:33:12 2020 -0500 copy master gather.py commit 189c32e Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 01:15:26 2020 -0500 add an empty correctness function commit 267238c Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 00:55:43 2020 -0500 import SArrays class in pdarraysetops.py commit 4f773e2 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sat Dec 26 00:38:10 2020 -0500 remove the bug causing wrong return string value commit 6074b60 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Dec 24 00:20:55 2020 -0500 remove binary op commit edc3f63 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Dec 24 00:16:22 2020 -0500 remove binary op commit 3861e62 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Dec 24 00:07:05 2020 -0500 bool or pdarray commit e1c3173 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Dec 24 00:01:20 2020 -0500 data type commit 0cda91d Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Dec 23 23:31:10 2020 -0500 type match commit 59174ac Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Dec 23 22:47:58 2020 -0500 correct a typo commit 9d60563 Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Dec 23 22:20:13 2020 -0500 align with strings function commit 16cca77 Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Dec 23 21:49:28 2020 -0500 remove unused import commit a6c536e Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Dec 22 11:18:51 2020 -0500 update the SegSArray commit 32ecc0c Merge: 38818a3 18e8acf Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Dec 22 10:59:10 2020 -0500 solve the conflict commit 38818a3 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Dec 21 09:44:13 2020 -0500 follow suggestions from community commit 6b6e41a Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 18:08:04 2020 -0500 update third party config commit da30cfa Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 17:15:09 2020 -0500 remove suffixarray_test.py commit 2db17d8 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 15:47:18 2020 -0500 datatype in string.py commit e90ca27 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 14:56:33 2020 -0500 remove tab in MultiTypeSymEntry.chpl commit 7a0b197 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 14:44:20 2020 -0500 suffix_arry_file updated commit 9702d46 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 20 13:57:26 2020 -0500 include sa.py into run_benchmarks.py commit 42e3ba7 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Dec 17 18:13:07 2020 -0500 change to relative directory commit b6228e5 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Dec 14 23:17:34 2020 -0500 remove tab, remove unused codes commit 8f88f4c Merge: c9e03fb d184048 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Dec 14 14:28:09 2020 -0500 Merge pull request Bears-R-Us#3 from alvaradoo/patch-1 Update SACA.chpl comments commit d184048 Author: Oliver Alvarado Rodriguez <41132909+alvaradoo@users.noreply.github.com> Date: Mon Dec 14 13:39:23 2020 -0500 Update SACA.chpl comments reworded and added some comments commit c9e03fb Author: David Bader <dbader13@gmail.com> Date: Sun Dec 13 22:21:31 2020 -0500 updated commit 4131820 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 13 21:56:00 2020 -0500 add thirdparty files commit 3a220dc Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 13 21:29:48 2020 -0500 solve conflict commit 9f22f79 Merge: e517e36 940cb79 Author: Zhihui Du <zhihuidu@gmail.com> Date: Sun Dec 13 20:16:35 2020 -0500 add thirdpary files commit 940cb79 Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Dec 9 15:01:34 2020 -0500 confirm submit all changes commit 21df359 Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Dec 8 14:19:37 2020 -0500 add the lcp array method commit dbd6d96 Author: Zhihui Du <zhihuidu@gmail.com> Date: Mon Dec 7 19:35:43 2020 -0500 add Chapel skew suffix array algorithm commit c81d755 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Nov 26 17:47:45 2020 -0500 add suffix_array Python test commit 9a22704 Author: Zhihui Du <zhihuidu@gmail.com> Date: Wed Nov 25 20:18:20 2020 -0500 change name commit d288c10 Author: Zhihui Du <zhihuidu@gmail.com> Date: Tue Nov 24 23:37:55 2020 -0500 add read file suffix array function and all libdivsufsort files commit 7af0b51 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Nov 19 11:36:20 2020 -0500 add suffix array benchmark sa.py commit 65ad500 Author: Zhihui Du <zhihuidu@gmail.com> Date: Thu Nov 19 10:33:48 2020 -0500 add the suffix array function to Arkouda
…e with master and pass tests: - Updates logging statements to use logLevel instead of v (PR Bears-R-Us#760) - Updates `check` to `check_table` (PR Bears-R-Us#792) - Removes outdated `Strings.attach` function which used `_` for subcomponents (new convention is `.` per PR Bears-R-Us#774) - Makes `unregister_strings_by_name` a staticmethod Manually applied recent changes to `SegmentedArray.chpl` because automatic merge was thrown off by addition of `class SegSArray` including: - PR Bears-R-Us#666 string test ends_with failure - PR Bears-R-Us#781 off by one in SegmentedArray peel method
- Moves SArrarys to suffix_array.py - Minimizes diff with master to highlight functionality changes - Refomats suffix-array.py - Reformats SACA.chpl to fix bracket alignment and to have more uniform formattting - Removes unnecessary commented out code - Add type hints to size and bytes in strings.py for mypy - In SegmentedArray.chpl replaces writelns in SegSArray class with logging statements following the example of SegString class - Renames `class SegSArray` to `class SegSuffixArray` and moved this class to SegmentedSuffixArray.chpl (previously in SegmentedArray.chpl along with SegString) - Removes external `libdivsufsort` code and references. Changes default to @zhihuidu's native chapel suffix array construction implementation using skew algorithm
- Removes unused `fields` variable in _parse_single_int_array_value - Passes Exception into ValueError in _parse_single_int_array_value - Inverts logic of `in1d_int` to avoid duplicating the `ind1d` logic - Adds `self.name` to `SArray` class and updates attach/register functions to use dots instead of dashs to be consistent with` the `Strings` class - Removes unnecessary cast in `suffix_array_file` - Changes typing from `sipHash128` and `computeSipHashLocalized` to accept int and uint(8) - Updates suffix_array registration and info functions - Renames SACA.chpl to SuffixArrayConstruction.chpl - Adds UnitTestSuffixArrayConstruction.chpl - Adds docstrings to SuffixArrayConstruction.chpl and removes duplicated logic by modifying `radixPass` to accept int or uint(8) - Updates SegmentedSuffixArray to use ServerErrors per PR#882
…ations and shortens looping variables
cefd5eb to
176fd08
Compare
|
@zhihuidu, I rebased over master so the |
@pierce314159 I appreciate it. In fact, you almost rewrite the code. |
|
@pierce314159 @zhihuidu I would like to see this updated and moved to the |
|
@mhmerrill @pierce314159 No problem. I can rewrite the suffix array code based on Kyle's framework to integrate new functionalities and push it to arkouda-njit first. |
@zhihuidu It might be easier than that, let me take a crack at it and see if I can use git magic to get it in line with the |
|
@pierce314159 If some tools can do it automatically, it will be better. Please go ahead. |
Note: It looked like @pierce314159 didn't have access to the repo, that should be fixed now. 👍 |
|
I am going to close this because we moved this to Bears-R-Us/arkouda-njit#3 |
This pull request updates PR #627 submitted by @zhihuidu.
This PR:
string-suffix-array-functionalityup to date with masterstringsandsuffix arraysand minimize diff with master/highlight changeslibdivsufsortin favor of zhihuidu's native chapel suffix array construction implementationbf1757c Repackages zhihuidu's commits in PR#627
07f51d8 Applies original minimum changes to resolve conflicts with master
checktocheck_tableStrings.attachfunction which used_for subcomponents (new convention is.)unregister_strings_by_namea staticmethodSegmentedArray.chpl(automatic merge was thrown off by addition ofclass SegSArray) including:ca4941b Implements changes suggested in PR 627 feedback
SArrarysclass tosuffix_array.pysuffix_array.pySACA.chplto fix bracket alignmentstrings.pyfor mypywritelnsinSegmentedArray.chplwith logging statements following the example ofSegStringclassPreviously stand-alone commits have been squashed into this one since Zhihui approved them
libdivsufsortcodebase into arkouda or forcing the dependency on arkouda users.class SegSArraytoclass SegSuffixArrayand moves this class toSegmentedSuffixArray.chpl(previously inSegmentedArray.chplalong withSegString). Rearranged to increase separation withstringsand minimize diff with master.935f29b Implements changes suggested in this PR's feedback
fieldsvariable in_parse_single_int_array_valueExceptionintoValueErrorin_parse_single_int_array_valuein1d_intto avoid duplicating theind1dlogicself.nametoSArrayclass and updates attach/register functions to use dots instead of dashs to be consistent with theStringsclasssuffix_array_filesipHash128andcomputeSipHashLocalizedto acceptintanduint(8)suffix_arrayregistration and info functionsSACA.chpltoSuffixArrayConstruction.chplUnitTestSuffixArrayConstruction.chplSuffixArrayConstruction.chpland removes duplicated logic by modifyingradixPassto acceptintoruint(8)