Skip to content

Commit 4df4ac6

Browse files
committed
Added new testcases that were added to circom tests. Concretely: a few new testcases that are added to the database file and a change in circuits and expansion of tests for both hardcoded test projects (from_addr and to_addr);
Also added more labels to the database file `regex_db_for_bench` so in the future they can be easily compared + explanation in the test_suite/README.
1 parent 59c1d9f commit 4df4ac6

File tree

11 files changed

+679
-3566
lines changed

11 files changed

+679
-3566
lines changed

test_suite/README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
This test-suite is a tool for testing and benchmarking the zk-regex library module that generates Noir code to check if certain regex is matched.
44

5+
In addition, all the tests for which the circom implementation of zk-regex gets tested are implemented. A large part can be ran automatically through the automated testing/benchmarking functionality for which a database has to be provided (and `regex_db_for_bench.json` contains all the necessary testdata) and there are 2 testcases that have been implemented separately in `hardcoded_tests`. The latter is necessary because for those tests multiple circuits have to be combined manually.
6+
57
## Requirements
68

79
- Install zk-regex command following the instructions in the documentation.
@@ -126,8 +128,49 @@ If you want to execute both the testing and the benchmarking you need to follow
126128
RUST_LOG=info cargo run -- -t <no-time | with-time>
127129
```
128130

131+
## Circom testing compatibility
132+
133+
The file `regex_db_for_bench.json` contains all testcases that the [circom implementation tests](https://github.com/zkemail/zk-regex/tree/main/packages/circom/tests) for and some additional ones.
134+
135+
The circom tests are contained in `.test.js` files and usually test for various circuits within a single file. In `regex_db_for_bench.json` the information is organized per circuit, but we've added information to link them back to the circom tests. If there's a specific circuit file it relates to, this is indicated with `circuit_name`, otherwise there is a reference in `test_name`. Examples:
136+
```json
137+
"circuit_name": "asterisk1"
138+
139+
"test_name": "simple_regex.test.js"
140+
```
141+
142+
Furthermore, to relate the test inputs to the specific testcases, `circom_testname` is indicated for passing samples, and separately in an array for failing samples. E.g:
143+
144+
```json
145+
"samples_pass": [
146+
{
147+
"input": "xb",
148+
"expected_substrings": [],
149+
"circom_testname": "asterisk1 valid case 1"
150+
},
151+
{
152+
"input": "xab",
153+
"expected_substrings": ["a"],
154+
"circom_testname": "asterisk1 valid case 2"
155+
}
156+
],
157+
"samples_fail": [
158+
"xaaa",
159+
"aaabx"
160+
],
161+
"circom_testnames_fails": [
162+
"asterisk1 invalid case 1",
163+
"asterisk1 invalid case 2"
164+
]
165+
```
166+
167+
These labels make it easier to verify whether all circom tests have been implemented.
168+
169+
Note that as mentioned in the introduction, there are a few tests that are implemented in `hardcoded_tests` as they combine multiple circuits, which cannot be done through the automated process.
170+
129171
## Limitations
130172

131173
For some regexes the random sampling is not possible, because the sampling library is limited. For example the end anchor (`$`) is not supported.
132174

133175
Random sample testing for the `gen_substrs` setting is only support for `decomposed`. In the `raw` setting, the substrings are determined via a json file that contains the transition information. Determining what the substring parts are, would be quite involved since it requires building the DFA.
176+

test_suite/bench_result.csv

Lines changed: 37 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,37 @@
1-
acir_opcodes,circuit_size,regex,with_gen_substr,proving_time
2-
43,3175,(a|b)+c,false,0.18447697588
1+
acir_opcodes,circuit_size,regex,with_gen_substr
2+
779,8828,"x[0-9]{2}-y{2,3}_z$",false
3+
801,19600,email was meant for @[a-z]+,false
4+
788,9603,1=(a|b) (2=(b|c)+ )+d,false
5+
781,4220,xa*b,false
6+
780,3451,ab*,false
7+
781,4220,a(x|y)*b,false
8+
782,4221,^a,false
9+
782,4221,^(a|b|c),false
10+
782,4221,(^|a)b+,false
11+
783,4990,(\n|^)x(a|b)+,false
12+
790,10373,(\n|^)x[^abc]+,false
13+
772,3445,a[bc]$,false
14+
775,4984,(\n|^)xa[bc]$,false
15+
787,8834,.,false
16+
789,10372,a.b,false
17+
820,34211,Latin-Extension=[¡-ƿ]+ Greek=[Ͱ-Ͽ]+ Cyrillic=[Ѐ-ӿ]+,false
18+
827,39594, Arabic=[؀-ۿ]+ Devanagari=[ऀ-ॿ]+ Hiragana&Katakana=[ぁ-ヿ]+,false
19+
786,8065,( )?(c|C)ode( )?(0|1|2|3|4|5|6|7|8|9|a|b|c|d|e|f)+,false
20+
790,11141,a:[^abcdefghijklmnopqrstuvwxyz\.]+\.,false
21+
787,8834,[^ab],false
22+
781,4220,a+b,false
23+
781,4220,a(b|c)+,false
24+
782,4989,a(bc)+,false
25+
784,6527,(12|345)+b,false
26+
781,4220,a?b,false
27+
783,5758,(1x?2)+b,false
28+
783,5758,12(a|b)?c,false
29+
782,4989,aba,false
30+
781,4220,a[ab],false
31+
802,20369,email was meant for @[a-zA-Z0-9_]+\.,false
32+
782,4989,[A-Za-z0-9!#$%&'*+=?\-\^_`{|}~./@]+@[A-Za-z0-9.\-]+,false
33+
782,4989,[A-Za-z0-9!#$%&'*+=?\-\^_`{|}~./]+@[A-Za-z0-9.\-@]+,false
34+
798,16525,(\r\n|^)message-id:<[A-Za-z0-9=@\.\+_-]+>\r\n,false
35+
800,18063,(\r\n|^)subject:[^\r\n]+\r\n,false
36+
813,28060,(\r\n|^)dkim-signature:([a-z]+=[^;]+; )+t=[0-9]+;,false
37+
814,28829,(\r\n|^)dkim-signature:([a-z]+=[^;]+; )+bh=[a-zA-Z0-9+/=]+;,false

test_suite/hardcoded_tests/from_addr/README.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,24 @@
22

33
This test for the circom code uses a circuit that includes 3 templates. The 3 templates have been generated with decomposed regex information (see `/decomposed`):
44
- "from_all"
5-
- "email_addr_with_name"
5+
- "reversed_bracket"
66
- "email_addr"
77

8-
They are combined as follows: "from_all" extracts a substring `s`. Then `s` is used as the input for both "email_addr_with_name" and "email_addr".
8+
They are combined as follows: "from_all" extracts a substring `s`. The reversed version of `s` is used as the input for "reversed_bracket" and the standard version as input for "email_addr".
99

10-
- "email_addr_with_name" matches an email address between `<>`
10+
- "reversed_bracket" matches an email address between `<>`. Since the input is the reversed string, it take into account the last email that pops up between `<>`
1111
- "email_addr" only matches the email address
1212

13-
If "email_addr_with_name" found an email address, that one is returned. Otherwise the email address found with "email_addr" is returned. If nothing was found, the match fails.
13+
If "reversed_bracket" found an email address, that one is reversed and returned. Otherwise the email address found with "email_addr" is returned. If nothing was found, the match fails.
1414

1515
## Recreate templates
1616

1717
```
1818
zk-regex decomposed -d decomposed/from_all.json --noir-file-path src/from_all.nr -g true
1919
20-
zk-regex decomposed -d decomposed/email_addr_with_name.json --noir-file-path src/email_addr_with_name.nr -g true
20+
zk-regex decomposed -d decomposed/reversed_bracket.json --noir-file-path src/reversed_bracket.nr -g true
2121
2222
zk-regex decomposed -d decomposed/email_addr.json --noir-file-path src/email_addr.nr -g true
2323
```
2424

25-
Note that `email_addr_with_name.nr` and `email_addr.nr` have to be adjusted to return a bool instead of fail the assertion if they don't match the regex. Otherwise we can't execute both functions and see which one returns a substring.
25+
Note that `reversed_bracket.nr` and `email_addr.nr` have to be adjusted to return a bool instead of fail the assertion if they don't match the regex. Otherwise we can't execute both functions and see which one returns a substring.

test_suite/hardcoded_tests/from_addr/decomposed/email_addr_with_name.json

Lines changed: 0 additions & 16 deletions
This file was deleted.

0 commit comments

Comments
 (0)