You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# :white_check_mark: 🇮🇷 🇮🇷 🇮🇷 Regex for Persian (Farsi) Language 🇮🇷 🇮🇷 🇮🇷
3
3
4
-
## :white_check_mark: Regular expressions for Persian aka Farsi language 🇮🇷
5
4
6
-
#### Regular expressions for validating, sanizing and filtering strings when it must be Persian.
5
+
#### Collection of Regex for validating, filtering, sanitizing and finding Persian strings.
6
+
7
+
8
+
### Introduction
9
+
10
+
11
+
Because of historical matters, many Arabic characters get a way into Persian language and transformed it, In these years many efforts have been made by government and non-governmental organizations to revivification of authority of Persian language and this is one of them.
12
+
7
13
8
14
#### :eight_pointed_black_star: Notes
9
15
10
-
* because of historical matters, many Arabic characters get a way into Persian language and transformed it,
11
-
but in these years many efforts made by government and non-government organizations to make a personality for Persian language.
12
16
13
-
* Persian alphabet consists of 32 characters but for above reasons there are five more Arabic characters that used in many old text so they are supported in the regex
17
+
* Persian alphabet consists of 32 characters and 3 vowel marks, but for above reasons there are 6 more Arabic characters and 8 more vowel marks that are being used in many texts.
18
+
19
+
20
+
* The important part of this effort, is codepoints range, so you can create your own regex for validating, filtering and finding strings, just put the desired range in it.< br>
21
+
for example when string should only contains persian words and spaces just concat space codepoints and persian alpha codepoints in the final Regex and so on.
14
22
15
-
* The important part of this effort, is codepoints range, so you can create your own regex in any way you want just put the desired codepoints range in it
16
23
17
-
* All patterns only pass one word means characters with no space if you want to run patterns against more than one word
18
-
just concat space pattern to desired patterns
24
+
* Characters in table are sorted by codepoints
19
25
26
+
* See tests after reading.
27
+
28
+
---
20
29
21
-
### :black_square_button: Codepoints range
30
+
31
+
### :black_square_button: Codepoints Range
22
32
23
33
24
34
### :white_square_button: Space
25
35
26
36
27
-
includes all kind of space specially zero-width space that massively use in Persian
37
+
This ranges include all kind of space, specially zero-width space that massively are using in Persian texts.
#### :eight_pointed_black_star: for more common punctutation marks like `” | « | » | ?| ; | : | ...` <br> see [general punctuation page in unicode](https://en.wikipedia.org/wiki/List_of_Unicode_characters#General_Punctuation)
0 commit comments