- Improving case transform logic (now it respects the case from the o…

…riginal word, addressing #30 and partially #24) - Including new basic case transform operations - Implementing extensive case transform mode - Fixing typos
r3nt0n · Sep 2, 2024 · 9104d04 · 9104d04
1 parent c66f91a
commit 9104d04
Show file tree

Hide file tree

Showing 12 changed files with 65 additions and 52 deletions.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -10,3 +10,8 @@ prune */__pycache__
 prune build
 prune dist
 prune bopscrk/tests
+prune img
+prune .gitignore
+prune .gitmodules
+prune build_package.sh
+prune requirements.txt
diff --git a/README.md b/README.md
@@ -35,9 +35,9 @@ Thanks dude :)
     <br />
     <a href="#about-the-project">View Demo</a>
     ·
-    <a href="/r3nt0n/bopscrk">Report Bug</a>
+    <a href="/r3nt0n/bopscrk/issues">Report Bug</a>
     ·
-    <a href="/r3nt0n/bopscrk">Request Feature</a>
+    <a href="/r3nt0n/bopscrk/issues">Request Feature</a>
   </p>
 </div>
 
@@ -66,8 +66,8 @@ Thanks dude :)
       <ul>
         <li><a href="#how-it-works">How it works</a></li>
         <li><a href="#tips">Tips</a></li>
-        <li><a href="#lyricpass">Lyricpass</a></li>
         <li><a href="#advanced-usage">Advanced usage</a></li>
+        <li><a href="#lyricpass">Lyricpass</a></li>
       </ul>
     </li>
     <li><a href="#roadmap">Roadmap</a></li>
@@ -90,7 +90,7 @@ Thanks dude :)
 <!-- ABOUT THE PROJECT -->
 ## About the Project
 
-<p align="center"><img src="https://github.com/r3nt0n/bopscrk/blob/master/img/bopscrk-2.4.5.gif" /></p>  
+<p align="center"><img src="https://github.com/r3nt0n/bopscrk/blob/master/img/bopscrk.gif" /></p>  
 
 
 
@@ -108,7 +108,7 @@ Thanks dude :)
 
 ### What's new
 
-**2.4.6 RELEASED** (30/08/2024): Speed and performance dramatically increased, real multiprocessing implementation.   
+- **2.4.7 RELEASED** (02/09/2024): Speed and performance dramatically increased. New extensive case transform mode allows to generate all possible case transforms.
 
 [//]: # (<p align="center"><img src="https://github.com/r3nt0n/bopscrk/blob/master/img/progressbar_example1.gif" /></p>)
 
@@ -183,18 +183,15 @@ _For more information, please refer to the [Advanced usage](#advanced-usage) sec
 + You **can use accentuation** in your words and special chars (if you use the non-interactive mode, escape special chars like `'` and `"` with backslashes, e.g.: `bopscrk -w John,O\'hara,Doe,foo,bar`).
 + In the others field you can write **several words comma-separated**. *Example*: 2C,Flipper.
 + If you want to produce **all possible leet transformations**, enable the **recursive_leet option** in configuration file.
++ If you want to produce **all possible case transformations**, enable the **extensive_case option** in configuration file.
 + You can **select which transforms to apply on lyrics phrases** found through the **cfg file**.
 + Using the **non-interactive mode**, you should provide years in the long and short way (1970,70) to get the same result than the interactive mode.
 + You have to be careful with **-n** argument. If you set a big value, it could result in **too huge wordlists**. I recommend values between 2 and 5.
 + To provide **several artist names** through command line you should provide it **comma-separated**. *Example*: `-a johndoe,johnsmith`
 + To provide **artist names with spaces** through command line you should provide it **quotes-enclosed**. *Example*: `-a "john doe,john smith"`
 
-### Lyricpass 
-<p align="center"><img src="https://github.com/R3nt0n/bopscrk/blob/master/img/lyricpass_demo.png" /></p>  
-
-This feature is based in a modified version of a [tool](https://github.com/initstring/lyricpass) developed originally by [initstring](https://github.com/initstring/). The changes are made to integrate input and output's tool with bopscrk.  
+<p align="right">(<a href="#top">back to top</a>)</p>
 
-It will retrieve all lyrics from all songs which belongs to artists that you provide. **By default it will store each artist, each phrase found with space substitution, each phrase found reduced to its initials** (which will be transformed later if you have activated leet and case transforms).
 
 ### Advanced usage
 
@@ -204,7 +201,8 @@ It will retrieve all lyrics from all songs which belongs to artists that you pro
   + **separators_chars**: characters to use in extra-combinations. *Can be a single char or a string of chars, e.g.: `!?-/&(`*  
   + **separators_strings**: strings  to use in extra-combinations. *Can be a single string or a list of strings space-separated, e.g.: `123` `34!@`*
   + **leet_charset**: characters to replace and correspondent substitute in leet transforms, *e.g.: `e:3 b:8 t:7 a:4`* 
-  + **recursive_leet**: enables a recursive call to leet_transforms() function to get all possible leet transforms (*disabled by default*). *WARNING*: enabled with huge --max parameters (e.g.: greater than 18) could take even days. *Can be true or false.* 
+  + **recursive_leet**: enables a recursive call to leet_transforms() function to get all possible leet transforms. *WARNING*: enabled with huge `--max` values (e.g.: greater than 18) could take a long time. *Can be true or false.*
+  + **extensive_case**: by default, bopscrk only applies the more common case transforms: all chars to lower, all chars to upper, each char to upper, all pairs to upper, all odds to upper, all consonants to upper and all vowels to upper. You can enable this option to obtain ALL possible case transforms, which can result in much larger wordlists, but might be useful in some scenarios. *Can be true or false.*
   + **remove_parenthesis**: remove all parenthesis in lyrics found before any transform  
   + **take_initials**: produce words based on initial of each word in lyric phrases found (if enabled with remove_parenthesis disabled, it can produce useless words)
   + **artist_split_by_word**: split artist names and add each word as a new one 
@@ -222,7 +220,14 @@ It will retrieve all lyrics from all songs which belongs to artists that you pro
 
 <p align="right">(<a href="#top">back to top</a>)</p>
 
+### Lyricpass 
+<p align="center"><img src="https://github.com/R3nt0n/bopscrk/blob/master/img/lyricpass_demo.png" /></p>  
+
+This feature is based in a modified version of a [tool](https://github.com/initstring/lyricpass) developed originally by [initstring](https://github.com/initstring/). The changes are made to integrate input and output's tool with bopscrk.  
+
+It will retrieve all lyrics from all songs which belongs to artists that you provide. **By default it will store each artist, each phrase found with space substitution, each phrase found reduced to its initials** (which will be transformed later if you have activated leet and case transforms).
 
+<p align="right">(<a href="#top">back to top</a>)</p>
 
 <!-- ROADMAP -->
 ## Roadmap
@@ -271,6 +276,12 @@ Thank you all!
 
 ## Changelist
 [//]: # (+ `last development version &#40;available on Github&#41;`)
++ `2.4.7 version notes (02/09/2024)` 
+  + Improving **case transform logic** (now it respects the case from the original word)
+  + Including **new basic case transform operations**
+  + Implementing **extensive case transform mode**
+  + Fixing typos
+
 + `2.4.6 version notes (30/08/2024)`
   + **Increasing parallelism performance** (real multiprocessing implementation)
   + Better handling of config parser errors
@@ -360,6 +371,7 @@ bopscrk: [Github](https://github.com/r3nt0n/bopscrk) - [Pypi](https://pypi.org/p
 
 * lyricpass module is based on a [project](https://github.com/initstring/lyricpass) created by [initstring](https://github.com/initstring).
 * [Pixel Gothic font](https://dafonttop.com/pixel-gothic-font.font) by [Kajetan Andrzejak](https://dafonttop.com/tags.php?key=Kajetan%20Andrzejak).
+* [Best-README-Template](https://github.com/othneildrew/Best-README-Template) by [othneildrew](https://github.com/othneildrew/).
 
 <p align="right">(<a href="#top">back to top</a>)</p>
 

diff --git a/bopscrk/bopscrk.cfg b/bopscrk/bopscrk.cfg
@@ -15,10 +15,10 @@
 [COMBINATIONS]
 # Enables extra combination and additions at beginning and end of words
 # example: (john, doe) => 123john, john123, 123doe, doe123, john123doe doe123john
-extra_combinations=true
+extra_combinations=false
 # SEPARATORS CHARSET - Characters to use in extra-combinations
-separators_chars=._-$%%&#@
-separators_strings=!! 123 xXx
+separators_chars=._-+&@!
+separators_strings=!! 123
 # To get extensive charsets, uncomment the following lines:
 #separators_chars=!"#$%%&'()*+,-./:;<=>?@[\]^_`{|}~
 #separators_strings=!! ¡¡ !!! ¡¡¡ ¡!¡ !¡! 123 1234 xXx XxX WwW wWw
@@ -35,6 +35,7 @@ leet_charset=a:4 e:3 i:1 o:0 s:$
 # Comment this line or set it to false in case you don't want to get all possible leet transforms
 # (!) Warning: enabled with huge --max parameters (e.g.: greater than 18) could take several minutes
 recursive_leet=true
+extensive_case=false
 
 [LYRICS]
 # Remove all parenthesis in lyrics found before any transform
@@ -49,7 +50,7 @@ lyric_space_replacement=true
 # SPACE REPLACEMENT CHARSET - Characters and/or strings to insert instead of spaces
 # inside an artist name or a lyric phrase
 # Comment two above lines or set it empty in order to don't replace spaces, just remove them
-space_replacement_chars=!@+._-
+space_replacement_chars=._-+&@!
 space_replacement_strings=
 # To get an extensive charset, uncomment the following line
 #space_replacement_chars=!"#$%%&'()*+,-./:;<=>?@[\]^_`{|}~
diff --git a/bopscrk/bopscrk.py b/bopscrk/bopscrk.py
@@ -6,7 +6,7 @@
 
 name = 'bopscrk.py'
 desc = 'Generate smart and powerful wordlists'
-__version__ = '2.4.6'
+__version__ = '2.4.7'
 __author__ = 'r3nt0n'
 __status__ = 'Development'
 

diff --git a/bopscrk/modules/args.py b/bopscrk/modules/args.py
@@ -160,13 +160,13 @@ def set_interactive_options(self):
         self.base_wordlist = []
         # here I can select on which wordlist include each info by their weight (to implement)
         if not is_empty(firstname):
-            firstname = firstname.lower()
+            firstname = firstname
             self.base_wordlist.append(firstname)
         if not is_empty(surname):
-            surname = surname.lower()
+            surname = surname
             self.base_wordlist.append(surname)
         if not is_empty(lastname):
-            lastname = lastname.lower()
+            lastname = lastname
             self.base_wordlist.append(lastname)
         if not is_empty(birth):
             birth = birth.split('/')
@@ -176,12 +176,12 @@ def set_interactive_options(self):
         if not is_empty(others):
             others = others.split(',')
             for i in others:
-                self.base_wordlist.append(i.lower())
+                self.base_wordlist.append(i)
 
     def set_cli_options(self):
         self.base_wordlist = []
         if self.args.words:
-            [self.base_wordlist.append(word.lower()) for word in ((self.args.words).split(','))]
+            [self.base_wordlist.append(word) for word in ((self.args.words).split(','))]
         self.min_length = self.args.min
         self.max_length = self.args.max
         self.leet = self.args.leet

diff --git a/bopscrk/modules/banners.py b/bopscrk/modules/banners.py
@@ -19,7 +19,7 @@ def banner(name, version, author="r3nt0n"):
         name_rand_leet = name_rand_leet[randint(0, (len(name_rand_leet) - 1))]
     except:
         name_rand_leet = name
-    name_rand_case = case_transforms(name)
+    name_rand_case = case_transforms_basic(name)
     name_rand_case = name_rand_case[randint((len(name_rand_case) - 3), (len(name_rand_case) - 1))]
     #version = version[:3]
     print('  ,----------------------------------------------------,   ,------------,');sleep(interval)
@@ -33,15 +33,7 @@ def banner(name, version, author="r3nt0n"):
     print('  `----------------------------------------------------´   `------------´\n');sleep(interval)
 
 def help_banner():
-    print(u'  +---------------------------------------------------------------------+');sleep(interval)
-    print(u'  | Fields can be left empty.  You can use accentuation in your words.  |');sleep(interval)
-    print(u'  | If you enable case transforms,  won\'t matter the lower/uppercases   |');sleep(interval)
-    print(u'  | in your input. In "others" field (interactive mode), you can write  |');sleep(interval)
-    print(u'  | several words comma-separated (e.g.: 2C,Flipper).                   |');sleep(interval)
-    print(u'  |                                                                     |');sleep(interval)
-    print(u'  |                              For advanced usage and documentation:  |');sleep(interval)
-    print(u'  |                                  {}https://github.com/r3nt0n/bopscrk{}  |'.format(color.ORANGE,color.END));sleep(interval)
-    print(u'  +---------------------------------------------------------------------+\n');sleep(interval)
+    print(u'    Advanced usage and documentation: {}https://github.com/r3nt0n/bopscrk{}'.format(color.ORANGE,color.END));sleep(interval)
 
 def bopscrk_banner():
     sleep(interval * 4)

diff --git a/bopscrk/modules/config.py b/bopscrk/modules/config.py
@@ -43,6 +43,7 @@ def setup(self):
                                                       self.read_config('COMBINATIONS', 'separators_strings'))
         self.LEET_CHARSET = (self.read_config('TRANSFORMS', 'leet_charset')).split()
         self.RECURSIVE_LEET = self.parse_booleans(self.read_config('TRANSFORMS', 'recursive_leet'))
+        self.EXTENSIVE_CASE = self.parse_booleans(self.read_config('TRANSFORMS', 'extensive_case'))
         self.REMOVE_PARENTHESIS = self.parse_booleans(self.read_config('LYRICS', 'remove_parenthesis'))
         self.TAKE_INITIALS = self.parse_booleans(self.read_config('LYRICS', 'take_initials'))
         self.ARTIST_SPLIT_BY_WORD = self.parse_booleans(self.read_config('LYRICS', 'artist_split_by_word'))

diff --git a/bopscrk/modules/main.py b/bopscrk/modules/main.py
@@ -155,7 +155,7 @@ def run(name, version):
         # LEET TRANSFORMS
         if args.leet:
             if not Config.LEET_CHARSET:
-                print('  {}[!]{} Any leet charset specified in {}'.format(color.ORANGE, color.END, args.cfg_file))
+                print('  {}[!]{} No leet charset specified in {}'.format(color.ORANGE, color.END, args.cfg_file))
                 print('  {}[!]{} Skipping leet transforms...'.format(color.ORANGE, color.END, args.cfg_file))
             else:
                 recursive_msg = ''
@@ -176,7 +176,10 @@ def run(name, version):
 
         # CASE TRANSFORMS
         if args.case:
-            print('  {}[+]{} Applying case transforms to {} words...'.format(color.BLUE, color.END,len(final_wordlist)))
+            extensive_msg = ''
+            if Config.EXTENSIVE_CASE:
+                extensive_msg = '{}extensive{} '.format(color.ORANGE, color.END)
+            print('  {}[+]{} Applying {}case transforms to {} words...'.format(color.BLUE, color.END, extensive_msg, len(final_wordlist)))
 
             # transform_cached_wordlist_and_save(case_transforms, args.outfile) # not working yet, infinite loop ?¿?¿
 

diff --git a/bopscrk/modules/transforms.py b/bopscrk/modules/transforms.py
@@ -3,18 +3,28 @@
 # https://github.com/r3nt0n/bopscrk
 # bopscrk - transform functions module
 
+import itertools
 from multiprocessing import cpu_count, Pool
-
 from alive_progress import alive_bar
 
 from . import Config
 from .excluders import remove_duplicates
 from .auxiliars import append_wordlist_to_file
 
+# EXTENSIVE: generates all case transforms possibilities
+def case_transforms_extensive(word):
+    word = word.lower()
+    return [new_word for new_word in map(''.join, itertools.product(*zip(word.upper(), word.lower())))]
 
-def case_transforms(word):
+# BASIC: generates the more probable case transformations, but not all possibilities
+def case_transforms_basic(word):
+    word = word.lower()
     new_wordlist = []
 
+    # Include all chars to lower and all chars to upper
+    new_wordlist.append(word)
+    new_wordlist.append(word.upper())
+
     # Make each one upper (hello => Hello, hEllo, heLlo, helLo, hellO)
     i=0
     for char in word:
@@ -55,15 +65,12 @@ def case_transforms(word):
         else: new_word += char
     if new_word not in new_wordlist: new_wordlist.append(new_word)
 
-    # recursive call function (not working, maybe this option won't be even useful)
-    # for new_word in new_wordlist:
-    #     original_size = len(new_wordlist)
-    #     new_wordlist.extend(case_transforms(new_word))
-    #     if len(new_wordlist) == original_size:
-    #         break  # breaking recursive call
-
     return new_wordlist
 
+def case_transforms(word):
+    if Config.EXTENSIVE_CASE:
+        return case_transforms_extensive(word)
+    return case_transforms_basic(word)
 
 def leet_transforms(word):
     new_wordlist = []
@@ -77,14 +84,9 @@ def leet_transforms(word):
                 leeted_char = lchar[-1:]
                 new_word = word[:i] + leeted_char + word[i + 1:]
                 if new_word not in new_wordlist: new_wordlist.append(new_word)
-                # dont break to allow multiple transforms to a single char (e.g. a into 4 and @)
+                # don't break to allow multiple transforms to a single char (e.g. a into 4 and @)
         i += 1
 
-    # MULTITHREAD RECURSIVE call function (doesn't seem efficient)
-    # if Config.RECURSIVE_LEET and (len(new_wordlist) > original_size):
-    #     new_wordlist += multithread_transforms(leet_transforms, new_wordlist)
-
-    # UNITHREAD RECURSIVE call function
     if Config.RECURSIVE_LEET:
         for new_word in new_wordlist:
             original_size = len(new_wordlist)
@@ -181,7 +183,4 @@ def transform_cached_wordlist_and_save(transform_type, filepath):
             f.seek(last_position)  # put point on last position
             line = f.readline()
             if not line:
-                break
-
-
-
+                break
diff --git a/img/bopscrk-2.3.gif b/img/bopscrk-2.3.gif
diff --git a/img/bopscrk-2.4.5.gif b/img/bopscrk-2.4.5.gif
diff --git a/img/bopscrk.gif b/img/bopscrk.gif