Customizable open-source software to generate randomized sketched web-pages.
Mapping hand sketches to computer code is a modern AI problem that has been under research for some time. One of the main example applications for this problem is the generation of web-page front-end code from a hand sketch. Training a deep learning model for this task requires a large dataset that contains a wide variety of samples, but sketching a large number of web-pages is a tedious and time-consuming task. The software we publish is an efficient tool to generate any number of unique sketched web-pages.
This program was one main part of our graduation project, in which we developed an end-to-end deep learning model to output web-page code given an input sketch. We could not find any suitable dataset for our case, so we built a dataset generator that creates randomized web-pages then turns them into matching sketches.
We have created a DSL (domain-specific language) dictionary in which we map a block of code to a single token to simplify our problem. The generator creates unique random pages using the DSL dictionary and applying a set of rules to output DSL files which are then mapped to realistic looking web-pages using a DSL compiler. The resulting web-pages are rendered with a special CSS file using PhantomJS engine. Finally we apply simple object detection on the rendered web-pages to detect all the different elements within this page, and create a matching sketch for each web-page; the sketch is generated by placing an actual hand sketch for the detected element, the sketched element is chosen randomly from a set of images provided for this element.
Here is the block diagram for the generator:
The main rule is to follow a correct web-page structure when creating a new randomized page.
Mapping the DSL to HTML code.
The rendered web-page without and with the special CSS file applied:
The element detection process and sketched output:
- Generate any number of unique web-pages.
- Generate any number of different sketches for each web-page.
- Automatically save batches of generated sketches in zip files.
- Ability to stop and resume the generation process keeping the output uniqueness.
- Option to save intermediate outputs.
- Python 3.6
- Dependencies:
pip install cv2 numpy selenium imutils imagesize
python main.py --help
Main.py [-h] --number NUMBER [--fresh] [--variations VARIATIONS]
[--intermediate] [--height HEIGHT] [--zipping]
[--batchsize BATCHSIZE] [--verbose]
optional arguments:
-h, --help show this help message and exit
--number NUMBER, -n NUMBER
Number of samples to be generated. Starts with n=0 if
the output directory is empty.
--fresh, -f Fresh start; removes any existing outputs.
--variations VARIATIONS, -v VARIATIONS
Number of different sketches for each generated
webpage.
--intermediate, -i Save intermediate outputs from rendering during
generation process.
--height HEIGHT Specifiy page height in pixels. Note: page width is
1200px
--zipping, -z Store batches of output files as zipped files. Default
batch size is 500.
--batchsize BATCHSIZE, -s BATCHSIZE
Number of pages to be zipped together.
--verbose Printing in console during execution.
- Currently, data augmentation is done by generating a variety of sketches for each web-page. Creating different modified versions of the same sketch is disabled as it needs more tuning.
- Elements may intersect or overlap. This highly depends on the provided sketched elements.
- Sketched element random selection needs more biasing to choose close size and aspect ratio elements, and therefore we need more variations of sketched elements for each element.
- Some performance optimizations need to be done.
- The DSL method was mainly inspired by pix2code project by Toni.
- The DSL compiler we use is a developed version of pix2code DSL compiler.
- We used simple startbootstrap templates as a reference for our generated web-pages.
- We use phantomjs engine to render the web-pages using python.
MIT License Copyright (c) 2019 Abdelrahman Abdelhamid, Ahmed Bally, Eman El-Sheikh and Abdelrahman Metwally.