-
Notifications
You must be signed in to change notification settings - Fork 906
Discussion
Suggestions, feature requests and discussion go here. See the TODO page for features which have not yet been implemented.
Things have been going well. WhatWeb 0.4.5 is a good, stable tool and has earned community recognition.
We've been tearing webpages apart and fingerprinting them piece by piece. We've built plugins for many web applications, client side libraries and HTML elements, but now we have a few important issues to consider regarding WhatWeb's direction.
Design Philosophy
- Always use an intuitive interface. never force a user to choose an option when a default is better. the following command must always work:
./whatweb slashdot.org
- Never take choices away from the user. Each automatic decision should be a default for a configurable option. examples: follow redirects.
- Avoid premature over-engineering. do not implement core code to handle types of information that few plugins currently return. Allow plugins to return the information in generic formats such as :string instead. Wait until many plugins are returning the same type of information, such as operating system, filepaths, versions, or modules before considering how to solve this problem in the core. Premature over-engineering is the type of error that kills a project.
- When a solution to a problem is inelegant then do not implement it in WhatWeb. Instead continue to meditate on the problem for as long as required. If you need a fast solution then hack up your own version of WhatWeb and do not introduce the patch into the core, I have done this many times.
- WhatWeb must grow horizontally and vertically together. WhatWeb must be good at solving a type of problem before entering a new area. for example, WhatWeb must be competent at identifying a system before it starts becoming good at identifying versions of systems. If WhatWeb is known to be patchy in it's coverage this could kill the project. this is the rationale behind not implementing security checks yet. This also works with the unix philosophy of doing one thing but doing it really well.
- Breaking backwards compatibility is OK.
Multi-App plugins
We're at a fork in the road on this one. On one side, we can fingerprint each application individually and write a plugin for each one. On the other we can incorporate many different applications of the same type into one plugin, for example all third party javascript libraries.
bcoles: I'm in favor of categorizing plugins rather than combining multiple applications into a single plugin. I'd rather see output Google-Analytics[713526426]
than Third-Party-Library[Google-Analytics[713526426]]
Exceptions:
An exception would be fingerprinting generic wep apps: admin panels or web backdoors for example. Applications where you're only able to fingerprint generically using subtle clues, such as "/admin/", "/login/" or "?cmd=" in the URL. It doesn't necessarily mean that an admin panel or backdoor are present, but it's a good indication.
It also acceptable to write plugins which return different models for hardware. It is not feasible to write a different plugin for every model.
Output becomes a wall of text
We now have numerous plugins which return a file path from the source of an HTML element. For example,
- Redirect-Location
- Frame
- RSS-Feed
- Mailto
- Title
- Script
- Shortcut Icon
These types of plugins are great for plugin development, data mining or noticing patterns across networks.
The problem is the WhatWeb output becomes a massive wall of text, even in --log-brief
mode.
One way around this is by putting these types of plugins in a "plugin development" category and allowing the user to enable/disable certain categories.
For now most of these plugins are in the ./plugins-disabled
directory.
One solution is a new output format combined with plugin categories (see below).
Should plugins be categorized? If so, should they be layered (ie, sub-categories)?:
- Server
- Language
- Program
- Third Party Library
or
- HTML Elements
- Program
- Vendor
- Server
- Development
- Config/Log files
or
- HTTP Server. Apache, Nginx
- Language. PHP, ASP, ASP.NET, ColdFusion
- Framework. Cake, Zend, Ruby on Rails ( can u tell this from the language and CMS?)
- CMS/Blog. WordPress, Joomla, Drupal
- JS Library. Scriptaculus, Prototype, JQuery, Google Analytics
- Hardware devices. Xerox Printers, Cisco routers, D-link cameras
- Common. Title, Subdomains, Uncommon-headers, X-Powered-By, Mailto
- Hashes. Header-hash, footer-hash
bcoles: Categories for plugins should be defined as an array of tags within the plugin file. Tagging is superior to categorization.
I (Andrew) like the above categories best but it is far from complete. The first categories break down into an OSI-like set of layers nicely. The 'hardware devices' category should be considered covering all layers from the server to the JS library. The common category defines plugins that are common to all types of websites, not necessarily commonly found plugins. The hashes are kept separate from the common plugins as hashes are primarily used to discover common content after a scan and a user may wish to disable these.
Here is a set of categories from builtwith.com:
- Ads
- Analytics
- Blog
- CDN
- CMS
- DocInfo
- Ecommerce
- Encoding (utf-8, big5)
- Feeds (feed types and feed providers)
- Framework (includes languages and frameworks)
- JS (javascript libraries, not including analytics)
- Media (Media provider such as youtube)
- Server
- Software (operating systems)
- Widgets
Here is a set of categories from Wappalyzer:
- CMS
- Message Boards
- Database managers
- Documentation tools
- Widgets
- Web shops
- Photo galleries
- Wikis
- Hosting panels
- Analytics
- Blogs
- JavaScript frameworks
- Issue trackers
- Video Players
- Comment Systems
- CAPTCHAs
- Font scripts
- Web frameworks
- Miscellaneous
- Editors
- LMS
- Web servers
- Cache tools
Some problems are:
- Encoding should be a plugin value, not a plugin
- Ecommerce has a lot of CMS's
- Blogs and CMS's have cross over, such as WordPress
- Client-Side fits into a lot of categories, but should probably be kept separate
Some notes are: The Analytics category could be included in JS but it's better to have it's own category.
Types of authentication to potentially support:
- HTTP Basic Authentication - currently supported by
--header
- HTTP Digest Authentication - currently supported by
--header
- URL parameter with session token
- HTTP Cookies - currently supported by
--header
- SSL Certificate Support
- HTTP Forms with passwords
Curl supports these and it might make sense for WhatWeb to copy curl's command line syntax.
A method, not necessarily a good one is to load WhatWeb with username and password combinations which it will try whenever it discovers a password prompt.
Using HTTP authorization would be nice for fingerprinting devices with default credentials. This belongs in aggression level 5 which has not yet been implemented.
Aung Khant: Some frameworks issue unique error response when we do invalid post request
:url_post=>'/', :post_data=>'null=null'
bcoles: post can be achieved with custom ruby but POST request support would be worth adding. Also support for OPTIONS requests may be useful, for example WebDav.
Andrew: No. Not yet at least. I want good coverage of plugins to identify systems first including aggressive plugins to detect exact version numbers.
Plugins that test for vulnerabilities, if or when introduced, should be at a different aggression level, maybe 5. Exploiting full path disclosure, default credentials and weak access controls fit into this category.
The anemone library does not support redirects. It is also limited to extracting links from <a href="*">
tags. It may be worth while to rewrite the anemone library at some point in the distant future.
According to the WhatWeb design philosophy: avoid premature over-engineering. Do not implement core code to handle types of information that few plugins currently return.
The following are candidates as data-types for plugins to return (such as :version
, :string
, :firmware
, etc) as it may be useful to separate them from results in :string=>
:
-
:hostname=>
- Internal host name - not widely used
-
:ip=>
- Used for internal IP addresses and the IP plugin - not widely used
-
:mac=>
- MAC address - not widely used
-
:year=>
- The age of an installation can often be roughly determined by the year(s) in copyright messages. Several plugins report the year.
Add option to save HTTP response (HTML + HTTP headers).
-
option 1 (hostnames backwards by TLD, IPs forwards by octet)
-
login.yahoo.com
becomes:com/yahoo/login/head
anddownload/com/yahoo/login/body
-
208.51.4.1
becomes:208/51/4/1/head
and208/51/4/1/body
-
-
option 2 (md5 hash of url, this is kind of brutal)
9e107d9d372bb6826bd81d3542a419d6.head
9e107d9d372bb6826bd81d3542a419d6.body
-
option 3 (URL encode every special character after the hostname. should dots remain dots?)
login.yahoo.com%2findex.html.head
login.yahoo.com%2findex.html.body
Thoughts:
- WhatWeb now supports reading HTTP headers + HTML content from a single local file so it's probably not necessary to separate the two.
- large sets - splitting the hostnames across directories (option 1)
- small sets - one directory for all hosts (keep the dots)
- URL encode every special character for the path
- Store files in optional folder? There should also be options for saving to DBs like gridfs, sqlite, etc
This feature should provide a gentle introduction into custom usage of WhatWeb and eventually lead into plugin writing.
Aims of the feature :
Reduce barrier to entry for custom searching with WhatWeb and remove the need for anyone to write this :
echo "\n\n" | netcat whatweb.net 80 | grep -Eo "<title>([^<]+)<\/title>"
For example:
$ ./whatweb --custom-plugin "{:string=>/<title>([^<]+)<\/title>/i}" whatweb.net
This option allows WhatWeb to act as a powerful, threaded, grep-powered platform for HTTP(S).
Unfortunately the --custom-plugin
option needs to be escaped and in some cases, such as :regexp=>//
, needs to be double-escaped as it parsed directly from the command-line. This results in a complicated and unintuitive command line argument.
Splitting each match method up into its own command line argument would help reduce the complexity :
option 1
--custom-plugin-text, --custom-plugin-regex
option 2
--find-text, --find-regex, --find-md5
option 3
--match-text, --match-regex, --match-md5
option 4
--grep-text, --grep-regex, --grep-md5
A GUI would be nice. Options:
- Add GUI to WhatWeb (Ruby) and launch with command line option
--gui
- Add GUI to WhatWeb (Ruby) and provide two branches: CLI and GUI
- Write a separate application (wrapper). Using Ruby would make sense.
bcoles: I'm concerned that using a wrapper will be slow. That said, I've written a threaded GUI wrapper in C# for use on Windows systems as a working proof of concept. Contact me if you would like a copy. Keep in mind that is a proof of concept only and suffers from the following flaws:
- you cannot select plugins (all enabled plugins are run by default)
- logging is limited to brief-logging
- scanning local files is buggy (Windows file paths are not escaped properly)
Addons in the ./addons
directory allow users to extend WhatWeb. These tools have been kept separate for several reasons:
- This helps us keep unsupported features out of the core until they have been thoroughly tested.
- It follows the UNIX philosophy: do one thing and do it well.
- It assists in preventing premature over-engineering.
The following are potential addons which might be worth writing.
build-report
A tool to build a report file. Use XML+XSL format?
- Could include (fav)icons for different software.
- CVE#/OSDVB#/bugtraq#/etc optional.
- Allow report generation based on grouping:
- this URL matches these plugins, or
- this plugin matches these URLs
passive-vuln-detection
A tool to return CVE#/OSVDB#/bugtraq#/etc for know vulnerable software versions.
Set a maximum file size for remote files to stop WhatWeb getting "stuck" on huge files or streaming data.
--max-filesize=SIZE Set the maximum allowed file size for remote files. Default: (1MB)
Follow frames
Many websites still use frames on intro pages. A --follow-frames
option would allow WhatWeb to grab these URLs instead of being stuck trying to fingerprint a HTML frameset.
--follow-frames=WHEN Control when to follow frames. WHEN may be `never',
`frame-only', `iframe-only', `same-site', `same-domain'
or `always'. Default: never
Should frames be followed by default? Should following off-site frames be ignored or be a configurable option? Would never
or same-site
be the best default?
Andre Gironda: i would love to see WhatWeb identify candidate insertion points for testing - especially marking insertion points that are user controllable HTML element attributes
bcoles: any suggestions on how the results for candidates for insertion should be formatted?
Andre Gironda: ProxMon and Casaba Watcher tools do it right - they are open-source
bcoles: This could be achieved with a plugin. Something like :
- GET params: split base_uri by ? then &
- Extract params from
/base_uri[^'"]+\?([^=]+)=([^&]+)/
- Extract params from
- POST params: The
./plugins-disabled/POST-Parameters.rb
plugin exists for this purpose - Elements: grep for the GET param values and extract the relevant HTML element type
- Will most likely result in false positives unless non-default GET parameter values are sent