This script searches for indicators in doc(x), xls(x), pdf, txt, and csv files. It also enriches IP addresses with additonal data from ipinfo.io.
Default indicators that it looks for:
- URL's
- E-mail addresses
- mobile phone numbers
- IP addresses
- MD5 hashes
- SHA1 hashes
- SHA256 hashes
- custom indicators
It outputs them to a csv file:
$ Parse-and-Enrich.py -i Input/*.csv
2022-08-11_132728_results.csv
| Regex result | Count | Type | Found in file(s) | City | Country | Organization | Full | Error
|---------------|-------|-----------|------------------|---------------|---------|--------------------|------|------
| 8.8.8.8 | 4 | ipaddress | ['file1.txt'] | Mountain View | US | AS15169 Google LLC | ... |
| j@mail.com | 1 | email | ['random.csv'] | | | | |
The script will take Office 365 UAL logs in the form of CSV files and enrich IP addresses with data from ipinfo.io.
Input (AuditRecords.csv):
timestamp, user, ip
2022-08-11 13:05:01, user1@company.nl, 8.8.8.8
Command:
$ Parse-and-Enrich.py -i Input/AuditRecords.csv -csv_e
Output (AuditRecords.csv_enriched.csv):
timestamp, user, ip, ip_info
2022-08-11 13:05:01, user1@company.nl, 8.8.8.8, {"ip": "8.8.8.8", "hostname": "dns.google", "anycast": "True", "city": "Mountain View", "region": "California", "country": "US", "loc": "37.4056,-122.0775", "org": "AS15169 Google LLC", "postal": "94043", "timezone": "America/Los_Angeles", "country_name": "United States", "latitude": "37.4056", "longitude": "-122.0775"}
If you would like to use the enrich feature, you need to put the API key (https://ipinfo.io/) in the file 'ip_info.key'. The file must be located in the same folder as the script.
- If you provide multiple csv files specified with the '-i' parameter (for example: -i input/*csv), they must be of the same encoding type. You can specify the encoding type that the csv's have: -csv_c UTF8 (see all encoding types: https://docs.python.org/3/library/codecs.html#standard-encodings)