-
Couldn't load subscription status.
- Fork 13
Description
I use a fork of your n0s1 code to scan our (large) confluence cloud instance. Thanks for that, it is very useful.
However, I found out that not all spaces are being scanned, but I didn't get an error message or timeout. I just noticed that a test space I added was not in the report. The total scan took about 5 hours. I figured it was caused by somehow the connection being closed and the client object to become empty. I saw that you recently added error handling and did some refactoring. But the strange thing is, we didn't get errors. But I will adopt the error handling in any case.
For now, I solved the issue with missing spaces by adding a self.connect() in the method 'get_data' for every batch of spaces to be collected. There might be a better way though, but for now this works.
def set_config(self, config):
from atlassian import Confluence
SERVER = config.get("server", "")
EMAIL = config.get("email", "")
TOKEN = config.get("token", "")
LABEL_FALSE_POSITIVE = config.get("label_false_positive", "cict-no-secrets-confirmed")
self._url = SERVER
self._user = EMAIL
self._password = TOKEN
self.label_false_positive = LABEL_FALSE_POSITIVE
self._connect()
return self.is_connected()
def _connect(self):
from atlassian import Confluence
if self._user and len(self._user) > 0:
self._client = Confluence(url=self._url, username=self._user, password=self._password)
else:
self._client = Confluence(url=SERVER, token=TOKEN)
and in get_data:
def get_data(self, include_comments=False, test=""):
if not self._client:
return None, None, None, None, None, None
start = 0
limit = 50
finished = False
while not finished:
logging.info(f"Spaces batch: {start} - {start+limit}")
# reconnect for every batch
self._connect()
if not test:
res = self._client.get_all_spaces(
start=start, limit=limit, expand="history"
)
start += limit
spaces = res.get("results", [])
else:
key = test
res = self._client.get_space(key, expand="history")
finished = True
spaces = [res]
I also added a possibility to only test with one space as the total scan takes such a long time via the parameter test.
For your interest, another improvement I made for our use case, is a change to the config.yaml: id: generic-api-key as we got tons of false positives due to this regex finding the confluence user macro and link macro in combination with 'key'.
- id: generic-api-key
description: Generic API Key
regex: >-
(?i)(?<!ri:user|CDATA\[\<add )(?:key|api|token|secret|client|passwd|password|auth|access)(?:[0-9a-z\-_\t
.]{0,20})(?:[\s|']|[\s|"]){0,3}(?:=|>|:{1,3}=|\|\|:|<=|=>|:|\?=)(?:'|\"|\s|=|\x60){0,5}([0-9a-z\-_.=]{10,150})(?:['|\"|\n|\r|\s|\x60|;]|$)
And we added a method to skip a page if a label was set to indicate the page is a false positive, because the found secret is just meant as an example. In that case, the user can add a specific label to indicate that it is a false positive.
def is_false_positive(self, page_id):
labels_json = self._client.get_page_labels(page_id)
labels = labels_json.get("results", [])
for label in labels:
if label["name"] == self.label_false_positive:
logging.info(f"INFO: page {page_id} is false positive due to label {label}")
return True
return False
And in the method get_data:
for p in pages:
comments = []
title = p.get("title", "")
page_id = p.get("id", "")
if self.is_false_positive(page_id):
continue
In any case, thanks for your code. Hope my comments are useful.
Kind regards,
Mariska