Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patching PDFCore.py to properly check for ids in self.objects #7

Closed
wants to merge 2 commits into from

Conversation

wroersma
Copy link

So I ran into a bug with the following pdf md5:bd23ad33accef14684d42c32769092a0
this would cause peepdf to break with the following error

  File "build/bdist.linux-x86_64/egg/peepdf/main.py", line 409, in main
    ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 7116, in parse
    ret = body.updateObjects()
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 4326, in updateObjects
    object = self.objects[id].getObject()
KeyError: 1
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/peepdf/main.py", line 409, in main
    ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 7122, in parse
    ret = body.updateObjects()
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 4326, in updateObjects
    if self.objects[str(id)]:
KeyError: '1'
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/peepdf/main.py", line 409, in main
    ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 7123, in parse
    ret = body.updateObjects()
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 4326, in updateObjects
    if self.objects[str(id)]:
KeyError: '1'
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/peepdf/main.py", line 409, in main
    ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 7124, in parse
    ret = body.updateObjects()
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 4326, in updateObjects
    if self.objects[str(id)]:
KeyError: '1'
Traceback (most recent call last):
  File "build/bdist.linux-x86_64/egg/peepdf/main.py", line 409, in main
    ret, pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 7126, in parse
    ret = body.updateObjects()
  File "build/bdist.linux-x86_64/egg/peepdf/PDFCore.py", line 4327, in updateObjects
    if self.objects[id]:
KeyError: 1

This patch makes it so it doesn't hard break and does a better check for the ID in self.object so it doesn't totally break trying to set the value as None.

After fix output is as follows

File: 1.pdf
MD5: bd23ad33accef14684d42c32769092a0
SHA1: 0d3f335ccca4575593054446f5f219eba6cd93fe
SHA256: 4b672deae5c1231ea20ea70b0bf091164ef0b939e2cf4d142d31916a169e8e01
Size: 82344 bytes
Version: 1.7
Binary: False
Linearized: True
Encrypted: False
Updates: 0
Objects: 50
Streams: 14
URIs: 0
Comments: 0
Errors: 0

Version 0:
        Catalog: 10
        Info: No
        Objects (50): [6, 7, 9, 10, 11, 12, 14, 15, 17, 19, 20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 61, 62, 63, 64, 65, 66]
                Errors (11): [14, 15, 17, 25, 31, 32, 33, 49, 51, 55, 56]
        Streams (14): [14, 15, 17, 25, 31, 32, 33, 34, 49, 51, 55, 56, 57, 62]
                Encoded (11): [14, 15, 17, 25, 31, 32, 33, 49, 51, 55, 56]
                Decoding errors (11): [14, 15, 17, 25, 31, 32, 33, 49, 51, 55, 56]
        Suspicious elements:
                /AcroForm (1): [10]
                /OpenAction (1): [10]
                /JS (1): [11]
                /JavaScript (1): [11]

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 24.547% when pulling 38b1554 on wroersma:master into 8d6a370 on jbremer:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 24.547% when pulling 38b1554 on wroersma:master into 8d6a370 on jbremer:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) to 24.547% when pulling 38b1554 on wroersma:master into 8d6a370 on jbremer:master.

@coveralls
Copy link

coveralls commented May 18, 2018

Coverage Status

Coverage decreased (-10.2%) to 14.409% when pulling 996d64c on wroersma:master into 8d6a370 on jbremer:master.

@wroersma
Copy link
Author

wroersma commented Jun 5, 2018

So any reason you don't want to merge this that I can resolve?

@jbremer
Copy link
Member

jbremer commented Jun 5, 2018

@wroersma sorry, mostly just forgot about it. Going to check soon!
Thanks for including sample hash, BTW.

@jbremer
Copy link
Member

jbremer commented Jun 11, 2018

@wroersma I can't imagine that your patch works; first of all the keys to self.objects are integers and secondly, I'm pretty sure that in any case self.objects will be non-None such that the if-statement will always fall-through to basically do nothing. What do you think about the patch below?

jbremer added a commit that referenced this pull request Jun 11, 2018
@wroersma
Copy link
Author

I'll take a look when I get a second to see what that looks like running through with a debugger. I mean it doesn't hard break for me and we're using it with our custom cuckoo, custom sflock and now custom peepdf in production and it's working but I probably could've spent more time really stepping through that code to fully understand the impacts it would have. I will get back to you shortly on this.

@jbremer
Copy link
Member

jbremer commented Jun 11, 2018

print repr(self.objects.keys()) shows at least that all keys are integers :)
Feel free to share the other patches that you have in-place, too ;-)

@wroersma
Copy link
Author

Your fix is much cleaner for sure they both work so that would resolve the bug so that works for me.

As far as the other code goes we still don't have a VBN file to share publicly to add that into sflock but we might just release vbncarver as a python package that you could possibly use as a third party call. The cuckoo stuff well that's a lot of changes as it would remove a lot of functionality from cuckoo itself and move it into a bunch of micro services etc. I am also working on changes to move away from django as well.
We will try and push as much as we can back into master though for sure because I think many would like object storage feature options vs local etc and using MQ for scaling etc. We should chat maybe in IM or email or something some time so I can explain in more detail.

@wroersma
Copy link
Author

So do you want to just commit that then and I can just close this PR? It works perfectly and is very clean so that makes sense and it's not any of my code so haha. I'm good with either way though.

@jbremer
Copy link
Member

jbremer commented Jun 11, 2018

Sure thing! Drop me an email at jbr@cuckoo.sh such that I can invite you for our Slack channel :-)

@jbremer jbremer closed this Jun 11, 2018
jbremer added a commit that referenced this pull request Jun 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants