-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffing server exception handling #185
Diffing server exception handling #185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to the project, @ftsalamp! This is wonderful. Thanks, especially, for including tests. I left a bunch of notes inline about how it could be even better.
A few general style notes, too:
-
Except for docstrings and multiline strings, we use single-quotes everywhere. (We should probably do it for those strings, too, but haven’t been consistent.)
-
Make sure to include spaces around operators.
-
Wherever possible, use format strings,
string.format()
, or string interpolation instead ofstring + string
web_monitoring/diffing_server.py
Outdated
@@ -15,6 +16,8 @@ | |||
) | |||
import web_monitoring.html_diff_render | |||
import web_monitoring.links_diff | |||
import json | |||
import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please alphabetize the imports!
web_monitoring/diffing_server.py
Outdated
@@ -67,17 +69,47 @@ def get(self, differ): | |||
try: | |||
func = self.differs[differ] | |||
except KeyError: | |||
self.send_error(404) | |||
self.send_error(404, reason="Diffing method not found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we’re improving these messages, one important practice we should follow is to, wherever reasonable, include information about what value was wrong and how it could be fixed.
This message would be even better if were something like:
self.send_error(404, reason=f'Unknown diffing method: `{differ}`. You can get a list of supported differs from the `/` endpoint.')
web_monitoring/diffing_server.py
Outdated
b = query_params.pop('b') | ||
except KeyError: | ||
self.send_error( | ||
400, reason="Malformed request. A URL parameter is missing.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, it’s really helpful to be able to say what parameter was missing. So maybe something like:
self.send_error(
400,
reason='Malformed request. You must provide a URL as the value for both `a` and `b` query parameters.')
Even better to be able say specifically whether it was a
, b
, or both that were missing, but that’s a lot more code and I think the above is probably good enough :)
web_monitoring/diffing_server.py
Outdated
except (ValueError, IOError): | ||
if res_a.code == 599: | ||
self.send_error( | ||
400, reason=str(res_a.error)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a comment describing why you are re-mapping the 599
code to 400
here? I think I understand why, but this is the sort of thing I think benefits from a clear explanation for others who are working on this code later :)
web_monitoring/diffing_server.py
Outdated
res_a.code, reason=str(res_a.error)) | ||
return | ||
|
||
if res_b.error is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two error handling blocks (for res_a
and res_b
) are exactly the same, so it would be btter to encapsulate this error logic into a function that you can call once for each response.
web_monitoring/diffing_server.py
Outdated
except UndecodableContentError: | ||
exceptionMessage = str(sys.exc_info()[1]) | ||
self.send_error(422, reason=exceptionMessage) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors should already be handled in write_error()
. Was that not working for your case?
write_error
does some important extra work, like:
- differentiating between errors that should be publicly visible and errors that should not (because that could leak security-sensitive information),
- including stack traces only when the server is configured to do so (again, this is a big security concern), and
- formatting the errors in a JSON structure (all output should be JSON here, though we are not perfect at that yet).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right about this. I think I was just too eager to handle everything, saw this in the stack trace and went ahead and re-handled it without checking the code first. I'm sorry about this.
self.assertEqual(response.code, 422) | ||
|
||
def mockDiffingMethod(c_body): | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure all files end with a blank line!
return | ||
|
||
def testInvalidURLaFormat(self): | ||
response = self.fetch('http://localhost:'+str(self.get_http_port())+'/html_token?format=json&include=all&a=example.org&b=https://example.org', method='GET') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use a format string for this? e.g:
port = self.get_http_port()
response = self.fetch(
f'http://localhost:{port}/html_token?format=json&include=all&a=example.org&b=https://example.org',
method='GET')
Same for all the other tests here.
|
||
def testInvalidURLaFormat(self): | ||
response = self.fetch('http://localhost:'+str(self.get_http_port())+'/html_token?format=json&include=all&a=example.org&b=https://example.org', method='GET') | ||
self.assertEqual(response.code, 400) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven’t done a good job of this everywhere, but we are trying to make sure that this server always responds with JSON. It would be great if these tests didn’t only check the response code, but also checked that they:
- Have the
application/json
content-type - Are parse-able JSON
- The JSON data is an object with at least the fields
code
(a number) anderror
(a string) present
Same for all the other tests here. You could probably write a method that does those assertions and use it in each of the tests.
app.listen(str(self.get_http_port())) | ||
return | ||
|
||
def testInvalidURLaFormat(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use underscore case for method names: def test_invalid_url_a_format
. Same for the rest of these tests.
Woot! Welcome @ftsalamp! 🎉 That's awesome that you've taken this on and glad that you're here 😀 |
I think I have implemented all the requested changes. Thank you for your help and input @Mr0grog ! |
Hello @ftsalamp ! Thank you for your contribution! |
def get_app(self): | ||
app = df.make_app() | ||
app.listen(str(self.get_http_port())) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should just return the app, not start it:
def get_app(self):
return df.make_app()
Doing that enables you to write each test more simply:
def test_some_request(self):
response = self.fetch('/html_token?whatever_args')
# Instead of:
port = self.get_http_port()
response = self.fetch(f'http://localhost:{port}/html_token?whatever_args')
Sorry I missed this before.
web_monitoring/diffing_server.py
Outdated
@@ -2,11 +2,14 @@ | |||
from docopt import docopt | |||
import hashlib | |||
import inspect | |||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t used anymore, so you should remove it.
web_monitoring/diffing_server.py
Outdated
import re | ||
import sys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t used anymore, so you should remove it.
web_monitoring/diffing_server.py
Outdated
import tornado.gen | ||
import tornado.httpclient | ||
import tornado.ioloop | ||
import tornado.web | ||
from tornado.stack_context import ExceptionStackContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t used anymore, so you should remove it.
self.check_response_for_error(res_a) | ||
self.check_response_for_error(res_b) | ||
except tornado.httpclient.HTTPError: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m pretty sure this is a logic error. In check_response_for_error()
, we have:
if response.error is not None:
try:
response.rethrow()
except (ValueError, IOError):
# send back the right error response
raise tornado.httpclient.HTTPError(0)
# But we didn’t handle tornado.httpclient.HTTPError errors!
If response.rethrow()
raised a tornado.httpclient.HTTPError
, it won’t ever get an error response because we didn’t handle it inside check_response_for_error()
, but we still stopped here in DiffHandler.get()
as if we did. It basically stops here, but never sends the error message. You can try this by adding a test:
def test_a_is_404(self):
response = self.fetch(f'/html_token?format=json&include=all&a=http://httpstat.us/404&b=https://example.org')
print(f'STATUS:: {response.code}')
print('BODY:::: "%s"' % response.body.decode('utf-8'))
self.assertEqual(response.code, 400)
self.json_check(response)
…which fails and prints out:
STATUS:: 200
BODY:::: ""
web_monitoring/diffing_server.py
Outdated
if response.error is not None: | ||
try: | ||
response.rethrow() | ||
except (ValueError, IOError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works, but IOError
got merged into OSError
in Python 3.3, so it might be better to use that instead.
web_monitoring/diffing_server.py
Outdated
except (ValueError, IOError): | ||
#Response code == 599 means that no HTTP response was received. | ||
#In this case the error code should become 400 indicating that the error was | ||
#raised because of a bad request parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is actually a specific HTTP error for this case: 504 GATEWAY TIMEOUT
https://httpstatuses.com/504
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no HTTP response is received tornado seems to return 599, according to the docs. http://www.tornadoweb.org/en/stable/httpclient.html#tornado.httpclient.HTTPError
How should we go about this? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, Tornado’s doing the correct thing here. Sorry if I was unclear — I didn’t mean we shouldn’t be checking for a 599; I meant our server should return a 504 when it receives a 599. 504 specifically means “this server timed out when asking another server for the data you requested.” It’s a more specific code than 400 in this case.
That said, it would likely be good to make sure here that we really are dealing with a timeout and not something else, like a malformed address that was never attempted (that should probably still be a 400 or maybe a 422). For example, I think your test_invalid_url_a_format
test is triggering a socket.gaierror
here (subclass of OSError
), which would be a 400/422 because it’s a bad address, not a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, I’m not too worried about sending exactly the right code here; just handling this case better than we previously were is probably an improvement. We can do a separate PR later for more specific and correct codes here.
So, I guess I’m saying you can skip changing this if you want. I’m much more concerned about the logical error above: #185 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my latest commit I fixed the logical error. check_response_for_error now also handles tornado.httpclient.HTTPError. I also added the test you gave as an example (thanks for this) and it no longer fails.
Sorry I wasn’t able to get back to this yesterday, @ftsalamp! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome, @ftsalamp!
web_monitoring/diffing_server.py
Outdated
@@ -90,7 +102,7 @@ def get(self, differ): | |||
actual_hash = hashlib.sha256(res.body).hexdigest() | |||
if actual_hash != expected_hash: | |||
self.send_error( | |||
500, reason="Fetched content does not match hash.") | |||
500, reason = 'Fetched content does not match hash.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborating on @Mr0grog's request for spaces around operators, Python's recommended style does not put spaces around =
when it's used in a keyword argument like
def f(x=3):
...
f(x=2)
only when it's used in a assignment like
y = 6
You can use a tool like flake8 to check the code style and catch these things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @danielballan ! Having worked for 6 months mostly with Swift, the tool you suggested is a great way to remember Python's coding style.
The tests are failing on a problem that was removed in #186, so I will rebase this on master to incorporate that fix in this PR. |
Made the requested changes and fixed all of flake8 warnings.
be33a03
to
50ef7db
Compare
Rebased. |
Sorry I was so slow to get back to this! Thanks, @ftsalamp. |
Hello everyone! I am the GSoC student selected by the Internet Archive to work on the project of bringing together the Wayback Machine and the scanner software. I wrote some code trying to improve the exception handling in the diffing_server. I also wrote some tests on the code I added.
I'm very excited to work on this project and looking forward to your comments.