Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large logs cause mongo to error out #129

Closed
rclough opened this issue Nov 24, 2014 · 5 comments · Fixed by #132
Closed

Large logs cause mongo to error out #129

rclough opened this issue Nov 24, 2014 · 5 comments · Fixed by #132

Comments

@rclough
Copy link
Collaborator

rclough commented Nov 24, 2014

Forking from #76, which is the same underlying issue, but causing a different error (SMTP error there, mongo issue here)

We have one particular job that generates a large log output for one task. We started noticing that no completion email would be sent. It turns out that this error is the error that eventually caused the initial conditions for #126 by causing an exception before _on_completion finished.

The error looks generally like this:

Exception in thread Thread-1037:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/threading.py", line 1082, in run
    self.function(*self.args, **self.kwargs)
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 828, in check_complete
    complete_time=datetime.utcnow())
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 1031, in _task_complete
    self.parent_job._complete_task(self.name, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 549, in _complete_task
    self._on_completion()
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 594, in _on_completion
    self._serialize(include_run_logs=True))
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 641, in _serialize
    for task in self.tasks.itervalues()]
  File "/usr/local/lib/python2.7/site-packages/dagobah/core/core.py", line 1048, in _serialize
    self.name)
  File "/usr/local/lib/python2.7/site-packages/dagobah/backend/mongo.py", line 139, in get_latest_run_log
    for rec in cur:
  File "/usr/local/lib/python2.7/site-packages/pymongo/cursor.py", line 1058, in next
    if len(self.__data) or self._refresh():
  File "/usr/local/lib/python2.7/site-packages/pymongo/cursor.py", line 1002, in _refresh
    self.__uuid_subtype))
  File "/usr/local/lib/python2.7/site-packages/pymongo/cursor.py", line 940, in __send_message
    self.__compile_re)
  File "/usr/local/lib/python2.7/site-packages/pymongo/helpers.py", line 109, in _unpack_response
    error_object)
OperationFailure: database error: Runner error: Overflow sort stage buffered data usage of 33758790 bytes exceeds internal limit of 33554432 bytes

In case the line numbers are off because the current version of dagobah running on our server is slightly off from master, here's the ling in question (the last_run = ... line)

# found in task.serialize
        if include_run_logs:
            last_run = self.backend.get_latest_run_log(self.parent_job.job_id,
                                                       self.name)

And the relevant mongo code:

    def get_latest_run_log(self, job_id, task_name):
        q = {'job_id': ObjectId(job_id),
             'tasks.%s' % task_name: {'$exists': True}}
        cur = self.log_coll.find(q).sort([('save_date', pymongo.DESCENDING)])
        for rec in cur: # Line 139
            return rec
        return {}

It seems like others with similar issues are being told to index what you're sorting, not sure if this helps:

I'm not super good with mongo so I'm just showing what I've found so far.

@rclough
Copy link
Collaborator Author

rclough commented Nov 25, 2014

You know I'm not even entirely convinced it's the logs being big? I was debugging the job in question today and looking at the logs for it and they really dont seem that huge. I can't share it publicly, but I pasted one into a text editor and saved it, its 489k, about 3000 log lines. Not really sure what's expected for log size, but I have a feeling the issue is less with size and more with the sort call for some reason

@thieman
Copy link
Owner

thieman commented Nov 26, 2014

I can't recall whether we ever establish indexes in Mongo. Sounds like we should, can you try adding an index on save_date and see if that makes it go away?

@rclough
Copy link
Collaborator Author

rclough commented Dec 1, 2014

We've added the index, I'll report back if it works. we're rerunning the job to get what we missed so I should know by EOD.

@rclough
Copy link
Collaborator Author

rclough commented Dec 1, 2014

Adding an index to save_date fixed it! Now the question is how do we make sure this index is added programatically to mongo (and add it to existing installations)

@thieman
Copy link
Owner

thieman commented Dec 1, 2014

Pymongo makes this really easy, it's just a one-liner to ensure_index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants