Skip to content

Console output of MR jobs fails to properly update progress #323

@simleo

Description

@simleo

This is a problem with our mapreduce version of the submitter. The original mapred submitter is unaffected.

The minimal setup is a map-only, java reader & writer app:

import pydoop.mapreduce.api as api
import pydoop.mapreduce.pipes as pipes

class Mapper(api.Mapper):
    def map(self, context):
        context.emit(context.key, len(context.value))

def __main__():
    pipes.run_task(pipes.Factory(mapper_class=Mapper))

Run this with only one mapper on a substantial amount of input (e.g., replicate examples/input/alice_1.txt 1000 times). Monitor the job on the console: with our mapreduce submitter, progrss will remain stuck at 0%, then jump to 100% right before the end of the job. With the mapred submitter, progress is gradually updated as expected.

Note that this was NOT fixed by #322.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions