Skip to content

Commit

Permalink
format README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
yongkun.wang committed Nov 15, 2013
1 parent 6a741bb commit 3928569
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ It is forked from Cloudera's 0.9.3-cdh3u0 distribution (Flume-OG).

Two major enhancements:

1) Masterless ACK (MA)
1. Masterless ACK (MA)
----------------------

This enhancement aims to increase the reliability and throughput of the whole distributed collecting system.
Expand All @@ -17,22 +17,24 @@ However, the ACKs are sent back through master, which could be a single-point-of
Therefore, I re-designed the ACKs system to let the ACK go back via the route of Event.

Main enhancements
~~~~~~~~~~~~~~~~~
a) Distribute ACK traffic to each flume node, ACK goes back to original agent throught the route of event.
b) Reuse the connection for event transmission.
c) Buffer input/output stream for thrift to improve throughput. https://github.com/yongkun/thrift

..1. Distribute ACK traffic to each flume node, ACK goes back to original agent throught the route of event.
..2. Reuse the connection for event transmission.
..3. Buffer input/output stream for thrift to improve throughput. https://github.com/yongkun/thrift

Apache JIRA is here, with some documents https://issues.apache.org/jira/browse/FLUME-640
This enhancement was supposed to be merged into 0.10 if Flume was not upgraded to NG.

2) Append to HDFS with new file rotation method
2. Append to HDFS with new file rotation method
-----------------------------------------------

Use HDFS append() and change the file rotation mechanism to create large HDFS files, which could increase the performance of Map/Reduce program when using these files as input, and reduce the number of block mapping entries in Hadoop NameNode.


This modified version has been heavily used inside our company (Rakuten, http://en.wikipedia.org/wiki/Rakuten), with single collector receiving more than 300GB data per day (more than 328 million events per day, peak throughput is about 26K event per second).
##### This modified version has been heavily used inside our company (Rakuten, http://en.wikipedia.org/wiki/Rakuten), with single collector receiving more than 300GB data per day (more than 328 million events per day, peak throughput is about 26K event per second).

Contact
yongkun at gmail.com

Repository
https://github.com/yongkun/flume-0.9.3-cdh3u0-rakuten

0 comments on commit 3928569

Please sign in to comment.