Use python implement the paper Spell: Streaming Parsing of System Event Logs
from Min Du, Feifei Li @University of Utah.
This implement is refactored and enhancement version of logpai's logparser.
pip install spellpy
python example.py
After executing the line above, the result
folder will be created and you will see two files: structured.csv
and templates.csv
.
*_main_structured.csv
... | Level | Component | Content | EventId | EventTemplate | ParameterList |
---|---|---|---|---|---|---|
... | INFO | dfs.DataNode$DataXceiver | Receiving block blk_-1608999687919862906 src: /10.250.19.102:54106 dest: /10.250.19.102:50010 | f57d69cf | Receiving block blk_-1608999687919862906 src <*> <*> dest <*> 50010 | ['/10.250.19.102:54106', '/10.250.19.102'] |
... | INFO | dfs.DataNode$PacketResponder | PacketResponder 1 for block blk_-1608999687919862906 terminating | 7b619377 | PacketResponder <*> for block blk_-1608999687919862906 terminating | ['1'] |
... | INFO | dfs.DataNode$DataXceiver | Receiving block blk_-1608999687919862906 src: /10.250.10.6:40524 dest: /10.250.10.6:50010 | f57d69cf | Receiving block blk_-1608999687919862906 src <*> <*> dest <*> 50010 | ['/10.250.10.6:40524', '/10.250.10.6'] |
*_main_templates.csv
EventId | EventTemplate | Occurrences |
---|---|---|
6af214fd | Receiving block <*> src <*> <*> dest <*> 50010 | 5 |
26ae4ce0 | BLOCK* NameSystem.allocateBlock <*> | 2 |
dc2c74b7 | PacketResponder <*> for block <*> terminating | 4 |
As you see, there have three test log files. Use for loop to simulate (nearly) streaming situation.
In the result
folder, there are _main_*.csv
files and *.log_*.csv
files. The _main_*.csv
files will keep appending the new coming log when it has been parse.
We can use graphviz to visualize the tree-structured of the parser.
python plot_tree.py
sh test_coverage.sh
Name | Stmts | Miss | Cover |
---|---|---|---|
spell/init.py | 3 | 0 | 100% |
spellpy/spell.py | 319 | 174 | 45% |
test/test_spellpu.py | 71 | 1 | 98% |
TOTAL | 393 | 175 | 55% |
-
This tree structure is generate by mac terminal tool
tree
& copy paste it toREADME.md
.tree -I "__pycache__|tmp.*" >> tmp.txt
.
├── LICENSE
├── MANIFEST.in
├── README.md
├── data
│ ├── empty_log.log
│ ├── tiny_hdfs_1.log
│ ├── tiny_hdfs_2.log
│ └── tiny_hdfs_3.log
├── example.py
├── plot
│ ├── tree.gv
│ └── tree.gv.png
├── plot_tree.py
├── requirements.txt
├── setup.cfg
├── setup.py
├── spellpy
│ ├── __init__.py
│ └── spell.py
└── tests
├── __init__.py
├── test_data.log
└── test_spellpy.py