|
1 | | -# lsfmq |
2 | | -lsfmq provides one way for lsf users to integrate different kinds of messge queues with lsf. |
| 1 | +# lsfmq |
| 2 | + |
| 3 | +This project is intended to provide one way for lsf users to integrate different kinds of messge queues with lsf. |
| 4 | + |
| 5 | +lsfeventsbeat is provided as a lsf event publisher which is enhanced from elastic filebeat, loading and parsing latest lsf events from lsf log files (lsb.acct/lsb.stream) and publishing data into message queues such as kafka, RabbitMQ, etc. Three kinds of data are published into message queue: |
| 6 | + |
| 7 | ++ "lsf_acct": Job finish events (from lsb.acct file) |
| 8 | ++ "lsf_events": All job events and LSF performance metrics (from lsb.stream file) |
| 9 | ++ "lsf_job_status": Job status tracing, published whenever a job status is changed, including current status, previous status, failed reason if job exit (transformed from lsb.stream file) |
| 10 | + |
| 11 | +Sample lsf data consumer is provided for each kind of messge queue to faciliate lsf users to customize their own data consumer based on their own business need. |
| 12 | + |
| 13 | + |
| 14 | +# Build the lsf publisher (lsfeventsbeat) |
| 15 | + |
| 16 | +__Note__ |
| 17 | ++ Make sure network is available |
| 18 | ++ Make sure git has been installed |
| 19 | ++ Download Go installation package from [Go download](https://golang.org/dl/) and set up your go environment. |
| 20 | + |
| 21 | +__Build Steps__ |
| 22 | +1. Copy the codes into target directory. |
| 23 | +2. Specifiy below parameters in build.sh. |
| 24 | + + LSF_VERSION - lsf version. Default value is LSF10 |
| 25 | + + BNAME - os platform. Defalut value is linux-x86_64 |
| 26 | + + LSF_LIB_PATH - LSF library path. e.g. /opt/lsf10_1_0_7/10.1/linux2.6-glibc2.3-x86_64/lib |
| 27 | +3. Run build.sh. If all things go well, package like lsfeventsbeat-6.4.2-${LSF_VERSION}-${BNAME}.tar.gz will be generated. |
| 28 | +``` bash |
| 29 | + sh build.sh |
| 30 | +``` |
| 31 | + |
| 32 | +# Setup the lsf publisher for your message queue |
| 33 | + |
| 34 | +__Note__: The package can be installed on any server either inside or outside lsf cluster. |
| 35 | + |
| 36 | +1. Uncompress the tar.gz package generated above. |
| 37 | +``` bash |
| 38 | + tar -zxvf lsfeventsbeat-6.4.2-LSF10-linux-x86_64.tar.gz |
| 39 | +``` |
| 40 | +2. Copy libreadlsbevents.so to LSF_LIB_PATH specified in __Build__ step |
| 41 | +``` bash |
| 42 | + cp lsfeventsbeat-6.4.2-LSF10-linux-x86_64/lib/libreadlsbevents.so ${LSF_LIB_PATH} |
| 43 | +``` |
| 44 | +3. Add LSF_LIB_PATH to LD_LIBRARY_PATH |
| 45 | +``` bash |
| 46 | + export LD_LIBRARY_PATH=${LSF_LIB_PATH}:${LD_LIBRARY_PATH} |
| 47 | +``` |
| 48 | + |
| 49 | +# Config the lsf publisher for your message queue |
| 50 | + |
| 51 | +Ensure specify correct values for parameters below: |
| 52 | ++ In "filebeat.inputs" section: |
| 53 | + - 'paths' is the absolute path of the latest lsf event file: lsb.stream file for topic "lsf_events" and lsb.acct file for topic "lsf_acct". To guarantee event order sent to message queue, only the latest lsb.events and lsb.acct file should be harvested. |
| 54 | + - 'cluster_name' is the name of your lsf cluster. |
| 55 | + |
| 56 | ++ In "output.*" section: |
| 57 | + - hosts: specify correct "$ip:$port" of your message queue broker server(s). |
| 58 | + |
| 59 | +__Note__: |
| 60 | +You can also refer to [Configuring Filebeat](https://www.elastic.co/guide/en/beats/filebeat/current/configuring-howto-filebeat.html) for common file beat configuration. |
| 61 | + |
| 62 | +A option named lsf_topics has been added in filebeat.inputs section for lsf specific configuration. Take the below sample as an example. |
| 63 | +``` yml |
| 64 | +lsf_topics: |
| 65 | + - topic_name: "lsf_events" |
| 66 | + type: "job.raw" |
| 67 | + include_fields: |
| 68 | + - version |
| 69 | + - event_type |
| 70 | + - event_time |
| 71 | + exclude_fields: |
| 72 | + - job_description |
| 73 | + add_fields: {cluster_name: "lsf_cluster"} |
| 74 | +``` |
| 75 | +
|
| 76 | ++ topic_name - Output message queue topic name |
| 77 | + - for Kafka, it means topic name |
| 78 | + - for RabbitMQ, it means exchange name |
| 79 | ++ type - Parsed data type |
| 80 | + - "job.raw" represents raw job event data |
| 81 | + - "job.status.trace" represents generated job status trace data based on raw job data |
| 82 | ++ include_fields - A list of fields name you want lsfeventsbeat to include |
| 83 | ++ exclude_fields - A list of fields name you want lsfeventsbeat to exclude |
| 84 | + - if both include_fields and exclude_fields are defined, lsfeventsbeat executes include_fields first and then executes exclude_fields. The order in which the two options are defined doesn’t matter. The include_fields option will always be executed before the exclude_fields option, even if exclude_fields appears before include_fields in the config file. |
| 85 | ++ add_fields - Optional fields that you can specify to add additional information to the output |
| 86 | +
|
| 87 | +
|
| 88 | +# Run the lsf publisher for Kafka |
| 89 | +
|
| 90 | +Enter the target directory and run lsfeventsbeat as below. |
| 91 | +``` bash |
| 92 | + ./lsfeventsbeat -c lsfeventsbeat.yml |
| 93 | +``` |
| 94 | + |
| 95 | +# Consume lsf events data in Kafka |
| 96 | + |
| 97 | +For example, use Kafka built-in consumer tool "kafka-console-consumer.sh" to subscribe Kafka topic "lsf_job_status" |
| 98 | + |
| 99 | +``` bash |
| 100 | +bin/kafka-console-consumer.sh --bootstrap-server 9.21.51.241:9092 --topic lsf_job_status --from-beginning |
| 101 | +{"app_profile":"","begin_time":0,"cluster_name":"test_cluster1","command":"sleep 10","cwd":"/env/lsf/work/cluster1/logdir/stream","depend_cond":"","event_time":"2018-10-18T05:47:38-0400","event_time_utc":1539856058,"event_type":"JOB_NEW","job_arr_idx":0,"job_description":"","job_group":"","job_id":101,"job_name":"yytest","num_arr_elements":1,"out_file":"","project_name":"default","queue_name":"normal","req_num_procs_max":1,"res_req":"","sla":"","src_cluster_name":"","submission_host_name":"icp5x1","submit_time":1539856058,"user_group_name":"","user_name":"u1","version":"10.1"} |
| 102 | +{"change_reason":"new job submitted","cluster_name":"","current_status":"PEND","job_arr_idx":0,"job_id":101} |
| 103 | +{"cluster_name":"test_cluster1","event_time":1539856059,"event_time_utc":1539856059,"event_type":"JOB_START_ACCEPT","job_arr_idx":0,"job_id":101,"start_time":1539856059,"version":"10.1"} |
| 104 | +{"change_reason":"job starts","cluster_name":"","current_status":"RUN","job_arr_idx":0,"job_id":101} |
| 105 | +{"cluster_name":"test_cluster1","cpu_time":0.076,"end_time":1539856070,"event_time":"2018-10-18T05:47:50-0400","event_time_utc":1539856070,"event_type":"JOB_STATUS","exit_info":0,"exit_status":0,"job_arr_idx":0,"job_id":101,"job_status":"DONE","job_status_code":64,"max_mem":0,"stime":0.06,"utime":0.016,"version":"10.1"} |
| 106 | +{"change_reason":"","cluster_name":"","current_status":"DONE","job_arr_idx":0,"job_id":101,"last_status":"RUN"} |
| 107 | +{"cluster_name":"test_cluster1","cpu_time":0.076,"end_time":1539856070,"event_time":"2018-10-18T05:47:50-0400","event_time_utc":1539856070,"event_type":"JOB_STATUS","exit_info":0,"exit_status":0,"job_arr_idx":0,"job_id":101,"job_status":"DONE+PDONE","job_status_code":192,"max_mem":0,"stime":0,"utime":0,"version":"10.1"} |
| 108 | +{"change_reason":"","cluster_name":"","current_status":"DONE+PDONE","job_arr_idx":0,"job_id":101} |
| 109 | +``` |
0 commit comments