Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Using Flink Table API to read write Excel files

Jörn Franke edited this page Sep 30, 2018 · 4 revisions

This is a Flink application demonstrating some of the capabilities of the hadoopoffice Flink TableSource/TableSink library. It takes as input a set of Excel files. It outputs them to a new Excel file into the Sheet "Sheet2". It has successfully been tested with the HDP Sandbox VM 2.5, but other Hadoop distributions should work equally well, if they support Flink. You will need at least Flink 1.5 (older versions may work though). Note: you may need to put the file flink-hadoop-compatibility*.jar in the lib folder of your flink distribution.

Getting an example Excel

You can create yourself an Excel file in LibreOffice or Microsoft Excel. Alternatively, you can download an Excel file that is used for unit testing of hadoopoffice library by executing the following command:

wget --no-check-certificate https://github.com/ZuInnoTe/hadoopoffice/blob/master/examples/scala-flinkts-excel/src/it/resources/testsimple.xlsx?raw=true

You can put it on your HDFS cluster by executing the following commands:

hadoop fs -mkdir -p /user/flink/office/excel/input
hadoop fs -put ./testsimple.xlsx /user/flink/office/excel/input

After it has been copied you are ready to use the example.

Building the example

Note the HadoopOffice library and the Flink TableSource/TableSink is available on Maven Central.

Execute

git clone https://github.com/ZuInnoTe/hadoopoffice.git hadoopoffice

You can build the application by changing to the directory hadoopoffice/examples/scala-flinkts-excel and using the following command:

sbt +clean +it:test +assembly

Running the example

Make sure that the output directory is empty:

hadoop fs -rm -R /user/flink/office/excel/output

Execute the following command:

flink run example-ho-flink-ts-scala-excel.jar --input hdfs://localhost:8020/user/flink/office/excel/input --output hdfs://localhost:8020/user/flink/office/excel/output/output.xlsx                                                                                                                                     

After the job finished you can download the Excel file and view it in LibreOffice, Excel or any other spreadsheet tool.

hadoop fs -copyToLocal /user/flink/office/excel/output/output.xlsx      

Other features

Find here further configuration options of the HadoopOffice library, such as encryption, decryption, locale, meta data filter, linked workbooks and filtering by sheets.

Clone this wiki locally