Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improve] [Connector-V2] File Connector add lzo compression way. #3782

Merged
merged 12 commits into from
Dec 30, 2022

Conversation

lightzhao
Copy link
Contributor

Purpose of this pull request

Check list

@Hisoka-X Hisoka-X changed the title add lzo compression way. [Improve] [Connector-V2] File Connector add lzo compression way. Dec 22, 2022
@TyrantLucifer
Copy link
Member

IMO, using the enum to limit the method of compress is better. Orc and parquet file format also have other compress method

@lightzhao
Copy link
Contributor Author

IMO, using the enum to limit the method of compress is better. Orc and parquet file format also have other compress method

Add compression enum completed.

@@ -50,7 +50,7 @@ public class BaseFileSinkConfig implements DelimiterConfig, CompressConfig, Seri

public BaseFileSinkConfig(@NonNull Config config) {
if (config.hasPath(BaseSinkConfig.COMPRESS_CODEC.key())) {
if ("lzo".equals(config.getString(BaseSinkConfig.COMPRESS_CODEC.key()))) {
if (CompressFormat.LZO.getCompressCodec().equals(config.getString(BaseSinkConfig.COMPRESS_CODEC.key()))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.

@@ -115,7 +116,7 @@ private FSDataOutputStream getOrCreateOutputStream(@NonNull String filePath) {
FSDataOutputStream fsDataOutputStream = beingWrittenOutputStream.get(filePath);
if (fsDataOutputStream == null) {
try {
if ("lzo".equals(compressCodec)) {
if (CompressFormat.LZO.getCompressCodec().equals(compressCodec)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks.

@lightzhao lightzhao requested review from Hisoka-X and TyrantLucifer and removed request for Hisoka-X and TyrantLucifer December 24, 2022 12:27
@Hisoka-X
Copy link
Member

image
Please add license header.

@lightzhao
Copy link
Contributor Author

image Please add license header.

done,thanks.

@TyrantLucifer
Copy link
Member

image

Please fix CI problem.

@lightzhao
Copy link
Contributor Author

image

Please fix CI problem.

done, thanks.

@lightzhao lightzhao requested review from Hisoka-X and removed request for Hisoka-X December 27, 2022 02:37
@TyrantLucifer
Copy link
Member

Waiting CI completed.

@TyrantLucifer
Copy link
Member

TyrantLucifer commented Dec 27, 2022

Could you please offer test screenshot images to verify this pull request is work? The following points need to be verified:

  1. The file name is correct
  2. Data can be read normally in hive or spark

@lightzhao
Copy link
Contributor Author

Could you please offer test screenshot images to verify this pull request is work? The following points need to be verified:

  1. The file name is correct
  2. Data can be read normally in hive or spark

hdfs file:
image
hive query:
image

@TyrantLucifer
Copy link
Member

TyrantLucifer commented Dec 27, 2022

Could you please offer test screenshot images to verify this pull request is work? The following points need to be verified:

  1. The file name is correct
  2. Data can be read normally in hive or spark

hdfs file: image hive query: image

*.lzo.txt might be better? We can use the suffix to Identify the file type quickly, BTW could you please test csv file type?

@lightzhao
Copy link
Contributor Author

csv file type

The traditional method is to end with a compressed code, but it doesn't use the ending method you mentioned. csv is just a separation symbol problem, which has been tested

@TyrantLucifer
Copy link
Member

csv file type

The traditional method is to end with a compressed code, but it doesn't use the ending method you mentioned. csv is just a separation symbol problem, which has been tested

cc @EricJoy2048

@lightzhao
Copy link
Contributor Author

@TyrantLucifer plesase review and merge , thanks.

@EricJoy2048
Copy link
Member

Could you please offer test screenshot images to verify this pull request is work? The following points need to be verified:

  1. The file name is correct
  2. Data can be read normally in hive or spark

hdfs file: image hive query: image

*.lzo.txt might be better? We can use the suffix to Identify the file type quickly, BTW could you please test csv file type?

I agree with you, *.lzo.txt is better.

@lightzhao
Copy link
Contributor Author

@lightzhao

done, complete file name ends with *. lzo.txt.

@lightzhao lightzhao requested review from Hisoka-X and removed request for TyrantLucifer December 28, 2022 10:59
@lightzhao
Copy link
Contributor Author

@TyrantLucifer @Hisoka-X please review .thanks.

@Hisoka-X Hisoka-X requested a review from EricJoy2048 December 29, 2022 02:03
@lightzhao
Copy link
Contributor Author

@EricJoy2048 plesase review .

@TyrantLucifer TyrantLucifer merged commit 8875d02 into apache:dev Dec 30, 2022
lhyundeadsoul pushed a commit to lhyundeadsoul/incubator-seatunnel that referenced this pull request Jan 3, 2023
…apache#3782)

* add lzo compression way.

* format code style

* add CompressFormat enum.

* Judge with switch statement.

* fix CompressFormat enum valueOf adapt

* Repair NPE when compression is not set.

* add license header.

* fix CI problem.

* CompressFormat enum add default value NONE.

* Supplement lzo compressed documents.

* Supplement lzo compressed documents.

* modify file name ends with *.lzo.txt

Co-authored-by: zhaoliang01 <zhaoliang01@58.com>
lhyundeadsoul pushed a commit to lhyundeadsoul/incubator-seatunnel that referenced this pull request Jan 3, 2023
…apache#3782)

* add lzo compression way.

* format code style

* add CompressFormat enum.

* Judge with switch statement.

* fix CompressFormat enum valueOf adapt

* Repair NPE when compression is not set.

* add license header.

* fix CI problem.

* CompressFormat enum add default value NONE.

* Supplement lzo compressed documents.

* Supplement lzo compressed documents.

* modify file name ends with *.lzo.txt

Co-authored-by: zhaoliang01 <zhaoliang01@58.com>
@lightzhao lightzhao deleted the add-lzo-compression branch February 22, 2023 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants