Skip to content

Conversation

@zuochunwei
Copy link
Collaborator

@zuochunwei zuochunwei commented May 17, 2023

Why I do this? please refer to discussion

We use the proto::Stream_Kind_xxx both for dwrf and orc, and this usage is dangerous, so I refactor this part by the following modifications:

  1. add another constructor with a param proto::orc::Stream for DwrfStreamIdentifier
  2. add another member method of forKind(const proto::orc::Stream_Kind kind) for EncodingKey
  3. using union to represent of proto::orc::StripeFooter* footerOrc_ for orc
  4. add StripeStreams::getEncodingOrc() for orc, the response to StripeStreams::getEncoding() for dwrf
  5. refactor loadStreams to avoid the mess
  6. add init for sub-classes of SelectiveColumnReader, check the format in init(), call initDwrf for dwrf and call initOrc for orc

@zuochunwei zuochunwei changed the title [WIP] refactor dwrf and orc [WIP] refactor the reader of format dwrf and orc May 17, 2023
@zuochunwei
Copy link
Collaborator Author

@rui-mo @JkSelf

@zuochunwei zuochunwei changed the title [WIP] refactor the reader of format dwrf and orc [DWIO] refactor the reader of dwrf/orc Jun 6, 2023
@rui-mo
Copy link
Collaborator

rui-mo commented Jun 6, 2023

Do we plan to open a PR to Velox upstream for this change?

}
}

void DwrfData::initDwrf(StripeStreams& stripe) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we use a little template magic to reduce code duplication?

void DwrfData::init(StripeStreams& stripe) {
  auto format = stripe.format();
  if (format == DwrfFormat::kDwrf) {
    init<DwrfFormat::kDwrf>(stripe);
  } else {
    VELOX_CHECK(format == DwrfFormat::kOrc);
    init<DwrfFormat::kOrc>(stripe);
  }
}

template <DwrfFormat kFormat>
void DwrfData::init(StripeStreams& stripe) {
  EncodingKey encodingKey{nodeType_->id, flatMapContext_.sequence};

  auto kind = kFormat == DwrfFormat::kDwrf ? proto::Stream_Kind_PRESENT : proto::orc::Stream_Kind_PRESENT;
  std::unique_ptr<dwio::common::SeekableInputStream> stream = stripe.getStream(
      encodingKey.forKind(kind), false);
  if (stream) {
    notNullDecoder_ = createBooleanRleDecoder(std::move(stream), encodingKey);
  }

  // We always initialize indexStream_ because indices are needed as
  // soon as there is a single filter that can trigger row group skips
  // anywhere in the reader tree. This is not known at construct time
  // because the first filter can come from a hash join or other run
  // time pushdown.
  kind = kFormat == DwrfFormat::kDwrf ? proto::Stream_Kind_ROW_INDEX : proto::orc::Stream_Kind_ROW_INDEX;
  indexStream_ = stripe.getStream(
      encodingKey.forKind(proto::orc::Stream_Kind_ROW_INDEX), false);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

Copy link
Collaborator Author

@zuochunwei zuochunwei Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the operator of condition ? a : b requires that type a is the same as type b, while proto::Stream_Kind_PRESENT is a different type from proto::orc::Stream_Kind_PRESENT, so it can't be compiled successfully.

let's keep it this way, a method for dwrf and another method for orc.

It's clean although it seems poor skills.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid these duplications as it would increase maintenance burden. We can do it like this instead:

template <DwrfFormat kFormat>
void DwrfData::init(StripeStreams& stripe) {
  EncodingKey encodingKey{nodeType_->id, flatMapContext_.sequence};
  auto id = kFormat == DwrfFormat::kDwrf ? encodingKey.forKind(proto::Stream_Kind_PRESENT) : encodingKey.forKind(proto::orc::Stream_Kind_PRESENT);
  std::unique_ptr<dwio::common::SeekableInputStream> stream = stripe.getStream(id, false);
  if (stream) {
    notNullDecoder_ = createBooleanRleDecoder(std::move(stream), encodingKey);
  }

  // We always initialize indexStream_ because indices are needed as
  // soon as there is a single filter that can trigger row group skips
  // anywhere in the reader tree. This is not known at construct time
  // because the first filter can come from a hash join or other run
  // time pushdown.
  id = kFormat == DwrfFormat::kDwrf ? encodingKey.forKind(proto::Stream_Kind_ROW_INDEX) : encodingKey.forKind(proto::orc::Stream_Kind_ROW_INDEX);
  indexStream_ = stripe.getStream(
      encodingKey.forKind(proto::orc::Stream_Kind_ROW_INDEX), false);
}

Copy link
Collaborator Author

@zuochunwei zuochunwei Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let‘s reduce the duplications in a new PR, I will ask my partner to do this for me in the following days.

He is a new member of our team, so it's a good start for him to make such changes.

@zuochunwei
Copy link
Collaborator Author

Do we plan to open a PR to Velox upstream for this change?

Yes

@zhejiangxiaomai
Copy link
Collaborator

if(NOT DEFINED CMAKE_INSTALL_LIBDIR)
set(CMAKE_INSTALL_LIBDIR "lib")
endif()

*/
std::string writerVersionToString(WriterVersion kind);

// Stream Kind of Dwrf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment starts with capitalized letter and ends with .. Same for below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

zuochunwei added 2 commits June 7, 2023 20:14
@rui-mo rui-mo merged commit e01c54f into oap-project:main Jun 8, 2023
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jun 9, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
@zuochunwei zuochunwei deleted the dwrfOrc branch June 19, 2023 06:14
zhejiangxiaomai pushed a commit that referenced this pull request Jun 20, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jun 25, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jun 25, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jun 26, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jun 27, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 3, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 4, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 11, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 12, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 12, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
zhejiangxiaomai pushed a commit to zhejiangxiaomai/velox that referenced this pull request Jul 17, 2023
Co-authored-by: zuochunwei <zuochunwei@meituan.com>
marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
…tpch workload (oap-project#261)

* add the workload for tpch

* Add the readme to tpch workload

* refine readme

Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>

Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants