Skip to content

Conversation

@iverase
Copy link
Contributor

@iverase iverase commented Sep 27, 2023

Adds a new method to RandomAccessInput tio bulk read bytes into a provided byte array. The default implementation reads byte by byte but faster implementations are provided for all lucene implementations.

@uschindler I tried to provide implementations for the MemorySegmentIndexInputProvider. I am a bit confused because I run TestMmapDirectory several times hoping it will cover single and multiple segments.I never saw it running woth multiplr segments, is there another test or something I need to do? Anyway, it would be great if you can review it.

closes #12599

@iverase iverase requested a review from uschindler September 27, 2023 13:36
@uschindler
Copy link
Contributor

Hi,

@uschindler I tried to provide implementations for the MemorySegmentIndexInputProvider. I am a bit confused because I run TestMmapDirectory several times hoping it will cover single and multiple segments.I never saw it running woth multiplr segments, is there another test or something I need to do? Anyway, it would be great if you can review it.

Write a test in TestMultiMmap. The other test is only about general setup. The "cool tests" using multiple segments or multiple buffers is there.

@uschindler
Copy link
Contributor

I need more time to review this as I am a bit crowded. Sorry for possible delays.

If you get tests running with all kinds of mmap, just give me some feedback.

@uschindler uschindler marked this pull request as draft September 27, 2023 14:34
@uschindler
Copy link
Contributor

I changed to draft until all is tested.

@iverase iverase marked this pull request as ready for review October 2, 2023 09:19
Copy link
Contributor

@uschindler uschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the code looks fine. To be safe, I will change the Policeman Jenkins Job temporarily to your branch.

I left some minor comments about method placement and a future simplification. In general, once we add Java 22 support (final version of Java Panama FFI API), I try to cleanup the older version.

At least in main (and possibly 9.x) we should remove Java 19 and Java 20 support (out of maintenance) and only keep Java 21+ for Panama Vector and Panama MMap.

@uschindler
Copy link
Contributor

uschindler commented Oct 3, 2023

I changed the https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/ (Linux MMAP Job) to use your branch. Please wait a bit until the checker is happy, as it tries all different java versions. This gives us confidence. Except the MultiMMap test do we have other productive ode already using the new API?

After merging just ping me to revert the branch settings in the policeman job - of course do not delete the branch!

The first run using this branch is: 1287 -- https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/1287/

Uwe

@uschindler
Copy link
Contributor

uschindler commented Oct 3, 2023

In general to me it is still questionable if we really need a bulk random access byte[] support in RandomaccessInput as this is not required by "core Lucene". I am partly agree with this, but if somebody asks for float[] or long[] bulk reads with positional access I will -1 that.

So I leave it up to others to decide if this makes sense not only for DocValues but possibly also other code in Lucene.

@iverase
Copy link
Contributor Author

iverase commented Oct 3, 2023

I am just trying to have an abstraction that can replace the BytesRef output for binary doc values with something that does not impose the internal representation of the bytes like BytesRef does. I hope this will avoid some extra byte copying when reading and when writing and avoid the risk of a humongous allocation when reading binary doc values.

I am ok if we said that RandomaccessInput is not the right abstraction and we need something else. On the other hand it seems to fit well.

@iverase iverase merged commit c4694c3 into apache:main Oct 4, 2023
@iverase iverase deleted the randomaccessinput-readbytes branch October 4, 2023 14:23
@iverase
Copy link
Contributor Author

iverase commented Oct 4, 2023

@uschindler I merged the change.

I tried to backported but it is not possible ByteBuffer#get(int, byte[], int, int) is not available in the java version on line 9.x. I think it is ok to leave this change until 10.

@iverase iverase added this to the 10.0.0 milestone Oct 4, 2023
@uschindler
Copy link
Contributor

uschindler commented Oct 4, 2023

Hi @iverase,
oh yeah. The absolute ByteBuffer gets are not available in older Java versions.

If you want to backport, you could create a temporary ByteBuffer slice, but if you're fine to delay this change, I am fine. An alternative is to not implement the method and fallback to the default?!

I changed the Policeman Jenkins MMAP job back to Lucene Main branch. The next run would have failed as you deleted the branch!

@uschindler
Copy link
Contributor

P.S.: See docs here. The method came with Java 13.

@uschindler
Copy link
Contributor

@iverase, I think you have to move the changes entry to Lucene 10.

@iverase
Copy link
Contributor Author

iverase commented Oct 4, 2023

@iverase, I think you have to move the changes entry to Lucene 10.

I did it already in ba74da1

I changed the Policeman Jenkins MMAP job back to Lucene Main branch. The next run would have failed as you deleted the branch!

Oops, I delete them mechanically, glad you changed quickly.

An alternative is to not implement the method and fallback to the default?!

I think that is the best approach, We can do that if there is need to have it in the 9.x branch.

s1monw pushed a commit to s1monw/lucene that referenced this pull request Oct 10, 2023
Adds a new method to RandomAccessInput tio bulk read bytes into a provided byte array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add readBytes method to RandomAccessInput

2 participants