Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the length of the array exceeds 1073741824, it will cause the current thread to be in a blocked state #21918

Closed
tzSharing opened this issue Jul 23, 2024 · 13 comments
Labels
Bug This tag is applied to issues which reports bugs.

Comments

@tzSharing
Copy link
Contributor

tzSharing commented Jul 23, 2024

Describe the bug

adding an element to an array, when its length exceeds 1073741824, causes the current thread to block without any exception information

Reproduction Steps

V code:

fn main() {
	mut arr := []u8{}
	buf := [1024]u8{}
	for {
		if arr.len < 1073741400 {
			arr << buf[..]
		} else {
			arr << 1 // when len > 1073741824    thread blocking
		}
		println('len=${arr.len}')
		if arr.len >= max_int { // max_int=2147483647
			break
		}
	}
	println('main  len=${arr.len}')
}

output:

...
len = 1073729536
len = 1073730560
len = 1073731584
len = 1073732608
len = 1073733632
len = 1073734656
len = 1073735680
len = 1073736704
len = 1073737728
len = 1073738752
len = 1073739776
len = 1073740800
len = 1073741824 
^ C
C:\Users\TZ\Desktop>

The main thread is blocked and can only be forcibly interrupted.
I tried running the same code with Kotlin to see the results, and it can be seen that Kotlin did not encounter any thread blocking issues and provided exception information. It may be that the memory requested by the Java virtual machine is too small. Most importantly, it provided location information, which is crucial for locating errors, especially when troubleshooting in complex projects.

kotlin code:

fun main(){
	val arr=mutableListOf<Int>()
	val buf=List(1024,{it})
	while (true){
		arr.addAll(buf)
		println(arr.size)
		if(arr.size>=Int.MAX_VALUE){
			println("Break out of the loop")
			break
		}
	}
	println("main  arr size=${arr.size}")
}

output:

...
34466816
34467840
34468864
34469888
34470912
34471936
34472960
34473984
34475008
34476032
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3512)
        at java.base/java.util.Arrays.copyOf(Arrays.java:3481)
        at java.base/java.util.ArrayList.grow(ArrayList.java:238)
        at java.base/java.util.ArrayList.addAll(ArrayList.java:761)
        at TestKt.main(test.kt:5)
        at TestKt.main(test.kt)
        at java.base/java.lang.invoke.LambdaForm$DMH/0x00000141d600f800.invokeStatic(LambdaForm$DMH)
        at java.base/java.lang.invoke.LambdaForm$MH/0x00000141d6010800.invoke(LambdaForm$MH)
        at java.base/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(DirectMethodHandleAccessor.java:154)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.jetbrains.kotlin.runner.AbstractRunner.run(runners.kt:70)
        at org.jetbrains.kotlin.runner.Main.run(Main.kt:183)
        at org.jetbrains.kotlin.runner.Main.main(Main.kt:193)

C:\Users\TZ\Desktop>

Expected Behavior

Since the length of the array is of type int, the length should be<=max_int. Even if there is a failure during the addition process due to various reasons, there should be an exception message instead of blocking the thread.

Current Behavior

The running thread is blocked

Possible Solution

No response

Additional Information/Context

No response

V version

V 0.4.6 fd7986c

Environment details (OS name and version, etc.)

C:\Users\TZ\Desktop>v doctor
V full version: V 0.4.6 fd7986c
OS: windows, Microsoft Windows 10 רҵ v19045 64 λ
Processor: 4 cpus, 64bit, little endian,

getwd: C:\Users\TZ\Desktop
vexe: D:\v\v.exe
vexe mtime: 2024-07-23 12:42:44

vroot: OK, value: D:\v
VMODULES: OK, value: C:\Users\TZ\.vmodules
VTMP: OK, value: C:\Users\TZ\AppData\Local\Temp\v_0

Git version: git version 2.45.1.windows.1
Git vroot status: weekly.2024.30-4-gfd7986c7 (4 commit(s) behind V master)
.git/config present: true

CC version: Error: 'cc' is not recognized as an internal or external command,
operable program or batch file.

thirdparty/tcc status: thirdparty-windows-amd64 b425ac82

Note

You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

@tzSharing tzSharing added the Bug This tag is applied to issues which reports bugs. label Jul 23, 2024
@JalonSolov
Copy link
Contributor

The main difference is that Java has a runtime, so extra checks like this can be done at runtime. If you want this with V, you will need to do the checks yourself, since V doesn't have a runtime.

If you really need to work with huge arrays like this, it would be best to declare a struct, with the @[heap] attribute on it to ensure it's never created on the stack. Put the array in the struct and always access it from there.

Otherwise, you can try increasing values with the V command line option -thread-stack-size <number of bytes> when you build/run your code, until you give a large enough value to make it work.

@tzSharing
Copy link
Contributor Author

tzSharing commented Jul 24, 2024

The main difference is that Java has a runtime, so extra checks like this can be done at runtime. If you want this with V, you will need to do the checks yourself, since V doesn't have a runtime.

If you really need to work with huge arrays like this, it would be best to declare a struct, with the @[heap] attribute on it to ensure it's never created on the stack. Put the array in the struct and always access it from there.

Otherwise, you can try increasing values with the V command line option -thread-stack-size <number of bytes> when you build/run your code, until you give a large enough value to make it work.

A brief description of my usage scenario at that time may help to understand. I tried to write a file transfer tool, using the API about network IO in the v standard library. I have implemented the file transfer function, and tested the transfer of files from several Mb to hundreds of Mb in size. Everything is normal. But when I tested a movie file transfer of about 2Gb in size, I waited about 30 minutes and still did not complete the transfer. I realized that there may be a problem, because using the operating system's Ctrl + V only takes about 30 seconds. At first I thought my code was not efficient, but after several hours of investigation, I found that the cause was the thread blocking caused by the addition of element code to the v array.
In addition, I also made a new discovery:
Extract part of the code:

...
file_path := 'C:\\Users\\TZ\\Desktop\\星际穿越.rmvb'
mut file := os.open( file_path ) or { eprintln( err.str() ) return }
defer { file.close() }
bytes := file.read_bytes( int( os.file_size( file_path ) ) )  //Module: os  method signature:`pub fn (f &File) read_bytes(size int) []u8`
println('file bytes len=${bytes.len}') //file bytes len=1911250979
...

The array length of the final printout is 1911250979, which obviously exceeds 1073741824. pub fn (f & File) read_bytes (size int) [] u8 in Module: os completely reads all the data and put a u8 [], and there are no blocking threads or out of memory or other problems. Prove that the array length is allowed to exceed 1073741824. As for the question I raised, it is most likely due to a bug in the implementation of the add element method (arr < < element) of the array.

@kbkpbot
Copy link
Contributor

kbkpbot commented Jul 24, 2024

It seems that the array length can't exceed 1073741824, #17958

@spytheman
Copy link
Member

Yes, it is a known problem, caused by V using 32 bit integers for the .len and .cap fields of its dynamic arrays, and the default strategy, for growing a dynamic array (doubling the allocated memory, when the capacity is reached).

@spytheman
Copy link
Member

spytheman commented Jul 27, 2024

In the read_bytes's case, the resulting array's capacity is known beforehand (by the file size), thus it is allocated right away, without needing to reallocate anything after that, and so the array can reach up to 2^31 - 1 = 2147483647 bytes in that way, without overflowing its .cap and .len .

@spytheman
Copy link
Member

As for the behavior, it should indeed better panic in that case, with a stacktrace, instead of looping indefinitely due to the overflow of the .cap field.
I have a local patch to do it, but I have to run more tests before making a PR:
image

@medvednikov
Copy link
Member

I'm making int 64 bit on 64 bit systems right now. So this issue will be fixed soon.

@tzSharing
Copy link
Contributor Author

Okay, I see.

spytheman added a commit that referenced this issue Jul 28, 2024
…1, instead of overflowing a.cap (partial fix for #21918) (#21947)
@spytheman
Copy link
Member

spytheman commented Jul 30, 2024

With latest V, it will print a backtrace on reaching the limit, and that will be at 2^31, not 2^30 (as a temporary fix, until the int becomes 64 bit).

@tzSharing
Copy link
Contributor Author

With latest V, it will print a backtrace on reaching the limit, and that will be at 2^31, not 2^30 (as a temporary fix, until the int becomes 64 bit).

Yes, I have tried it and it works fine. But there are still some things I can't figure out, what happens when the int type comes to a 64-bit system? Will the number of bytes occupied change? I remember the documentation saying "Unlike C and Go, int is always a 32-bit integer". And when it comes, will your changes be reverted? Do programmers still need to check for out-of-bounds by themselves?

@medvednikov
Copy link
Member

array.len will be i64, so the size of the array structure will grow
this check will probably still stay

@spytheman
Copy link
Member

I consider this specific issue solved (the blocking behavior due to the infinite loop is fixed, the memory limit is increased, and there is better diagnostic now, when the limit is reached).

@medvednikov, @tzSharing do you agree?

@tzSharing
Copy link
Contributor Author

Yes, I agree. @spytheman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug This tag is applied to issues which reports bugs.
Projects
None yet
Development

No branches or pull requests

5 participants