Skip to content

very unstable #1

Open
Open
@salier

Description

I used cuda12.1 pytorch 2.1.2 and although it was successfully deployed, I have not yet successfully generated a model. I hope to receive help.

Contains the following errors:

CUDA kernel failed : no kernel image is available for execution on the device void group_points_kernel_wrapper(int, int, int, int, int, const float *, const int *, float *) at L:38 in D:\TriplaneGaussian\tgs\models\snowflake\pointnet2_ops_lib\pointnet2_ops_ext-src\src\group_points_gpu.cu
(It seems that my architecture does not support it) (Occasionally)

Traceback (most recent call last): File "", line 1, in File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main prepare(preparation_data) File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare _fixup_main_from_path(data['init_main_from_path']) File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path main_content = runpy.run_path(main_path, File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path return _run_module_code(code, init_globals, run_name, File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code _run_code(code, mod_globals, init_globals, File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "D:\TriplaneGaussian\gradio_app.py", line 39, in model = TGS(cfg=base_cfg.system).to(device) File "D:\TriplaneGaussian\infer.py", line 94, in init self.load_weights(self.cfg.weights, self.cfg.weights_ignore_modules) File "D:\TriplaneGaussian\infer.py", line 50, in load_weights state_dict = load_module_weights( File "D:\TriplaneGaussian\tgs\utils\misc.py", line 37, in load_module_weights ckpt = torch.load(path, map_location=map_location) File "D:\TriplaneGaussian\env\lib\site-packages\torch\serialization.py", line 993, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "D:\TriplaneGaussian\env\lib\site-packages\torch\serialization.py", line 447, in init super().init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
(Abnormal model reading) (Occasional)

Traceback (most recent call last): File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data data = self._data_queue.get(timeout=timeout) File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\queues.py", line 114, in get raise Empty _queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\TriplaneGaussian\env\lib\site-packages\gradio\queueing.py", line 456, in call_prediction output = await route_utils.call_process_api( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\blocks.py", line 1522, in process_api result = await self.call_function( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\blocks.py", line 1144, in call_function prediction = await anyio.to_thread.run_sync( File "D:\TriplaneGaussian\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "D:\TriplaneGaussian\env\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "D:\TriplaneGaussian\env\lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, *args) File "D:\TriplaneGaussian\env\lib\site-packages\gradio\utils.py", line 674, in wrapper response = f(*args, **kwargs) File "D:\TriplaneGaussian\gradio_app.py", line 111, in run infer(image_path, cam_dist, only_3dgs=True) File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\TriplaneGaussian\gradio_app.py", line 96, in infer for batch in dataloader: File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 630, in next data = self._next_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data idx, data = self._get_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1294, in _get_data success, data = self._try_get_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e RuntimeError: DataLoader worker (pid(s) 33816) exited unexpectedly Traceback (most recent call last): File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data data = self._data_queue.get(timeout=timeout) File "C:\Users\Sariel\AppData\Local\Programs\Python\Python310\lib\multiprocessing\queues.py", line 114, in get raise Empty _queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\TriplaneGaussian\env\lib\site-packages\gradio\queueing.py", line 456, in call_prediction output = await route_utils.call_process_api( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api output = await app.get_blocks().process_api( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\blocks.py", line 1522, in process_api result = await self.call_function( File "D:\TriplaneGaussian\env\lib\site-packages\gradio\blocks.py", line 1144, in call_function prediction = await anyio.to_thread.run_sync( File "D:\TriplaneGaussian\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "D:\TriplaneGaussian\env\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread return await future File "D:\TriplaneGaussian\env\lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, *args) File "D:\TriplaneGaussian\env\lib\site-packages\gradio\utils.py", line 674, in wrapper response = f(*args, **kwargs) File "D:\TriplaneGaussian\gradio_app.py", line 111, in run infer(image_path, cam_dist, only_3dgs=True) File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "D:\TriplaneGaussian\gradio_app.py", line 96, in infer for batch in dataloader: File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 630, in next data = self._next_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data idx, data = self._get_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1294, in _get_data success, data = self._try_get_data() File "D:\TriplaneGaussian\env\lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data raise RuntimeError(f'DataLoader worker (pid(s) {pids_str}) exited unexpectedly') from e RuntimeError: DataLoader worker (pid(s) 33816) exited unexpectedly

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "D:\TriplaneGaussian\env\lib\site-packages\gradio\queueing.py", line 501, in process_events response = await self.call_prediction(awake_events, batch) File "D:\TriplaneGaussian\env\lib\site-packages\gradio\queueing.py", line 465, in call_prediction raise Exception(str(error) if show_error else None) from error

Then accompanied by various error free crashes,

I have checked the online version using A10G, and I am using 4090 and 64G memory, which should not be insufficient. But the process often stops
_4R_0 I ( I{95L`@) D0

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions