Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezed after around 2 hours #9

Closed
YuqiHuai opened this issue Mar 16, 2024 · 9 comments
Closed

Freezed after around 2 hours #9

YuqiHuai opened this issue Mar 16, 2024 · 9 comments

Comments

@YuqiHuai
Copy link
Contributor

YuqiHuai commented Mar 16, 2024

Hi Mingfei,

After about 2 hours of running scenario 1, it got frozen. I attached the log from terminal below. Since SVL discontinued, I was using the SVL client you provided, and connected to a local cloud.

2024-03-16 05:13:48.237 | INFO     | common.simulator:load_map:64 - [Simulator] Loaded map: SanFrancisco_correct
2024-03-16 05:13:51.918 | INFO     | common.simulator:init_environment:246 - Load environment - Finish
2024-03-16 05:13:52.096 | INFO     | lgsvl.dreamview.dreamview:setup_apollo:314 - {'Camera': False, 'Canbus': False, 'Control': True, 'GPS': False, 'Guardian': False, 'Localization': True, 'Perception': False, 'Planning': True, 'Prediction': True, 'Radar': False, 'Recorder': False, 'Routing': True, 'Storytelling': True, 'Third Party Perception': False, 'Traffic Light': False, 'Transform': True, 'Velodyne': False}
2024-03-16 05:13:52.099 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Localization module...
2024-03-16 05:13:52.099 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Transform module...
2024-03-16 05:13:52.099 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Routing module...
2024-03-16 05:13:52.099 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Prediction module...
2024-03-16 05:13:52.100 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Planning module...
2024-03-16 05:13:52.100 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Control module...
2024-03-16 05:13:52.100 | INFO     | lgsvl.dreamview.dreamview:enable_apollo:283 - Starting Storytelling module...
2024-03-16 05:13:52.152 | INFO     | lgsvl.dreamview.dreamview:on_control_received:332 - Control message received
2024-03-16 05:13:54.193 | INFO     | common.simulator:run:316 - [Simulator] Set Apollo (EGO) destination: -435.52519353421843,410.3606234656769
nohup: appending output to 'nohup.out'
2024-03-16 05:13:57.365 | INFO     | lgsvl.simulator:run_custom:114 - [PythonAPI] simulator.run_custom
2024-03-16 05:13:57.365 | INFO     | lgsvl.remote:command_run:118 - [PythonAPI] Start Running
2024-03-16 05:14:28.533 | INFO     | common.simulator:run:457 - simulation finished, total frames: 301
2024-03-16 05:14:33.210 | INFO     | common.simulator:run:477 - [Simulator] Restart all simulator modules in case high delays.

Have you seen this issue before or is this likely a SVL issue due to not using official cloud?

@YuqiHuai
Copy link
Contributor Author

YuqiHuai commented Mar 16, 2024

I just terminated the process after submitting this issue and noticed the Traceback being

  File "/apollo/./bazel-bin/BehAVExplor/main.runfiles/apollo/BehAVExplor/main.py", line 150, in <module>
    fuzzer.loop(int(params['total_test_time'])) # seconds
  File "/apollo/./bazel-bin/BehAVExplor/main.runfiles/apollo/BehAVExplor/main.py", line 119, in loop
    scenario_recorder, scenario_id = runner.run(scenario_obj)
  File "/apollo/BehAVExplor/common/runner.py", line 52, in run
    sim_recorder = self.sim.run(scenario_obj, scenario_id, self.record_apollo_path)
  File "/apollo/BehAVExplor/common/simulator.py", line 478, in run
    utils.close_modules(dv, self.modules)
  File "/apollo/BehAVExplor/common/utils.py", line 13, in close_modules
    module_status = dv.get_module_status()
  File "/home/yuqi/.local/lib/python3.6/site-packages/lgsvl/dreamview/dreamview.py", line 221, in get_module_status
    self.ws.recv()

So it is actually Dreamview's websocket being broken and I have seen this before when we frequently communicate with Dreamview over socket. Is BehAVExplor expecting me to manually restart Dreamview when this problem happens?

@MingfeiCheng
Copy link
Owner

Hi Yuqi,

Sorry, I dont recall facing a similar situation before. So, I am not sure how to solve this issue effectively. Restart maybe a good solution. Thanks.

@YuqiHuai
Copy link
Contributor Author

Hi Mingfei,

I ran BehAVExplor again and this time it seems to move on smoothly. However, after a few hours, I can no longer access Dreamview on localhost:8888, and BehAVExplor is still running.

See screenshot
Screenshot from 2024-03-18 10-06-24

I've reported this to Apollo before but I cannot provide enough information for them to debug the issue.
ApolloAuto/apollo#13134 (comment)

Dreamview's log suggests its backend is still working

I0318 10:03:33.893414 4047347 simulation_world_updater.cc:656] Constructed RoutingRequest to be sent:
waypoint {
  id: "lane_477"
  s: 18.651878762971553
  pose {
    x: 593241.49687995552
    y: 4135030.957659022
  }
}
waypoint {
  id: "lane_570"
  s: 59.999857800182347
  pose {
    x: 593130.360748291
    y: 4134914.525177
  }
}
W0318 10:03:36.611380 4047413 rate.cc:96] Detect forward jumps in time
I0318 10:03:38.099026 4047342 simulation_world_service.h:240] Has not received any data from /apollo/audio_detection
W0318 10:03:40.798460 4047413 rate.cc:96] Detect forward jumps in time

So likely this is an issue of Apollo's Dreamview and I'll close this issue.

@looles
Copy link

looles commented Sep 3, 2024

@YuqiHuai Hi, I'm having a similar problem, I get the following error when I first run it, can you help me out?image

@YuqiHuai
Copy link
Contributor Author

YuqiHuai commented Sep 3, 2024

When you first run it? Did you compile an Apollo first? This looks like you have not compiled it, or you compiled it under root but not regular user (inside docker).

@looles
Copy link

looles commented Sep 3, 2024

  1. I am running it for the first time; 2. Apollo is already compiled; 3. I retried it again with a normal user and it doesn't work;
    image
    image
    image
    image
    Can you determine where I am having problems with the steps based on the picture above?

@YuqiHuai
Copy link
Contributor Author

YuqiHuai commented Sep 3, 2024

Nope, cannot determine the issue yet. When you enter the container, can you run ‘cyber_recorder’?

@looles
Copy link

looles commented Sep 3, 2024

Hi, I have installed "cyber_recorder" as per the tutorial, but it still doesn't work.

@looles
Copy link

looles commented Sep 3, 2024

Hi, I have installed "cyber_recorder" as per the tutorial, but it still doesn't work.
Uploading image.png…

Uploading image.png…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants