Skip to content
This repository was archived by the owner on Aug 5, 2022. It is now read-only.
This repository was archived by the owner on Aug 5, 2022. It is now read-only.

Pod-manager cannot detect PSME #58

@rwleea

Description

@rwleea

Dear Intel,

I found an issue that is pod-manager can not detect PSME after restating PSME.

steps:

  1. Kill PSME restful server & compute/chassis agent
  2. Start PSME restful server & compute/chassis agent

After I perform those steps, my expectation is I can see this PSME in external_service, but actually I can not found it, so I check the log:

Log:

Thread-1 (Discovery job)
2018-04-07 23:47:22,404 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-1] INFO c.i.p.d.external.DiscoveryRunner - Polling data from ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME} started
2018-04-07 23:47:22,821 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-1] WARN c.i.p.d.external.DiscoveryRunner - Connection error while getting data from ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME} service - performing check on this service


Thread-8 (Service removal job)

2018-04-07 23:48:01,162 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-8] INFO c.i.p.d.external.ServiceRemovalTask[ServiceLifecycle] - ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME, unreachableSince=2018-04-07T23:47:23.235} is unreachable longer than PT0S - will be evicted.


Thread-3 (Service detect job)

2018-04-07 23:48:02,077 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-3] DEBUG c.i.p.d.e.ServiceDetectionListenerImpl - Service Detect ..Schedule deep disvocery for 52dc9db9-784a-4857-9c7c-3fab259fb13f
2018-04-07 23:48:02,078 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-3] DEBUG c.i.p.c.s.SerialExecutorImpl - Registering new async task for 52dc9db9-784a-4857-9c7c-3fab259fb13f
2018-04-07 23:48:02,198 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-3] INFO c.i.p.d.e.event.EventSubscribeRunner - Successfully subscribed to event service for '52dc9db9-784a-4857-9c7c-3fab259fb13f'
2018-04-07 23:48:02,200 [ForkJoinPool.commonPool-worker-1] DEBUG c.i.p.c.s.SerialExecutorImpl - Requesting execution of next ASYNC operation(DiscoveryRunner(52dc9db9-784a-4857-9c7c-3fab259fb13f)) for (52dc9db9-784a-4857-9c7c-3fab259fb13f)


Thread-7 (Discovery Job)

2018-04-07 23:48:02,216 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] INFO c.i.p.d.external.DiscoveryRunner - Polling data from ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME, unreachableSince=2018-04-07T23:47:23.235} started
2018-04-07 23:48:02,317 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] DEBUG c.i.p.c.e.u.l.MethodInvocationLoggingInterceptor - [Discovery] RestGraphBuilderImpl.build - started
2018-04-07 23:48:02,378 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] DEBUG c.i.p.c.e.u.l.MethodInvocationLoggingInterceptor - [Discovery] RestGraphBuilderImpl.build - ended, execution time: 60.98 ms
2018-04-07 23:48:05,381 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] DEBUG c.i.p.c.e.u.l.MethodInvocationLoggingInterceptor - [Discovery] EntityGraphMapper.map - started
2018-04-07 23:48:05,396 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] ERROR c.i.p.d.external.DiscoveryRunner - Error while polling data from ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME, unreachableSince=2018-04-07T23:47:23.235}
javax.ejb.EJBTransactionRolledbackException: there is no service with UUID '52dc9db9-784a-4857-9c7c-3fab259fb13f'


I think the sequence is like this:

  1. Thread 1 : pod-manager cannot connect to PSME (because I kill it)

  2. Thread 8 : Service removal task prepare to remove PSME from external_service table

  3. Thread 3 : service detect (because I start PSME)
    *** pod-manager got the old external_service data which thread 8 prepare to delete
    (you can observe it from
    2018-04-07 23:48:02,216 [EE-ManagedScheduledExecutorService-TasksExecutor-Thread-7] INFO c.i.p.d.external.DiscoveryRunner - Polling data from ExternalService {UUID=52dc9db9-784a-4857-9c7c-3fab259fb13f, baseUri=https://10.3.0.113:8407/redfish/v1, type=PSME, unreachableSince=2018-04-07T23:47:23.235} started, why new detect service has unreachableSince ?)

  4. Thread 7: pod-manager start to polling the PSME
    …. (Thread 8)remove the PSME from external_service table
    pod-manager throws exception: there is no service with UUID '52dc9db9-784a-4857-9c7c-3fab259fb13f'
    --> because the tuple in external_service is removed by Thread 8

And here is my assumption about why pod-manager can not persist new detect PSME to its database, even though this PSME is keep running.

  1. Pod-manager service detect thread read the old external_service data and then the old data are deleted by service removal thread. This result in service detect fail and the PSME doesn’t enter to external_service table.
  2. PSME is running and keep to send alive message, even though PSME doesn’t enter to external_service table, it is still in SSDP registry cache.
    When pod-manager receive m-search response or notification from this PSME, pod-manager will think this PSME is known service (due to it is already in SSDP registry cache), and never fire service detect task.

Could you please help to check this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions