Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixie doesn’t work on RKE #450

Open
aslafy-z opened this issue Jun 2, 2022 · 3 comments
Open

Pixie doesn’t work on RKE #450

aslafy-z opened this issue Jun 2, 2022 · 3 comments
Assignees
Labels
area/datacollector Issues related to Stirling (datacollector) kind/compatibility Compatibility related to either K8s, architecture, library versions priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@aslafy-z
Copy link

aslafy-z commented Jun 2, 2022

Describe the bug

Some tables are not found: http_events, dns_events, conn_stats.

Vizier PEM log the following:

pem cannot attach kprobe, probe entry may not exist
Source Connector (registry name=socket_tracer) not instantiated, error: Internal : Unable to attach kprobe for connect using syscall__probe_entry_connect

To Reproduce
Steps to reproduce the behavior:

  1. Apply these manifests (derived by the px deploy --extract_yaml command) to add patches that changes the registry: pixie_yamls_public.zip
  2. Wait for the vizier-pem pods to spawn in the px namespace.
  3. Check their logs - errors
  4. Open the UI and check for metrics - empty

Expected behavior

Vizier PEM registers the probe and metrics fills up the tables.

Screenshots
If applicable, add screenshots to help explain your problem. Please make sure the screenshot does not contain any sensitive information such as API keys or access tokens.

Logs
Output of px collect-logs:

pixie_logs_20220530115134.zip

Output of px.display(px.DataFrame("stirling_error")):

pixie-errors.log

App information (please complete the following information):

  • Pixie images:
- repository: operator-framework/olm
  host: quay.io
  digest: sha256:b706ee6583c4c3cf8059d44234c8a4505804adcc742bcddb3d1e2f6eff3d6519
- repository: pixie-oss/pixie-prod/operator/bundle_index
  host: gcr.io
  tag: 0.0.1
- repository: operator-framework/configmap-operator-registry
  host: quay.io
  tag: latest
- repository: coreos/etcd
  host: quay.io
  tag: v3.4.3
- repository: nats
  host: docker.io
  tag: 2.4.0-alpine3.14
- repository: pixie-oss/pixie-dev-public/curl
  host: gcr.io
  tag: 1.0
- repository: pixie-oss/pixie-prod/vizier/pem_image
  host: gcr.io
  tag: 0.11.1
- repository: pixie-oss/pixie-prod/vizier/query_broker_server_image
  host: gcr.io
  tag: 0.11.1
- repository: pixie-oss/pixie-prod/vizier/metadata_server_image
  host: gcr.io
  tag: 0.11.1
- repository: pixie-oss/pixie-prod/vizier/cloud_connector_server_image
  host: gcr.io
  tag: 0.11.1
- repository: pixie-oss/pixie-prod/vizier/kelvin_image
  host: gcr.io
  tag: 0.11.1
  • K8s cluster version: RKE v1.20.9
  • Node Kernel version: 5.13.0-1017-azure

Additional context
This issue was submitted to slack first: https://pixie-community.slack.com/archives/CQ63KEVFY/p1653649104892539

@aimichelle aimichelle changed the title vizier-pem pods fail with "pem cannot attach kprobe" Pixie doesn’t work on RKE Jun 8, 2022
@yzhao1012
Copy link
Contributor

Sorry for the delay. I followed rancher quickstart on gcp, and managed to install pixie on one of the created kubernetes clusters.

Aftwards, I am able to see the http traffic for pod vizier-operator-77f959df76-k7blz, as shown in the attached screenshot. The service graph shows the pod are sending and receiving http requests & responses. And the other screenshot shows that the pod is created by Rancher.

One thing I am not clear yet:
Is this the same RKE as you originally mentioned? Google RKE on gcp returns the instructions I followed. rancher quickstart on gcp.
And we do not usually test on AWS.

Screen Shot 2022-06-13 at 1 52 41 PM

Screen Shot 2022-06-13 at 1 52 55 PM

@aslafy-z
Copy link
Author

https://rancher.com/docs/rancher/v2.6/en/quick-start-guide/deployment/google-gcp-qs/ is for deploying Rancher Server (not RKE), on a K3S cluster. This Rancher Server can then be used to provision a RKE cluster, by starting a agent on linux nodes (guide: https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/rke-clusters/custom-nodes/, node requirements: https://rancher.com/docs/rancher/v2.6/en/cluster-provisioning/node-requirements/).

Also, I'm using a hardened ubuntu image, i'll try with a vanilla Ubuntu 20.04. Thank you for your time @yzhao1012

@zasgar zasgar added kind/feature New feature or request triage/needs-information Indicates an issue needs more information in order to work on it. area/datacollector Issues related to Stirling (datacollector) priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. kind/compatibility Compatibility related to either K8s, architecture, library versions and removed kind/feature New feature or request labels Jun 15, 2022
@yzhao1012
Copy link
Contributor

Docker (or any container-based kubernetes desktop environment) is not compatible with Pixie (PEM specifically).

The reason is that PEM needs to access the host kernel's system filesystem to compile BPF code.
Container-based kubernetes desktop environment hides the host filesystem from PEM and breaks it.

I probably can find time later to experiment with RKE following #450 (comment); but if you are sure that a container-based environment supports the REK' host runitme (something equivalent to a normal VM), then the above mentioned problem will cause failure.

The workaround is to use a VM or normal host based container runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/datacollector Issues related to Stirling (datacollector) kind/compatibility Compatibility related to either K8s, architecture, library versions priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

3 participants