Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a2109e0
Merge pull request #1 from microsoft/main
Tatsuya-hasegawa Mar 17, 2023
170bd0e
Merge branch 'microsoft:main' into main
Tatsuya-hasegawa Mar 27, 2023
40a3520
Merge branch 'microsoft:main' into main
Tatsuya-hasegawa Apr 18, 2023
0dfb9bd
Merge branch 'microsoft:main' into main
Tatsuya-hasegawa Apr 18, 2023
ef63223
Merge pull request #2 from microsoft/main
Tatsuya-hasegawa Jul 27, 2023
cb4986d
Merge branch 'microsoft:main' into main
Tatsuya-hasegawa Aug 29, 2023
9f32942
Merge branch 'main' of https://github.com/Tatsuya-hasegawa/msticpy
Tatsuya-hasegawa Oct 6, 2023
2b6a353
Merge branch 'main' of https://github.com/Tatsuya-hasegawa/msticpy
Tatsuya-hasegawa Mar 29, 2024
51384b2
Merge branch 'main' of https://github.com/Tatsuya-hasegawa/msticpy
Tatsuya-hasegawa May 10, 2024
1064652
add_post_data_styles_to_splunk_uploader
Tatsuya-hasegawa May 10, 2024
79da7be
fix the argument name and add tests for them (no verify)
Tatsuya-hasegawa May 21, 2024
3287e43
Merge branch 'main' into mod_splunk_uploader_post
ianhelle May 24, 2024
a812dbf
Merge branch 'main' into mod_splunk_uploader_post
ianhelle May 28, 2024
c0edebf
fix the new argument position
Tatsuya-hasegawa May 31, 2024
c40ed2e
fix the new argument position and slightly modified index_name param …
Tatsuya-hasegawa May 31, 2024
ea49ab4
Merge branch 'mod_splunk_uploader_post' of https://github.com/Tatsuya…
Tatsuya-hasegawa May 31, 2024
70d0413
Merge branch 'main' into mod_splunk_uploader_post
ianhelle Jun 25, 2024
d9745b8
Merge branch 'main' into mod_splunk_uploader_post
ianhelle Jul 3, 2024
5439943
Merge branch 'main' into mod_splunk_uploader_post
ianhelle Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 37 additions & 17 deletions docs/source/data_acquisition/UploadData.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ Instantiating the Splunk uploader
The first step in uploading data is to instantiate an uploader for the location we wish to upload data to.
For Splunk there are three parameters that need to be passed at this stage, the Splunk host name, a username,
and a password. You can also pass a parameter for ``port``, by default this value is 8089.
In addition, The security auth token of ``bearer_token`` can be also passed instead of username and password as same as Splunk QueryProvider.
In addition, The security auth token of ``bearer_token`` can be also passed
instead of username and password as same as Splunk QueryProvider.

.. code:: ipython3

Expand All @@ -97,32 +98,43 @@ On the other hand, You can use the stored credentials in msticpyconfig.yaml to S
from msticpy.data.uploaders.splunk_uploader import SplunkUploader
spup = SplunkUploader()

*Note: Due to the way Splunk API's work the time taken to upload a file to Splunk can be significantly longer than
with Log Analytics.*
*Note: Due to the way Splunk API's work the time taken to upload a file to
Splunk can be significantly longer than with Log Analytics.*

Uploading a DataFrame to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a Pandas DataFrame to Splunk you simply pass the DataFrame to ``.upload_df()`` along with the name of a table,
and index you wish the data to be uploaded to. If the index provided does not exist and you want it to be created,
To upload a Pandas DataFrame to Splunk you simply pass the DataFrame to ``.upload_df()`` along with index you wish the data to be uploaded to.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively and **json** is by default.
``table_name`` parameter remains for the backward compatibility.
If the index provided does not exist and you want it to be created,
you can pass the parameter ``create_index = True``.

.. Note – table name for Splunk refers to sourcetype.
.. Note – table name for Splunk refers to source type.

.. code:: ipython3

spup.upload_df(data=DATAFRAME, table_name=TABLE_NAME, index_name=INDEX_NAME)
spup.upload_df(data=DATAFRAME, index_name=INDEX_NAME)

During upload a progress bar will be shown showing the upload process of the upload.

Uploading a File to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a file to Splunk pass the path to the file to ``.upload_file()`` along with the name of the index you
want the data uploaded to. By default a comma separated value file is expected but if you have some other separator
value you can pass this with the ``delim`` parameter. You can specify a table name to upload the data to with that ``table_name``
parameter but by default the uploader will upload to a table with the same name as the file. As with uploading a DataFrame
if the index provided does not exist and you want it to be created, you can pass the parameter ``create_index = True``.
To upload a file to Splunk pass the path to the file to ``.upload_file()`` along with the name of
the index you want the data uploaded to.
By default, a comma separated value file is expected but if your file has
some other separator value you can pass this with the ``delim`` parameter.
You can specify the sourcetype to upload the data to with that ``source_type`` parameter
but by default the uploader will upload to the sourcetype with the same name as the file.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
only for the backward compatibility.
As with uploading a DataFrame
if the index provided does not exist and you want it to be created, you can pass
the parameter ``create_index = True``.

.. code:: ipython3

Expand All @@ -131,16 +143,24 @@ if the index provided does not exist and you want it to be created, you can pass
Uploading a Folder to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also upload a whole folder of files. To do this simply pass the folder path to ``.upload_folder()`` along with the
You can also upload a whole folder of files. To do this simply pass the folder path to
``.upload_folder()`` along with the
name of the index you want the data uploaded to. By default this will upload all csv files in that folder to Splunk,
with each file being uploaded to a sourcetype with a name corresponding to the file name. Alternatively you can also
specify single a table sourcetype which all files will be uploaded with the ``table_name`` parameter. If you have some
specify single a sourcetype which all files will be uploaded with the ``source_type`` parameter.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
only for the backward compatibility.
If your files have some
other separated value file type you can pass ``delim``, and the specified delimiter value, however currently there is
only support for a single delim type across files. By default this method attempts to upload all files in the specified
folders, if you want to only process certain file extensions you can pass the ``glob`` keyword parameter with the a pattern
for files to attempt to upload. The pattern format required follows the ``pathlib.glob()`` pattern - more details are
folders, if you want to only process certain file extensions you can pass the ``glob`` keyword parameter
with the a pattern for files to attempt to upload.
The pattern format required follows the ``pathlib.glob()`` pattern - more details are
avaliable `here <"https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_
As with the other methods if the index provided does not exist and you want it to be created, you can pass the parameter ``create_index = True``.
As with the other methods if the index provided does not exist and you want it to be created,
you can pass the parameter ``create_index = True``.

.. code:: ipython3

Expand Down
90 changes: 68 additions & 22 deletions msticpy/data/uploaders/splunk_uploader.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def _post_data(
self,
data: pd.DataFrame,
index_name: str,
table_name: Any,
source_type: Any,
host: str = None,
**kwargs,
):
Expand All @@ -85,10 +85,12 @@ def _post_data(
Data to upload.
index_name : str
Name of the Splunk Index to add data to.
table_name : str
source_type : str
The sourcetype in Splunk data will be uploaded to.
csv, json or other can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
host : str, optional
The hostname associated with the uploaded data, by default "Upload".
The hostname associated with the uploaded data, by default "msticpy_splunk_uploader".

"""
if not self.connected:
Expand All @@ -97,28 +99,36 @@ def _post_data(
title="Splunk host not connected",
)
if not host:
host = "Upload"
host = "msticpy_splunk_uploader"
index = self._load_index(index_name, kwargs.get("create_index", False))
progress = tqdm(total=len(data.index), desc="Rows", position=0)
source_types = []
for row in data.iterrows():
data = row[1].to_csv() # type: ignore
if source_type == "json":
data = row[1].to_json() # type: ignore
elif source_type == "csv":
data = row[1].to_csv() # type: ignore
else:
data = row[1].to_string() # type: ignore
try:
data.encode(encoding="latin-1") # type: ignore
except UnicodeEncodeError:
data = data.encode(encoding="utf-8") # type: ignore
index.submit(data, sourcetype=table_name, host=host)
index.submit(data, sourcetype=source_type, host=host)
source_types.append(source_type)
progress.update(1)
progress.close()
if self._debug is True:
print("Upload complete")
print(f"Upload complete: Splunk sourcetype = {source_types}")

# pylint: disable=arguments-differ
def upload_df( # type: ignore
self,
data: pd.DataFrame,
table_name: Optional[str],
index_name: str,
table_name: Optional[str] = None,
index_name: Optional[str] = None,
create_index: bool = False,
source_type: Optional[str] = None,
**kwargs,
):
"""
Expand All @@ -128,8 +138,13 @@ def upload_df( # type: ignore
----------
data : pd.DataFrame
Data to upload.
table_name : str
source_type : str, optional
The sourcetype in Splunk data will be uploaded to.
csv, json or other can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
"json" is by default.
table_name: str, optional
The backward compatibility of source_type.
index_name : str
Name of the Splunk Index to add data to.
host : str, optional
Expand All @@ -138,6 +153,12 @@ def upload_df( # type: ignore
Set this to true to create the index if it doesn't already exist. Default is False.

"""
if not source_type:
if not table_name:
source_type = "json"
else:
source_type = table_name

host = kwargs.get("host", None)
if not index_name:
raise ValueError("parameter `index_name` must be specified")
Expand All @@ -148,10 +169,10 @@ def upload_df( # type: ignore
)
self._post_data(
data=data,
table_name=table_name,
index_name=index_name,
create_index=create_index,
host=host,
source_type=source_type,
create_index=create_index,
)

def upload_file( # type: ignore
Expand All @@ -161,6 +182,7 @@ def upload_file( # type: ignore
delim: str = ",",
index_name: Optional[str] = None,
create_index: bool = False,
source_type: Optional[str] = None,
**kwargs,
):
"""
Expand All @@ -172,9 +194,14 @@ def upload_file( # type: ignore
Path to the file to upload.
index_name : str
Name of the Splunk Index to add data to.
table_name : str, optional
source_name : str, optional
The sourcetype in Splunk data will be uploaded to.
csv, json or other can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
If not set the file name will be used.
"json" is by default.
table_name: str, optional
The backward compatibility of source_type.
delim : str, optional
Seperator value in file, by default ","
host : str, optional
Expand All @@ -195,13 +222,17 @@ def upload_file( # type: ignore
"Incorrect file type.",
) from parse_err

if not table_name:
table_name = path.stem
if not source_type:
if table_name:
source_type = table_name
else:
source_type = path.stem

self._post_data(
data=data,
table_name=table_name,
index_name=index_name,
host=host,
source_type=source_type,
create_index=create_index,
)

Expand All @@ -212,6 +243,7 @@ def upload_folder( # type: ignore
delim: str = ",",
index_name: Optional[str] = None,
create_index=False,
source_type: Optional[str] = None,
**kwargs,
):
"""
Expand All @@ -223,9 +255,13 @@ def upload_folder( # type: ignore
Path to folder to upload.
index_name : str
Name of the Splunk Index to add data to, if it doesn't exist it will be created.
table_name : str, optional
source_type : str, optional
The sourcetype in Splunk data will be uploaded to.
If not set the file name will be used.
csv, json or other can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
If not set the file name will be used. "json" is by default.
table_name: str, optional
The backward compatibility of source_type.
delim : str, optional
Seperator value in files, by default ","
host : str, optional
Expand All @@ -248,18 +284,28 @@ def upload_folder( # type: ignore
"The file specified is not a seperated value file.",
title="Incorrect file type.",
) from parse_err
if not table_name:
table_name = path.stem

if not source_type:
if table_name:
source_type = table_name
else:
source_type = path.stem

self._post_data(
data=data,
table_name=table_name,
index_name=index_name,
host=host,
source_type=source_type,
create_index=create_index,
)
f_progress.update(1)
if self._debug is True:
print(f"{str(path)} uploaded to {table_name}")
print(
f"{str(path)} uploaded to \
index={index_name} \
source_type={source_type} \
host={host}"
)
f_progress.close()

# pylint: enable=arguments-differ
Expand Down
48 changes: 48 additions & 0 deletions tests/data/uploaders/test_splunk_uploader.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,18 +56,39 @@ def test_df_upload(sp_upload):
sp_upload.upload_df(data, index_name="test_upload", table_name="test_upload")


def test_df_upload_sourcetype(sp_upload):
"""Test DataFrame upload."""
data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
data = pd.read_csv(data_file, parse_dates=["TimeGenerated"])
sp_upload.upload_df(data, index_name="test_upload", source_type="test_upload")


def test_df_failure(sp_upload):
"""Test DataFrame upload failure."""
with pytest.raises(MsticpyUserError):
sp_upload.upload_df("123", index_name="test_upload", table_name="test_upload")


def test_df_failure_sourcetype(sp_upload):
"""Test DataFrame upload failure."""
with pytest.raises(MsticpyUserError):
sp_upload.upload_df("123", index_name="test_upload", source_type="test_upload")


def test_file_upload(sp_upload):
"""Test file upload."""
data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
sp_upload.upload_file(data_file, index_name="test_upload", table_name="test_upload")


def test_file_upload_sourcetype(sp_upload):
"""Test file upload."""
data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
sp_upload.upload_file(
data_file, index_name="test_upload", source_type="test_upload"
)


def test_file_failure(sp_upload):
"""Test file upload failure."""
data_file = Path(_TEST_DATA).joinpath("win_proc_test.pkl")
Expand All @@ -77,6 +98,15 @@ def test_file_failure(sp_upload):
)


def test_file_failure_sourcetype(sp_upload):
"""Test file upload failure."""
data_file = Path(_TEST_DATA).joinpath("win_proc_test.pkl")
with pytest.raises(MsticpyUserError):
sp_upload.upload_file(
data_file, index_name="test_upload", source_type="test_upload"
)


def test_folder_upload(sp_upload):
"""Test folder upload."""
data_folder = Path(_TEST_DATA).joinpath("uploader")
Expand All @@ -85,6 +115,14 @@ def test_folder_upload(sp_upload):
)


def test_folder_upload_sourcetype(sp_upload):
"""Test folder upload."""
data_folder = Path(_TEST_DATA).joinpath("uploader")
sp_upload.upload_folder(
data_folder, index_name="test_upload", source_type="test_upload"
)


def test_folder_upload_no_name(sp_upload):
"""Test folder upload with no table name specified."""
data_folder = Path(_TEST_DATA).joinpath("uploader")
Expand All @@ -99,3 +137,13 @@ def test_not_connected(sp_upload):
sp_upload.upload_file(
data_file, index_name="test_upload", table_name="test_upload"
)


def test_not_connected_sourcetype(sp_upload):
"""Test no connection is handled correctly."""
sp_upload.connected = False
data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
with pytest.raises(MsticpyConnectionError):
sp_upload.upload_file(
data_file, index_name="test_upload", source_type="test_upload"
)