microsoft · ianhelle · Jul 3, 2024 · Mar 17, 2023 · Mar 27, 2023 · Apr 18, 2023
diff --git a/docs/source/data_acquisition/UploadData.rst b/docs/source/data_acquisition/UploadData.rst
@@ -81,7 +81,8 @@ Instantiating the Splunk uploader
 The first step in uploading data is to instantiate an uploader for the location we wish to upload data to.
 For Splunk there are three parameters that need to be passed at this stage, the Splunk host name, a username,
 and a password. You can also pass a parameter for ``port``, by default this value is 8089.
-In addition, The security auth token of ``bearer_token`` can be also passed instead of username and password as same as Splunk QueryProvider.
+In addition, The security auth token of ``bearer_token`` can be also passed
+instead of username and password as same as Splunk QueryProvider.
 
 .. code:: ipython3
 
@@ -97,32 +98,43 @@ On the other hand, You can use the stored credentials in msticpyconfig.yaml to S
 	from msticpy.data.uploaders.splunk_uploader import SplunkUploader
 	spup = SplunkUploader()
 
-*Note: Due to the way Splunk API's work the time taken to upload a file to Splunk can be significantly longer than
-with Log Analytics.*
+*Note: Due to the way Splunk API's work the time taken to upload a file to
+Splunk can be significantly longer than with Log Analytics.*
 
 Uploading a DataFrame to Splunk
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-To upload a Pandas DataFrame to Splunk you simply pass the DataFrame to ``.upload_df()`` along with the name of a table,
-and index you wish the data to be uploaded to. If the index provided does not exist and you want it to be created,
+To upload a Pandas DataFrame to Splunk you simply pass the DataFrame to ``.upload_df()`` along with index you wish the data to be uploaded to.
+As the ``source_type`` parameter, csv, json or others can be input and then passed to
+ df.to_csv(), df.to_json(), df.to_string() styles respectively and **json** is by default.
+``table_name`` parameter remains for the backward compatibility.
+If the index provided does not exist and you want it to be created,
 you can pass the parameter ``create_index = True``.
 
-.. Note – table name for Splunk refers to sourcetype.
+.. Note – table name for Splunk refers to source type.
 
 .. code:: ipython3
 
-	spup.upload_df(data=DATAFRAME, table_name=TABLE_NAME, index_name=INDEX_NAME)
+	spup.upload_df(data=DATAFRAME, index_name=INDEX_NAME)
 
 During upload a progress bar will be shown showing the upload process of the upload.
 
 Uploading a File to Splunk
 ^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-To upload a file to Splunk pass the path to the file to ``.upload_file()`` along with the name of the index you
-want the data uploaded to. By default a comma separated value file is expected but if you have some other separator
-value you can pass this with the ``delim`` parameter. You can specify a table name to upload the data to with that ``table_name``
-parameter but by default the uploader will upload to a table with the same name as the file. As with uploading a DataFrame
-if the index provided does not exist and you want it to be created, you can pass the parameter ``create_index = True``.
+To upload a file to Splunk pass the path to the file to ``.upload_file()`` along with the name of
+the index you want the data uploaded to.
+By default, a comma separated value file is expected but if your file has
+some other separator value you can pass this with the ``delim`` parameter.
+You can specify the sourcetype to upload the data to with that ``source_type`` parameter
+but by default the uploader will upload to the sourcetype with the same name as the file.
+As the ``source_type`` parameter, csv, json or others can be input and then passed to
+ df.to_csv(), df.to_json(), df.to_string() styles respectively.
+The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
+ only for the backward compatibility.
+As with uploading a DataFrame
+if the index provided does not exist and you want it to be created, you can pass
+the parameter ``create_index = True``.
 
 .. code:: ipython3
 
@@ -131,16 +143,24 @@ if the index provided does not exist and you want it to be created, you can pass
 Uploading a Folder to Splunk
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-You can also upload a whole folder of files. To do this simply pass the folder path to ``.upload_folder()`` along with the
+You can also upload a whole folder of files. To do this simply pass the folder path to
+ ``.upload_folder()`` along with the
 name of the index you want the data uploaded to. By default this will upload all csv files in that folder to Splunk,
 with each file being uploaded to a sourcetype with a name corresponding to the file name. Alternatively you can also
-specify single a table sourcetype which all files will be uploaded with the ``table_name`` parameter. If you have some
+specify single a sourcetype which all files will be uploaded with the ``source_type`` parameter.
+As the ``source_type`` parameter, csv, json or others can be input and then passed to
+ df.to_csv(), df.to_json(), df.to_string() styles respectively.
+The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
+ only for the backward compatibility.
+If your files have some
 other separated value file type you can pass ``delim``, and the specified delimiter value, however currently there is
 only support for a single delim type across files. By default this method attempts to upload all files in the specified
-folders, if you want to only process certain file extensions you can pass the ``glob`` keyword parameter with the a pattern
-for files to attempt to upload. The pattern format required follows the ``pathlib.glob()`` pattern - more details are
+folders, if you want to only process certain file extensions you can pass the ``glob`` keyword parameter
+with the a pattern for files to attempt to upload.
+The pattern format required follows the ``pathlib.glob()`` pattern - more details are
 avaliable `here <"https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_
-As with the other methods if the index provided does not exist and you want it to be created, you can pass the parameter ``create_index = True``.
+As with the other methods if the index provided does not exist and you want it to be created,
+ you can pass the parameter ``create_index = True``.
 
 .. code:: ipython3
 

diff --git a/msticpy/data/uploaders/splunk_uploader.py b/msticpy/data/uploaders/splunk_uploader.py
@@ -72,7 +72,7 @@ def _post_data(
         self,
         data: pd.DataFrame,
         index_name: str,
-        table_name: Any,
+        source_type: Any,
         host: str = None,
         **kwargs,
     ):
@@ -85,10 +85,12 @@ def _post_data(
             Data to upload.
         index_name : str
             Name of the Splunk Index to add data to.
-        table_name : str
+        source_type : str
             The sourcetype in Splunk data will be uploaded to.
+            csv, json or other can be input and then passed to
+            df.to_csv(), df.to_json(), df.to_string() styles respectively.
         host : str, optional
-            The hostname associated with the uploaded data, by default "Upload".
+            The hostname associated with the uploaded data, by default "msticpy_splunk_uploader".
 
         """
         if not self.connected:
@@ -97,28 +99,36 @@ def _post_data(
                 title="Splunk host not connected",
             )
         if not host:
-            host = "Upload"
+            host = "msticpy_splunk_uploader"
         index = self._load_index(index_name, kwargs.get("create_index", False))
         progress = tqdm(total=len(data.index), desc="Rows", position=0)
+        source_types = []
         for row in data.iterrows():
-            data = row[1].to_csv()  # type: ignore
+            if source_type == "json":
+                data = row[1].to_json()  # type: ignore
+            elif source_type == "csv":
+                data = row[1].to_csv()  # type: ignore
+            else:
+                data = row[1].to_string()  # type: ignore
             try:
                 data.encode(encoding="latin-1")  # type: ignore
             except UnicodeEncodeError:
                 data = data.encode(encoding="utf-8")  # type: ignore
-            index.submit(data, sourcetype=table_name, host=host)
+            index.submit(data, sourcetype=source_type, host=host)
+            source_types.append(source_type)
             progress.update(1)
         progress.close()
         if self._debug is True:
-            print("Upload complete")
+            print(f"Upload complete: Splunk sourcetype = {source_types}")
 
     # pylint: disable=arguments-differ
     def upload_df(  # type: ignore
         self,
         data: pd.DataFrame,
-        table_name: Optional[str],
-        index_name: str,
+        table_name: Optional[str] = None,
+        index_name: Optional[str] = None,
         create_index: bool = False,
+        source_type: Optional[str] = None,
         **kwargs,
     ):
         """
@@ -128,8 +138,13 @@ def upload_df(  # type: ignore
         ----------
         data : pd.DataFrame
             Data to upload.
-        table_name : str
+        source_type : str, optional
             The sourcetype in Splunk data will be uploaded to.
+            csv, json or other can be input and then passed to
+            df.to_csv(), df.to_json(), df.to_string() styles respectively.
+            "json" is by default.
+        table_name: str, optional
+            The backward compatibility of source_type.
         index_name : str
             Name of the Splunk Index to add data to.
         host : str, optional
@@ -138,6 +153,12 @@ def upload_df(  # type: ignore
             Set this to true to create the index if it doesn't already exist. Default is False.
 
         """
+        if not source_type:
+            if not table_name:
+                source_type = "json"
+            else:
+                source_type = table_name
+
         host = kwargs.get("host", None)
         if not index_name:
             raise ValueError("parameter `index_name` must be specified")
@@ -148,10 +169,10 @@ def upload_df(  # type: ignore
             )
         self._post_data(
             data=data,
-            table_name=table_name,
             index_name=index_name,
-            create_index=create_index,
             host=host,
+            source_type=source_type,
+            create_index=create_index,
         )
 
     def upload_file(  # type: ignore
@@ -161,6 +182,7 @@ def upload_file(  # type: ignore
         delim: str = ",",
         index_name: Optional[str] = None,
         create_index: bool = False,
+        source_type: Optional[str] = None,
         **kwargs,
     ):
         """
@@ -172,9 +194,14 @@ def upload_file(  # type: ignore
             Path to the file to upload.
         index_name : str
             Name of the Splunk Index to add data to.
-        table_name : str, optional
+        source_name : str, optional
             The sourcetype in Splunk data will be uploaded to.
+            csv, json or other can be input and then passed to
+            df.to_csv(), df.to_json(), df.to_string() styles respectively.
             If not set the file name will be used.
+            "json" is by default.
+        table_name: str, optional
+            The backward compatibility of source_type.
         delim : str, optional
             Seperator value in file, by default ","
         host : str, optional
@@ -195,13 +222,17 @@ def upload_file(  # type: ignore
                 "Incorrect file type.",
             ) from parse_err
 
-        if not table_name:
-            table_name = path.stem
+        if not source_type:
+            if table_name:
+                source_type = table_name
+            else:
+                source_type = path.stem
+
         self._post_data(
             data=data,
-            table_name=table_name,
             index_name=index_name,
             host=host,
+            source_type=source_type,
             create_index=create_index,
         )
 
@@ -212,6 +243,7 @@ def upload_folder(  # type: ignore
         delim: str = ",",
         index_name: Optional[str] = None,
         create_index=False,
+        source_type: Optional[str] = None,
         **kwargs,
     ):
         """
@@ -223,9 +255,13 @@ def upload_folder(  # type: ignore
             Path to folder to upload.
         index_name : str
             Name of the Splunk Index to add data to, if it doesn't exist it will be created.
-        table_name : str, optional
+        source_type : str, optional
             The sourcetype in Splunk data will be uploaded to.
-            If not set the file name will be used.
+            csv, json or other can be input and then passed to
+            df.to_csv(), df.to_json(), df.to_string() styles respectively.
+            If not set the file name will be used. "json" is by default.
+        table_name: str, optional
+            The backward compatibility of source_type.
         delim : str, optional
             Seperator value in files, by default ","
         host : str, optional
@@ -248,18 +284,28 @@ def upload_folder(  # type: ignore
                     "The file specified is not a seperated value file.",
                     title="Incorrect file type.",
                 ) from parse_err
-            if not table_name:
-                table_name = path.stem
+
+            if not source_type:
+                if table_name:
+                    source_type = table_name
+                else:
+                    source_type = path.stem
+
             self._post_data(
                 data=data,
-                table_name=table_name,
                 index_name=index_name,
                 host=host,
+                source_type=source_type,
                 create_index=create_index,
             )
             f_progress.update(1)
             if self._debug is True:
-                print(f"{str(path)} uploaded to {table_name}")
+                print(
+                    f"{str(path)} uploaded to \
+                    index={index_name} \
+                    source_type={source_type} \
+                    host={host}"
+                )
         f_progress.close()
 
     # pylint: enable=arguments-differ

diff --git a/tests/data/uploaders/test_splunk_uploader.py b/tests/data/uploaders/test_splunk_uploader.py
@@ -56,18 +56,39 @@ def test_df_upload(sp_upload):
     sp_upload.upload_df(data, index_name="test_upload", table_name="test_upload")
 
 
+def test_df_upload_sourcetype(sp_upload):
+    """Test DataFrame upload."""
+    data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
+    data = pd.read_csv(data_file, parse_dates=["TimeGenerated"])
+    sp_upload.upload_df(data, index_name="test_upload", source_type="test_upload")
+
+
 def test_df_failure(sp_upload):
     """Test DataFrame upload failure."""
     with pytest.raises(MsticpyUserError):
         sp_upload.upload_df("123", index_name="test_upload", table_name="test_upload")
 
 
+def test_df_failure_sourcetype(sp_upload):
+    """Test DataFrame upload failure."""
+    with pytest.raises(MsticpyUserError):
+        sp_upload.upload_df("123", index_name="test_upload", source_type="test_upload")
+
+
 def test_file_upload(sp_upload):
     """Test file upload."""
     data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
     sp_upload.upload_file(data_file, index_name="test_upload", table_name="test_upload")
 
 
+def test_file_upload_sourcetype(sp_upload):
+    """Test file upload."""
+    data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
+    sp_upload.upload_file(
+        data_file, index_name="test_upload", source_type="test_upload"
+    )
+
+
 def test_file_failure(sp_upload):
     """Test file upload failure."""
     data_file = Path(_TEST_DATA).joinpath("win_proc_test.pkl")
@@ -77,6 +98,15 @@ def test_file_failure(sp_upload):
         )
 
 
+def test_file_failure_sourcetype(sp_upload):
+    """Test file upload failure."""
+    data_file = Path(_TEST_DATA).joinpath("win_proc_test.pkl")
+    with pytest.raises(MsticpyUserError):
+        sp_upload.upload_file(
+            data_file, index_name="test_upload", source_type="test_upload"
+        )
+
+
 def test_folder_upload(sp_upload):
     """Test folder upload."""
     data_folder = Path(_TEST_DATA).joinpath("uploader")
@@ -85,6 +115,14 @@ def test_folder_upload(sp_upload):
     )
 
 
+def test_folder_upload_sourcetype(sp_upload):
+    """Test folder upload."""
+    data_folder = Path(_TEST_DATA).joinpath("uploader")
+    sp_upload.upload_folder(
+        data_folder, index_name="test_upload", source_type="test_upload"
+    )
+
+
 def test_folder_upload_no_name(sp_upload):
     """Test folder upload with no table name specified."""
     data_folder = Path(_TEST_DATA).joinpath("uploader")
@@ -99,3 +137,13 @@ def test_not_connected(sp_upload):
         sp_upload.upload_file(
             data_file, index_name="test_upload", table_name="test_upload"
         )
+
+
+def test_not_connected_sourcetype(sp_upload):
+    """Test no connection is handled correctly."""
+    sp_upload.connected = False
+    data_file = Path(_TEST_DATA).joinpath("syslog_data.csv")
+    with pytest.raises(MsticpyConnectionError):
+        sp_upload.upload_file(
+            data_file, index_name="test_upload", source_type="test_upload"
+        )