Skip to content

Commit ac0e63f

Browse files
update names
1 parent 9e01df9 commit ac0e63f

File tree

4 files changed

+25
-25
lines changed

4 files changed

+25
-25
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
4646
| [KaggleDataSource](pyspark_datasources/kaggle.py) | `kaggle` | Read datasets from Kaggle | `kagglehub`, `pandas` |
4747
| [SimpleJsonDataSource](pyspark_datasources/simplejson.py) | `simplejson` | Write JSON data to Databricks DBFS | `databricks-sdk` |
4848
| [OpenSkyDataSource](pyspark_datasources/opensky.py) | `opensky` | Read from OpenSky Network. | None |
49-
| [SalesforceDataSource](pyspark_datasources/salesforce.py) | `salesforce` | Streaming sink for writing data to Salesforce | `simple-salesforce` |
49+
| [SalesforceDataSource](pyspark_datasources/salesforce.py) | `pyspark.datasource.salesforce` | Streaming datasource for writing data to Salesforce | `simple-salesforce` |
5050

5151
See more here: https://allisonwang-db.github.io/pyspark-data-sources/.
5252

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@ spark.readStream.format("fake").load().writeStream.format("console").start()
3838
| [HuggingFaceDatasets](./datasources/huggingface.md) | `huggingface` | Read datasets from the HuggingFace Hub | `datasets` |
3939
| [StockDataSource](./datasources/stock.md) | `stock` | Read stock data from Alpha Vantage | None |
4040
| [SimpleJsonDataSource](./datasources/simplejson.md) | `simplejson` | Write JSON data to Databricks DBFS | `databricks-sdk` |
41-
| [SalesforceDataSource](./datasources/salesforce.md) | `salesforce` | Write streaming data to Salesforce objects |`simple-salesforce` |
41+
| [SalesforceDataSource](./datasources/salesforce.md) | `pyspark.datasource.salesforce` | Write streaming data to Salesforce objects |`simple-salesforce` |
4242
| [GoogleSheetsDataSource](./datasources/googlesheets.md) | `googlesheets` | Read table from public Google Sheets document | None |
4343
| [KaggleDataSource](./datasources/kaggle.md) | `kaggle` | Read datasets from Kaggle | `kagglehub`, `pandas` |

examples/salesforce_example.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#!/usr/bin/env python3
22
# -*- coding: utf-8 -*-
33
"""
4-
Salesforce Sink Example
4+
Salesforce Datasource Example
55
6-
This example demonstrates how to use the SalesforceDataSource as a streaming sink
6+
This example demonstrates how to use the SalesforceDataSource as a streaming datasource
77
to write data from various sources to Salesforce objects.
88
99
Requirements:
@@ -61,10 +61,10 @@ def example_1_rate_source_to_accounts():
6161
.getOrCreate()
6262

6363
try:
64-
# Register Salesforce sink
64+
# Register Salesforce Datasource
6565
from pyspark_datasources.salesforce import SalesforceDataSource
6666
spark.dataSource.register(SalesforceDataSource)
67-
print("✅ Salesforce sink registered")
67+
print("✅ Salesforce datasource registered")
6868

6969
# Create streaming data from rate source
7070
streaming_df = spark.readStream \
@@ -131,7 +131,7 @@ def example_2_csv_to_contacts():
131131
.getOrCreate()
132132

133133
try:
134-
# Register Salesforce sink
134+
# Register Salesforce datasource
135135
from pyspark_datasources.salesforce import SalesforceDataSource
136136
spark.dataSource.register(SalesforceDataSource)
137137

@@ -426,8 +426,8 @@ def example_4_custom_object():
426426

427427
def main():
428428
"""Run all examples"""
429-
print("🚀 Salesforce Sink Examples")
430-
print("This demonstrates various ways to use the Salesforce streaming sink")
429+
print("🚀 Salesforce Datasource Examples")
430+
print("This demonstrates various ways to use the Salesforce streaming datasource")
431431

432432
try:
433433
# Run examples
@@ -440,7 +440,7 @@ def main():
440440
print("✅ All examples completed!")
441441
print("="*60)
442442
print("\n💡 Key takeaways:")
443-
print(" - Salesforce sink supports various input sources (rate, CSV, etc.)")
443+
print(" - Salesforce datasource supports various input sources (rate, CSV, etc.)")
444444
print(" - Checkpoint functionality enables exactly-once processing")
445445
print(" - Custom schemas allow flexibility for different Salesforce objects")
446446
print(" - Batch processing optimizes Salesforce API usage")

pyspark_datasources/salesforce.py

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,20 @@ class SalesforceCommitMessage(WriterCommitMessage):
1717

1818
class SalesforceDataSource(DataSource):
1919
"""
20-
A Salesforce streaming sink for PySpark to write data to Salesforce objects.
20+
A Salesforce streaming datasource for PySpark to write data to Salesforce objects.
2121
22-
This data sink enables writing streaming data from Spark to Salesforce using the
22+
This datasource enables writing streaming data from Spark to Salesforce using the
2323
Salesforce REST API. It supports common Salesforce objects like Account, Contact,
2424
Opportunity, and custom objects.
2525
26-
Note: This is a write-only sink, not a full bidirectional data source.
26+
Note: This is a write-only datasource, not a full bidirectional data source.
2727
2828
Name: `salesforce`
2929
3030
Notes
3131
-----
3232
- Requires the `simple-salesforce` library for Salesforce API integration
33-
- **Write-only sink**: Only supports streaming write operations (no read operations)
33+
- **Write-only datasource**: Only supports streaming write operations (no read operations)
3434
- Uses Salesforce username/password/security token authentication
3535
- Supports batch writing with Salesforce Composite Tree API for efficient processing
3636
- Implements exactly-once semantics through Spark's checkpoint mechanism
@@ -61,7 +61,7 @@ class SalesforceDataSource(DataSource):
6161
6262
Examples
6363
--------
64-
Register the Salesforce sink:
64+
Register the Salesforce Datasource:
6565
6666
>>> from pyspark_datasources import SalesforceDataSource
6767
>>> spark.dataSource.register(SalesforceDataSource)
@@ -82,9 +82,9 @@ class SalesforceDataSource(DataSource):
8282
... (col("value") * 100000).cast("double").alias("AnnualRevenue")
8383
... )
8484
>>>
85-
>>> # Write to Salesforce using the sink
85+
>>> # Write to Salesforce using the datasource
8686
>>> query = account_data.writeStream \\
87-
... .format("salesforce") \\
87+
... .format("pyspark.datasource.salesforce") \\
8888
... .option("username", "your-username@company.com") \\
8989
... .option("password", "your-password") \\
9090
... .option("security_token", "your-security-token") \\
@@ -102,7 +102,7 @@ class SalesforceDataSource(DataSource):
102102
... )
103103
>>>
104104
>>> query = contact_data.writeStream \\
105-
... .format("salesforce") \\
105+
... .format("pyspark.datasource.salesforce") \\
106106
... .option("username", "your-username@company.com") \\
107107
... .option("password", "your-password") \\
108108
... .option("security_token", "your-security-token") \\
@@ -118,7 +118,7 @@ class SalesforceDataSource(DataSource):
118118
... )
119119
>>>
120120
>>> query = custom_data.writeStream \\
121-
... .format("salesforce") \\
121+
... .format("pyspark.datasource.salesforce") \\
122122
... .option("username", "your-username@company.com") \\
123123
... .option("password", "your-password") \\
124124
... .option("security_token", "your-security-token") \\
@@ -132,7 +132,7 @@ class SalesforceDataSource(DataSource):
132132
>>> contact_schema = "FirstName STRING NOT NULL, LastName STRING NOT NULL, Email STRING, Phone STRING"
133133
>>>
134134
>>> query = contact_data.writeStream \\
135-
... .format("salesforce") \\
135+
... .format("pyspark.datasource.salesforce") \\
136136
... .option("username", "your-username@company.com") \\
137137
... .option("password", "your-password") \\
138138
... .option("security_token", "your-security-token") \\
@@ -152,7 +152,7 @@ class SalesforceDataSource(DataSource):
152152
... )
153153
>>>
154154
>>> query = opportunity_data.writeStream \\
155-
... .format("salesforce") \\
155+
... .format("pyspark.datasource.salesforce") \\
156156
... .option("username", "your-username@company.com") \\
157157
... .option("password", "your-password") \\
158158
... .option("security_token", "your-security-token") \\
@@ -163,7 +163,7 @@ class SalesforceDataSource(DataSource):
163163
164164
Key Features:
165165
166-
- **Write-only sink**: Designed specifically for writing data to Salesforce
166+
- **Write-only datasource**: Designed specifically for writing data to Salesforce
167167
- **Batch processing**: Uses Salesforce Composite Tree API for efficient bulk writes
168168
- **Exactly-once semantics**: Integrates with Spark's checkpoint mechanism
169169
- **Error handling**: Graceful fallback to individual record creation if batch fails
@@ -172,7 +172,7 @@ class SalesforceDataSource(DataSource):
172172

173173
@classmethod
174174
def name(cls) -> str:
175-
"""Return the short name for this Salesforce sink."""
175+
"""Return the short name for this Salesforce datasource."""
176176
return "pyspark.datasource.salesforce"
177177

178178
def schema(self) -> str:
@@ -200,12 +200,12 @@ def schema(self) -> str:
200200
"""
201201

202202
def streamWriter(self, schema: StructType, overwrite: bool) -> "SalesforceStreamWriter":
203-
"""Create a stream writer for Salesforce sink integration."""
203+
"""Create a stream writer for Salesforce datasource integration."""
204204
return SalesforceStreamWriter(schema, self.options)
205205

206206

207207
class SalesforceStreamWriter(DataSourceStreamWriter):
208-
"""Stream writer implementation for Salesforce sink integration."""
208+
"""Stream writer implementation for Salesforce datasource integration."""
209209

210210
def __init__(self, schema: StructType, options: Dict[str, str]):
211211
self.schema = schema

0 commit comments

Comments
 (0)