-
Notifications
You must be signed in to change notification settings - Fork 54
Initial Signals Docs #1197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Initial Signals Docs #1197
Conversation
✅ Deploy Preview for snowplow-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
f40a7bc
to
db48419
Compare
db48419
to
321d5d5
Compare
| `events` | List of Snowplow Events that the Attribute is calculated on | List of `Event` type | | ||
| `aggregation` | The aggregation type of the Attribute | One of: `counter`, `sum`, `min`, `max`, `mean`, `first`, `last`, `unique_list` | | ||
| `property_syntax` | The syntax used to reference the property. | One of: `snowflake`, `blobl` | | ||
| `property` | The property of the event or entity you wish to use in the aggregation | `string` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll need a bit more documentation on this as there are complexities about the syntax and naming that should be used here.
We decided that we'll use Snowflake syntax for accessing nested properties within events and entities. Also the columns names are the same as in the atomic events table. There was a bit more detail in this doc.
One can also access other properties in the atomic events table like app_id
and more. It'd be great to provide some examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it deserves a separate paragraph/section as users will definitely need to figure out how to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilias1111 left some comments re the batch tutorial
@@ -0,0 +1,5 @@ | |||
{ | |||
"title": "Snowplow Signals CLI Tutorial", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilias1111 can we rename this to Create Batch Attributes using Snowplow Signals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create Batch Attributes using Snowplow Signals
But isnt what we are doin, we are creating dbt projects ( batch engine projects ) based on already generated attributes :/ What should we say?
Before starting, ensure you have: | ||
|
||
- Python 3.11+ installed | ||
- Snowplow Signals SDK installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show them how to install this - have a pip install command or something.
Also it's not totally obvious to me that installing a Python package means it can be used as a CLI tool, maybe just clarify that piece.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is skipped for now, in the sense that the cli is not live in pypi thus i cannot test the command and there is no point to saying it. Will add it when we are live with the sdk
- API URL | ||
- API Key | ||
- API Key ID | ||
- Organization ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell them how to find all these bits of information - you can copy from the Quickstart guide.
--api-key "YOUR_API_KEY" \ | ||
--api-key-id "YOUR_API_KEY_ID" \ | ||
--org-id "YOUR_ORG_ID" \ | ||
--repo-path "./my_snowplow_repo" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should show a step to create the repo earlier on (mkdir...)
|
||
## Choosing Between Options | ||
|
||
- Use **Option 1** when you want to set up projects for all your batch views at once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had already forgotten what Option 1 and 2 were here, could you just use their descriptive names?
* Add missing batch content * Updates * Updates on the installation * John changes --------- Co-authored-by: Ilias Xenogiannis <ilias1111@users.noreply.github.com>
* Rename signals to sp_signals * Rename cli tutorial * Remove cli details from batch engine * Add view prerequisite to cli tutorial * Add online offline to view def * Clarify onlne offline matrix * Update docs/signals/views_services/index.md
Can someone check this PR against Vale please, haven't finished reviewing yet but seen a few formatting problems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the new Signals terms to the Glossary please
I haven't reviewed the Tutorial part since the main docs are more urgent
|
||
page_view_count = Attribute( | ||
name="page_view_count", | ||
type="int32", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does type come up a different colour in the codeblock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type
gets special treatment in the highlighting because it's a language "builtin".
$ pip install snowplow-signals | ||
``` | ||
|
||
To connect to your Signals deployment, you will need 4 values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this 3 things
## Step 1: Installation and Setup | ||
The Snowplow Signals SDK allows you to define attributes, create views, and retrieve user features. It requires Python 3.12 or above. | ||
|
||
Install the SDK using pip: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where?
There is a second layer of incremental processing logic dictated by the `daily_aggregation_manifest` table. After the `filtered_events` table is created or updated, the `daily_aggregates` table gets updated with the help of this manifest. It is needed due to late arriving data, which may mean that some days will need to be reprocessed as a whole. For optimization purposes there are variables to fine-tune how this works such as the `snowplow__reprocess_days` and the `snowplow__min_rows_to_process`. | ||
|
||
Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level. | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this diagram would work better vertically, then the text wouldn't be tiny
Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level. | ||
 | ||
|
||
## Variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the defaults?
docs/signals/index.md
Outdated
sidebar_label: "Signals" | ||
--- | ||
|
||
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what SDKs? how many are there? link to github repos
 | ||
|
||
Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a short paragraph or section here about how to use Signals
docs/signals/index.md
Outdated
sidebar_label: "Signals" | ||
--- | ||
|
||
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be named Profiles Store to match the official announcement
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs. | |
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profiles Store API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs. |
|
||
Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences. | ||
|
||
### Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section refers to views, should go below View section
|
||
### Services | ||
|
||
A `Service` is a collection of `Views` that are grouped to make the retrieval of attributes simpler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expand on this - there's more info to copy in the other page. why would they want multiple views?
@@ -0,0 +1,16 @@ | |||
--- | |||
title: "Sources" | |||
sidebar_position: 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section should be below Attributes
A `View` is a versioned collection of attributes grouped by a common `Entity` (e.g., `session_id` or `user_id`). Once defined, a `View` allows you to retrieve the calculated values of the attributes it contains. | ||
|
||
### What is a Service? | ||
A `Service` is a collection of views that streamlines the retrieval of multiple views. By grouping related views into a `Service`, you can efficiently manage and access user insights, making it easier to personalize applications and analyze behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can one service have views with different sources?
The entity here is typically the user, which may be the `domain_userid` or other Snowplow identifier fields, such as the logged in `user_id`. | ||
|
||
:::info | ||
For now, only the `domain_userid` can be used, but shortly we will extend support for all Snowplow identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this also apply to the stream source?
Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level. | ||
 | ||
|
||
## Variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this section should go higher on the page, under Generating the dbt project
|
||
Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences. | ||
|
||
### Sources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually put these bits under a subheading
Signals components
The core components are Attributes, Sources, Views, and Services.
You will need to create a new Python project that imports the Signals SDK. Configure Signals by defining these components, and deploying them to the Profiles Store. You can then pull aggregrated attributes, using the Signals SDK, to use in your applications.
[diagram like this - is this right?]
|
||
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs. | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this diagram into the new "Signals components" subheading below, replace it with the fancy official one that matches the docs homepage architecture diagram
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
No description provided.