Skip to content

Initial Signals Docs #1197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Initial Signals Docs #1197

wants to merge 17 commits into from

Conversation

Jack-Keene
Copy link
Contributor

No description provided.

@Jack-Keene Jack-Keene marked this pull request as ready for review April 7, 2025 16:34
Copy link

netlify bot commented Apr 7, 2025

Deploy Preview for snowplow-docs ready!

Name Link
🔨 Latest commit 6149798
🔍 Latest deploy log https://app.netlify.com/projects/snowplow-docs/deploys/68370c645ad6290008100840
😎 Deploy Preview https://deploy-preview-1197--snowplow-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mscwilson mscwilson added the do not merge Flag to denote a Issue or PR which should not yet be merged (usually pending a release) label Apr 7, 2025
| `events` | List of Snowplow Events that the Attribute is calculated on | List of `Event` type |
| `aggregation` | The aggregation type of the Attribute | One of: `counter`, `sum`, `min`, `max`, `mean`, `first`, `last`, `unique_list` |
| `property_syntax` | The syntax used to reference the property. | One of: `snowflake`, `blobl` |
| `property` | The property of the event or entity you wish to use in the aggregation | `string` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll need a bit more documentation on this as there are complexities about the syntax and naming that should be used here.

We decided that we'll use Snowflake syntax for accessing nested properties within events and entities. Also the columns names are the same as in the atomic events table. There was a bit more detail in this doc.

One can also access other properties in the atomic events table like app_id and more. It'd be great to provide some examples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it deserves a separate paragraph/section as users will definitely need to figure out how to do that.

Copy link
Contributor

@johnmicahreid johnmicahreid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilias1111 left some comments re the batch tutorial

@@ -0,0 +1,5 @@
{
"title": "Snowplow Signals CLI Tutorial",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilias1111 can we rename this to Create Batch Attributes using Snowplow Signals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create Batch Attributes using Snowplow Signals But isnt what we are doin, we are creating dbt projects ( batch engine projects ) based on already generated attributes :/ What should we say?

Before starting, ensure you have:

- Python 3.11+ installed
- Snowplow Signals SDK installed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Show them how to install this - have a pip install command or something.

Also it's not totally obvious to me that installing a Python package means it can be used as a CLI tool, maybe just clarify that piece.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is skipped for now, in the sense that the cli is not live in pypi thus i cannot test the command and there is no point to saying it. Will add it when we are live with the sdk

- API URL
- API Key
- API Key ID
- Organization ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell them how to find all these bits of information - you can copy from the Quickstart guide.

--api-key "YOUR_API_KEY" \
--api-key-id "YOUR_API_KEY_ID" \
--org-id "YOUR_ORG_ID" \
--repo-path "./my_snowplow_repo" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should show a step to create the repo earlier on (mkdir...)


## Choosing Between Options

- Use **Option 1** when you want to set up projects for all your batch views at once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had already forgotten what Option 1 and 2 were here, could you just use their descriptive names?

agnessnowplow and others added 2 commits April 25, 2025 16:37
* Add missing batch content

* Updates

* Updates on the installation

* John changes

---------

Co-authored-by: Ilias Xenogiannis <ilias1111@users.noreply.github.com>
* Rename signals to sp_signals

* Rename cli tutorial

* Remove cli details from batch engine

* Add view prerequisite to cli tutorial

* Add online offline to view def

* Clarify onlne offline matrix

* Update docs/signals/views_services/index.md
@johnmicahreid johnmicahreid requested a review from mscwilson May 21, 2025 10:45
@mscwilson
Copy link
Collaborator

Can someone check this PR against Vale please, haven't finished reviewing yet but seen a few formatting problems

Copy link
Collaborator

@mscwilson mscwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the new Signals terms to the Glossary please

I haven't reviewed the Tutorial part since the main docs are more urgent


page_view_count = Attribute(
name="page_view_count",
type="int32",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does type come up a different colour in the codeblock?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type gets special treatment in the highlighting because it's a language "builtin".

$ pip install snowplow-signals
```

To connect to your Signals deployment, you will need 4 values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this 3 things

## Step 1: Installation and Setup
The Snowplow Signals SDK allows you to define attributes, create views, and retrieve user features. It requires Python 3.12 or above.

Install the SDK using pip:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where?

There is a second layer of incremental processing logic dictated by the `daily_aggregation_manifest` table. After the `filtered_events` table is created or updated, the `daily_aggregates` table gets updated with the help of this manifest. It is needed due to late arriving data, which may mean that some days will need to be reprocessed as a whole. For optimization purposes there are variables to fine-tune how this works such as the `snowplow__reprocess_days` and the `snowplow__min_rows_to_process`.

Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level.
![](../images/batch_engine_data_models.png)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this diagram would work better vertically, then the text wouldn't be tiny

Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level.
![](../images/batch_engine_data_models.png)

## Variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the defaults?

sidebar_label: "Signals"
---

Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what SDKs? how many are there? link to github repos

![](./images/signals.png)

Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a short paragraph or section here about how to use Signals

sidebar_label: "Signals"
---

Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be named Profiles Store to match the official announcement

Suggested change
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs.
Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profiles Store API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs.


Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences.

### Sources
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section refers to views, should go below View section


### Services

A `Service` is a collection of `Views` that are grouped to make the retrieval of attributes simpler.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expand on this - there's more info to copy in the other page. why would they want multiple views?

@@ -0,0 +1,16 @@
---
title: "Sources"
sidebar_position: 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section should be below Attributes

A `View` is a versioned collection of attributes grouped by a common `Entity` (e.g., `session_id` or `user_id`). Once defined, a `View` allows you to retrieve the calculated values of the attributes it contains.

### What is a Service?
A `Service` is a collection of views that streamlines the retrieval of multiple views. By grouping related views into a `Service`, you can efficiently manage and access user insights, making it easier to personalize applications and analyze behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can one service have views with different sources?

The entity here is typically the user, which may be the `domain_userid` or other Snowplow identifier fields, such as the logged in `user_id`.

:::info
For now, only the `domain_userid` can be used, but shortly we will extend support for all Snowplow identifiers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this also apply to the stream source?

Finally, the `Attributes` table is generated which is a drop and recompute table, fully updated each time an incremental update runs. This is made possible without much effort as the data is already pre-aggregated on a daily level.
![](../images/batch_engine_data_models.png)

## Variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this section should go higher on the page, under Generating the dbt project


Signals allows users to enhance their applications by aggregating user attributes and providing near real-time visibility into customer behavior. With seamless access to user history, it simplifies creating personalized, intelligent experiences.

### Sources
Copy link
Collaborator

@mscwilson mscwilson May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually put these bits under a subheading


Signals components

The core components are Attributes, Sources, Views, and Services.

You will need to create a new Python project that imports the Signals SDK. Configure Signals by defining these components, and deploying them to the Profiles Store. You can then pull aggregrated attributes, using the Signals SDK, to use in your applications.

[diagram like this - is this right?]

signals_components


Snowplow Signals is a personalization engine built on Snowplow’s behavioral data pipeline. The Profile API, hosted in your BDP cloud allows you to create, manage and access user attributes by using the Signals SDKs.

![](./images/signals.png)
Copy link
Collaborator

@mscwilson mscwilson May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this diagram into the new "Signals components" subheading below, replace it with the fancy official one that matches the docs homepage architecture diagram

@johnmicahreid johnmicahreid requested review from miike and jethron May 22, 2025 09:10
matus-tomlein and others added 6 commits May 23, 2025 14:29
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
Co-authored-by: Miranda Wilson <miranda@snowplow.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Flag to denote a Issue or PR which should not yet be merged (usually pending a release)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants