Streaming Data Transform is an essential tool for professionals working with the Salesforce Data Cloud. It allows Data Cloud developers, architects, and engineers to clean, enhance, and analyze streaming data as it flows into the data infrastructure. Whether you're building data pipelines, optimizing real-time processing, or curating custom datasets, Streaming Data Transform delivers unmatched accuracy and flexibility.
This blog will provide a deep-dive into Streaming Data Transform, explain its core capabilities, and break down SQL operations that help unlock the power of real-time transformation. We'll also explore use cases with examples to help you build smarter and faster data pipelines.
At its core, Streaming Data Transform processes incoming event-driven data in real-time. It's an on-the-fly data transformation tool for structured and semi-structured data, leveraging SQL to curate data for immediate use in downstream applications.
The platform simplifies creating transformations using a GUI (Graphical User Interface) and offers rich SQL syntax capabilities. It ensures that raw data transforms into actionable insights as it enters the Data Cloud. Think of it as both your clean-up crew and a data magician, merging datasets, creating dynamic lists, and optimizing performance without the need for batch processing or manual intervention.
A crucial limitation to keep in mind when working with Streaming Data Transform is that you cannot use more than two Data Lake Objects (DLOs) at a time in the Builder. If you need to merge two or more DLOs, you'll need to write SQL manually, ensuring that there’s a common primary target DLO between them.
Example:
If you want to merge customer data and purchase history stored across three different sources, write SQL to combine them under one logical DLO. This ensures streamlined integration and reduces redundancy in your data pipeline.
SQL tip for combining DLOs effectively:
Streaming Data Transform allows for several operations that simplify and optimize your data workflow. Here's what you can achieve using SQL-powered transformations:
When working with streaming data, the query structure typically includes SELECT statements, Data Lineage Operators (DLOs), and WHERE conditions to filter the incoming data. Here's a basic example of a streaming data query:
SELECT event_time, user_id, action_type
FROM event_stream
WHERE action_type = 'purchase' AND event_time >= CURRENT_TIMESTAMP - INTERVAL '1 HOUR';
Explanation of Key Components:
event_time
, user_id
, and action_type
are specified in the SELECT clause to extract specific columns from the incoming data stream. These fields represent the most relevant pieces of information for the use case, ensuring the query returns meaningful results.event_stream
, which represents the dataset being processed. This operator ensures that data is continuously fetched from the live stream, enabling real-time transformations and analysis.action_type
is equal to 'purchase' and limits the time range to events occurring in the last hour. This filtering is essential for narrowing the scope of analysis, reducing noise, and focusing on actionable insights.The SQL foundation for Streaming Data Transform is built around fundamental functions and operations. Below, we explore examples of powerful SQL features and their output.
Imagine you have a dataset where JSON data is stored in a single column. Here's a simplified version of what that might look like:
Dataset: customers_with_orders__dll
order_id__c | json_data__c |
---|---|
1 | {"customer": "John Doe", "items": [{"name": "Widget", "price": 25.00}]} |
2 | {"customer": "Jane Smith", "items": [{"name": "Gadget", "price": 15.00}]} |
After Using JSON_TABLE
By using JSON_TABLE, you can extract and structure this JSON data into separate columns, making it easier to query and analyze.
Transformed Dataset
order_id__c | customer__c | item_name__c | item_price__c |
---|---|---|---|
1 | John Doe | Widget | 25.00 |
2 | Jane Smith | Gadget | 15.00 |
SQL Query Using JSON_TABLE
SELECT
customers_with_orders__dll.order_id__c as order_id__c,
customers.customer as customer__c,
items.item_name as item_name__c,
items.item_price as item_price__c
FROM customers_with_orders__dll,
JSON_TABLE(
customers_with_orders__dll.json_data__c, '$' COLUMNS (
customer TEXT PATH '$.customer'
)
) customers,
JSON_TABLE(
customers_with_orders__dll.json_data__c, '$.items[*]' COLUMNS (
item_name TEXT PATH '$.name',
item_price NUMBER PATH '$.price'
)
) items
This query extracts the customer, item_name, and item_price from the JSON data, creating a more structured and query-friendly dataset.
Let's look at an example of how you can use the CONCAT() function to concatenate fields in a dataset.
Example Dataset
Imagine you have a dataset of customer information with separate fields for first and last names:
customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Smith |
Using CONCAT() to Combine Fields
You can use the CONCAT() function to combine the first_name and last_name fields into a single full_name field.
SQL Query
SELECT
customers__dll.customer_id__c as customer_id__c,
CONCAT(first_name, ' ', last_name) AS full_name__c
FROM customers__dll
Resulting Dataset
customer_id__c | full_name__c |
---|---|
1 | John Doe |
2 | Jane Smith |
In this example, the CONCAT() function is used to merge the first_name and last_name fields, with a space in between, to create a new full_name field. This makes it easier to display or use the full name in reports or applications.
The COALESCE() function is used to return the first non-null value from a list of arguments. It's particularly useful for handling missing data.
Example Dataset
Let's say you have a dataset of customer contact information where some customers have a phone number, others have an email, and some have both:
customer_id | phone_number | |
---|---|---|
1 | 123-456-7890 | NULL |
2 | NULL | jane@example.com |
3 | 987-654-3210 | john@example.com |
Using COALESCE() to Select the First Non-Null Contact Method
You can use the COALESCE() function to create a new field that selects the first available contact method, either phone or email.
SQL Query
SELECT
customers__dll.customer_id as customer_id__c,
COALESCE(phone_number, email) AS primary_contact__c
FROM customers__dll
Resulting Dataset
customer_id__c | primary_contact__c |
---|---|
1 | 123-456-7890 |
2 | jane@example.com |
3 | 987-654-3210 |
In this example, the COALESCE() function checks each row and returns the phone_number if it's available; otherwise, it returns the email. This way, you ensure that the primary_contact field always has a value, even if one of the contact methods is missing.
Let's explore how to use the LOWER() and UPPER() functions to change the case of text in a dataset.
Example Dataset
Suppose you have a dataset with product names that are inconsistently cased:
product_id | product_name |
---|---|
1 | Widget |
2 | gadget |
3 | THINGAMABOB |
Using LOWER() and UPPER() to Standardize Case
You can use the LOWER() function to convert all product names to lowercase, or the UPPER() function to convert them to uppercase, depending on your needs.
SQL Query to Convert to Lowercase
SELECT
products__dll.product_id as product_id__c,
LOWER(product_name) AS product_name_lower__c
FROM products__dll
Resulting Dataset (Lowercase)
product_id__c | product_name_lower__c |
---|---|
1 | widget |
2 | gadget |
3 | thingamabob |
SQL Query to Convert to Uppercase
SELECT
products__dll.product_id as product_id__c,
UPPER(product_name) AS product_name_upper__c
FROM products__dll
Resulting Dataset (Uppercase)
product_id__c | product_name_upper__c |
---|---|
1 | WIDGET |
2 | GADGET |
3 | THINGAMABOB |
In these examples, the LOWER() function is used to ensure all product names are in lowercase, while the UPPER() function converts them to uppercase. This can be particularly useful for standardizing data for display or comparison purposes.
The TO_DATETIME() function is used to convert a date string into a datetime format, which is essential for performing date-based operations and comparisons.
Example Dataset
Consider a dataset where dates are stored as strings:
event_id | event_date_string |
---|---|
1 | 2025-01-01 |
2 | 2025-02-15 |
3 | 2025-03-30 |
Using TO_DATETIME() to Convert Strings to Datetime
You can use the TO_DATETIME() function to convert these string representations into a datetime format.
SQL Query
SELECT
events__dll.event_id as event_id__c,
TO_DATETIME(event_date_string) AS event_date__c
FROM events__dll
Resulting Dataset
event_id__c | event_date__c |
---|---|
1 | 2025-01-01 00:00:00 |
2 | 2025-02-15 00:00:00 |
3 | 2025-03-30 00:00:00 |
In this example, the TO_DATETIME() function is used to convert the event_date_string field into a datetime format, allowing for more complex date manipulations and queries.
When working with null values in data, functions like ISNULL(), IFNULL(), NULLIF(), and IS NOT NULL play a crucial role in handling missing or undefined data effectively. Let's explore how these functions work with examples for each.
ISNULL()
The ISNULL() function checks whether an expression is NULL and returns a boolean value.
Example Dataset
customer_id | phone_number |
---|---|
1 | 123-456-7890 |
2 | NULL |
3 | 987-654-3210 |
SQL Query Using ISNULL()
SELECT
customers__dll.customer_id as customer_id__c,
ISNULL(phone_number) AS is_phone_null__c
FROM customers__dll
Resulting Dataset
customer_id__c | is_phone_null__c |
---|---|
1 | 0 |
2 | 1 |
3 | 0 |
IFNULL()
The IFNULL() function returns a specified value if the expression is NULL; otherwise, it returns the expression.
SQL Query Using IFNULL()
SELECT
customers__dll.customer_id as customer_id__c,
IFNULL(phone_number, 'No Phone') AS phone_number__c
FROM customers__dll
Resulting Dataset
customer_id__c | phone_number__c |
---|---|
1 | 123-456-7890 |
2 | No Phone |
3 | 987-654-3210 |
NULLIF()
The NULLIF() function returns NULL if two expressions are equal; otherwise, it returns the first expression.
Example Dataset
product_id | current_price | previous_price |
---|---|---|
1 | 100 | 100 |
2 | 150 | 120 |
3 | 200 | 200 |
SQL Query Using NULLIF()
SELECT
products__dll.product_id as product_id__c,
NULLIF(current_price, previous_price) AS price_change__c
FROM products__dll
Resulting Dataset
product_id__c | price_change__c |
---|---|
1 | NULL |
2 | 150 |
3 | NULL |
IS NOT NULL
The IS NOT NULL condition checks whether an expression is not NULL.
SQL Query Using IS NOT NULL
SELECT
customers__dll.customer_id as customer_id__c,
customers__dll.phone_number as phone_number__c
FROM customers__dll
WHERE phone_number IS NOT NULL
Resulting Dataset
customer_id__c | phone_number__c |
---|---|
1 | 123-456-7890 |
3 | 987-654-3210 |
These functions are useful for handling NULL values in different scenarios, allowing you to manage missing or undefined data effectively.
You can use the CASE WHEN statement to assign letter grades based on the score.
SQL Query
SELECT
student_id,
score,
CASE
WHEN score >= 90 THEN 'A'
WHEN score >= 80 THEN 'B'
WHEN score >= 70 THEN 'C'
WHEN score >= 60 THEN 'D'
ELSE 'F'
END AS grade
FROM students
Resulting Dataset
student_id | score | grade |
---|---|---|
1 | 85 | B |
2 | 72 | C |
3 | 90 | A |
4 | 65 | D |
In this example, the CASE WHEN statement evaluates each student's score and assigns a grade based on the specified conditions. If none of the conditions are met, it defaults to assigning an 'F'. This approach is useful for categorizing or transforming data based on specific criteria.
Let's visualize the data transformation process by showing the structure and content of the tables before and after applying the UNION operation.
Before UNION: Source Table
Table: Contacts_with_multiple_phone_numbers__dll
contactid__c | mobilenumber__c | worknumber__c | homenumber__c |
---|---|---|---|
1 | 123-456-7890 | 234-567-8901 | 345-678-9012 |
2 | 456-789-0123 | 567-890-1234 | 678-901-2345 |
3 | 789-012-3456 | 890-123-4567 | 901-234-5678 |
After UNION: Target Table
Table: PhoneNumbers
ID__c | ContactID__c | Phone_Number__c | Phone_Number_Type__c |
---|---|---|---|
Mobile_1 | 1 | 123-456-7890 | Mobile |
Work_1 | 1 | 234-567-8901 | Work |
Home_1 | 1 | 345-678-9012 | Home |
Mobile_2 | 2 | 456-789-0123 | Mobile |
Work_2 | 2 | 567-890-1234 | Work |
Home_2 | 2 | 678-901-2345 | Home |
Mobile_3 | 3 | 789-012-3456 | Mobile |
Work_3 | 3 | 890-123-4567 | Work |
Home_3 | 3 | 901-234-5678 | Home |
Explanation
Before UNION: The source table Contacts_with_multiple_phone_numbers__dll contains multiple phone numbers in each row, categorized by type (mobile, work, home) for each contact.
After UNION: The target table PhoneNumbers is the result of the UNION operation, where each phone number is extracted into its own row. The ID__c column uniquely identifies each phone number entry by combining the type and contact ID. The Phone_Number_Type__c column indicates the type of phone number. This transformation normalizes the data, making it easier to manage and query.
Let's break down the data transformation process using the UNION operator to normalize phone numbers from the Contacts_with_multiple_phone_numbers__dll table into the PhoneNumbers table. Each SELECT statement extracts a different type of phone number: mobile, work, and home.
SQL Query
SELECT
Concat("Mobile_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
TRIM(Contacts_with_multiple_phone_numbers__dll.mobilenumber__c) as Phone_Number__c,
"Mobile" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.mobilenumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.mobilenumber__c <> ''
UNION
SELECT
Concat("Work_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
Contacts_with_multiple_phone_numbers__dll.worknumber__c as Phone_Number__c,
"Work" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.worknumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.worknumber__c <> ''
UNION
SELECT
Concat("Home_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
Contacts_with_multiple_phone_numbers__dll.homenumber__c as Phone_Number__c,
"Home" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.homenumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.homenumber__c <> ''
TRIM()
, ISNULL()
, and TO_DATE()
to standardize input data as it arrives.UNION
for better scalability.