Amazon S3 with Data Cloud
15 Min Read

Unlocking the Power of Streaming Data Transform in Data Cloud

Streaming Data Transform is an essential tool for professionals working with the Salesforce Data Cloud. It allows Data Cloud developers, architects, and engineers to clean, enhance, and analyze streaming data as it flows into the data infrastructure. Whether you're building data pipelines, optimizing real-time processing, or curating custom datasets, Streaming Data Transform delivers unmatched accuracy and flexibility.

This blog will provide a deep-dive into Streaming Data Transform, explain its core capabilities, and break down SQL operations that help unlock the power of real-time transformation. We'll also explore use cases with examples to help you build smarter and faster data pipelines.

What Is Streaming Data Transform?

At its core, Streaming Data Transform processes incoming event-driven data in real-time. It's an on-the-fly data transformation tool for structured and semi-structured data, leveraging SQL to curate data for immediate use in downstream applications.

The platform simplifies creating transformations using a GUI (Graphical User Interface) and offers rich SQL syntax capabilities. It ensures that raw data transforms into actionable insights as it enters the Data Cloud. Think of it as both your clean-up crew and a data magician, merging datasets, creating dynamic lists, and optimizing performance without the need for batch processing or manual intervention.

Key Rule for Using DLOs in Streaming Data Transform

A crucial limitation to keep in mind when working with Streaming Data Transform is that you cannot use more than two Data Lake Objects (DLOs) at a time in the Builder. If you need to merge two or more DLOs, you'll need to write SQL manually, ensuring that there’s a common primary target DLO between them.

Example:

If you want to merge customer data and purchase history stored across three different sources, write SQL to combine them under one logical DLO. This ensures streamlined integration and reduces redundancy in your data pipeline.

SQL tip for combining DLOs effectively:

  • Match primary keys across datasets.
  • Define target relationships clearly.
  • Avoid nested operations that complicate transformations.

Core Operations in Streaming Data Transform

Streaming Data Transform allows for several operations that simplify and optimize your data workflow. Here's what you can achieve using SQL-powered transformations:

  • Clean Data – Perform operations like removing null values or trimming unnecessary characters.
  • Format Data – Standardize date fields, capitalize names, or change values into readable text.
  • Merge Data – Combine multiple datasets using UNION or join conditions.
  • Analyze in Real Time – Employ conditional logic and aggregate functions to extract key insights immediately.

Example of Streaming Data Query Structure

When working with streaming data, the query structure typically includes SELECT statements, Data Lineage Operators (DLOs), and WHERE conditions to filter the incoming data. Here's a basic example of a streaming data query:


                    SELECT event_time, user_id, action_type
                    FROM event_stream
                    WHERE action_type = 'purchase' AND event_time >= CURRENT_TIMESTAMP - INTERVAL '1 HOUR';
                    

Explanation of Key Components:

  1. Fields (SELECT Clause): The fields event_time, user_id, and action_type are specified in the SELECT clause to extract specific columns from the incoming data stream. These fields represent the most relevant pieces of information for the use case, ensuring the query returns meaningful results.
  2. Data Lineage Operator (DLO): The FROM clause defines the streaming source, event_stream, which represents the dataset being processed. This operator ensures that data is continuously fetched from the live stream, enabling real-time transformations and analysis.
  3. WHERE Condition: The WHERE clause filters the data to include only records where action_type is equal to 'purchase' and limits the time range to events occurring in the last hour. This filtering is essential for narrowing the scope of analysis, reducing noise, and focusing on actionable insights.

SQL Structure for Streaming Data Transform

The SQL foundation for Streaming Data Transform is built around fundamental functions and operations. Below, we explore examples of powerful SQL features and their output.

1. JSON_TABLE Function

Imagine you have a dataset where JSON data is stored in a single column. Here's a simplified version of what that might look like:

Dataset: customers_with_orders__dll

order_id__c json_data__c
1 {"customer": "John Doe", "items": [{"name": "Widget", "price": 25.00}]}
2 {"customer": "Jane Smith", "items": [{"name": "Gadget", "price": 15.00}]}

After Using JSON_TABLE

By using JSON_TABLE, you can extract and structure this JSON data into separate columns, making it easier to query and analyze.

Transformed Dataset

order_id__c customer__c item_name__c item_price__c
1 John Doe Widget 25.00
2 Jane Smith Gadget 15.00

SQL Query Using JSON_TABLE


                    SELECT 
                      customers_with_orders__dll.order_id__c as order_id__c, 
                      customers.customer as customer__c,
                      items.item_name as item_name__c,
                      items.item_price as item_price__c
                    FROM customers_with_orders__dll, 
                    JSON_TABLE(
                      customers_with_orders__dll.json_data__c, '$' COLUMNS (
                        customer TEXT PATH '$.customer'
                      )
                    ) customers,
                    JSON_TABLE(
                      customers_with_orders__dll.json_data__c, '$.items[*]' COLUMNS (
                        item_name TEXT PATH '$.name',
                        item_price NUMBER PATH '$.price'
                      )
                    ) items
                    

This query extracts the customer, item_name, and item_price from the JSON data, creating a more structured and query-friendly dataset.

2. Concatenate Fields

Let's look at an example of how you can use the CONCAT() function to concatenate fields in a dataset.

Example Dataset

Imagine you have a dataset of customer information with separate fields for first and last names:

customer_id first_name last_name
1 John Doe
2 Jane Smith

Using CONCAT() to Combine Fields

You can use the CONCAT() function to combine the first_name and last_name fields into a single full_name field.

SQL Query


SELECT 
customers__dll.customer_id__c as customer_id__c, 
CONCAT(first_name, ' ', last_name) AS full_name__c
FROM customers__dll

Resulting Dataset

customer_id__c full_name__c
1 John Doe
2 Jane Smith

In this example, the CONCAT() function is used to merge the first_name and last_name fields, with a space in between, to create a new full_name field. This makes it easier to display or use the full name in reports or applications.

3. Handle Null Values

The COALESCE() function is used to return the first non-null value from a list of arguments. It's particularly useful for handling missing data.

Example Dataset

Let's say you have a dataset of customer contact information where some customers have a phone number, others have an email, and some have both:

customer_id phone_number email
1 123-456-7890 NULL
2 NULL jane@example.com
3 987-654-3210 john@example.com

Using COALESCE() to Select the First Non-Null Contact Method

You can use the COALESCE() function to create a new field that selects the first available contact method, either phone or email.

SQL Query


SELECT 
  customers__dll.customer_id as customer_id__c, 
  COALESCE(phone_number, email) AS primary_contact__c
FROM customers__dll

Resulting Dataset

customer_id__c primary_contact__c
1 123-456-7890
2 jane@example.com
3 987-654-3210

In this example, the COALESCE() function checks each row and returns the phone_number if it's available; otherwise, it returns the email. This way, you ensure that the primary_contact field always has a value, even if one of the contact methods is missing.

4. Change Case

Let's explore how to use the LOWER() and UPPER() functions to change the case of text in a dataset.

Example Dataset

Suppose you have a dataset with product names that are inconsistently cased:

product_id product_name
1 Widget
2 gadget
3 THINGAMABOB

Using LOWER() and UPPER() to Standardize Case

You can use the LOWER() function to convert all product names to lowercase, or the UPPER() function to convert them to uppercase, depending on your needs.

SQL Query to Convert to Lowercase


SELECT 
  products__dll.product_id as product_id__c, 
  LOWER(product_name) AS product_name_lower__c
FROM products__dll

Resulting Dataset (Lowercase)

product_id__c product_name_lower__c
1 widget
2 gadget
3 thingamabob

SQL Query to Convert to Uppercase


SELECT 
  products__dll.product_id as product_id__c, 
  UPPER(product_name) AS product_name_upper__c
FROM products__dll

Resulting Dataset (Uppercase)

product_id__c product_name_upper__c
1 WIDGET
2 GADGET
3 THINGAMABOB

In these examples, the LOWER() function is used to ensure all product names are in lowercase, while the UPPER() function converts them to uppercase. This can be particularly useful for standardizing data for display or comparison purposes.

5. Convert Data Types

The TO_DATETIME() function is used to convert a date string into a datetime format, which is essential for performing date-based operations and comparisons.

Example Dataset

Consider a dataset where dates are stored as strings:

event_id event_date_string
1 2025-01-01
2 2025-02-15
3 2025-03-30

Using TO_DATETIME() to Convert Strings to Datetime

You can use the TO_DATETIME() function to convert these string representations into a datetime format.

SQL Query


SELECT 
  events__dll.event_id as event_id__c, 
  TO_DATETIME(event_date_string) AS event_date__c
FROM events__dll

Resulting Dataset

event_id__c event_date__c
1 2025-01-01 00:00:00
2 2025-02-15 00:00:00
3 2025-03-30 00:00:00

In this example, the TO_DATETIME() function is used to convert the event_date_string field into a datetime format, allowing for more complex date manipulations and queries.

6. Handle Logical Conditions

When working with null values in data, functions like ISNULL(), IFNULL(), NULLIF(), and IS NOT NULL play a crucial role in handling missing or undefined data effectively. Let's explore how these functions work with examples for each.

ISNULL()

The ISNULL() function checks whether an expression is NULL and returns a boolean value.

Example Dataset

customer_id phone_number
1 123-456-7890
2 NULL
3 987-654-3210

SQL Query Using ISNULL()


SELECT 
  customers__dll.customer_id as customer_id__c, 
  ISNULL(phone_number) AS is_phone_null__c
FROM customers__dll

Resulting Dataset

customer_id__c is_phone_null__c
1 0
2 1
3 0

IFNULL()

The IFNULL() function returns a specified value if the expression is NULL; otherwise, it returns the expression.

SQL Query Using IFNULL()


SELECT 
  customers__dll.customer_id as customer_id__c, 
  IFNULL(phone_number, 'No Phone') AS phone_number__c
FROM customers__dll

Resulting Dataset

customer_id__c phone_number__c
1 123-456-7890
2 No Phone
3 987-654-3210

NULLIF()

The NULLIF() function returns NULL if two expressions are equal; otherwise, it returns the first expression.

Example Dataset

product_id current_price previous_price
1 100 100
2 150 120
3 200 200

SQL Query Using NULLIF()


SELECT 
  products__dll.product_id as product_id__c, 
  NULLIF(current_price, previous_price) AS price_change__c
FROM products__dll

Resulting Dataset

product_id__c price_change__c
1 NULL
2 150
3 NULL

IS NOT NULL

The IS NOT NULL condition checks whether an expression is not NULL.

SQL Query Using IS NOT NULL


SELECT 
  customers__dll.customer_id as customer_id__c, 
  customers__dll.phone_number as phone_number__c
FROM customers__dll
WHERE phone_number IS NOT NULL

Resulting Dataset

customer_id__c phone_number__c
1 123-456-7890
3 987-654-3210

These functions are useful for handling NULL values in different scenarios, allowing you to manage missing or undefined data effectively.

7. CASE WHEN

You can use the CASE WHEN statement to assign letter grades based on the score.

SQL Query


SELECT 
  student_id, 
  score,
  CASE 
    WHEN score >= 90 THEN 'A'
    WHEN score >= 80 THEN 'B'
    WHEN score >= 70 THEN 'C'
    WHEN score >= 60 THEN 'D'
    ELSE 'F'
  END AS grade
FROM students

Resulting Dataset

student_id score grade
1 85 B
2 72 C
3 90 A
4 65 D

In this example, the CASE WHEN statement evaluates each student's score and assigns a grade based on the specified conditions. If none of the conditions are met, it defaults to assigning an 'F'. This approach is useful for categorizing or transforming data based on specific criteria.

8. Perform Union Operations

Let's visualize the data transformation process by showing the structure and content of the tables before and after applying the UNION operation.

Before UNION: Source Table

Table: Contacts_with_multiple_phone_numbers__dll

contactid__c mobilenumber__c worknumber__c homenumber__c
1 123-456-7890 234-567-8901 345-678-9012
2 456-789-0123 567-890-1234 678-901-2345
3 789-012-3456 890-123-4567 901-234-5678

After UNION: Target Table

Table: PhoneNumbers

ID__c ContactID__c Phone_Number__c Phone_Number_Type__c
Mobile_1 1 123-456-7890 Mobile
Work_1 1 234-567-8901 Work
Home_1 1 345-678-9012 Home
Mobile_2 2 456-789-0123 Mobile
Work_2 2 567-890-1234 Work
Home_2 2 678-901-2345 Home
Mobile_3 3 789-012-3456 Mobile
Work_3 3 890-123-4567 Work
Home_3 3 901-234-5678 Home

Explanation

Before UNION: The source table Contacts_with_multiple_phone_numbers__dll contains multiple phone numbers in each row, categorized by type (mobile, work, home) for each contact.

After UNION: The target table PhoneNumbers is the result of the UNION operation, where each phone number is extracted into its own row. The ID__c column uniquely identifies each phone number entry by combining the type and contact ID. The Phone_Number_Type__c column indicates the type of phone number. This transformation normalizes the data, making it easier to manage and query.

Let's break down the data transformation process using the UNION operator to normalize phone numbers from the Contacts_with_multiple_phone_numbers__dll table into the PhoneNumbers table. Each SELECT statement extracts a different type of phone number: mobile, work, and home.

SQL Query


SELECT
  Concat("Mobile_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
  Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
  TRIM(Contacts_with_multiple_phone_numbers__dll.mobilenumber__c) as Phone_Number__c,
  "Mobile" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.mobilenumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.mobilenumber__c <> ''

UNION

SELECT
  Concat("Work_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
  Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
  Contacts_with_multiple_phone_numbers__dll.worknumber__c as Phone_Number__c,
  "Work" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.worknumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.worknumber__c <> ''

UNION

SELECT
  Concat("Home_", Contacts_with_multiple_phone_numbers__dll.contactid__c) as ID__c,
  Contacts_with_multiple_phone_numbers__dll.contactid__c as ContactID__c,
  Contacts_with_multiple_phone_numbers__dll.homenumber__c as Phone_Number__c,
  "Home" as Phone_Number_Type__c
FROM Contacts_with_multiple_phone_numbers__dll
WHERE Contacts_with_multiple_phone_numbers__dll.homenumber__c <> "" AND Contacts_with_multiple_phone_numbers__dll.homenumber__c <> ''

Practical Tips for Leveraging Streaming Data Transform

  • Clean and Prep Early: Use functions like TRIM(), ISNULL(), and TO_DATE() to standardize input data as it arrives.
  • Optimize Efficiency: Merge similar datasets with UNION for better scalability.
  • Document Complex Logic: Label custom transformations clearly for easy debugging.
  • Collaborate Actively: Share SQL queries within teams to build faster, actionable workflows.
back top