The data that we have to work with in Salesforce Marketing Cloud rarely comes in the format that we need to be able to further use it inside journeys, emails, etc. One of the common use cases that repeatedly comes up on Salesforce Stack Exchange is splitting values from a comma-delimited list into separate rows. Unfortunately, in Marketing Cloud, the usual SQL methods such as STRING_SPLIT are not supported, which results in a heavy dependence on makeshift solutions.
Below you will find two SQL-based solutions that I found most reliable, although you need to be warned of a limitation that comes with both of them and is tied to the usage of the CROSS APPLY function: you need to be able to predict what is the maximum number of comma separated values in one field, as it’s not able to dynamically loop through all of them.
Data schema for our use case
I’ve come up with a very simple use case to illustrate the problem: we have a Data Extension with unique SubscriberKey
s and for each subscriber, we have a list of comma-separated OrderIDs
. Here’s a preview of the Data Extension that we’re working with:
SubscriberKey | OrderIDs |
---|---|
Test_1 | apples, bananas, oranges |
Test_2 | potatoes, tomatoes |
And this is the result that we’re after – each OrderID
in a separate row, with it’s corresponding SubscriberKey
:
SubscriberKey | OrderID |
---|---|
Test_1 | apples |
Test_1 | bananas |
Test_1 | oranges |
Test_2 | potatoes |
Test_2 | tomatoes |
Solution 1: Cross Apply with Substring
I’ve adapted this solution from Adam Spriggs’ SFSE answer and its logic is based on the SUBSTRING function which is used to find each of the comma-separated values and split it accordingly. In the below query, you will need to change the following to make it work with your data:
- In the SELECT statement in line 2, add your desired output column names. Note that
SubscriberKey
is the original PK column in the initial DE, while theOrderID
column is the result we want to get from the originalOrderIDs
column - In the SELECT statement in line 5, update the column names with the original PK column name and the name of the column which holds the comma-delimited values
- Do the same for each SELECT statement that follows after that
- In the last line, in the WHERE clause, make sure to again use the desired output colum name, in my case, it’s
OrderID
Also, note that inside the query we are UNION
-ing 6 select statements – this is directly related to the limitation mentioned at the beginning: you need to be able to predict the maximum number of comma-delimited values that can be present in one field. The below solution will work with up to 6 values – if your use case requires more, you’ll need to UNION
additional SELECT
statements and adjust them accordingly.
Solution 2: Cross Apply with XML
This solution comes from Gregory Gifford’s blog and it turns the list of comma-separated values into an XML to later split it into separate rows. In the below query, you will need to change the following to make it work with your data:
- In the SELECT statement in the first line, add all the column names from your original Data Extension
- In the SELECT statement in line 5, replace
OrderIDs
with the name of the column that holds the comma-separated values in your DE - Repeat this for every subsequent SELECT statement
- In the last line, in the WHERE clause, also update the column name with the original column name that holds the comma-separated values
Just like in the previous query, you will see that we have a bunch of UNION
s that determine the number of comma-delimited values that can be processed. If you need to add more than the 9 that we have below, just add another UNION
with an adjusted SELECT
statement.
Performance and ease of use
I’ve tested both queries in Query Studio on a dataset with 1 million records, each record with 4 comma-delimited values in the OrderIDs
field and this is how they performed:
- The Cross Apply with Substring solution took ~2 minutes to process the data
- The Cross Apply with XML solution took ~6 minutes to process the data
That said, the Cross Apply with Substring solution, although more complex to adapt, seems to be more performant, but if you’re not worried about working with large datasets, the Cross Apply with XML is much easier to adjust to your DE schema and works just as well.
Questions? Comments?
Leave a comment below or email me at zuzanna@sfmarketing.cloud.