Splitting a string of comma-separated values in a Data Extension field using SQL

The data that we have to work with in Salesforce Marketing Cloud rarely comes in the format that we need to be able to further use it inside journeys, emails, etc. One of the common use cases that repeatedly comes up on Salesforce Stack Exchange is splitting values from a comma-delimited list into separate rows. Unfortunately, in Marketing Cloud, the usual SQL methods such as STRING_SPLIT are not supported, which results in a heavy dependence on makeshift solutions.

Below you will find two SQL-based solutions that I found most reliable, although you need to be warned of a limitation that comes with both of them and is tied to the usage of the CROSS APPLY function: you need to be able to predict what is the maximum number of comma separated values in one field, as it’s not able to dynamically loop through all of them.

Data schema for our use case

I’ve come up with a very simple use case to illustrate the problem: we have a Data Extension with unique SubscriberKeys and for each subscriber, we have a list of comma-separated OrderIDs. Here’s a preview of the Data Extension that we’re working with:

SubscriberKeyOrderIDs
Test_1apples, bananas, oranges
Test_2potatoes, tomatoes

And this is the result that we’re after – each OrderID in a separate row, with it’s corresponding SubscriberKey:

SubscriberKeyOrderID
Test_1apples
Test_1bananas
Test_1oranges
Test_2potatoes
Test_2tomatoes

Solution 1: Cross Apply with Substring

I’ve adapted this solution from Adam Spriggs’ SFSE answer and its logic is based on the SUBSTRING function which is used to find each of the comma-separated values and split it accordingly. In the below query, you will need to change the following to make it work with your data:

  • In the SELECT statement in line 2, add your desired output column names. Note that SubscriberKey is the original PK column in the initial DE, while the OrderID column is the result we want to get from the original OrderIDs column
  • In the SELECT statement in line 5, update the column names with the original PK column name and the name of the column which holds the comma-delimited values
  • Do the same for each SELECT statement that follows after that
  • In the last line, in the WHERE clause, make sure to again use the desired output colum name, in my case, it’s OrderID

Also, note that inside the query we are UNION-ing 6 select statements – this is directly related to the limitation mentioned at the beginning: you need to be able to predict the maximum number of comma-delimited values that can be present in one field. The below solution will work with up to 6 values – if your use case requires more, you’ll need to UNION additional SELECT statements and adjust them accordingly.

select
orders.SubscriberKey, orders.OrderID
from (
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, 1, ISNULL(p1 – 1, LEN(a.OrderIDs))) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
union
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, p1 + 1, ISNULL(p2, LEN(a.OrderIDs) + 1) – p1 – 1) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
union
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, p2 + 1, ISNULL(p3, LEN(a.OrderIDs) + 1) – p2 – 1) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
union
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, p3 + 1, ISNULL(p4, LEN(a.OrderIDs) + 1) – p3 – 1) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)
union
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, p4 + 1, ISNULL(p5, LEN(a.OrderIDs) + 1) – p4 – 1) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p4 + 1), 0)) f(p5)
union
SELECT
a.SubscriberKey
, SUBSTRING(a.OrderIDs, p5 + 1, ISNULL(p6, LEN(a.OrderIDs) + 1) – p5 – 1) OrderID
from AllOrders a
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p4 + 1), 0)) f(p5)
CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p5 + 1), 0)) g(p6)
) orders
where orders.OrderID is not null

Solution 2: Cross Apply with XML

This solution comes from Gregory Gifford’s blog and it turns the list of comma-separated values into an XML to later split it into separate rows. In the below query, you will need to change the following to make it work with your data:

  • In the SELECT statement in the first line, add all the column names from your original Data Extension
  • In the SELECT statement in line 5, replace OrderIDs with the name of the column that holds the comma-separated values in your DE
  • Repeat this for every subsequent SELECT statement
  • In the last line, in the WHERE clause, also update the column name with the original column name that holds the comma-separated values

Just like in the previous query, you will see that we have a bunch of UNIONs that determine the number of comma-delimited values that can be processed. If you need to add more than the 9 that we have below, just add another UNION with an adjusted SELECT statement.

SELECT p.SubscriberKey
, c.OrderIDs
FROM [AllOrders] p
CROSS APPLY (
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[1]','nvarchar(max)') as OrderIDs1
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[2]','nvarchar(max)') as OrderIDs2
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)') as OrderIDs3
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[4]','nvarchar(max)') as OrderIDs4
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[5]','nvarchar(max)') as OrderIDs5
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[6]','nvarchar(max)') as OrderIDs6
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[7]','nvarchar(max)') as OrderIDs7
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[8]','nvarchar(max)') as OrderIDs8
UNION ALL
Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[9]','nvarchar(max)') as OrderIDs9
) c (OrderIDs)
WHERE ISNULL(c.OrderIDs,'') <> ''

Performance and ease of use

I’ve tested both queries in Query Studio on a dataset with 1 million records, each record with 4 comma-delimited values in the OrderIDs field and this is how they performed:

  • The Cross Apply with Substring solution took ~2 minutes to process the data
  • The Cross Apply with XML solution took ~6 minutes to process the data

That said, the Cross Apply with Substring solution, although more complex to adapt, seems to be more performant, but if you’re not worried about working with large datasets, the Cross Apply with XML is much easier to adjust to your DE schema and works just as well.


Questions? Comments?

Leave a comment below or email me at zuzanna@sfmarketing.cloud.


One thought on “Splitting a string of comma-separated values in a Data Extension field using SQL

  1. Eashwer B. Iyer

    With the recent upgrade to SQL server 2019 , will SFMC support STRING_SPLIT for splitting a string of comma-separated values into multiple rows?

    Like

Leave a comment