Splitting a string of comma-separated values in a Data Extension field using SQL

The data that we have to work with in Salesforce Marketing Cloud rarely comes in the format that we need to be able to further use it inside journeys, emails, etc. One of the common use cases that repeatedly comes up on Salesforce Stack Exchange is splitting values from a comma-delimited list into separate rows. Unfortunately, in Marketing Cloud, the usual SQL methods such as STRING_SPLIT are not supported, which results in a heavy dependence on makeshift solutions.

Below you will find two SQL-based solutions that I found most reliable, although you need to be warned of a limitation that comes with both of them and is tied to the usage of the CROSS APPLY function: you need to be able to predict what is the maximum number of comma separated values in one field, as it’s not able to dynamically loop through all of them.

Data schema for our use case

I’ve come up with a very simple use case to illustrate the problem: we have a Data Extension with unique SubscriberKeys and for each subscriber, we have a list of comma-separated OrderIDs. Here’s a preview of the Data Extension that we’re working with:

SubscriberKey	OrderIDs
Test_1	apples, bananas, oranges
Test_2	potatoes, tomatoes

And this is the result that we’re after – each OrderID in a separate row, with it’s corresponding SubscriberKey:

SubscriberKey	OrderID
Test_1	apples
Test_1	bananas
Test_1	oranges
Test_2	potatoes
Test_2	tomatoes

Solution 1: Cross Apply with Substring

I’ve adapted this solution from Adam Spriggs’ SFSE answer and its logic is based on the SUBSTRING function which is used to find each of the comma-separated values and split it accordingly. In the below query, you will need to change the following to make it work with your data:

In the SELECT statement in line 2, add your desired output column names. Note that SubscriberKey is the original PK column in the initial DE, while the OrderID column is the result we want to get from the original OrderIDs column
In the SELECT statement in line 5, update the column names with the original PK column name and the name of the column which holds the comma-delimited values
Do the same for each SELECT statement that follows after that
In the last line, in the WHERE clause, make sure to again use the desired output colum name, in my case, it’s OrderID

Also, note that inside the query we are UNION-ing 6 select statements – this is directly related to the limitation mentioned at the beginning: you need to be able to predict the maximum number of comma-delimited values that can be present in one field. The below solution will work with up to 6 values – if your use case requires more, you’ll need to UNION additional SELECT statements and adjust them accordingly.

	select
	orders.SubscriberKey, orders.OrderID
	from (

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, 1, ISNULL(p1 – 1, LEN(a.OrderIDs))) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)

	union

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, p1 + 1, ISNULL(p2, LEN(a.OrderIDs) + 1) – p1 – 1) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)

	union

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, p2 + 1, ISNULL(p3, LEN(a.OrderIDs) + 1) – p2 – 1) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)

	union

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, p3 + 1, ISNULL(p4, LEN(a.OrderIDs) + 1) – p3 – 1) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)

	union

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, p4 + 1, ISNULL(p5, LEN(a.OrderIDs) + 1) – p4 – 1) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p4 + 1), 0)) f(p5)

	union

	SELECT
	a.SubscriberKey
	, SUBSTRING(a.OrderIDs, p5 + 1, ISNULL(p6, LEN(a.OrderIDs) + 1) – p5 – 1) OrderID
	from AllOrders a
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs), 0)) b(p1)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p1 + 1), 0)) c(p2)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p2 + 1), 0)) d(p3)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p3 + 1), 0)) e(p4)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p4 + 1), 0)) f(p5)
	CROSS APPLY (select NULLIF(CHARINDEX(',', a.OrderIDs, p5 + 1), 0)) g(p6)

	) orders
	where orders.OrderID is not null

view raw split_csv_string.sql hosted with ❤ by GitHub

Solution 2: Cross Apply with XML

This solution comes from Gregory Gifford’s blog and it turns the list of comma-separated values into an XML to later split it into separate rows. In the below query, you will need to change the following to make it work with your data:

In the SELECT statement in the first line, add all the column names from your original Data Extension
In the SELECT statement in line 5, replace OrderIDs with the name of the column that holds the comma-separated values in your DE
Repeat this for every subsequent SELECT statement
In the last line, in the WHERE clause, also update the column name with the original column name that holds the comma-separated values

Just like in the previous query, you will see that we have a bunch of UNIONs that determine the number of comma-delimited values that can be processed. If you need to add more than the 9 that we have below, just add another UNION with an adjusted SELECT statement.

	SELECT p.SubscriberKey
	, c.OrderIDs
	FROM [AllOrders] p
	CROSS APPLY (
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[1]','nvarchar(max)') as OrderIDs1
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[2]','nvarchar(max)') as OrderIDs2
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[3]','nvarchar(max)') as OrderIDs3
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[4]','nvarchar(max)') as OrderIDs4
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[5]','nvarchar(max)') as OrderIDs5
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[6]','nvarchar(max)') as OrderIDs6
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[7]','nvarchar(max)') as OrderIDs7
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[8]','nvarchar(max)') as OrderIDs8
	UNION ALL
	Select CAST('<x>' + REPLACE(p.OrderIDs,',','</x><x>') + '</x>' AS XML).value('/x[9]','nvarchar(max)') as OrderIDs9
	) c (OrderIDs)
	WHERE ISNULL(c.OrderIDs,'') <> ''

view raw split_string_xml.sql hosted with ❤ by GitHub

Performance and ease of use

I’ve tested both queries in Query Studio on a dataset with 1 million records, each record with 4 comma-delimited values in the OrderIDs field and this is how they performed:

The Cross Apply with Substring solution took ~2 minutes to process the data
The Cross Apply with XML solution took ~6 minutes to process the data

That said, the Cross Apply with Substring solution, although more complex to adapt, seems to be more performant, but if you’re not worried about working with large datasets, the Cross Apply with XML is much easier to adjust to your DE schema and works just as well.

Questions? Comments?

Leave a comment below or email me at zuzanna@sfmarketing.cloud.

One thought on “Splitting a string of comma-separated values in a Data Extension field using SQL”

Eashwer B. Iyer

With the recent upgrade to SQL server 2019 , will SFMC support STRING_SPLIT for splitting a string of comma-separated values into multiple rows?

LikeLike

January 8, 2024 at 20:00 Reply