Inventory of Data Extensions and their data sources

It’s very likely that at some point in your career you will inherit or take over a Salesforce Marketing Cloud instance that has been set up and used by someone else in the past. In case it hasn’t been well documented, it can be a struggle to understand all the data related processes and clean up all the unused Data Extensions.

Gregory Gifford has written a very useful blog post, where he explains how to create a Data Extension Inventory for your SFMC Business Unit. Using the code snippets he provided, you will be able to populate a complete list of all the Data Extensions in your Business Unit, along with their properties:

Source: https://gortonington.com/data-extension-inventory-for-your-sfmc-business-unit/

But where does the data in those Data Extensions come from? Of course, it won’t be possible to track everything – things like manual data imports, data uploaded through scripts or coming in through API won’t leave any trace in the system. Fortunately, we can track data-related activities that are scheduled in Automation Studio: Imports and Queries. We will use two SOAP API objects to obtain all the information we need: ImportDefinition and QueryDefinition.

But before we do that, let’s first create a Data Extension where we will populate the results of our calls. In the script, we are going to retrieve the following data:

  • Data Extension’s Name,
  • Data Extension’s External Key,
  • the name of the activity in Automation Studio which populates the data,
  • the type of the activity (Import/Query),
  • the description of the activity if one has been set up.

Here’s the Data Extension structure you will need:

Retrieve the ImportDefinition data

We will use WSProxy to retrieve information about Imports that have been set up and used in Automation Studio and to get information about Data Extensions they populate with data. The properties we will use in our script are the following:

  • Name – name of the Import Definition,
  • Description – description of the Import Definition,
  • ObjectID – ID of the Import Definition,
  • DestinationObject.ObjectID – the ID of the destination. In this context, this could be either a Data Extension or a List, so we will need to filter out Lists later on.

As you can see above, we can get only as much as the Data Extension’s ObjectID from the ImportDefinition object, so we will add another call to our script. We will call the DataExtension object to check the Name and the CustomerKey of that Data Extension using the ObjectID value to match it correctly.

Let’s combine all of the above together and add pagination, so that we are not limited by the fact that most API objects have a predefined batch size of 2500. Here’s the full function: [see code snippet]

function getImports(objectType, cols) {
var prox = new Script.Util.WSProxy(),
objectType = "ImportDefinition",
cols = ["DestinationObject.ObjectID", "Description", "Name", "ObjectID"],
moreData = true,
reqID = null,
numItems = 0;
while (moreData) {
moreData = false;
var data = reqID == null ? prox.retrieve(objectType, cols) : prox.getNextBatch(objectType, reqID);
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
if (data && data.Results) {
for (var i = 0; i < data.Results.length; i++) {
if (data.Results[i].DestinationObject.ObjectID) {
var cols = ["Name", "CustomerKey", "ObjectID"];
var filter = {
Property: "ObjectID",
SimpleOperator: "equals",
Value: data.Results[i].DestinationObject.ObjectID
};
var deName = prox.retrieve("DataExtension", cols, filter);
if (deName.Results[0].Name) {
logDE.Rows.Add({
DEName: deName.Results[0].Name,
DEKey: deName.Results[0].CustomerKey,
ActivityName: data.Results[i].Name,
ActivityType: "Import",
ActivityDescription: data.Results[i].Description
});
}
}
numItems++;
}
}
}
}
Write("<br />" + numItems + " total " + objectType + "<br>");
}
view raw get-imports.js hosted with ❤ by GitHub

Retrieve the QueryDefinition data

The QueryDefinition object is a bit easier to work with because SQL queries in Salesforce Marketing Cloud only have one type of destination: a Data Extension. That’s why the QueryDefinition object will be able to return everything we need in one call. The properties we will use in our script are the following:

  • Name – name of the Query Definition,
  • Description – description of the Query Definition,
  • DataExtensionTarget.Name – name of the target Data Extension,
  • DataExtensionTarget.CustomerKey – external key of the target Data Extension.

Again, we will add pagination to make sure we retrieve everything. Here’s the full function: [see code snippet]

function getQueries(objectType, cols) {
var prox = new Script.Util.WSProxy(),
objectType = "QueryDefinition",
cols = ["Name", "Description", "ObjectID", "DataExtensionTarget.Name", "DataExtensionTarget.CustomerKey"],
moreData = true,
reqID = null,
numItems = 0;
while (moreData) {
moreData = false;
var data = reqID == null ? prox.retrieve(objectType, cols) : prox.getNextBatch(objectType, reqID);
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
if (data && data.Results) {
for (var i = 0; i < data.Results.length; i++) {
logDE.Rows.Add({
DEName: data.Results[i].DataExtensionTarget.Name,
DEKey: data.Results[i].DataExtensionTarget.CustomerKey,
ActivityName: data.Results[i].Name,
ActivityType: "Query",
ActivityDescription: data.Results[i].Description
});
numItems++;
}
}
}
}
Write("<br />" + numItems + " total " + objectType + "<br>");
}
view raw get-queries.js hosted with ❤ by GitHub

The full script

Now let’s combine everything together and run our script – you can either run it on a CloudPage, or in Automation Studio. Note, that depending on the volume of activities in your account, this script can take even several minutes to resolve. All you will need to do with below script is to provide the External Key of the Data Extension you created earlier, which will hold the results of our calls: [see code snippet]

<script runat="server">
Platform.Load("Core", "1");
try {
//initiate WSProxy
var prox = new Script.Util.WSProxy();
//initiate data extension
var targetDE = 'xxxxxxx-xxxxxxx-xxxxxxxxx'; //pass external key of the target DE
var logDE = DataExtension.Init(targetDE);
function getImports(objectType, cols) {
var objectType = "ImportDefinition",
cols = ["DestinationObject.ObjectID", "Description", "Name", "ObjectID"],
moreData = true,
reqID = null,
numItems = 0;
while (moreData) {
moreData = false;
var data = reqID == null ? prox.retrieve(objectType, cols) : prox.getNextBatch(objectType, reqID);
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
if (data && data.Results) {
for (var i = 0; i < data.Results.length; i++) {
if (data.Results[i].DestinationObject.ObjectID) {
var cols = ["Name", "CustomerKey", "ObjectID"];
var filter = {
Property: "ObjectID",
SimpleOperator: "equals",
Value: data.Results[i].DestinationObject.ObjectID
};
var deName = prox.retrieve("DataExtension", cols, filter);
if (deName.Results[0].Name) {
logDE.Rows.Add({
DEName: deName.Results[0].Name,
DEKey: deName.Results[0].CustomerKey,
ActivityName: data.Results[i].Name,
ActivityType: "Import",
ActivityDescription: data.Results[i].Description
});
}
}
numItems++;
}
}
}
}
Write("<br />" + numItems + " total " + objectType + "<br>");
}
function getQueries(objectType, cols) {
var objectType = "QueryDefinition",
cols = ["Name", "Description", "ObjectID", "DataExtensionTarget.Name", "DataExtensionTarget.CustomerKey"],
moreData = true,
reqID = null,
numItems = 0;
while (moreData) {
moreData = false;
var data = reqID == null ? prox.retrieve(objectType, cols) : prox.getNextBatch(objectType, reqID);
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
if (data && data.Results) {
for (var i = 0; i < data.Results.length; i++) {
logDE.Rows.Add({
DEName: data.Results[i].DataExtensionTarget.Name,
DEKey: data.Results[i].DataExtensionTarget.CustomerKey,
ActivityName: data.Results[i].Name,
ActivityType: "Query",
ActivityDescription: data.Results[i].Description
});
numItems++;
}
}
}
}
Write("<br />" + numItems + " total " + objectType + "<br>");
}
getImports(objectType, cols)
getQueries(objectType, cols)
} catch (e) {
Write(Stringify(e));
}
</script>

Questions? Comments?

Leave a comment below or email me at zuzanna@sfmarketing.cloud.


7 thoughts on “Inventory of Data Extensions and their data sources

  1. Piyush

    Hi Zuzanna,

    Script is working and I am getting data in DE but it is keep on failing after 30 min. any solution on that. Also, Do you have any solution where I can store file name from FTP to DE.

    Thanks
    Piyush

    Liked by 1 person

    1. Hi Piyush,
      You can try running the scripts to get Imports and Queries separately in two different activities, or just run the whole thing on a CloudPage.
      Regarding FTP, I am not sure what you are referring to, as this article does not mention any FTP related activities.
      Zuzanna

      Liked by 1 person

      1. Piyush

        Hi Zuzanna

        Sorry for confusion, FTP was a separate question.
        I am getting a file on FTP as subscribersinfo_23245.csv , I want this 23245 to add as one column in DE. Is this achievable ?

        Thanks
        Piyush

        Liked by 1 person

  2. Hi Zuzanna,
    Thank you for the article. Do you have an idea why I’m getting only 2 records related to import (I have hundreds of data extensions and automation in the system).

    Like

Leave a comment