Identity Management

Leveling Up Identity Resolution: The Four Biggest Pitfalls of DIY Identity Resolution

We explore four misconceptions associated with creating your own DIY identity resolution.

Apr 5, 2023

By Segment

Personalization is only possible with a true understanding of your customers – their shopping and browsing preferences, the interactions they have with your support team, how they use your product, and more. But as these activities occur across a wide variety of channels, apps, and devices, without the ability to stitch all of these customer behaviors together into a unified view of the customer, a truly personal feeling relationship is impossible. This is why identity resolution continues to be a top priority for data teams.

In our last post, we covered the challenges and best practices of identity resolution. In this post, we will look at the pitfalls of do it yourself identity resolution, and the challenges that lead many data teams to seek out a trusted identity resolution solution like Segment.

What is Identity Resolution?

Identity resolution is the critical process of linking all of a customer’s interactions with your business into one unified profile that can then be leveraged by all customer-facing applications for more personalized interactions. Identity resolution is the only way to understand how customers behave across email, the web, your mobile sites, and even how they interact anonymously with your business, and to deliver a customer experience that takes a customer’s history and preferences into account.

With identity resolution, data teams can imbue business teams with valuable insights that drive customer satisfaction, loyalty, and growth, while making it easier for business teams to leverage intelligence to drive bottom line revenue.

How Identity Resolution Projects End Up A “DIY” Project

Identity resolution projects arise because business teams want to know:

What campaigns are effective with what kinds of customers?
What customer behaviors are most likely to predict churn?
What is the lifetime value of every customer?
What marketing touches turned a prospect into a customer, or drove the highest volume of repeat purchases?
How can we tell if a customer is no longer engaged?

Business teams know the answer to these questions lies in the data, but they have no idea how to tie it together. While data teams are expected to help the business stitch customer interactions together and deliver insights, they are often left out of the loop when it comes to establishing the “how.”

By the time many data science teams are brought into the discussion, marketing has already gone down the road of choosing a vendor to manage their identity resolution. Alternatively, business teams may come to data science teams with no plan for execution, expecting data teams to manage the entire process from ideation to execution. In either case, data science often plays a role in helping business teams decide how to take identity resolution from idea to reality. And the biggest question to answer is whether the business should find an identity resolution provider to work with, or to build their own solution.

When you consider the costs of working with an outside vendor, it can be tempting to manage identity resolution internally, especially now that modern warehouse vendors like Snowflake have made it easier to perform complex queries and transformations in the warehouse without having to worry about the constraint of resource limits. There are a litany of resources that advocate for leveraging a slew of connected tools to manage the entire process yourself, with the promise of unparalleled flexibility, freedom, and the ability to get into the weeds of modeling the data “just right” for your business.

While it’s nice to think that identity resolution can be done with some brains and elbow grease, the reality is that many teams that do it alone end up falling far short of the goal of delivering true customer insights. In 2020, over half of senior marketers reported disappointment in the results achieved from marketing analytics, and a full 60% of marketing teams said they planned to cut investments in analytics due to poor outcomes, according to Gartner.

The truth is, if you’re a SQL genius and want to spend the majority of your work life creating, fixing, re-tweaking data models, and finding new ways to join data tables from completely disparate solutions, DIY identity resolution is great. If you want to quickly deliver value to the business teams you support, DIY identity resolution is a road plagued with potholes.

From our experience working with 25,000 companies to solve identity resolution challenges, here are the four biggest pitfalls we’ve heard from data teams who have taken the “go it alone” approach.

The four biggest pitfalls of DIY identity resolution

Pitfall number one: Your customer engagement strategy has changed over time… A lot more than you realize.

The first step of any identity resolution project is to create a basic historical record of who everyone is in the database. While this sounds easy, it is actually incredibly complex. Not only do you have to inventory the data as it stands today, but you also have to understand how it’s changed or drifted over the years.

It’s tempting to think that most businesses apply a level of rigor to customer engagement, but the reality is that the way a business capturers customer data changes wildly across different business stages.

Business teams have historically captured customer data in a variety of ways: through a list purchase, a bulk .CSV upload into a CRM, or a name dropped into a bowl at an event. As a business grows in sophistication, forms, mobile apps, and logged-in user actions begin to capture the customer story. Each of these mechanisms for capturing customer data have their own set of objects and data structures, and possibly their own unique identifiers. Reconciling records across all of these disparate sites is extremely challenging on its own, but it becomes an order of magnitude more difficult when you have developers throwing in code requests, and changing the logic of how information is captured and categorized along the way. For every change your business has made, there are hundreds – if not thousands – of lines of code that have to be created to reconcile the difference to get to a clean ID graph.

This means getting a simple view of “what has happened” means not just finding all the different data sources from the beginning of time, but understanding all of the different identifiers and all of the possible IDs and combos that appear in silos, and paring that with a change log of all the ways your logic may have changed over time. And this is just to get a cursory understanding of the ID graph!

Pitfall number two: The incredible complexity of building an ID graph

Once you have a handle on all of the data that is “out there,” the next task for data scientists is to assemble the data together in the data warehouse, and develop the ID graph that will be used for all future data queries and sent to the application layer for use by the business teams.

It sounds simple enough, but in reality, “analytics engineering” is such a time consuming task that it’s increasingly starting to be its own job. Stitching identities in the warehouse by performing transformations can consume an entire data science team for years. One of the biggest challenges of processing data is the simple fact that, as a data scientist, you’re just not that close to the customer. You have access to all the data, but deciding what data is important, and what data is useful, is a really tough call to make on your own. Oppositely, the business teams have the most insight about “what makes sense” when it comes to looking at customer behavior through data, but no business team should be looking at raw data in the warehouse.

The biggest challenge here is that what’s good for marketing isn’t always good for clean data practices (how many times have your marketers copied and pasted an email form, created fields called EMAIL, e-mail, and Email, or purchased a list of 100k email addresses to add to the inventory?) Especially when you consider that sales and marketing tools aren’t often designed for flattening data collection purposes, it’s no wonder that these “easy to use” business solutions can become a nightmare for data teams.

We recommend keeping business users away from raw data in the warehouse, which can be very difficult to make sense of, but they do need a way to look at the most complete form of customer data that makes sense. Data teams assist in this process by developing an ID graph from data that has been captured, cleaned, and transformed. Before making data available to business teams, data teams must preprocess and decide what kind of data can be trusted, what identifiers should be used, and how to handle the massive influx of data that usually exists. This process requires SQL logic to create files that are thousands of lines long, and then a painstaking process of combining, enriching, filtering, and cleaning to make a nice new set of data.

Pitfall three: You’ll need to buy, connect, and maintain a whole new set of tools to action the data

While many teams go the DIY identity resolution route to save money by reducing the involvement of third party vendors, the reality is that DIY identity resolution requires the support of a lot of vendors!

It is possible to create a basic version of an ID graph using SQL and data in the warehouse, but making the benefits of identity resolution available to the business will require investment in a host of solutions to make the data portable. Once you have resolved customer profiles, you need to make that information available to business teams in their systems of choice so they can deliver personalized messages, campaigns, and more. This information needs to be actionable, and as more customer interactions change the customer record, this information has to come back into the system of record, and update the customer source of truth.

The more underlying infrastructure (servers/VMs, compute, storage) and vendors (like AWS) that need to be connected to purpose-built tech stacks (like martech, CRM, support, and ecommerce tools), the more complicated the data pipelines become, and the more data tools you bring into the mix, the more vendors you need to work with as well.

A classic configuration can require up to 5 different tools and 5 different vendors:

Event Streaming
ETL
Reverse ETL
Data warehouse
Data transformation

As the requirements to connect more and more systems grows, so does the time-suck and expense of navigating the myriad of protocols, languages, interfaces, and UI/UX choices in an ecosystem of providers. Many of these tools lack "out of the box" connectors and so connecting the dots between systems requires a lot of custom code, especially for legacy systems that require a different approach than more agile cloud based systems and lack the API coverage to easily connect to new tools.

This requirement to evaluate, purchase, setup, and maintain new solutions to make identity resolution outputs useful to the business dramatically increases the costs (time and money) of DIY identity resolution, and makes many data teams regret the day they decided to pursue building a “simple” in-house solution.

Pitfall number four: When you think the job’s over, you realize it’s just begun

By the time you’ve collected, cleaned, assembled, reconciled, gone back and forth to the marketing team, and written thousands of lines of SQL to get to a basic ID graph, it feels like you’ve taken a journey of a thousand miles. Unfortunately, even at this point, fully realized identity resolution is still out on the horizon.

That’s because the biggest challenge of Identity Resolution isn’t gathering the data or creating the ID Graph, it’s maintaining the system over time.

For data teams that have built their own warehouse-centric solution, this is the biggest and most challenging pitfall to overcome. When data teams maintain an identity resolution solution internally, resourcing requires data teams to manage the following 100% in-house:

Lengthy test, debug, audit cycles
Navigating legal/regulatory and privacy compliance
Security compliance
First level issue support

Of all of the challenges listed above, the biggest challenge is the impossibility of QA. While data science teams are well-versed in working with data, they have much less context than the business teams, so it’s tough to recognize when the data is doing something “funny.” (Did that customer really enter three different emails on their path to purchase, or did your identity resolution just make a mistake?)

Consider the many tools and vendors that are required to operate your business every day. Creating a DIY Identity Resolution solution on top of the customer experience becomes difficult because these tools create a world of siloes, which make mistakes and gaps in identity resolution incredibly hard to catch. The more opaque and roughly stitched together your identity resolution process is, the more challenging it becomes to have confidence in outcomes.

A data scientist can easily build and deploy a lead scoring model, but its successful adoption depends on a sales rep choosing to trust that model. By guaranteeing that data and business teams operate on the same view of the customer journey, defined by a shared identity resolution system, you enable that salesperson to trust the data on a lead being good or bad. When internal teams trust that they are all working with the same reliable data, it will lead to adoption of new data models that can drive better results (not to mention more personalized outreach!).

Maintaining the solution would be challenging in a world where business never changes, and customer behavior never evolves, but that is not the world most data teams operate in. Most businesses continue to adopt new technologies for communicating with customers, which means new data, new objects and fields to map, and new identifiers to merge.

Every time someone introduces a new silo, data teams must go back to every single file they’ve ever written, and add logic pertaining to that new input – usually one at a time. For example, when a new type of identifier is added, data teams must create new logic that considers the new identifier, and create rules for how it pairs with emails, they must understand how to cookie the ID, and more… and every new identifier that gets added on top is multiplicative.

With this much complexity, it eventually becomes impossible for data teams to stay ahead of the game, which is why so many DIY projects ultimately fail. According to Forrester, while 66% of brands have had an identity program in place for at least 12 months, only half of marketers say they are fully capable of fundamental identity resolution capabilities.

Conclusion: There is a fine line between flexibility and chaos…

The quest to create a solution that enables 360 customer visibility and a 1-1 personalized customer experience is a noble one, but it is also a difficult one. For most teams who “go it alone,” the benefits of the project are often undercut because of the complexity, inefficiency, and difficulty of maintaining the solution. It’s not just that it can take years to build your own solution, it’s that, even after the solution has been “built,” it’s already needing to be rebuilt.

Much better outcomes can be generated with a future-proofed future digital strategy/architecture, which is API first, and easy for developers to engage with. This is one reason 25,000+ companies choose solutions like Segment CDP to do the heavy lifting of identity resolution. Segment helps data teams build profiles that business teams can trust, with an expertise built over refining our Identity Resolution system to solve use cases across various industries. Profiles can be easily enriched with data in the warehouse to build a complete view of the customer that includes data from events like web and app interactions as well as systems like CRM or customer support applications.

At Segment, our customer data platform has helped thousands of businesses merge the complete history of each customer into a single profile. Segment provides engineers with a tested framework for identity resolution that gives full visibility into and access to all of a customer’s traits in the warehouse. This enables robust and complex audience building capabilities, and powers personalization and retention strategies.

When data teams can offload the heavy lifting of building and maintaining Identity Resolution, it’s much easier to focus on solving complex use cases for business users, like attribution analysis, calculating average customer lifetime value, churn prediction, and more.

Test drive Segment CDP today

It’s free to connect your data sources and destinations to the Segment CDP. Use one API to collect analytics data across any platform.

Get started

Test drive Segment CDP today

It’s free to connect your data sources and destinations to the Segment CDP. Use one API to collect analytics data across any platform.

Get started

Keep updated

Leveling Up Identity Resolution: The Four Biggest Pitfalls of DIY Identity Resolution

The four biggest pitfalls of DIY identity resolution

Test drive Segment CDP today

Test drive Segment CDP today

Recommended articles

How to accelerate time-to-value with a personalized customer onboarding campaign

Introducing Segment Community: A central hub to connect, learn, share and innovate

Using ClickHouse to count unique users at scale

Want to keep updated on Segment launches, events, and updates?