7 Tips to Clean Up Your Data in 2016
Dec 23, 2015
By Diana Smith
This time of year, many of us are cooking up New Year’s resolutions to kickstart our personal growth — Hit the gym 3 times a week; be kinder; read 5 books. But what about resolutions for our work lives?
This year, we invite you to join in on our 2016 New Year’s Resolution: Clean Slate Data. Together we want to answer, “How would you redo your tracking if you could start from scratch?” and implement it.
This article outlines 7 tips for you to get started.
Here’s the problem
You go to run your end of year reports, and it takes you three-times longer than you thought. Your drop downs in your analytics tools are stuffed with user actions you stopped caring about a while ago. A bunch of different developers added tracking at different times, so all of your events are named differently.
You have to ask three teammates what one event (Subscription Started
) in your reporting tool means because there is another one (Added Credit Card
) that might be telling you the the same thing.
You run funnel reports with both Account Created
and Signup
, but they don’t match. You don’t know which one to believe.
You have a basic grip on the data model, but onboarding a new teammate next year to learn the idiosyncratic schema seems daunting.
It doesn’t have to be this way. What if your data was squeaky clean?
The promise of clean data
To us, clean data means recording just the important events, in a standard format, with a spec accessible to your whole team. With clean data, you’ll be able to:
Reduce time to insight with fewer events clogging your systems
Empower each team to answer their questions with easy to understand event naming
Quickly measure company and team-specific KPIs by designing tracking with these in mind
Make confident decisions based on your data again
To make it here, you need to ask yourself a few tough questions:
How would you set up your tracking if you started from scratch? What would you cut? What would you add? How can you make the data in end tools more accurate and easier to use?
Why now
It’s very hard to revamp your data model without breaking analyses. That’s why many people avoid the clean up and live with crufty data.
But all of your reports turn over on the first of the year. You’ve done your reporting for 2015. It’s the perfect time to get clean. Plus, you’re probably planning for 2016 goals across your org. You’ll want to update your tracking to reflect the metrics you’re focused on in the coming months.
If you think about it, the time and effort you put into the scrub down will compound down the road: Your whole team won’t only save time on analysis, but the data will be more actionable, and more accurate. You can stop second guessing your charts and start making confident decisions.
7 tips for getting started
If you’re convinced, here are 7 steps you can follow to achieve clean data.
1. Finalize, document, and save your 2015 reports. If you’re going to be moving to new tracking, it’s important you save all reports that document your performance in 2015. You’ll want to be able to look back and see how your current metrics compare to last year. Save your reports, document what each chart means, and circulate to your team to make sure you’re not missing anything.
2. Identify the problems with your current tracking. Before you start making changes identify what’s wrong with your current setup. Do you have duplicate events you need to cut? Are your events inaccurate? Should you be sending them from a different location in your app? Does no one on the team know how they can use your data? Collect these issues, and make sure you address them with your new plan.
3. Focus on the most important metrics. Your company overall and each team likely landed on the most important metrics and goals for 2016. Identify the events you need to track to very simply calculate those.
For example, if you’re focusing on driving “paid accounts” as a company goal, make paid account
a user trait, rather than forcing your team to analyze that with a bunch of events like Upgraded
, Growth Plan
, and Startup Plan
. If your marketing team needs to know how many pieces of content and which content drives acquisition and retention, make it easy for them with a Marketing Content Viewed
event that can be tied to Account Created
and Account Upgraded
.
Don’t add in a bunch of superfluous events that aren’t answering critical questions you have about your customers or that aren’t tied to your KPIs. Focus on the metrics that matter.
4. Create a new implementation spec, or tracking plan. This tracking plan should serve as the “source of truth” for your tracking. It identifies each event, the properties associated with it, where it should fire, and why you are collecting it. (You can download a sample tracking plan template here.)
To simplify your setup try to group as many similar events as possible into a higher level rollup event. For example, instead of capturing Viewed Warehouses Landing Page
and Viewed Integrations Landing Page
, create one Viewed Landing Page
event with name
and url
properties.
Also, be adamant about naming conventions. Choose one way to name events, whether that’s plain text (Completed Order
) or object-action (Order Completed
). Make sure each event follows your preferred convention.
Once you’ve got your events all spec’d out, send the tracking document to the entire company, so they can use it to build their own analyses, funnels, and marketing segments. No more questions like, “How do I measure X?” should circulate around.
5. Build a “translator” for the old tracking to the new. As you switch over to the new schema, it will be helpful to document how old events translate into new ones. If your team was used to looking at Logins
, and those will now be under App Login
, you should help them easily discover changes.
To make the new schema more accessible, we suggest you create a “translator” attached to your tracking plan to help people understand how events they were used to seeing are now tracked and why you may have deleted some things. Think of this document as an “French to English” dictionary.
The translator should also help you in 6 months when you want to look back at your 2015 reports to compare key metrics. By then, you’ll probably have forgotten how you changed your schema.
6. Work with tools that can clean up your data. There are a few different options for switching over to your new data model in end analytics tools. The least manual option (great if you don’t need to see your 2015 and 2016 data together) is to send your new tracking to a new project in your out-of-the-box tools. You can easily switch between projects for old analysis, but the new data will be squeaky clean.
Only one out-of-the-box tool that we know of, Indicative, lets you merge two events and rename them without code. (That’s a sweet feature! Other platforms, take note!)
If you’re not using Indicative and you need your historical data to match your new schema, then you can write a script to translate data from the old to the new events. This is possible for Segment customers and folks using other analytics platforms with an HTTP API. (Write a script on top of the S3 integration back into the HTTP API). We’ll admit this is a pretty manual process, so consider if just switching projects will work for your team.
If you’re using a data warehouse, your schema consolidation will be much easier. Tools like Xplenty can help you merge and clean columns, or you can use the SELECT INTO
SQL function to append old event tables onto new ones.
7. Communicate the new process to your team. Now that you’ve scrubbed down your data, it’s important to keep it clean! The best way to do this is to host a meeting with your team where you outline your new schema, share the tracking plan, explain your naming convention, and discuss the process for tracking new events.
Who do they have to run events by? When in the product development process are events decided and implemented? (Hint: Before launch.) Write this process up for future hires.
In a number of our customer’s companies, we’re seeing the rise of a new role that we’re calling the “Data Czar.” This person is responsible for owning the schema and approving all new events that go into their apps and websites. They work with product and analytics leads to ensure each new event is necessary, follows their spec, and is being tracked from the right location (client vs. server). You might want to consider it!
We’re embarking on our own data clean up project this year, and we’ll be sharing our progress along the way. We’d also love to tell your stories. Hit us up with how you’ve been able to clean up your data, and we’d love to feature you on the blog!
Happy cleaning. 🚿
— The Segment Team
The State of Personalization 2023
Our annual look at how attitudes, preferences, and experiences with personalization have evolved over the past year.
Get the reportThe State of Personalization 2023
Our annual look at how attitudes, preferences, and experiences with personalization have evolved over the past year.
Get the reportShare article
Recommended articles
How to accelerate time-to-value with a personalized customer onboarding campaign
To help businesses reach time-to-value faster, this blog explores how tools like Twilio Segment can be used to customize onboarding to activate users immediately, optimize engagement with real-time audiences, and utilize NPS for deeper customer insights.
Introducing Segment Community: A central hub to connect, learn, share and innovate
Dive into Segment's vibrant customer community, where you can connect with peers, gain exclusive insights, and elevate your success with expert guidance and resources!
Using ClickHouse to count unique users at scale
By implementing semantic sharding and optimizing filtering and grouping with ClickHouse, we transformed query times from minutes to seconds, ensuring efficient handling of high-volume journeys in production while paving the way for future enhancements.