If you’re an early startup, you’re probably just getting started with your analytics and figuring out how people are using your product. For this purpose, an out-of-the-box tool like Google Analytics, Mixpanel, or Amplitude should be enough to handle your needs. Building out infrastructure to dump your data into AmazonS3, Postgres, or BigQuery may seem like overkill.
But when you take a step back and look one or two years ahead, you’ll discover it will be massively valuable for you to have the optionality to do any analysis you could possibly imagine without needing to create hacky exports from your analytics tools. This requires you to own and manage your own database.
In this lesson, we’ll share why you might want to own your data and your options for data storage.
Now versus later
Let’s start with now. Today, you are probably asking fairly simple questions with your data: Where do users get confused and drop off in my funnel? What’s my overall retention rate? What are my top sources of referral sign ups? These questions are easily answered in most analytics tools, and they’ll get you pretty far. Google Analytics, Mixpanel, and Amplitude let you easily make funnel reports, graphs, and dashboards. And you can make decisions as quickly as setting up the tool. Aside from implementing the tracking, there is no further engineering required.
But at a certain point, it will be hard for you to get to the bottom of a question or an unusual signal in your data. When you start to hit product-market fit and gain momentum, you’ll have more nuanced questions: How many users had multiple referral sources, and how do I weigh the value of each channel? What happens when a user starts their journey on mobile and finishes on web? Unfortunately, it may be difficult or even impossible for some tools to provide that depth and specificity in their reporting. If you don’t control your analytics data, you’ll be left with a choice: answer with the tools you have or don’t answer at all.
It’s at this point where owning your data gives you the leg up. Owning your data is valuable for two reasons:
Storing data in your own data warehouse in a normalized format (not JSON blobs shoved into individual cells) and layering on BI tools or command-line SQL will give you the most flexible access to answer your business questions.
Owning your data in a database you manage also future-proofs your marketing tech stack choices. You can always choose to move to a different tool because you can import your historical data into a new tool.
Tales of data ownership
Here are some stories from other companies that illustrate the importance of owning your own data for these two points.
Case Study: Tradesy
Tradesy is a popular peer-to-peer marketplace for buying and selling luxury clothing and accessories. Tradesy's data engineering team had implemented Google Analytics to understand shopper behavior, hoping to turn browsers into buyers through great app design. Google Analytics helped give insight into customer behavior, but it only illuminated simple aggregated conversion metrics, like for instance, if a customer landed anywhere on the site or app and eventually purchased something. In addition, the team couldn't tie funnel data to actual users since Google Analytics anonymizes every interaction.
In order to improve conversion rates and learn more about their customers, the team needed more information. Why were customers bouncing? What made them stay? Where exactly in the checkout process did they abandon the chase?
Tradesy chose to track some more data points and load their data into Amazon S3 in addition to Google Analytics. Amazon S3 offers a scalable storage platform for raw data that the team uses to power exploratory analysis on the checkout funnel and other user flows. They also use the data to feed machine learning algorithms that provide product recommendations to each customer. Learn more here.
Case Study: LogMeIn
LogMeIn offers widely adopted access management software. As their team was growing, they realized that it was very difficult for different departments to try out new tools for marketing and analytics. They felt locked into their existing vendor set, even though these vendors weren’t scaling for their business. In addition to swapping out old vendors, they needed to add entirely new types of tools to support new business functions.
To give themselves more optionality, the IT team transitioned to owning their own data and adopting a flexible architecture for onboarding historical data into new tools. The company now uses a wide variety of analytics, tracking, and marketing tools. This collection of best of breed tooling empowers LogMeIn employees to analyze the user experience, report on marketing campaigns, message customers, A/B test designs, and more. They even test multiple tools at once to find the best fit. Learn more here.
Ways to own your data
If you’re not owning your data yet, fear not! There are a lot of ways for you to take control over your data destiny.
Google Analytics, a crowd favorite, is great for slicing and dicing your aggregate visitor data. Unfortunately, the free version of Google Analytics does not provide you access to your raw data. Unfiltered, raw analytics data is available on Google Analytics 360 via Google BigQuery, and is quite a bit pricier.
Mixpanel an event-based analytics platform, provides an API to allow you to export your data. Hitting up their raw data export API is totally free, but note that you can only run one export on a project at any given time. Here is their documentation on getting started. Similarly, Amplitude offers an Export API.
Segment — If you’re one of our customers, there are three ways to access your data in near real-time: webhooks, S3, and warehouse integrations. Our webhooks integration forwards your customer data to an endpoint you host. Many of our customers use webhooks to keep a copy of their data and power internal apps. This integration is free, depending on how much data you send us. If you don’t want to deal with setting up storage, you can use our S3 integration to automatically copy your Segment data to your S3 bucket every hour. Lastly, you can check out our warehouses integrations, where we programmatically schematize and load your web and mobile data into your own database. We have connectors for Amazon Redshift, Postgres, Google BigQuery, Snowflake, Microsoft Azure, and more.
The last method would be building the entire customer data pipeline yourself. Here are a few data pipeline management technologies to consider if you’re interested: Kafka, Kinesis Firehose. We won’t dive into the nitty gritty details here, but here are some awesome blog posts to point you in the right direction. We will caution you: only go this route if you have some serious engineering resources for analytics.
Control your fate—and your data
There will come a day when your analytics needs are no longer served adequately without raw, direct access to your customer data with flexible interfaces like SQL. Until then, it’s smart to leverage hosted analytics tools and their reporting as much as possible. However, setting up a quick dump of data to a low-cost data warehouse now will save you tons of time and give you analytics superpowers later.