At the beginning of the summer, the MindTouch engineering team decided to find a way to provide our enterprise customers with the more detailed data our system collects on their customers. We thought this raw data, which we use to determine usage and other insights for our own business, could also be valuable to our customers in an analytics-ready fashion. Getting user data to our customers could also increase the value of our service, and strengthen our relationships with them.
So we started architecting the system by identifying the high-level properties that we wanted it to have, and eventually we settled on three main goals:
A highly reliable and durable archiving mechanism
Clean data we can rely on for further processing
Query access to the data
With those goals in mind, we identified which parts of our current architecture and infrastructure had to change, be replaced, or added. We then divided the project into four main categories: data capture, data archiving, data cleanse, and data analysis.
For data capture, we had to make sure that as MindTouch grew, the system would continue to keep up with demand. Therefore, we architected the system to use a high-capacity data transfer mechanism and high throughput techniques to make sure we could support projected user traffic growth.
For data archiving, we built on our experiences of leveraging a highly durable and reliable object storage system, to make sure no data is ever lost.
For data cleanse, we explored several options and finally set on leveraging a column-based data store to perform data de-duplication.
For data analysis, we also leveraged the data store we used for data cleanse, to provide analytics access to our systems.
Each of those areas presented challenges, yet our amazing MindTouch engineering team came up with clever and innovative solutions that allowed us to move forward, complete the project, and deliver on the goal of empowering our customers to understand their users even more.
We are really excited to share our experience, so in the next blog post, we will dive in and share the details of each of the phases of the data’s journey.