My colleagues and I were recently presented with an interesting challenge while building out a business intelligence solution for a mid-size client. The client wanted to archive, for potential future use, a large amount of data over and above the current reporting requirements. Unfortunately, the client’s proprietary database system was primarily designed for data analysis, not storage, and could not be leveraged as a solution. To identify a solution, our collaboration focused on evaluating options made possible through significant changes in the data analytics field. As we discussed the new technologies and methodologies, I found myself drawing parallels to how Apple Macintosh, in 1984, brought computing power from the mainframe to the masses. Similarly, data analytics, once the realm of large corporations, is now within reach of small- to medium-sized businesses.
Previously, extensive data analytics capabilities existed only in the domain of large enterprises. The work was done on expensive, proprietary database management systems running on costly hardware solutions. Operating these systems required specialized skills and training; the availability of which often did not exist outside of bigger companies. Furthermore, necessary data storage came at a premium and the price of required computing power was prohibitive. Thus, as with mainframe computers, high barriers to entry existed. These barriers virtually ensured that only large corporations could handle significant data analysis beyond small, rudimentary work.
Suddenly, as with the release of the Macintosh and other GUI-based PC’s it spawned, things began to change. Storage became cheaper, allowing a wider variety and quantity of data to be stored at a lower cost. The introduction of multi-core processors improved the economies of computing power. The “Cloud” allowed a sliding scale for investment, as payments for computing utilities are on an “as needed” basis. Open source solutions for data management and analysis became more prevalent.
User communities sprang up around these technologies. Libraries of published algorithms appeared allowing users to share code and best practices more easily. “Massively Open Online Courses” (MOOC), like EdX, Coursera, and Udemy, allowed training and education to become more readily available and at lower cost (sometimes free) to smaller organizations. Users could download open source tools and example code from vendors. More recently, governments, trade groups, research institutions, private industry, and even social networks have published datasets or APIs in hope of solving key problems faster.
Significant opportunities and advantages exist for both individual users and organizations. First, interested and industrious parties can now enhance their data analytics skills and abilities via the multitude of low cost training and online communities. Second, there is tremendous potential value-add to companies of all sizes, not just large organizations. From non-profits to large organizations, they now have access to a variety of solutions that could be implemented at low initial cost. In addition, they have the ability to ramp up their chosen solution as business value gets realized over time. In fact, we proposed to our client a leading open source data storage solution in the Big Data space as a solution to their data archiving needs. The low initial cost and scalability of that solution attracted them to the idea and ultimately sold them on the proposition.
The Mac was once advertised as the “computer for the rest of us”. It made available significant computing power without the need for large investment, high ongoing costs, and expensive, specialized training. Today, data analytics has followed a similar path. Open source applications, trends in computing and data storage, and wide dissemination of training and collaboration have made powerful analytics capabilities accessible without huge cash outlays and difficult, costly education being required.