The Hype

big_dataRising out of the Big Data movement and dubbed the Sexiest Job in the Next Decade, Data Scientists are in hot demand.  New online courses and university programs are popping up to remedy the looming skills shortage.  Big Data conferences are filled with professionals wanting to discover the magical path to the next golden professional.  Akin to Big Data, the purpose of Data Science is not well understood.

What is Data Science and why is it relevant?

Is Data Science a new branch of inquiry created by the upsurge of Big Data?  Not really.  Most would argue it is not science in the classic sense at all. Rather it incorporates models and techniques from scientific disciplines including the fields of mathematics, statistics, and computer science with the goal of gaining greater understanding from data.  As data volumes and variety grow, opportunity arises to leverage data assets to make more informed decisions and create competitive advantage.  This requires moving past more rudimentary reporting and trending towards predictive modeling.  However, these models and new tools require greater discipline and care.  Using more powerful tools against larger and more varied data poses a danger – identifying patterns or correlations with no real relationships or predictive power.   Decisions based upon flawed analysis could produce worse results than decisions made using domain experience.  The intent of Data Science is to apply rigor to data analysis to produce the most accurate and meaningful results for data driven decisions.

Data Scientist or Data Science Team?

A Data Scientist can be thought more as a distinct professional specification than a new profession.  Based upon wildly variable descriptions in the industry, a Data Scientist possesses a mash up of deep applied mathematical and scientific skills, statistical acumen, data integration proficiency, technical expertise in a wide variety of Big Data technologies and programming languages, data visualization artistry, and the ability to weave results into a cohesive narrative for management and end users – a scarce and expensive combination of skills.  In a more likely scenario the Data Science function is a team of individuals with distinct skills collaborating to produce useful analytics.

The types of skills for a Data Science team include:

  • Technology Specialist: Skills to manage the infrastructure and software used for Big Data Analysis.
  • Data Integration and Data Management: Skills in data acquisition, data cleansing and transformation, and data management.
  • Advanced Data Analysis: Skills with applying statistical and mathematical modeling to analyze large data sets.  Understands the potential biases or limitations in large data sets and the application of appropriate models and interpretation of results.  This could be a new role for many organizations.
  • Data Presentation and Communication: Skills to create  easy to understand graphical presentations of complex or novel relationships, providing context and meaning.  Goes beyond traditional reporting or dash boards.  Telling a story with the data – specifically a contextually accurate story.

The above are generalized categories, which will potentially overlap functional responsibilities.

Conclusion

Big Data technologies provide a means to perform deep analysis on an increasing scale but do not exempt analysts from understanding the limitations of statistical models and responsibility for evaluating data for bias and quality issues.  This is where individuals with statistical modeling experience might be added to augment existing Business Intelligence capabilities.  Whether choosing to embark on a Big Data initiative or build a Data Science team, the key criteria remain the same – what is the desired outcome and value to the organization?  Lead with the business case and evaluate the tools, skills, and processes for fit.

Tim Eck

About Tim Eck