It’s 2020. Are you hiring Data Scientists or Analysts?
Establishing distinctions between conventional analysts and data scientists at varying degrees of organizational data maturity.
We are asked about this quandary over and over again in our consulting work with all types of organizations. Data is now a fundamental business asset and its requirements have slowly reshaped organizational roles and specializations. Where the act of data collection was once laborious and intentional, advances in software have enabled the measurement and metrification of almost every aspect of business and consumer activity.
Organizations and roles tried their best to adapt, or die.
However, the goalposts for successfully extracting value from data collection, management, and analysis only continued to move further down-field. Traditional analysts often weren’t able to navigate massive, complex data storage systems; software engineers weren’t equipped with the domain expertise to develop robust statistics or business-relevant insights.
In response, data science—and data scientists—emerged as a multidisciplinary bricolage rooted in three core competencies:
- Statistics and analytics. Extracting meaning or value from sets of data.
- Business and domain expertise. Translating between technical realities and business goals and constraints.
- Computation and information systems. Developing software solutions to otherwise intractable problems around the storage and processing of data.
Due to the sweeping breadth of these competencies, the role of a data scientist has tended toward more ambiguity than the traditional analyst or engineer, causing confusion in some of the organizations we’ve encountered. However, in many organizations, the data scientist has adopted a generalist scope with a consultative approach to navigate technical environments and to negotiate evolving business needs.
That’s a good sign for your business and culture. That said, we’ve seen that how an organization manages this shift depends a lot on where it is in its journey to becoming fully data-driven.
In moderately data-mature organizations, the lines between analysts and data scientists are straightforward and skill-based: data scientists perform an analyst’s responsibilities in technically-complex or loosely-defined contexts. As these organizations recognize and realize their new needs, we tend to see data scientists being distinguished from analysts in several ways:
- Projects take weeks to months instead of hours to days.
- Data comes in unstructured, disjointed, or messy formats rather than ready-to-analyze sheets and tables.
- Methods are specific and directed (e.g. context-appropriate statistical tests and techniques) rather than out-of-the-box or plug-and play (e.g. online tools and built-in calculators).
- Datasets are measured in the millions or billions of rows rather than the hundreds or thousands.
- Computer programming is required for deriving insights and value instead of general-purpose software.
Of course, these distinctions vary between organizations, but the common wisdom remains the same: when an analyst can’t do it you should hire a data scientist. Furthermore, whenever we discuss this topic with the human beings actually doing the day-to-day work of data, they are often very open to these distinctions (leaving organizational politics at the door for the moment) and often share united objectives and FAST goals. This results in clear role separations and management strategies—in theory.
In practice, the lines are much blurrier. The processes and personnel supporting data-driven initiatives have grown ever-more specialized. As such, data practitioners operate in many different subdisciplines—processing and pipelining, tracking and instrumentation, machine learning and artificial intelligence, presentation and visualization, to name only a few—and data science has become an ungainly label for such a broad swath of data-oriented business practices.
This lack of clarity has often translated into “special projects” which fizzle out more often than not. Hiring a gaggle of PhDs from academia without giving them a clear mandate has been the folly of many of the organizations we’ve worked with.
So what should you do inside your own organization?
As we’ve seen (and if you look deep inside your own org), the now-clumsy distinction between analysts and data scientists is becoming a point of contention inside mature, data-driven enterprises. Analysts are doing more work that feels like data science and vice versa—these combinations are driving uncomfortable conversations around management structure, career advancement, and compensation.
The reality is that data practitioners aren’t tidily arranged around the three core competencies of data science—they live in a much broader and fuzzier space of intersecting skills and capabilities. And, to make things even more confusing, fully-matured data organizations require baseline data literacy from every employee—rendering moot almost all of the distinctions we listed above.
We believe the new managing principle for data-driven organizations is the spectrum—not binary—between exploratory and confirmatory data questions:
- Confirmatory. Conventional analyst role. Extracting value from challenging, complex, but ultimately well-mapped data sources using common tools and techniques to maintain consistency throughout the business.
- Exploratory. Conventional data scientist role. Discovering value by discovering and defining unknown or unconventional data sources through a technically diverse, machine-augmented toolbox.
Fully-mature, data-driven organizations are less concerned with the measurement and management of metrics—they organize and thrive on the generation and refinement of knowledge. There is still plenty to be done to translate this principle into how roles are defined, titled, and managed, but this approach re-levels the playing field for all data practitioners and ensures they are working on the questions which are right for them and their level of technical ability.
So what do you think? Leave us a note in the comments.