Fuzzy Math: Silicon Valley’s Diversity Statistics

Many articles have been written about a diversity crisis in Silicon Valley, perhaps due to tech firms’ initial resistance to releasing employee demographics.  Enter Tracey Chou, an early Pinterest engineer, who launched a (still running) Google spreadsheet of the number of female hires.  Around the same time, several of the largest tech companies released diversity statistics from their Department of Labor’s EEO-1 reports, with several publications displaying side-by-side comparisons.  But if a picture (or chart) is worth a thousand words, then it’s important to understand what it does and doesn’t say.  In this post, I’ll focus on three ways employers can shift diversity stats from simply being self-congratulatory, to meaningful for potential employees deciding between firms.

The Atlantic points out:

It has become a grand gesture in tech this summer for big companies to release demographic data about their workforces…that formula has now become the de facto way to share (and apologize for) diversity data in Silicon Valley.  It goes something like this:

  1. Write a blog post about the importance of transparency, acknowledging how your company has a long way to go and outlining a few diversity-related initiative
  2. Include a sleek graph showing how few women and minorities you employ
  3. When asked to talk about the issue, decline interview requests and redirect people back to the original blog post

Diversity disclosure can help employees to pick the firm that is best for them, as disclosure is generally important.  But there’s a reason I say can, not will: data without context doesn’t mean very much.  Data’s potential power is in confirming individual “anecdata” as a trend.

Specifically, employees would benefit if employers provided baseline numbers, standardized and disaggregated their data.

Solution 1. Provide baseline numbers

There are two kinds of baseline numbers I’ll discuss here: absolute numbers and starting points.  Both of these help contextualize diversity data: absolute numbers account for a company’s size, while starting points account for trends.

Without absolute numbers, a percentage increase in the number of underrepresented individuals hired may be (inadvertently or otherwise) misleading.  This problem may be exacerbated at early-stage startups.  Specifically, a smaller company will have higher increases and overall proportions of diverse candidates with the same absolute change.  For example, if a 20-person startup has 5% of underrepresented employees, that means only one person in the office is a minority.  If a prospective employee of this company sees only the rate without the total number of employees, he or she may think that there are more underrepresented employees in the office than there actually are.

Similarly, knowing starting points is also important.  Earlier this year, Intel announced it had increased its proportion of female employees by 5.4%.  While this increase in diversity is commendable, Wired points out that even with that increase, the percentage of female employees at Intel remains below the industry average.

Companies should include the size of their company (or the total number of employees) as easily accessible information next to their diversity statistics.  If there is a diversity pie chart, each segment of the pie could show the number of people affected.  Absolute, proportional and starting numbers all matter, and can work together to provide accurate diversity figures.

Solution 2. Standardize data

While the EEO-1 form requires companies to tabulate diversity statistics by job type, the technical versus non-technical classification standards are outdated given new developments in software.  For instance, it’s not clear whether product management – one of the most popular jobs in tech – belongs in the technical or non-technical category.

In response, tech companies have adopted nonstandard reporting methods to reflect aspects of jobs that they think are important.  For example, Apple, instead of using the Department’s categories, divides its data into tech, non-tech, leadership, retail and retail leadership (given its abundance of retail stores).  Another startup, Slack, counts the percentage of employees with a female manager.

However, these metrics may not be all that useful – while they might theoretically provide more information overall, because not every company uses the same metrics,  a prospective employee will find it difficult to compare different companies on the basis of diversity.

As major tech players collectively share their statistics, the next step is for them to also collectively decide what to share and how to share it in a standardized fashion.  That might include job classifications, both in terms of necessary skills and rank.  Companies and the EEOC could even base these classifications off of the Department of Labor’s long-running O*NET survey.

Solution 3. Disaggregate data

Finally, the way that diversity data is presented – in aggregate – may conceal interactions between multiple diversity factors.  Specifically, each company release I’ve seen displays charts based on race or gender – but this EEO-1 data could be presented differently.

As a result, this visualization that allows you to break down by race and gender yields more pessimistic results than what companies report. For example, even if women and Asians comprise approximately 30% and 23% of seven large tech companies, Asian women comprise only 6% of the total.  In light of last year’s well-publicized survey showing how women of color have particularly negative experiences in STEM, it’s important to consider how gender and race might together relate to the tech industry.

That might start by disaggregating statistics to examine potential race and gender differences.

Similarly, we could disaggregate diversity statistics by job characteristics.  Slack, for example, has shared information on engineering versus non-engineering roles.  Coupled with standardization, as suggested by Solution 2, this disaggregation could show us who is more likely to sort into more and less technical jobs, what companies they are going to, and in turn, where to focus diversity efforts.

In contrast, without disaggregation, we are currently stuck with speculation for why gender and race gaps in hiring exist.  For instance, one theory is that “women choose to go into more soft-skills positions in tech (and therefore not engineering” because women are generally friendlier.  Without data actually showing these preferences, that theory is simply a stereotype.

Where do we go?

The easy answer is to say more disclosure.  That’s certainly a good thing – some suggestions include integrating diversity disclosure with pay information or reporting data on promotions, applicants and undergraduate interns.

But how data is presented is currently up to the company.  Without clear data through providing baseline numbers, standardization and aggregation, it’s up to employees to carefully make comparisons – and determine how they might relate to day-to-day experience.