Gender Bias 102 For Mathematicians: Merit

A long time ago, I promised a follow-up to my Gender Bias 101 post. One thing I’ve found out the hard way is that I can’t promise to post anything here on a regular schedule, or according to deadlines. Paid work takes precedence, as does vacation time and my other interests – that’s one problem. The other one is that I don’t really have much to say about gender that’s not complicated. That’s why, instead of one follow-up, you’ll get several “Gender Bias 102″ posts on different topics. This is the first one. The rest will follow… oh, whenever I get around to it. I did mention a paid job that takes precedence.

I’ve said already that this is complicated. That’s my main point here. There’s no such thing as a complete explanation of sexism that will fit in a single post. You shouldn’t assume that you can learn everything you need to know from me, either. There’s a lot of women out there, with different experiences, and none of us have all the facts or answers. What I’m aiming for is this. When the subject of gender bias comes up, well-meaning colleagues like to offer one-sentence explanations and simple solutions, for instance (today’s example) that we should “just” evaluate everyone based on merit and not gender. I’ll try to give you reasons to stop and think about it twice. Once you do that, it’s not hard to find further reading, should you be so inclined.

Deal? OK, let’s get started.

MYTH: We should just evaluate everyone based on objective merit, regardless of gender, race, or other similar considerations.

FACT: Wouldn’t it be nice if we could actually do that. Unfortunately, it’s much easier said than done.

First, we do not evaluate people or their work objectively, even when we think we are doing just that. Gender is a known risk factor. I cited this Yale study last time, and an older one with similar conclusions can be found here (PDF):


In the present study, both male and female academicians were significantly more likely to hire a potential male colleague than an equally qualified potential female colleague. Furthermore, both male and female participants were more likely to positively evaluate the research, teaching, and service contributions of a male job applicant than a female job applicant with an identical record. These results are consistent with previous research that has shown that department heads were significantly more likely to indicate that they would hire female candidates at the assistant professor level and male candidates with identical records at the associate professor level (Fidell, 1970).

Incidentally, if you believe you have no gender bias, then statistically you are in fact more likely to be biased. That’s not self-help mumbo-jumbo, that’s Nate Silver.

Race is also a known factor. Here’s an account of what happened when a black unemployed woman pretended to be white, and yes, studies have shown that job applicants with black-sounding names are indeed less likely to get callbacks than those whose names sound neutral.

If you look around, you should be able to find much more. One well known case in point involves blind orchestra auditions: female musicians auditioning for orchestras were judged to play much better when they played behind a screen concealing their identities. Yet, while blind auditions ushered a good deal of progress, they did not “solve” the problem entirely as biased attitudes persisted elsewhere in the system. See here for a particularly notorious case, and if you have more time, then this thesis has extensive analysis and documentation. It should be required reading for those suggesting, for instance, blind peer review in science journals.

Some of this can be attributed to outright conscious preferences. There’s no shortage of folks believing that women and/or blacks are in fact worse employees. While I’d rather they didn’t, the problem does not end with them, extending instead to the subconscious thought processes of genuinely well-meaning people. To wit: how about the ease of pronunciation of your last name? Would anyone really believe that a John Smith should be a better employee than a Gareth Vorkosigan? Take a look at this fascinating article on “disfluency”:


When you meet someone for the first time, if they’re a stranger, does the ease with which you can pronounce their name influence how you feel about them? We’ve shown, for example, in some studies that if you look at lawyers who join law firms they tend to ascend up the legal hierarchy much more quickly when their names are easy to pronounce or process. That’s independent of a whole lot of other factors, like how foreign the name is. Even if you look at American male names, you’ll find that the ones that are easier to pronounce, there tends to be a relationship where they tend to progress up the hierarchy more quickly. […]

We’ve also looked at the effect of this fluency phenomenon on stocks in the stock market, and we’ve shown, for example, that when a stock first comes out on the market, people don’t really know what to make of that stock. But if you look at the performance of stock over the first day or week after it’s come out on the market, you can predict its performance pretty well by looking at how easy it is to pronounce its name. And, again, that’s controlling for all sorts of other factors like which industry the stock is from, the size of the company.

(See also Kahneman’s System 1 and System 2 in “Thinking, Fast And Slow”.)

We may be unaware of our cognitive biases and distortions, or unwilling to acknowledge them, but the science is clear. Those “objective” judgements we make, the obvious ones, those we’re so sure of that we don’t even need to stop to think about them? They’re not objective at all. Reliable, trustworthy evaluation of professional merit is hard work. It requires formal procedures with explicit criteria, effective cross-checks, and conscious effort to remove known and suspected cognitive biases from the process.

Where such procedures are followed, women indeed tend to be treated more fairly. This has been shown many times, including at my own university. But you don’t just have to do it for the women. Read that article about disfluency again:


[I]f you present the questions in a font that’s a little bit more difficult to read, we found that you can increase their accuracy pretty dramatically. They make fewer of those intuitive responses. They take the time to reconsider their initial responses. They assume that the task is more difficult. They have a bit less confidence in their initial response, and so they tend to do a little bit better at the task. The same is true when you ask them to complete syllogism questions, logical syllogisms, any questions that ask you to think more deeply about a particular topic, where thinking more deeply will lead you to the right answer more often. We’ve shown that with disfluency people are more likely to do that.

In mathematics, “male” is still the default, “female” the exception. Is it possible that men are getting the benefits of fluent thinking, but also falling into the traps associated with it? Could they for instance be more prone to having their work evaluated based on their institutional affiliation or their advisor’s credentials instead of actual merit, by the same mechanism of unexamined initial responses that’s also keeping women back?

Second, what is “merit” anyway? The studies linked above didn’t need to define it, as the researchers were evaluating the same application materials with only the names changed. Yet, “merit” is a composite of many different criteria that can be weighed in different ways with different outcomes. In evaluating just a single paper or grant proposal, different referees or editors can come to different conclusions regarding its quality and significance. When judging publication records of individuals, we must weigh the number of publications against their quality and “impact,” if we can even agree what that means. In hiring and promotion situations, evaluations are based on teaching, research and service, often with “leadership,” “collegiality” or “fit” thrown in the mix for good measure. In college admissions and scholarship decisions, we use “academic excellence,” “research ability,” and “leadership,” in various proportions.

These all look like gender-neutral criteria, or race-neutral for that matter. Unfortunately, history says otherwise, at least with regard to the latter:


By the 1920s, the drawbacks of academic merit as a key admissions criterion were evident: a small but growing number of Jewish students from Eastern European backgrounds were achieving their way into elite colleges. The ” ‘Jewish problem ‘ ” – or the ” ‘Hebrew invasion,’ ” as some at Yale called it – gravely concerned administrators, who believed deeply in Anglo-Saxon superiority and the “preservation of Anglo-Saxon dominance” at their institutions. Viewing Jews as an ” ‘alien and unwashed element’ ” and fearing that Jewish students would drive away their Gentile alumni donors and their children, they institutionalized anti-Semitism by imposing quotas on the percentage of Jewish students who could be admitted.

To ensure control over the process, each school redefined “merit” and designed a more subjective admissions process to produce the desired outcome. Lengthy background questionnaires, letters of recommendation, applicant photographs, emphasis on non-academic achievement and face-to-face interviews allowed the Big Three to “screen out ‘undesirables.’ ” Indeed, the country’s first admissions offices were established “as a direct response to the ‘Jewish problem,’ ” Karabel writes. Newly emphasized were extra-curricular activities and that intangible known as “character” of the “sturdy, all-round boy,” which, presumably, Protestant boys possessed and Jewish boys did not. The adoption of the “character standard,” one Harvard administrator admitted in 1922, “would suffice to prevent a ‘Jewish inundation.’ “

I don’t want to get started on “quota” and affirmative action right now – I’m saving that for later. The point that’s relevant here is that notions of merit that appear objective, colour-blind and gender-blind on paper may well work very differently in actual practice.

It doesn’t have to be so blatantly intentional, either. It’s quite enough when the gatekeepers and the decision-makers start defining merit in terms of what they see when they look in the mirror. We think back on what contributed to our own successes and look for the same qualities in others. It’s the “obvious” and “natural” thing to do, if you don’t think about it too much. It’s also deeply unfair to those who come from backgrounds different from ours. Ultimately, it’s misguided, not even so much for reasons of fairness or diversity, but because conformity is a very small measure of success.

I’ve written before about the “extracurricular activities” and how the rich have a leg up in that regard. Instead of repeating myself on that, I want to mention something else I’m worried about.

Every now and then, an argument is made somewhere that internet participation (Math Overflow, blogs, discussion boards, comment sections, online collaborative projects) should count towards merit indicators for purposes such as hiring, promotion and pay increases. It’s a completely reasonable notion, given that such activity can greatly benefit the community. Also every now and then, someone asks why there are so few women on Math Overflow or on Stack Exchange, or at least so few women posting under female names, real or pseudonymous. There’s no single answer to that, but one issue that women themselves (myself included) have suggested is their actual negative experience with many internet discussion sites and comment sections. (I won’t do that detour now. I’ve written several posts on the subject already, and you should also read the other answers on the pages I linked.)

How about we connect the two? Will “internet participation” be to women in mathematics as “leadership” was to Jewish boys applying to Princeton? It doesn’t matter why women participate less; the fact is that it happens and that it will affect us.

I’ll leave you with one more thing to consider. Our society, and perhaps especially academia, embraces meritocracy. We take it as self-evident that meritocracy is the best and most fair system possible ever. But what if it weren’t true? That’s the argument that Chris Hayes makes in Twilight of the Elites:


A pure functioning meritocracy would produce a society with growing inequality, but that inequality would come along with a correlated increase in social mobility. As the educational system and business world got better and better at finding inherent merit wherever it lay, you would see the bright kids of the poor boosted to the upper echelons of society, with the untalented progeny of the best and brightest relegated to the bottom of the social pyramid where they belong.

But the Iron Law of Meritocracy makes a different prediction: that societies ordered around the meritocratic ideal will produce inequality without the attendant mobility. Indeed, over time, a society will become more unequal and less mobile as those who ascend its heights create means of preserving and defending their privilege and find ways to pass it on across generations. And this, as it turns out, is a pretty spot-on description of the trajectory of the American economy since the mid-1970s.

Read the whole book if you can. It’s short and well worth your time. Hayes writes about the political, financial and academic elites in general, but I also found myself thinking of, for example, some of the larger research groups in my own department, or of Ivy League schools compared to state universities, or of how some areas of mathematics are, supposedly, intrinsically more worthy than others.

But that’s a list that I’m sure you can continue on your own.

About these ads

Comments Off

Filed under academia, feminism, women in math

Comments are closed.