women’s style recommendation with artificial intelligence (part #2)

In “women’s style recommendation with artificial intelligence (part #1)”, I introduced my work toward developing artificial intelligence (AI) for fashion and style recommendation. Essentially, its an expert system built on a Bayesian belief network. Now I discuss model validation and next steps in the design iteration process.

I first wanted to see if the trained network correctly returned known recommendations (“wear” or “don’t wear”) based on known clothing selections. This procedure successfully validated the code I wrote. Then I wanted to see if the model can derive new style rules. Experienced partial success on this account; I will outline a possible strategy for improving it.

The rest of this article details the processes summarized in the previous paragraph:

Consider the following trained Bayesian belief network structure:

While calculating the structure, the learning algorithm also calculated the node value probability distributions from the training set:

We first evaluate the model on three fashion rules, asking whether the selected node combination’s values are okay to wear:

  • IF body shape = “apple” AND skirt zipper = “on front” THEN wear = “No!” [1]
  • IF body shape = “apple” AND skirt zipper = “on side” THEN wear = “Yes” [1]
  • IF shoes = “flip-flops” THEN wear = “No!” [2]

(I trained the model upon 126 such rules simultaneously).

Running the inference code:

All looks good. As a control, I added “shoe = pumps” (instead of flip-flops) to the above calculation, and see that these are okay to wear as expected. (However, see the discussion below where I ran into trouble).

So now I start to derive novel new style rules from the model. Suppose we want to simply find out if it is okay to “wear” an “apple” body shape. We expect the model to report “yes”, as it does, assigning a probability to the conclusion:

However, the model cannot handle the addition of a shoe type to the “apple” body shape query above:

The problem is “fixed” when I add a style rule specifically allowing apple-shaped folks to wear pumps, but I am not happy with this. Ideal outcome would be for the inference to conclude this. I’m first going to check the dependencies encoding… which, if that solves the problem, stresses the importance of specifying dependencies well in additional to lateral relationships. For example, I might establish a “human” node, and indicate that each clothing article and feature proves appropriate for humans to wear. Then I’ll declare that each body shape associates with “human = true”.

Nonetheless, the progress reported here is significant!

I’ll keep you posted.

– Emily

References

women’s style recommendation with artificial intelligence (part #1)

Introduction

We know several basic style “rules” (ha!) based on body shape:

  • Skirts:
    • “Apple” Body Shape:
      • IF body shape is apple AND skirt has front zipper THEN don’t wear
      • IF body shape is apple AND skirt has side zipper THEN wear
      • IF body shape is apple AND skirt has no zipper THEN wear
    • “Rectangular” Body Shape:
      • IF body shape is rectangle AND skirt has front zipper THEN wear
      • IF body shape is rectangle AND skirt has front zipper THEN wear
      • IF body shape is rectangle AND skirt has front zipper THEN wear
      • IF body shape is rectangle AND skirt is A-line THEN wear
  • Pants:
    • “Apple” Body Shape:
      • IF body shape is apple AND jeans have flare THEN wear
      • IF body shape is apple AND jeans have pleats THEN don’t wear
      • IF body shape is apple AND jeans have stretch THEN wear
      • IF body shape is apple AND trousers have flare THEN wear
      • IF body shape is apple AND trousers have pleats THEN don’t wear
      • IF body shape is apple AND trousers have stretch THEN wear
    • “Rectangle” Body Shape:
      • IF body shape is rectangle AND jeans have flare THEN wear
      • IF body shape is rectangle AND jeans have pleats THEN wear
      • IF body shape is rectangle AND jeans have stretch THEN wear
      • IF body shape is rectangle AND trousers have flare THEN wear
      • IF body shape is rectangle AND trousers have pleats THEN wear
      • IF body shape is rectangle AND trousers have stretch THEN wear

We want to create an artificially intelligent system to probabilistically decide, given a query such as “I have an ‘apple’ body shape and am thinking of wearing a skirt with a zipper in front. Should I?”. To accomplish this we use these rules to train a Bayesian network, and then use the network to make inferences upon queries such as the one given above.

Training the Network

From these we derive the 13 nodes of our Bayesian network:

Node
apple
jeans.with.flare
jeans.with.pleats
jeans.with.stretch
rectangle
skirt.with.a.line
skirt.with.front.zipper
skirt.with.no.zipper
skirt.with.side.zipper
trousers.with.flare
trousers.with.pleats
trousers.with.stretch
wear

We use the rules and the nodes to produce an automatically generated graph. Put to help it along, we will apply some expert knowledge and specify some

We seed the model structure identification algorithm with some basic expert knowledge by manually specifying the following 12 causal relationships:

From To
rectangle wear
apple wear
skirt.with.front.zipper wear
skirt.with.side.zipper wear
skirt.with.no.zipper wear
skirt.with.a.line wear
jeans.with.flare wear
jeans.with.stretch wear
jeans.with.pleats wear
trousers.with.flare wear
trousers.with.stretch wear
trousers.with.pleats wear

(We will see later that the automated graph structure learning procedure adds one more edge).

We save these relationships in “output/style_edges.csv” for later import using R.

We then encode the rules in dictionaries/hashes for items co-joint in a rule. For example, we express the skirt-related rules pertaining to apple-shaped bodies in JSON as:

    {
        "wear": "Yes",
        "apple": "1",
        "skirt.with.no.zipper": "1"
    },
    {
        "wear": "Yes",
        "apple": "1",
        "skirt.with.side.zipper": "1",
    },
    {
        "wear": "No",
        "apple": "1",
        "skirt.with.front.zipper": "1",
    }

For each entry, we zero out all other nodes (expect for “wear”, which is set to “No”), and express all 19 rules as a data frame, where the index order corresponds to the node order displayed above:

0,0,0,0,1,0,0,1,0,0,0,0,Yes
0,0,0,0,1,0,1,0,0,0,0,0,Yes
0,0,0,0,1,0,0,0,1,0,0,0,Yes
0,0,0,0,1,1,0,0,0,0,0,0,Yes
0,1,0,0,1,0,0,0,0,0,0,0,Yes
0,0,0,1,1,0,0,0,0,0,0,0,Yes
0,0,1,0,1,0,0,0,0,0,0,0,Yes
0,0,0,0,1,0,0,0,0,1,0,0,Yes
0,0,0,0,1,0,0,0,0,0,0,1,Yes
0,0,0,0,1,0,0,0,0,0,1,0,Yes
1,0,0,0,0,0,0,1,0,0,0,0,Yes
1,0,0,0,0,0,0,0,1,0,0,0,Yes
1,0,0,0,0,0,1,0,0,0,0,0,No
1,1,0,0,0,0,0,0,0,0,0,0,Yes
1,0,0,1,0,0,0,0,0,0,0,0,Yes
1,0,1,0,0,0,0,0,0,0,0,0,No
1,0,0,0,0,0,0,0,0,1,0,0,Yes
1,0,0,0,0,0,0,0,0,0,0,1,Yes
1,0,0,0,0,0,0,0,0,0,1,0,No

We save this data frame as “output/style_rules.csv” for later import by R.

In R, we load the necessary libraries and the CSV files. We also ensure everything is a factor in the rules data frame:

We look at the expert-specified edges, noting the existence of 12 relationships. After running the hill climbing algorithm to derive the network structure from the prior-specified edges and the rules, we notice that now 13 edges are present:

Here is the added edge:

From To
apple rectangle

We derive the model’s parameters from the training data, and then compile it for use in inference.

Results

Suppose we have an “apple” body shape, and want to choose a skirt using this model. We try the following skirt types against the apple body shape to infer whether or not to wear a particular skirt:

The first result in the image above resoundingly rejects wearing a skirt having a front zipper when one carries and apple-shaped body. By contrast, the second result approves of skirts having side zippers for apple-shaped folks. Both results concord with the IF-THEN-ELSE rules initially specified. The third result proves interesting—we did not provide a rule for apple-shaped bodies and A-line skirts, so the model provides no conclusion.

We observe similar results for trousers: The first two outcomes match the rules, but the third provides no decision because we provided no information about whether flare and stretch may be used together in a pair of trousers for apple-shaped bodies, or for any body shape for that matter!

Issues to Resolve

As indicated in the last paragraph, in practice a pair of trousers may have both flare and the ability to stretch. Each of these traits alone proves great for apple-shaped individuals. So together I manually infer that the two together are at least okay and may be even preferable. However, the model does not derive such a conclusion. In other words, we need to add rules saying these two traits may coexist.

Also, this effort took a lot of manual “expert” specification of the initial “seed” graph structure. Ideally one would learn the final structure purely from rules. My thinking is that the rule data frame is rather sparse, making it hard to learn the structure in an automated fashion. On the other hand, I may not have chosen the best learning algorithm.

Stay tuned…

– Emily

Update 16 April 2018

I’m onto the next iteration of the model design. A visual of results so far:

tracking my gender transition through computational linguistics and machine learning

I wrote 299 blog posts in the last decade, roughly half on badassdatascience.com and half on genderpunk360.com. Produced most of the Badass Data Science content while publicly expressing as a man, and most of the Gender Punk 360 content as a woman. Some articles appear on both blogs—for example this one—and in the analysis described below I account for such duplication.

My speech therapist observed that I successfully employ feminine language in my recent video “radical forgiveness”. This led me to thinking: Has the language I use in my prose evolved as I blossomed into femininity? I detail my attempt to answer this question using mathematical analysis below.

Two Caveats

I make two major assumptions in this analysis, assumptions I will address in future work:

First, I assume my writing skill remained constant throughout the last ten years. Not a great assumption in the long haul but necessary to simplify the math for this “back of the envelope” analysis.

Second, the two blogs cover different subjects, and the first one even contains source code on occasion. This may distort the clustering process described below. Again, ignoring this concern proves acceptable for this “quick-and-dirty” calculation to enable exploration of the problem domain.



Method

I download each of my blog posts and then calculated the part of speech (POS) for each word in the post. After that I computed the frequency distribution of the POSs. I then performed hierarchical clustering using a similarity matrix defined by the dot product of each pair of posts’ POS use frequency distribution vectors. The resulting dendrogram looks like:

I recommend downloading the image to view it at full size.

Each vertical line represents a blog post, and the trees linking the vertical lines indicate the degree of similarity between any two blog posts. For example, in the above image, the cyan and magenta colored posts prove similar but the green and black posts diverge significantly in terms of their POS use frequency distributions. The asterisks indicate posts created after I started expressing publicly as a woman full-time. The colors divide the tree into sections that group similar blog posts. Please note that I chose the grouping threshold manually (but rationally).

Results

By visually inspecting the density of these asterisks for the different color groups we derive an indication of how “feminine” or how “masculine” we might regard each group of blog posts. For example, we see sparse femininity in the green, yellow, and black groups; while we see enriched femininity in the cyan and purple group. The algorithm clearly found little distinction between the posts within the large red group, but even there we visually recognize sections of diminished femininity and sections of enhanced femininity.

So a linguistical difference between my pre- and post-transition writing appears to exist. But is it real? Can we conclude that my prose grew more feminine after my public transition? Not so fast! We must build a model that includes time as a variable to cancel out possible influence of improvement in my writing skill, and then test that model for significance. I’ll save this work for a later date.

artificial intelligence in fashion (part two: a first step)

In my recent post, “artificial intelligence in fashion (part one: brainstorming)“, I produced a list of big ideas on how machine learning and artificial intelligence may be applied to the fashion industry. I addressed sizing, marketing, and design activities when brainstorming this list.

This post doesn’t specifically cover an artificial intelligence solution, but it lays groundwork that I need in place to get to an AI-based style recommendation engine based on body shapes that I’d like to build. Essentially, most fashion dictums take the form of IF-THEN-ELSE rules, where the IF clause generally starts with specifying one’s body shape.



So I needed a way for many individuals at once to determine their body shape, which led to creation of a web-based body shape calculator, pictured below. Several of these already exist, but I really needed my own for my AI project for the following reasons:

  • I can include this work into larger AI software pipelines.
    • Cannot easily include others’ tools, by comparison.
  • I understand the computational method behind what I’m offering.
    • Others’ tools are black boxes.
  • The computation method I used comes from academic literature, so it is peer-reviewed.
  • I can show ads to users to generate some cash flow.

Here is a picture of the web-application I created for this task. Click here to use the application!

artificial intelligence in fashion (part one: brainstorming)

Brainstorming as usual:

  1. Fashion dictums involve many IF-THEN-ELSE rules. One can convert this into a decision engine (inference engine).
  2. User specifies their body shape, and a recommendation engine selects suitable clothing for them, taking into account the user’s tastes.
  3. Upload an image of a dress you want to buy, and specify the dress’s given size. At the same time, upload your measurements. The algorithm then tells you the likelihood of fit.
  4. Upload your measurements. The algorithm searches for clothes that fit well.
  5. Upload your measurements. The algorithm searches for clothes that flatter your body shape.
  6. User submits 10+ images of dresses they like, with the option to add more. Moreover, they submit their measurements. The algorithm then designs dresses for them.
  7. Automate difficult design tasks. My model here is the AI drummer in GarageBand which provides very sophisticated beats, and which I use in all my songs.
  8. Enhance design. Algorithms can produce combinations that have not been thought of before. Here I envision designer as “pilot” and algorithm as “vehicle”.
  9. Create fiber optic dresses that light up responsively to movement, such that the changes in lighting accentuate curves.



Collaborate!

If you would like to collaborate with me to these or similar ideas happen, I’m an extremely experienced data scientist and would love to work with you!  Please contact me through Facebook if you are interested.

AI-Driven Fashion Show

Holding a fashion show for AI-created styles sounds fabulous!

Next Steps

See what tools exist already. See what APIs exist. Determine if measurement statistics are known. Investigate the Computer Science and Home Economics academic literature.

What data is out there?

See Also

body shape calculator

hypothesis #1

Hypothesis: Women generally excel at mindfulness over men because living in a patriarchy forces us to.

I can envision an experiment to test the first part of this hypothesis:  Put statistically representative samples of men and women through a battery of psychological tests to measure mindfulness, and then compare the sample medians.

However, establishing the proposed causality would prove tremendously difficult.

estrogen deficit disorder

Potential correlation: I’ve recently upped my estrogen dose, and have recently been happier than I’ve been at anytime in the last two years. What if the two are related? What if my brain expects a certain baseline level of estrogen to function best that it never received until now?

There is evidence that hormone administration improves psychological functioning in transgender people (see my post “the science of gender identity (part 3: psychology)” for a discussion of this evidence.

Perhaps I finally hit a (psychiatrically) clinical dose.

the science of gender identity (part 1: genetics)

This is the first in a multi-part series surveying the current science of gender identity, particularly with regard to the transgendered population. I intend to discuss the genetic, brain anatomic, and neuropsychological findings of recent studies on the matter. As always, I will incorporate my own statistical analysis of raw study data wherever possible.

Here I start by discussing four studies involving genetic variations thought to be correlated with transsexualism. Some of these studies show promising leads toward increasing our understanding, others report limited or no findings. Limited or no findings does not imply that no genetic factors relate to transsexualism, just that none were found for the particular gene variant examined by the study.

My only beef with these studies is that they consider only one or a few genetic variations at a time. This is a limitation of the technology used. As the cost of whole-genome sequencing decreases, we’ll be able to look for simultaneous genetic variations that play a role in concert with each other.

Code and data for the analyses presented below is attached.

A Bit About the Words I’m Using

Two words I use in this post bother me, so I thought I’d explain my choice to use them.

First, I’d prefer to use the umbrella term “transgender” to label the study participants described below. However, “transgender” is too broad, as the research I describe focused on those who particularly modify their bodies to become a member of a different sex, which not all transgendered individuals want to do. Therefore I use the medical term for this population: “transsexuals”.

Second, “nucleotide variation”, which I associate below through analysis with transsexualism, implies there is a “normal” non-variation. The word is used to indicate that the particular DNA sequence involved is not present in most individuals’ genome. More common DNA variations are those that result in blue eyes vs. the more frequent brown, and certainly nothing is pathological about have blue eyes. In the same vein, I assert that nothing is pathological about transsexualism; its hypothesized genetic component is simply part of our genetic diversity.

Gene Promoter Variation rs549669867

A nucleotide variation (rs549669867) in the promoter for the gene CYP17A1 associates with female-to-male transsexualism according to a study outlined in [1]. CYP17A1 is a key gene involved in steroid metabolism, and this particular variation causes carriers to possess higher concentrations of both testosterone and estrodiol in their bodies [1]. These findings are consistent with a prevailing theory that extra testosterone causes masculinization of the female brain during fetal development, thereby contributing to development of gender dysphoria.

Here I present independent statistical reasoning based on data obtained from the study paper, which supports the researchers’ conclusions. These conclusions do not fully explain the origins of female-to-male transsexualism, as there were non-transsexuals included in the study who had the nucleotide variation, and there were transsexuals in the study who did not. However, the difference in frequencies of the variation’s occurrence between the transsexual and non-transsexual study participant groups is statistically significant.

First I’ll discuss the nucleotide variation itself. The following screenshot from the UCSC Genome Browser [2] shows 50 nucleotides upstream and downstream from the start of gene CYP17A1 on chromosome 10 of the human genome:

The variation we are examining is shown in the lower left, 34 nucleotides before the start of CYP17A1 (this is inside the “promoter” region of the gene). For the genomic strand sequenced in the study (any of two could have been chosen), the normal nucleotide at this position is a “T” and the variation is a “C”. From analysis of 1000 Genomes Project data, this variation is expected to occur on one of an individual’s two copies of chromosome 10 with a frequency of 0.02% [3].

Now the statistical analysis:

The study recruited 49 female-to-male transsexuals and 913 female controls, then sequenced their DNA in the promoter region of gene CYP17A1 to determine their genotype. The genotype could be one of three outcomes: “TT”, indicating lack of the nucleotide variation on both copies of chromosome 10; “CT”, indicating the variation occurs on only one of the chromosome 10 copies; and “CC”, indicating the variation is present on both copies of chromosome 10. The genotypes and their frequencies by group are listed in the following table:

We make two comparisons: The number of recessive genotypes vs. non-recessive genotypes (CC vs. CT + TT), and the number of dominant genotypes vs. non-dominant genotypes (TT vs. CT + CC). A variation often has to be recessive (present on both copies of its chromosome) to be biologically active, though this is not always the case.

Testing recessive vs. non-recessive genotype counts by study group using a Chi-square test yields a p-value of 0.04034, indicating a statistically significant difference exists between the transsexual and non-transsexual groups with regard to presence or absence of the recessive genotype.

Testing dominant vs. non-dominant genotype counts by study group using a Chi-square test yields a p-value of 0.06322, which is just over the commonly used threshold for declaring statistical significance.

It follows from this data and analysis that we can conclude that the recessive genotype is associated with female-to-male transsexualism. Again, this association does not explain all cases, e.g., why some non-transsexuals also have the recessive genotype, but it contributes to scientific efforts to understand transsexualism’s origins.

Gene Variation rs743572

Nucleotide variation rs743572 also impacts gene CYP17A1. Rather than residing in the promoter region of the gene as did rs549669867, this variation lies within the gene itself.

In the my analysis of this variation’s study data discussed below [4], the association between the variation and transsexualism (comparing transsexuals vs. controls) is not significant. However, the difference in the frequency of the variation between female-to-male transsexuals and male-to-female transsexuals is significant according to the statistical test I conducted. (The study authors concluded the same thing, just with different p-values). Therefore I’m reporting this variation as notable with regard to our efforts to understand the genetic underpinnings of transsexualism. The difference between this variation’s frequency in female-to-male transsexuals vs. male-to-female transsexuals may lead to insight into the origin of each outcome separately (per nominal biological sex), rather than help provide a “one size fits all” explanation for transsexualism.

rs743572 resides 139 nucleotide positions from the start of gene CYP17A1. It occurs on one of individuals’ two copies of chromosome 10 with a frequency of 41% [5]. The fact that this variation is much more common than rs549669867 probably explains why the transsexualism vs. control association for the variation I investigate below does not prove statistically significant. The following screenshot from the UCSC Genome Browser [2] shows the variation on gene CYP17A1 within chromosome 10 of the human genome:

The study [4] whose data I analyze here recruited 151 male-to-female and 142 female-to-male transsexuals. The researchers also recruited 167 male and 168 female non-transsexuals. All were Spaniards with no possibly confounding health issues. Of these subjects, 36% of the male-to-female and 45% of the female-to-male transsexuals carried the variation. 39% of the male and 38% of the female non-transsexuals also carried the variation. Presence or absence of the variation was determined through DNA sequencing. From this data I constructed the following contingency table, rounding to get whole numbers:

Performing pairwise comparisons of the count proportions using a Chi-squared goodness of fit test yields the following p-values:

As mentioned above, the only significant difference in variation proportions is in the comparison of female-to-male vs. male-to-female transsexuals. Therefore this variation does not by itself seem a strong contributor to our effort to explain the transgendered experience in terms of genetics. However, a whole-genome comparison study on similar test subjects could elucidate whether this variation interacts with other variations to form a combined association with transsexualism.

Androgen Receptor Repeat Length Variation rs193922933

A study [6] correlated the androgen receptor (AR) gene’s CAG repeat length variation (rs193922933) with male-to-female transsexualism. I feel the researchers did not perform their statistical analysis correctly, and have remedhttp://rs193922933ied the situation below. However my conclusion was the same.

The AR gene’s CAG repeat length is highly variable between individuals. Each occurrence of the repeat appends an extra amino acid to the androgen receptor protein, as shown below. No information about the frequency distribution of this variation was readily available [7].

Longer CAG repeat lengths are known to diminish testosterone signaling, which impacts masculinization of the brain during development [6].

The study authors sequenced the CAG repeat region of 112 male-to-female transsexuals and 258 male controls. They report the length data in the following plot (but not their raw data) [6]:

Using the GNU Image Manipulation Program, I measured each bar to determine the percentages and reconstructed the source data, re-plotted as follows:

Here we see that the CAG repeat length medians between the transsexual subjects and the controls differ by one (with the transsexual group’s median being longer), and that the interquartile limits are identical. The control group has a heavier lower tail.

The researchers compared the means using a t-test, which I am uncomfortable with due to the skew in the male controls’ distribution. Therefore I performed a quasi-Poisson regression since this is underdispersed count data. That analysis reported a statistically significant difference between the two groups (p = 0.0269).

I could not find data on the practical significance of a median difference of one CAG repeat length.

Negative Results

Another study [8], found no association between CAG repeat length variation in the AR gene and transsexualism. Furthermore, it found no association between transsexualism and variations in four other sex hormone-related genes: estrogen receptors alpha and beta, aromatase CYP19, and progesterone receptor PGR.

More Research Needed

A search of DisGeNET (a database of disease*-gene annotations) [9] for the term “transsexualism” shows only five genes and five PubMed publications covering the subject. This reveals the dearth of research on the matter. The image below showing the genes and PubMed articles extracted from the search comes from my own implementation of DisGeNET’s data within a graph database, which I discuss here.

*I of course object to DisGeNET’s labeling of “transsexualism” as a disease, and to its connection with the MeSH term “mental disorders”. I’ve contacted DisGeNET and MeSH about this issue and will report back on their response shortly.

Related Posts

the science of gender identity (part 2: brain anatomy)

the science of gender identity (part 3: psychology)

Code and Data

code_and_data

References

  1. http://www.ncbi.nlm.nih.gov/pubmed/17765230
  2. https://genome.ucsc.edu/
  3. http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs549669867
  4. http://www.ncbi.nlm.nih.gov/pubmed/25929975
  5. http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?searchType=adhoc_search&type=rs&rs=rs743572
  6. http://www.ncbi.nlm.nih.gov/pubmed/18962445
  7. http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs193922933
  8. http://www.ncbi.nlm.nih.gov/pubmed/19604497
  9. http://www.disgenet.org/web/DisGeNET/menu

HRC Corporate Equality Index correlates with Fortune’s 50 most admired companies

The Human Right’s Campaign, one of America’s largest civil rights groups, scores companies in its yearly Corporate Equality Index (CEI) according to their treatment of lesbian, gay, bisexual, and transgender employees [1]. The companies automatically evaluated are the Fortune 1000 and American Lawyer’s top 200. Additionally, any sufficiently large private sector organization can request inclusion in the CEI [2].

Similarly, Fortune Magazine publishes an annual list of 50 of the world’s most admired companies [3]. Companies are rated by financial health, stock performance, leadership effectiveness, customer sentiment, scandals, and social responsibility.

I became curious whether CEI scores correlate with membership in Fortune’s most admired list, so I matched the two datasets and analyzed the outcome. The results (below) are striking. Code implementing the calculations, with the source data, is attached.

Results

Plotting the CEI score distributions by whether a company was included in Fortune’s list produced:

From this difference in distributions it is clear that the status of being “most admired” correlates with a high CEI score, though there are a few outliers. In the distribution on the left, we see that over 50% of the companies in Fortune’s list held the top CEI score of 100, whereas only 25% of the companies not contained in Fortune’s held the top score. The median score for the most admired group was 100 while for the companies not included in Fortune’s list it is about 80. Over 80% of the most admired companies scored 90 or above. The variance is much wider for the companies not included on the list. Statistical analysis comparing the two groups, detailed below, confirms the correlation.



While correlation does not imply causality, this analysis suggests two things: First, the type of leadership necessary to achieve a high CEI score is the same type of leadership that leads to inclusion in Fortune’s most admired companies group. Second, any company aspiring to membership in the most admired group might consider developing its CEI score.

There is one possible source of bias, but I don’t expect that it is large: “Social responsibility” is used in Fortune’s rankings, which may include CEI scores (I don’t know). However, Fortune’s emphasis on financial health and stock price probably trumps any contribution that the CEI would generate alone. Furthermore, in the CEI score distribution for the most admired companies, there are outliers containing extremely low scores. This suggests that the CEI played little if any role in the selection of most admired companies.

Method

I manually copied and pasted the company names and scores from the CEI online database [1]. Then I cleaned up the results to create a manageable CSV file. Similarly, I copied and pasted the Fortune 50 most admired company list [3] into another CSV file. After that, I matched the two datasets by hand. Perhaps I could have performed the match algorithmically, but I would have had to worry about different representations of company names between the two datasets, e.g. “3M Co.” vs. “3M”. There was only 50 cases so the manual match did not take long.

Two cases in Fortune’s list had to be excluded, BMW and Singapore Airlines, because they were not included in the CEI, possibly because they are based outside the USA. In the case of two other non-US companies in Fortune’s list, Toyota and Volkswagen, I matched to Toyota Motor Sales USA and Volkswagen Group of America, respectively.

Finally, I plotted the CEI score distributions shown above and performed the statistical analysis reported below using the attached Python code.

Statistical Analysis

The extreme difference in variance between the two groups makes it impossible to compare medians using a non-parametric test, and the distribution of the CEI scores does not lend itself to a clean regression analysis. Therefore I built the following contingency table from the data:

The p-value for this table obtained from Fisher’s exact test is 4.53e-08, indicating that the proportions are significantly different.

References

  1. http://www.hrc.org/campaigns/corporate-equality-index
  2. http://www.hrc.org/resources/entry/corporate-equality-index-what-businesses-are-rated-and-how-to-participate
  3. http://fortune.com/worlds-most-admired-companies/

Code and Data

HRC_Fortune_data_and_code

bias reinforcement through survey questionnaires

Today I play media theorist and examine how survey questionnaires reinforce survey designers’ biases:

The knowledge that biases emit from survey questionnaires is nothing new. The extreme case, “push-polling”, intentionally guides the questionnaire reader toward a viewpoint, without real interest in their prior opinion. Any survey writer willing to push-poll already understands my concerns about bias (because they are propagandists).

It is the unintended or “honest” biases that concern me here.



Consider for example the common belief that individuals can be categorized as a member of one out of four or five distinct racial groups, a belief reflected in many survey questionnaires that ask respondents to indicate which race they belong to. This is an example of what I call an “honestly” projected bias; the survey writer likely has limited awareness that there is even a problem, and does not expect their respondents to question the belief. In these cases, the bias enters the survey questionnaire through the questionnaire writers’ phrasing and provided options, and is confirmed when each respondent chooses one of the options.

Stepping back, we observe “bias in, bias out” where the belief itself gains strength across the survey process. It strengthens among the respondents as they accept the belief when answering the questions, and strengthens in the mind of the survey creator when they see tacit acceptance of the bias in the responses. At each step, neural pathways supporting the belief become stronger due to exercise.

I’ve mapped this process below, illustrating the cumulative bias amplification by degree of red in the arrows’ color:

While we cannot completely escape projecting our biases through our measurement instruments, I call on questionnaire writers to step back and consider what we might be propagating. We may have to become more creative to limit the damage. (For one example of a creative approach, see my post “a better way to ask about gender in survey questionnaires” for an idea on how to avoid propagating the binary sex/gender bias through survey questionnaires).

a better way to ask about gender in survey questionnaires

Survey questionnaires regularly ask respondents’ sex or gender, and mostly offer only the binary options:

When presented with such a survey on paper, I typically add and then select a third option: “Fuck you”. (Similarly, I do the same with race/ethnicity questions when asked to choose one out of four or five options).

However, we increasingly answer surveys online, making this write-in approach unavailable. Furthermore, scrawling profanities onto survey forms fails to positively address the very serious problem underlying my anger: that the binary sex/gender classification erases folks who, for a variety of reasons, do not fit within it.

In what I perceive as an honest attempt at inclusion—and I sincerely appreciate the effort—Google offers an “Other” option in its Google+ profile form:

But simply adding an “other” option still emphasizes the binary classification system; it reminds respondents that they either fit in, or don’t. Very few of those who don’t fit in enjoy that interjection when it involves something as fundamental as gender identity.



I recommend the following alternative for collecting gender and sex data from survey respondents:

Here the use of sliders reflects the continuous natures of sex and gender, while the division of the query into separate, orthogonal dimensions accounts for the distinctness of biology (sex) from social artifact (gender).

Certainly this scheme fails to capture all the nuances of gender identity, particularly its flux within individuals, but it reaches for a more honest and inclusive world.