Welcome to FEE – Fair Embedding Engine’s documentation!

Fair Embedding Engine: A Library for Analyzing and Mitigating Gender Bias in Word Embeddings.

Non-contextual word embedding models have been shown to inherit human-like stereotypical biases of gender, race and religion from the training corpora. To counter this issue, a large body of research has emerged which aims to mitigate these biases while keeping the syntactic and semantic utility of embeddings intact. This paper describes Fair Embedding Engine (FEE), a library for analysing and mitigating gender bias in word embeddings. FEE combines various state of the art techniques for quantifying, visualising and mitigating gender bias in word embeddings under a standard abstraction. FEE will aid practitioners in fast track analysis of existing debiasing methods on their embedding models. Further, it will also allow rapid prototyping of new methods by evaluating their performance on a suite of standard metrics.

Role of FEE in propagating research in fairness

Despite the development of a large number of debiasing methods, the issue of bias in word representations still persists, making it an active area of research. We believe that the design and wide variety of tools provided by FEE can play a significant role in assisting practitioners and researchers to develop better debiasing and evaluation methods. The following figures portrays FEE assisted workflows which abstract the routing engineering tasks and allow users to invest more time on the intellectually demanding questions.

_images/dev1.png _images/dev2.png

FEE serves as a centralized resource for practitioners and researchers to develop novel debiasing methods and bias evaluation metrics. Figures illustrate the possible workflow associated with each of the tasks respectively all made possible by the powerful abstraction provided by FEE.

Documentations

Following are the documentations for the constituent classes in the five major components of FEE – Loader, Debiasing, Reports, Metrics and Visualizations.

Loader

class fee.embedding.loader.WE[source]

The Word embedding class.

The main class that facilitates the word embedding structure.

dim(int)
Type:Dimension of embedding
vecs(np.array)

Initialize WE object.

fname_to_format(fname)[source]

Get embedding format from file name.

Format can usually be extracted from the filename extension. We currently support the loading of embeddings in binary (.bin), text (.txt) and numpy format (.npy).

Parameters:fname (str) – file name
Returns:format (txt, bin or npy)
Return type:format (str)
get_gensim_word_vecs(model)[source]

Loading word and vecs using gensim scripts. :param model: Model for accessing all the words in

vocab, and their vectors.
load(fname=None, format=None, ename=None, normalize=False, dim=300)[source]

Load word embedding from filename or embedding name.

Loads word embeddings from either filename fname or the embedding name ename. Following formats are supported: - bin: Binary format, load through gensim. - txt: Text w2v or GloVe format. - npy: Numpy format. fname.wv.npy contans the numpy vector

while fname.vocab contains the vocabulary list.

All Gensim pre-trained embeddings are integrated for easy access via ename. ename are same as the gensim conventions.

Example

` we = WE() E = we.load('glove6B.txt', dim = 300) ` ` we = WE() E = we.load(ename = 'glove-wiki-gigaword-50') `

Parameters:
  • fname (str) – Path to the embedding file on disk.
  • format (str) –

    Format of word embedding. Following are the supported formats:

    • binary
    • text
    • numpy array
  • ename (str) – Name of embedding. This will download embedding using the Downloader class. In case both ename and fname are provided, ename is given priority.
  • normalize (bool) – Normalize word vectors or not.
  • dim (int) – The dimension of embedding vectors. Default dimension is 300
Returns:

Return self, the word embedding object.

Return type:

self (WE object)

normalize()[source]

Normalize word embeddings.

Normaliation is done as follows:

ec{v}_{norm} := ec{v}/| ec{v}|

where |

ec{v}| is the L2 norm of ec{v}

reindex()[source]

Reindex word vectors.

v(word)[source]

Access vector for a word

Returns the self.dim dimensional vector for the word word.

Example

E = WE().load(‘glove’) test = E.v(‘test’)

Parameters:word (str) – Word to access vector of.
Returns:self.dim dimension vector for word.
Return type:vec (np.array)

Debiasing

class fee.debias.hard_debias.HardDebias(E, g=None)[source]

Hard debiasing class.

HardDebias debiasing method class.

This debiasing word vectors in two step stages, first it neutralizes and then equailizes the vectors.

Parameters:
  • E (WE class object) – Word embeddings object.
  • g (np.array) – Gender Direction, if None, it is computed again.
equalize(E)[source]

Equalize word vectors using the gender direction and a set of equalizing word pairs. This is the second step of hard debiasing procedure.

Parameters:E (WE class object) – Word embeddings object.
neutralize(E, word_list)[source]

Neutralize word vectors using the gender direction. This is the first step of hard debiasing procedure.

Parameters:
  • E (WE class object) – Word embeddings object.
  • word_list (list) – List of words to debias.
run(word_list)[source]

Debias word vectors using the hard debiasing method.

Parameters:word_list (list) – List of words to debias.
Returns:Debiased word vectors
class fee.debias.hsr_debias.HSRDebias(E)[source]

HSR Debiasing class (Half Sibling Regression)

HSR debiasing method class.

Parameters:E (WE class object) – Word embeddings object.
hsr(gender_vecs, nongender_vecs, nongender_list, alpha)[source]

Half Sibling Regression method

Parameters:
  • gender_vecs (np.array) – 2D numpy array of gendered words
  • nongender_vecs (np.array) – 2D numpy array of non-gendered words
  • nongender_list (list) – list of nongender words.
  • alpha (float) – alpha hyperparameter.
run(gender_list, nongender_list=None, alpha=60)[source]

Run the Half Sibling Regression method

Parameters:
  • gender_list (list) – list of gendered words.
  • nongender_list (list) – list of nongendered words.
  • alpha (float) – alpha hyperparameter.
subset(words)[source]

Create subset such that the words exist in vocabulary.

Parameters:words (list) – list of words to debias.
class fee.debias.ran_debias.RANDebias(E, g=None)[source]

Class to perform Repulsion-Attraction-Neutralization based debiasing.

Parameters:
  • E (WE class object) – Word embeddings object.
  • g (np.array) – Gender Direction, if None, it is computed again.
minimize(word, X, lr, max_epochs, *args, **kwargs)[source]

minimize RANObjective using gradient optimization :param word: word to debias :type word: str :param X: the initialized new debiased vector :type X: np.array :param lr: learning rate for gradient descent :type lr: float :param max_epochs: number of epochs :type max_epochs: int

run(words)[source]

Run RANDebias for word list words

Parameters:words (list) – Words list to debias.
class fee.debias.ran_debias.RANOpt(E, word, X, N, g, ws=[0.33, 0.33, 0.33], ripa=False)[source]

RAN objective optimization class.

Parameters:
  • E (WE class object) – Word embeddings object.
  • word (np.array) – the original vector to debias
  • N (dict) – neighbourhood dictionary
  • g (np.array) – Gender Direction, if None, it is computed again.
  • ws (list) – weights for RAN objective
  • ripa (bool) – use RIPA based neutralization or not
fee.debias.ran_debias.calc_ns_idb(E, word, g)[source]

Calculate neighbourhood dictionary for one word with indirect bias pair information.

Parameters:
  • word (str) – word to compute neighbours and pair idb
  • E (WE class object) – Word embeddings object
fee.debias.ran_debias.get_neighbors_idb_dict(words, E)[source]

create neighbourhood dictionary for each word with indirect bias pair information.

Parameters:
  • words (list) – list of words to compute neighbours and pair idb
  • E (WE class object) – Word embeddings object
fee.debias.ran_debias.get_ns_idb(word, N)[source]

Get indirect bias and neighbours

Parameters:
  • word (str) – word to get neighbours and pair idb
  • N (dict) – neighbourhood-idb dictionary
fee.debias.ran_debias.init_vector(word, E)[source]

initialize vector for optimization

Parameters:
  • word (str) – word to debias
  • E (WE class object) – Word embeddings object
fee.debias.ran_debias.ran_objective(X, sel, desel, g, ws)[source]

objective function for RAN

Parameters:
  • X (torch.Tensor) – original word vector
  • sel (torch.Tensor) – selection tensors for attraction
  • X – deselection tensors for repulsion
  • g (torch.Tensor) – gender direction tensor
  • ws (list) – list of objective weights
fee.debias.ran_debias.torch_cosine_similarity(X, vectors)[source]

torch tensor cosine similarity for groups

Parameters:
  • X (torch.Tensor) – torch tensor 1D
  • vectors (torch.Tensor) – torch tensor 2D

Reports

class fee.reports.biased_neighbours.NeighboursAnalysis(E, g=None, random_state=42)[source]

NeighboursAnalysis report

Analyze Neighbours of a word in the embedding through cosine similarities and bias by projection. :param E: Word embeddings object :type E: WE class object :param g: gender direction :type g: np.array :param random_state: random seed for reproduction :type random_state: int

generate(word, n=100, ret_report=True)[source]

Generate the report for neighbours of word :param word: Word to generate report for :type word: str :param n: number of neighbours to compute :type n: int :param ret_report: return or print the report dataframe :type ret_report: bool

get_neighbours(word, n=100)[source]

Compute list of n neighbours for word :param word: Word to compute neighbours for :type word: str :param n: number of neighbours to compute :type n: int

print_neighbours(words, n)[source]

Pretty print n neighbours :param words: List of neighbours :type words: list :param n: number of neighbours to compute :type n: int

class fee.reports.word_report.WordReport(E, g=None)[source]

WordReport class

Generate a word level report for some word in E. This report computes and prints direct bias, proximity bias and neighbours of a word along with their bias by projection (same as direct bias). Additionally, the report includes tSNE visualization for the neighbourhood of word color coded by bias by projection. Finally, the report also plots the WordCloud of the given word size coded by bias by projection. :param E: Word embeddings object :type E: WE class object :param g: gender direction :type g: np.array

generate(word, n=50, figsize=None, dpi=100)[source]

Generate the wordreport for word word. :param word: Word to show the report for :type word: str :param n: number of neighbours of word to consider :type n: int :param figsize: size of figures in (HxW) :type figsize: tuple :param dpi: dpi of the figures :type dpi: int

class fee.reports.global_report.GlobalReport(E, g=None)[source]

GlobalReport Class

Generate a global bias report for a word embedding. This report computes the least and most biased words in an embedding and plot them. Bias by projection (direct bias) is used as the metric to compute this report. The report also plots the overall distribution of bias in the embedding E. :param E: Word embeddings object :type E: WE class object :param g: gender direction :type g: np.array

generate(n=10, ret_df=False, plot=True)[source]

Generate the global report for embedding E :param n: No. of most/least biased words to print. :type n: int

get_values_and_words()[source]

Get the list of words in E sorted by bias by projection.

plot(values)[source]

Plot the biased words. :param values: list of bias by projection :type values: list

print_df(sorted_values, sorted_words, n)[source]

Pretty print the dataframe containing most and least biased words in E. :param sorted_words: list of bias by projection for

sorted_words
Parameters:
  • sorted_words (list) – list of words
  • n (int) – no. of least/most biased words to print

Metrics

class fee.metrics.weat.WEAT(E)[source]

Perform WEAT (Word Embedding Association Test) bias tests on a language model. Follows from Caliskan et al 2017 (10.1126/science.aal4230). Code mostly “stolen” from: https://github.com/chadaeun/weat_replication/blob/master/lib/weat.py

fee.metrics.weat.cos_sim(v1, v2)[source]

Returns cosine of the angle between two vectors

fee.metrics.weat.unit_vector(vec)[source]

Returns unit vector

fee.metrics.weat.weat_association(W, A, B)[source]

Returns association of the word w in W with the attribute for WEAT score. s(w, A, B) :param W: target words’ vector representations :param A: attribute words’ vector representations :param B: attribute words’ vector representations :return: (len(W), ) shaped numpy ndarray. each rows represent association of the word w in W

fee.metrics.weat.weat_differential_association(X, Y, A, B)[source]

Returns differential association of two sets of target words with the attribute for WEAT score. s(X, Y, A, B) :param X: target words’ vector representations :param Y: target words’ vector representations :param A: attribute words’ vector representations :param B: attribute words’ vector representations :return: differential association (float value)

fee.metrics.weat.weat_p_value(X, Y, A, B)[source]

Returns one-sided p-value of the permutation test for WEAT score CAUTION: this function is not appropriately implemented, so it runs very slowly :param X: target words’ vector representations :param Y: target words’ vector representations :param A: attribute words’ vector representations :param B: attribute words’ vector representations :return: p-value (float value)

fee.metrics.weat.weat_score(X, Y, A, B, p_val)[source]

Returns WEAT score X, Y, A, B must be (len(words), dim) shaped numpy ndarray CAUTION: this function assumes that there’s no intersection word between X and Y :param X: target words’ vector representations :param Y: target words’ vector representations :param A: attribute words’ vector representations :param B: attribute words’ vector representations :return: WEAT score

class fee.metrics.sembias.SemBias(E)[source]

Sembias test Class

Parameters:E (WE class object) – Word embeddings object.
compute()[source]

Returns accuracy for definitional, stereotype and none type analogies.

eval_bias_analogy()[source]

Source: https://github.com/uclanlp/gn_glove

class fee.metrics.proximity_bias.ProxBias(E, g=None, thresh=0.05, n=100)[source]

Proximity Bias Metric Class

Parameters:E (WE class object) – Word embeddings object.
kwargs:

g (np.array): Gender direction. thresh (float): The minimum indirect bias threshold, above

which the association between a word and its neighbour is considered biased.

n (int): Top n neighbours according to the cosine similarity.

compute(words)[source]
Parameters:words (str) – A word or a list of worrds to compute the ProxBias for.
Returns:The average proximity bias for the given list of words. Proximity bias is in simple terms the ratio of biased nieghbours according to indirect bias with respect to a word.
class fee.metrics.pmn.PMN(E, g=None, n=100)[source]

The class that computes the Percentage of Male Neighbours (PMN) in the top n neighbours for a word.

Parameters:E (WE class object) – Word embeddings object.
kwargs:
g (np.array): Gender direction. n (int): Top n neighbours according to the cosine similarity.
compute(words)[source]
Parameters:words (str or list[str]) – A word or a list of worrds to compute the PMN for.
Reutrn:
The percentage of male neighbours. Note that the remaining percentage of neighbours can be considered to be female.
class fee.metrics.gipe.GIPE(E_new, E_orig=None, g=None, thresh=0.05, n=100)[source]

The GIPE metric class

Parameters:
  • E_new (WE class object) – Represents the new embedding object, which consists of the debiased embeddings.
  • E_orig (WE class object) – Represents the old/original embedding object, which consists of the non-debiased embeddings.
kwargs:

g (np.array): Gender direction. thresh (float): The minimum indirect bias threshold, above which

the association between a word and its neighbour is considered biased.

n (int): The top n neighbours to be considered.

compute(words)[source]
Parameters:words (list[str]) – A list of string of words, on which GIPE will be computed.
Returns:The final computed GIPE score of a word embedding over the given word lists, and the corresponding created BBN.
fee.metrics.gipe.get_neighbors(N, word, k)[source]
Parameters:
  • N (dict[dict]) – A dict of dict, storing the neighbours of
  • word with IDB values. (each) –
  • word (str) – The target word whose neighbours are to be fecthed.
  • k (int) – top k neighbours.
Returns:

A list of top k neighbours for word.

fee.metrics.gipe.get_neighbors_idb_dict(E, words)[source]
Parameters:Args – E (WE class object): Word embeddings object. word (str): The word in consideration.
Returns:A dict of dicts, storing the neighbours of each word with IDB values.
# The key of larger dict resembles the source node, its value # is again a dict which has keys and values. These are # respetively the target node and the weight of an edge # that is conceptually formed between the two nodes # (keys of two dicts).
fee.metrics.gipe.get_ns_idb(E, word, g)[source]
Parameters:
  • E (WE class object) – Word embeddings object.
  • word (str) – The word in consideration.
  • g (np.array) – Gender direction.
Returns:

A dictionary of top 100 neighbours of word and the indirect bias between word and each neighbour.

fee.metrics.gipe.prox_bias(vals, l, thresh)[source]
Returns:the ratio of total neighbours that have IDB above thresh.
fee.metrics.gipe.score(vals, weights)[source]

Score the values and weights.

Parameters:
  • vals (dict) – A dict of words and their corresponding proximity bias.
  • weights (dict) – The weights of an edge according to GIPE metric.
Returns:

The final computed GIPE score

class fee.metrics.direct_bias.DirectBias(E, c=1, g=None)[source]

Direct Bias Metric

Direct bias calculation :param E: Word embeddings object :type E: WE class object

Kwargs:
c (float): strictness factor g (np.array): gender direction
compute(word_list)[source]

Compute direct bias

Parameters:word_list (list) – list of words to compute bias for.
Returns:The direct bias of each word in the word_list.
class fee.metrics.indirect_bias.IndirectBias(E, g=None)[source]

The class for computing indirect bias between a pair of words.

Parameters:E (WE class object) – Word embeddings object.
kwargs:
g (np.array): Gender direction.
compute(w, v)[source]
Parameters:
  • w (str) – One of a pair of words.
  • v (str) – The other word from the pair.
Returns:

The indirect bias between the embeddings of w and v.

fix(w, eps=0.001)[source]
Parameters:w (np.array) – word vector
kwargs:
eps (float): threshold. If the difference between the norm
of w and 1, is greater than eps. Then normalize w.
Returns:The normalized w.

Visualizations

class fee.visualize.neighbour_bias_wordcloud.NBWordCloud(E, g=None, random_state=42)[source]

NBWordCloud Class

WordCloud for the neighbourhood of a word. The size of neighbouring words is directly propotional to the bias by projection of these words.

Parameters:
  • E (WE class object) – Word embeddings object
  • g (np.array) – gender direction
  • random_state (int) – for reproducibility
run(word, title=None, n=100, dpi=300, figsize=(8, 5), width=800, height=500)[source]

Run the NBWordCloud visualization

Parameters:
  • word (str) – word to compute neighbours of and make this plot
  • title (str) – title of the plot
  • n (int) – number of neighbours to consider
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures
  • width (int) – width of the wordcloud image
  • height (int) – height of the wordcloud image
visualize(freq_dict, title, figsize, dpi, width, height)[source]

Main NBWordCloud visualization driver function

Parameters:
  • freq_dict (dict) – dictionary to map size of each word
  • title (str) – title of the plot
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures
  • width (int) – width of the wordcloud image
  • height (int) – height of the wordcloud image
class fee.visualize.neighbour_plot.NeighbourPlot(E, g=None, random_state=42)[source]

NeighbourPlot Class

tSNE plot for the neighbourhood of a word color coded by their bias by projection.

Parameters:
  • E (WE class object) – Word embeddings object
  • g (np.array) – gender direction
  • random_state (int) – for reproducibility
bias_by_projection_sort(words)[source]

Sort words by bias by projection.

Parameters:words (list) – list of words (str)
get_neighbours(word, n=100)[source]

Get n neighbours for word word.

Parameters:
  • word (str) – word to computer neighbours for
  • n (int) – number of neighbours to compute
run(word, title=None, n=100, dpi=300, figsize=(8, 5), colors=['blue', 'red'], fontsize=7, annotate=True)[source]

Run the NeighbourPlot visualization

Parameters:
  • word (str) – word to compute neighbours of and make this plot
  • title (str) – title of the plot
  • n (int) – number of neighbours to consider
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures
  • colors (list) – list of two matplotlib compatible colors
  • fontsize (int) – matplotlib compatible font size for scatter plot
  • annotate (bool) – annotate points in scatter plot?
visualize(words, ranks, title, figsize, dpi, colors, s, annotate=False)[source]

Run the NeighbourPlot visualization

Parameters:
  • words (list) – list of neighbours
  • title (str) – title of the plot
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures
  • colors (list) – list of two matplotlib compatible colors
  • s (int) – matplotlib compatible font size for scatter plot
  • annotate (bool) – annotate points in scatter plot?
fee.visualize.neighbour_plot.color_fader(c1, c2, mix=0)[source]

Fade color between c1 and c2 with mix. See: https://stackoverflow.com/questions/25668828/how-to-create-colour-gradient-in-python

fee.visualize.neighbour_plot.generate_palette(c1, c2, n)[source]

Generate color palette of length n between c1 and c2

class fee.visualize.gender_cluster_tsne.GCT(E, random_state=0)[source]

GCT Class

Gender Cluster tnse Plot: plot the tSNE visualization for the neighbourhood of word color coded by computer cluster.

Parameters:
  • E (WE class object) – Word embeddings object
  • random_state (int) – for reproducibility
cluster(vecs)[source]

Apply kmeans clustering over vecs

Parameters:vecs (np.array) – list of word vectors to cluster
run(word_list, title=None, dpi=300, annotate=False, figsize=(8, 5), colors=['k', 'r'])[source]

Run the GCT visualization

Parameters:
  • word_list (list) – list of words (list of string)
  • title (str) – title of the plot
  • dpi (int) – dpi of the figures
  • annotate (bool) – annotate each point in scatter plot
  • figsize (tuple) – size of figures in (HxW)
  • colors (list) – list of two matplotlib compatible colors
visualize(vecs, words, labels, title, annotate, figsize, dpi, colors)[source]

Main GCT visualization driver function

Parameters:
  • vecs (np.array) – list of word vectors to cluster
  • words (list) – list of words (list of string)
  • labels (list) – cluster labels (list of 0s and 1s)
  • title (str) – title of the plot
  • annotate (bool) – annotate each point in scatter plot
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures
  • colors (list) – list of two matplotlib compatible colors
class fee.visualize.pca_components.PCAComponents(E)[source]

PCAComponents Class

Plot the PCA principle component bar graph for some direction of E computed using a list of pairs of words.

Parameters:E (WE class object) – Word embeddings object
PlotPCA(pairs, title, num_components, dpi, figsize)[source]

Main PCAComponents visualization driver function

Parameters:
  • pairs (list) – A list of pair (tuple/list) of words. The direction is computed by PCA of set of differences of these words.
  • title (str) – title of the plot
  • num_components (int) – number of principal components
  • dpi (int) – dpi of the figures
  • figsize (tuple) – size of figures in (HxW)
run(pairs, title=None, num_components=10, dpi=300, figsize=(8, 5))[source]

Run the PCAComponents visualization

Parameters:
  • pairs (list) – A list of pair (tuple/list) of words. The direction is computed by PCA of set of differences of these words.
  • title (str) – title of the plot
  • num_components (int) – number of principal components
  • figsize (tuple) – size of figures in (HxW)
  • dpi (int) – dpi of the figures

Indices and tables