Conference papers

Słowik A., Bottou L. (2021) Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation. Under review. arxiv

TLDR: Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of underrepresented gender and ethnic groups. Given the importance of bias mitigation in machine learning, the topic leads to contentious debates on how to ensure fairness in practice (data bias versus algorithmic bias). Distributionally Robust Optimization (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. We show that neither DRO nor curating the training set should be construed as a complete solution for bias mitigation: in the same way that there is no universally robust training set, there is no universal way to setup a DRO problem and ensure a socially acceptable set of results. We then leverage these insights to provide a mininal set of practical recommendations for addressing bias with DRO. Finally, we discuss ramifications of our results in other related applications of DRO, using an example of adversarial robustness. Our results show that there is merit to both the algorithm-focused and the data-focused side of the bias debate, as long as arguments in favor of these positions are precisely qualified and backed by relevant mathematics known today.

Lamb A., Goyal A., Słowik A., Beaudoin P., Mozer M., Bengio Y. (2020) Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers. The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021). proceedings

TLDR: Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

Danel T., Spurek P., Tabor J., Śmieja M., Struski Ł., Słowik A., Maziarka Ł. (2019) Spatial Graph Convolutional Networks. International Conference on Neural Information Processing (ICONIP 2020). arXiv code

TLDR: Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Spatial Graph Convolutional Network which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalization of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, our model outperforms state-of-the-art graph-based methods on image classification and chemical tasks.

Workshop papers

Aubin B., Słowik A., Arjovsky M., Bottou L., Lopez-Paz D. (2020) Linear unit-tests for invariance discovery. NeurIPS Workshop on Causal Discovery and Causality-Inspired Machine Learning. arxiv code presentation

TLDR: There is an increasing interest in algorithms able to learn invariant correlations across training environments. A big share of the current proposals find theoretical support in the causality literature, but how useful are they in practice? The purpose of this note is to propose six linear low-dimensional problems—“unit tests”—to evaluate out-of-distribution generalization algorithms. Following initial experiments, none of three recently proposed alternatives pass these tests. By providing the code to automatically replicate all the results in this manuscript, we hope that these unit tests become a standard stepping stone for researchers in out-of-distribution generalization.

Guo S., Ren Y., Słowik A., Mathewson K. (2020) Inductive Bias and Language Expressivity in Emergent Communication. 4th NeurIPS Workshop on Emergent Communication: Talking to Strangers: Zero-Shot Emergent Communication. code paper

TLDR: Referential games and reconstruction games are the most common game types for studying emergent languages. We investigate how the type of the language game affects the emergent language in terms of: i) language compositionality and ii) transfer of an emergent language to a task different from its origin, which we refer to as language expressivity. With empirical experiments on a handcrafted symbolic dataset, we show that languages emerged from different games have different compositionality and further different expressivity.

Gupta^* A., Słowik^* A., Hamilton W. L., Jamnik M., Holden S. B., Pal C. (2020) Analyzing Structural Priors in Multi-Agent Communication. Workshop on Adaptive and Learning Agents at AAMAS 2020.

TLDR: Human language and thought are characterized by the ability to systematically generate a potentially infinite number of complex structures (e.g., sentences) from a finite set of familiar components (e.g., words). Recent works in emergent communication have discussed the propensity of artificial agents to develop a systematically compositional language through playing co-operative referential games. The degree of structure in the input data was found to affect the compositionality of the emerged communication protocols. Thus, we explore various structural priors in multi-agent communication and propose a novel graph referential game. We compare the effect of structural inductive bias (bag-of-words, sequences and graphs) on the emergence of compositional understanding of the input concepts measured by topographic similarity and generalization to unseen combinations of familiar properties. We empirically show that graph neural networks induce a better compositional language prior and a stronger generalization to out-of-domain data. We further perform ablation studies that show the robustness of the emerged protocol in graph referential games.

Słowik^* A., Gupta^* A., Hamilton W. L., Jamnik M., Holden S. B. (2020) Towards Graph Representation Learning in Emergent Communication. Workshop on Reinforcement Learning in Games at AAAI 2020. arxiv

TLDR: Recent findings in neuroscience suggest that the human brain represents information in a geometric structure (for instance, through conceptual spaces). In order to communicate, we flatten the complex representation of entities and their attributes into a single word or a sentence. In this paper we use graph convolutional networks to support the evolution of language and cooperation in multi-agent systems. Motivated by an image-based referential game, we propose a graph referential game with varying degrees of complexity, and we provide strong baseline models that exhibit desirable properties in terms of language emergence and cooperation. We show that the emerged communication protocol is robust, that the agents uncover the true factors of variation in the game, and that they learn to generalize beyond the samples encountered during training.

Słowik A., Mangla C., Jamnik M., Holden S. B., Paulson L. C. (2019) Bayesian Optimisation for Heuristic Configuration in Automated Theorem Proving. The 6th Vampire Workshop at The 22nd International Conference on Theory and Applications of Satisfiability Testing (SAT 2019), The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020). Presented as a talk at SAT 2019 (Workshop track) and as a poster at AAAI 2020 (Student Abstract track). proceedings

TLDR: Modern theorem provers utilise a wide array of heuristics to control the search space explosion, thereby requiring optimisation of a large set of parameters. An exhaustive search in this multi-dimensional parameter space is intractable in most cases, yet the performance of the provers is highly dependent on the parameter assignment. In this work, we introduce a principled probabilistic framework for heuristic optimisation in theorem provers. We present results using a heuristic for premise selection and the Archive of Formal Proofs (AFP) as a case study.

Antoniou A., Słowik A., Crowley E. J., Storkey A. J. (2018) Dilated DenseNets for Relational Reasoning. Women in Machine Learning Workshop at NeurIPS 2018 (WiML 2018), Theoretical Foundations of Machine Learning (TFML 2019), Transylvanian Machine Learning Summer School (TMLSS). arXiv slides poster

TLDR: Despite their impressive performance in many tasks, deep neural networks often struggle at relational reasoning. This has recently been remedied with the introduction of a plug-in relational module that considers relations between pairs of objects. Unfortunately, this is combinatorially expensive. In this extended abstract, we show that a DenseNet incorporating dilated convolutions excels at relational reasoning on the Sort-of-CLEVR dataset, allowing us to forgo this relational module and its associated expense.

Słowik A., Czarnecki W. (2016) Random projections in Extreme Learning Machines. Women in Machine Learning Workshop at NIPS 2016 (WiML 2016). slides

TLDR: Despite growing research in random neural networks the problem of choosing weights distribution has been ignored for a long time and sampling from Gaussian noise remains the state-of-the-art. My BSc thesis compares three groups of random projections: data-independent probability distributions, semi-supervised sampling from clustered training examples, and supervised sampling in the region of the decision boundary. I analyse the effect of the distribution choice on the performance of Extreme Learning Machines, the most popular RNN model introduced by Guang Bin-Huang. I report the results using a suite of 5 classification and 5 regression datasets.