Keywords: Ethical statistical practice; Interpretable machine learning; Generative modeling; Data generation
Researchers in the social sciences have become increasingly more cognizant of many of the machine learning models now available for analytical purposes. Of particular note is the advent of generative models—such as generative adversarial networks and variational autoencoders—that allow for the representation, reconstruction, and creation of novel data closely resembling a source dataset. Alongside this development is the consideration of “black box” methods, such as neural networks that allow for highly accurate predictions without concomitantly interpretable outputs. This paper examines one such example of the use of such complex models and the potential benefits they have to offer to the social sciences. Through this example and a discussion of historical ethical issues in statistical practice, the issue of ethical practice is considered with a solution proposed in the form of interpretable model characteristics (e.g., bias) through Shapley values and Local Interpretable Model-Agnostic Explanations. Several recommendations are offered for maintaining an ethical practice of data analysis and machine learning in the social sciences.