The robot sociologist - Alec's Project Documentation

[linking to the running instance of the tester thing](http://ec2-54-191-145-248.us-west-2.compute.amazonaws.com:8095/app/tasks/interview_each_other) # Overview  > This project aims to understand human communication and cognition deeply, in order to teach a machine how to speak and understand. This aim is Sociological (my PhD) in that communication is social, meaning is intersubjective, that people think about people, and what they think affects others. + Behavior, perception and understanding bind people together and form society. + Sociologists work to untangle the content of these links, hoping to gain some understanding of the forces which structure the social world. + The methodological tools available to the social sciences for analyzing understandings either suffer from assuming homogeneity, or are limited in their societal scope. + I present in this paper a chatbot which can understand and explain beliefs and understandings. + The chatbot acts as interviewer, collecting conversations and the understandings it derives from them. + It is able to update what it understands, as well as how it understands, through interactive dialogue. + I will discuss the need, theory, and logistics of building such a chatbot The crucial difference between this aim and those previously dealing with contextual meaning through discourse is that my chatbot will learn through interaction, through clarification and question-asking to a warm human being, believed by Collins (2018) to be the crucial ingredient in a machine actually being able to understand what we say.  # Further resources + The initial motivational essay inspiring this proposal ([here](http://academics.alecmcgail.com/pdf/readableSocialRepresentations.pdf)) + A literature review motivating the collection of better data to contribute to sociological knowledge of systems of beliefs ([here](http://academics.alecmcgail.com/pdf/litReview.secondDraft.pdf)). + [Example interviews](/robotsoc/interview-examples/family-life) for reference and motivation + [Logistics, using ParlAI for testing](/robotsoc/parlai) ## Annotated bibliography + Jurafsky, D., & Martin, J. H. (2007). Dialogue and Conversational Agents. In Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. + I shold definitely have a solid understanding of the canonical theory in this area! + [TOC](/uploads/journal-analysis/dialogue-agents.png "Dialogue Agents") + Collins, Harry (2018) Artifictional Intelligence: Against Humanity's Surrender to Computers + "No computer will be fluent in a natural language, pass a severe Turing Test and have full human-like intelligence unless it is fully embedded in normal human society." + "No computer will be fully embedded in human society as a result of incremental progress based on current techniques." + Galantucci, B., & Roberts, G. (2014). Do we notice when communication goes awry? an investigation of people’s sensitivity to coherence in spontaneous conversation. PLoS ONE, 9(7), 1–5. https://doi.org/10.1371/journal.pone.0103182 + "The single biggest problem in communication is the illusion that it has taken place." - George Bernard Shah + A significant proportion of conversants didn't notice their conversations had been "crossed" when discussing what was in a drawing (talking about different pictures) + Jerolmack, C., & Khan, S. (2014). Talk Is Cheap: Ethnography and the Attitudinal Fallacy. Sociological Methods and Research, 43(2), 178–209. https://doi.org/10.1177/0049124114523396 + Argues that self-reports of attitudes and behaviors aren't of much value + These descriptions are abstracted from living experience + "Meaning and action are collectively negotiated and context-dependent" + Glaeser, A. (2005). An Ontology for the Ethnographic Analysis of Social Processes: Extending the Extended- Case Method. Social Analysis: The International Journal of Social and Cultural Practice, 49(3), 16–45. + Consequent processualism "explodes" dichotomies: micro-macro, event-structure, agency-social structure + Theory is instrumental to delimiting fruitful field sites + Ethnography should be the method of choice for developing social theory + He, H., Balakrishnan, A., Eric, M., & Liang, P. (2017). Learning Symmetric Collaborative Dialogue Agents with Dynamic Knowledge Graph Embeddings. https://doi.org/10.18653/v1/P17-1162 + "Collected a dataset of 11K human-human dialogues" + "propose a neural model with dynamic knowledge graph embeddings that evolve as the dialogue progresses" + Akman, V. (1995). V. Lifschitz, ed., formalizing common sense: papers by John McCarthy. Artificial Intelligence, 77(2), 359–369. https://doi.org/10.1016/0004-3702(95)90018-7 + A summary of John McCarthy's goal to formalize common sense + A short, concise introduction to a large body of philosophical work in the area + Kamp, H., Van Genabith, J., & Reyle, U. (2011). Discourse Representation Theory. Handbook of Philosophical Logic, (1), 125–394. https://doi.org/10.1007/978-94-007-0485-5_3 + DRT represents meaning in context, formalistically + It's messy, incomplete, but gets closer to the answer. This chapter gives a nice overview of the issues of context in meaning-making + Grosz, B. J., & Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175–204. https://www.aclweb.org/anthology/J86-3001 + Super important. Dialogue can be decomposed into 1) attentional state, 2) structure of utterances, and 3) structure of intents + Describes in this context cue phrases, referring expressions, and interruptions + Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A Simplest Systematics for the Organization of Turn-Taking for Conversation. Language, 50(4), 696–735. --- + Zarisheva, E., & Scheffler, T. (2015). Dialog Act Annotation for Twitter Conversations. 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue, (August), 114–123. Retrieved from http://anthology.aclweb.org/W/W14/W14-43.pdf#page=219 + Produced gold-standard dialog act tagging for 172 dialogues in German # Methodology ## The component speech acts of the qualitative interviewer An interview can be usefully understood as a sequence of speech acts, of intents and expectations embedded in a commonly understood pattern [^austin]. Conversants take turns, each turn with some intent. In a question the intent might be to elicit information or to understand a word or phrase better. In clearing up some verbal miscommunication there is a defined ritual we observe. In each statement we occupy some known position in the well-trodden conversational artifact. The full flow of conversation in a clarification is depicted below: [^austin]: Austin, John Langshaw. 1975. *How to Do Things with Words*. Oxford university press. ![Clarificationsa](/uploads/clarificationsa.png "Clarificationsa") **Figure credit:** Walton, D. (2007). The Speech Act of Clarification in a Dialogue Model. Studies in Communication Sciences, 7(2), 165–197. This network among states of expectation in conversational flow is self-contained, independent from the network representing talk about the weather, or asking about family history.  In Wittgensteinian terms this network is a language game [^grayling]. Because each of these games is separable from the rest, independent and self-contained, programming a qualitative researcher becomes a decomposable task. Conversation as a complex and varied structure can be expressed in terms of discrete modules. Our task here will be to break the competencies of a qualitative interviewer into its component speech acts, and code them. [^grayling]: For a summary see pp. 83 in Grayling, A. C. 2001. Wittgenstein: A Very Short Introduction. Oxford: Oxford University Press. ## Constructing a Python package for enacting speech act networks I've set about accomplishing this goal through a conceptualization of language as a game, a sequence of speech acts, and have represented these games in terms of objects in Python. I program some "ways of communicating", routines which know how to understand some large collection of what surface forms means in terms of speech acts within a language game. One of the most important "games" I've created is that of clarification (shown in the figure above). This game is initiated by the algorithm when the algorithm doesn't understand something the person said, asks for clarification of the part of what they said it didn't understand, and either eventually gives up understanding or understands the person's explanation. If the algorithm understand what the person meant to say through this process, it updates its bank of "ways of communicating" for future use. By mirroring the code for one side of this game, we get the other. For example, because the algorithm knows how to express misunderstanding, it also knows how to recognize when misunderstanding is expressed by the other, and is able to participate in the other side of this language game. take a sentence, replace surface forms with (possible) alternative forms, and try to interpret them in a way suitable to the context. each sentence is an act, in a way. with a parser, I could generate "reasonable" ways to answer, and could generate "reasonable" parsings of a sentence. (eventually) need to have the ability to modify parsers in relationt o discussions about what they meant (eventually) should be able to carry on multiple conversations at the same time ## Relatively independent functional modules + Examination of real interviews to get an idea, inductively, of how this works + Testing in specific contexts + Plus ability to hand off between modules when it stops understanding ## Continuous real-world testing ## Hiring undergraduates to code This project is ideal for completing with skilled student coders. Tasks provided are bite-sized, and can be completed in full by a single student in a semester of working 9 hours per week. I can get funding for these students either through an NSF grant, or through a class I teach (e.g. [Knight Foundation](https://knight.as.cornell.edu/prizes-awards)). I should contact professors in other departments about how to find these resources. I [emailed my intro students](/robotsoc/lettertostudents), let's see how many reply. **Ambitions** + I've collected a petition of 50 students who are interested in working. + I would need X +- 50 coder hours to complete Z1 Z2 Z3 # Broader Impact ## Giving society a tool with which to communicate Humans' potential for negatively impacting the planet is undeniable, and we are forced to consider for the first time in human history how we go about preserving the Earth's ability to support human life. In the up-swing of the technological age, as the world becomes increasingly complex, there is an urgent need for cooperation, communication, and understanding. Echo chambers plague our ability to communicate effectively as a society. This is almost by definition, as echo chambers are just a stable lack of communication between groups. In some cases this cross-group communication is nearly impossible, either because of the emotional response it would arouse, or because the groups don't understand the language of the other. It is not that the individuals don't want to communicate, it's that they have not the time, the energy, the language, the medium. People fundamentally wish to be understood, and we would all live in a safer and more just world if we understood each other. A full understanding of the conditions under which a product or service is being produced - the true sacrifice and costs (or perhaps benefits) of this production chain - enables the consumer to improve the social constructions under which we all live On the flip side, ideological manipulation and misinformation is the most effective tool in maintaining unjust and potentially disasterous social constructions. ## An intermediary (not quite a broker) Communication is greatly facilitated by an intermediary. That is, a translator, one who understands and explains to both sides, learning in the process. This translator is instrumental in true globalization, in the sense of becoming one [^jijon]. Typically, one person constructs an explanation specifically for another person, or at the very least for a type of person. [^jijon]: Jijon, Isabel. 2019. “Toward a Hermeneutic Model of Cultural Globalization: Four Lessons from Translation Studies.” *Sociological Theory* 37(2):142–61. Explanation is best done in person, one-on-one, as it allows for clarification and questions, and these opportunities make this explaining-to much more direct and effective. Yet interactive explanation (and understanding) is an extremely scarce resource. A person has room for only a few people they can come to understand, and spend time explaining themselves to. And each of those people have a few people they come to understand. For most, this sparser social network does not reach very far in social space. Those which are close emotionally are most often close spatially, creating a natural clustering. The broker, the facilitator of an understanding between those who otherwise would never understand each other, can bring new understanding, although rarely breaching the walls of an echo chamber. # Contributions to Science ## Scalability and breadth of access >The researcher, once built, can be accessed and communicated with by anyone with an internet connection, and will be incredibly convenient for those with a smart phone. >Although smart phone users are not representative in one sense of the population, they reside in every country in the world, in every demographic slice whatever, and in many previously "hard-to-reach" contexts (citation). The researcher could also collect data from an indefinite number of people at once, meaning an unprecedented scaling in sample sizes available. In an ideal scenario the Twitter handle [@RobotSociologist](http://twitter.com/sociorobot) could have 1 million followers. For comparison, `@wizkhalifa` has 35 million followers. It could be carrying on a conversation with 10,000 of them at a given time. This large an adoption is not my expectation, at least not within the next few years, but it is possible outcome of such an algorithm. ## Acts as an efficient and indefatigable qualitative researcher The primary occupation of a researcher in Sociology is to ask people questions, watch what they do, what they make, in an effort to understand how the social infrastructure we are participating in maintains itself and the logic by which this infrastructure changes over time. And in order to do this, sociologists must always seek to understand the taken-for-granted social world of individuals. Society is structured insofar as it is meaningful. But while the main occupation of the sociologist may be to understand, this means nothing if the sociologist cannot communicate this understanding convincingly. Methods for explaining sociological findings are elaborate, consisting of a nuanced collaboration between theory and data. In a written report, the researcher must navigate institutional practices of citation and framing, and presentation of results. The study must also be easily translated to "layman's terms" for efficient incorporation into the community of knowledge in which they participate. This hermeneutic praxis is exhausting, even more so for the qualitative researcher. The collection, interpretation, and analysis of qualitative interviews must be limited to a small set -- of topics and individuals. One cannot survey the entirety of biography, experience, social networks, interaction with institutions, child rearing, cultural assimilation, with one study. Reality is at least as complex at this, supporting dozens of projects at once, but the interviewer must be direct. > Eliminate the researcher - researched dichotomy This tool not only enables the qualitative researcher to study large populations, it allows individuals to come to understand each other. When we are explaining ourselves to each other, it open-sources 'sociology', and ... Makes it more about us understanding each other, and explaining ourselves to each other. Sociology shouldn't be an oligarchy of prescribed or used truths. Qualitative data collection leaves the analyst as a priveledged speaker of the reality of those they observe. One aspect of this priveledge is the separate space of the academic journal, separated by discourse, paywall, or lack of knowledge from the subjects of the study. When there is this separation, there cannot develop a dialogue between theory and reality, between the analyzed and the analyst. Bryman considers this topic in depth, and summarizes: "What has proved to be disquieting to some commentators, both within and outside the qualitative approach, is whether researchers really can provide accounts from the perspective of those whom they study and how we can evaluate the validity of their interpretations of those perspectives." [^bryman] [^bryman]: Bryman, Alan. 1988. *Quantity and Quality in Social Research*. New York, NY: Routledge. pp. 73 ## Theoretically relevant data The interviewer would enables large-scale collection of theoretically important social data which have never before been collected systematically. How a person explains their beliefs and political stances, their prejudices and opinions of prejudice, --, all these would be collected in machine-readable form. This allows researchers to ask and answer *empirically* what before were purely theoretic. * Why do people mobilize? * What buttresses ideological, racial, or class divides? # Footnotes