On March 1, 2018, Microsoft released the latest large-scale dataset for machine reading comprehension and question answering. You can access the public dataset, named MS MARCO, at //www.msmarco.org/. As human beings, we are so habituated to the daily routines of reading, writing, and conversing that we may deem it intuitive to transfer our language understanding capabilities to machines. In reality, machine reading comprehension and dialogues have been subdomains of artificial intelligence (AI), where the fanfare of media coverage overshoots the actual state-of-the-art performance. This article surveys the evolving landscape of human-technology interactions and highlights the role of open data in our journey toward intelligent machines that deliver sustaining value to society.
Conversation as the New User Interface
Conversations are not just for humans anymore. Slowly but surely, we are turning to machines to accomplish our daily routines (“OK Google, turn off the reading light”), quench our thirst for knowledge (“Hey Cortana, tell me more about the Innovator’s Dilemma”), or even vent our frustration and anger (“I want my money back, or I’m switching to Lyft!”). Conversational systems, opines Ron Kaplan from Wired, can unlock new customer value propositions that traditional user interfaces cannot even imagine. Venture Capitalist Matt Hartman equates conversational user interfaces to the new “hidden homescreen.” David Marcus, head of Facebook Messenger, seems to have taken the idea to an extreme: “…just have a message within a nicely designed bubble … [that’s a] much nicer experience than any app.” The business impacts of intelligent conversational agents have been felt across industrial boundaries. The phrase personalization at scale has become the new holy grail in product development as companies invest heavily in human-computer interfaces to “mass produce” customer delights at minimal marginal costs.
However, despite the latest advancement of artificial intelligence in the domains of natural language understanding and speech recognition, conversational systems backed by machine learning algorithms do not yet rival their human counterparts in either cognitive capabilities (IQ) or emotional intelligence (EQ).
One challenge is the difficulty for machines to infer the true meaning from a span of words or sound bites. Consider the following tweet-sized passage:
“Caffeine is found in almost every over-the-counter fat-burning supplement commercially available today. You could burn more fat.”
As human readers, we immediately sense the “positive vibe” in the text, which describes one of the benefits of caffeine. Until recently, however, most machine reading comprehension algorithms would pick up a predominantly negative sentiment, because the text sample, on the surface level, includes negative words such as “fat” and “burn.”
Another challenge stems from the fact that many knowledge-based questions are ambiguous in nature and require multi-perspective answers. Is caffeine good or bad? How about cholesterol? Are gun control laws effective? As humans, we tend to answer “umm… it depends.” In the domain of machine reading comprehension, however, synthesizing answers from multiple perspectives has proven to be an ever-daunting task. First comes the “pollution” by low-quality content—especially against the backdrop of today’s clickbait practices in digital content creation. Then, we run into the risk of a surface-level interpretation instead of a deep semantic inference. Last, but not least, we need a reliable ranking algorithm to prioritize, merge, and synthesize the answers from multiple perspectives.
Conversational AI & Open Datasets
Despite the challenges, AI researchers across the globe have been closing the gap in both IQ and EQ between human and machine-learned conversational agents, largely thanks to the advancement of multi-layer artificial neural networks in the AI domain. For example, the Facebook AI Research (FAIR) group has trained a conversational bot to successfully negotiate deals with humans. Microsoft’s AI and Search division has been tackling automated question answering by leveraging deep neural networks (DNNs), and recently exceeded human performance on a key dimension of the Stanford Question Answering Test (an industry standard). Below is a multi-perspective synthesis if you ask, “is cholesterol good?” on the Bing search engine today.
Large-scale algorithms for language comprehension and knowledge aggregation are extremely data-hungry. Training the algorithms requires large amount of clean and unbiased data. In the MS MARCO dataset, all questions are sampled from real users, and answers to the queries have been generated and curated by human judges. At its completion, the dataset will contain one million question-answer pairs, which will make it the most comprehensive real-world dataset of its kind in both quantity and quality. In addition to empowering a global community of AI researchers, the open dataset, just like Wikipedia, can be further edited and refined based on any user feedback, which further shields the dataset from the “curses of bias and variance,” a phrase used by famed AI scientist Pedro Domingos to describe the scarcity of high-quality data. It is now upon us to create the next generation of intelligent product and services that combine advanced technology with deep human insights.
Shu “Steve” Zheng (HBS ’18) was a program manager at Microsoft Silicon Valley Office, driving product development in AI and Search. Born and raised in Shanghai, he studied computer science at the Institute down the Charles River.
Anton McGonnell (HBS ’18) is from Ireland and has spent his career to date in health tech and enterprise tech and has contributed to innovation policy in Northern Ireland. He is passionate about artificial intelligence and its application in the future. He plans to found a tech company after HBS and make trillions. He also wants to make Belfast the new startup epicenter of Europe.