Episode 102 | December 11, 2019

With all the buzz surrounding AI, it can be tempting to envision it as a stand-alone entity that optimizes for accuracy and displaces human capabilities. But Dr. Besmira Nushi, a senior researcher in the Adaptive Systems and Interaction group at Microsoft Research, envisions AI as a cooperative entity that enhances human capabilities and optimizes for team performance.

On today’s podcast, Dr. Nushi talks about what it takes to develop collaborative AI systems and unpacks the unique challenges machine learning engineers face in their version of the software development cycle. She also reveals why understanding the “terrain of failure” can help researchers develop AI systems that perform as well in the real world as they do in the lab.

Related:

Microsoft Research Podcast: View more podcasts on Microsoft.com

iTunes: Subscribe and listen to new podcasts each week on iTunes

Email: Subscribe and listen by email

Android: Subscribe and listen on Android

Spotify: Listen on Spotify

RSS feed

Microsoft Research Newsletter: Sign up to receive the latest news from Microsoft Research

Transcript

Besmira Nushi: What I’d like AI to be, I’d like it to be a technology that enables everyone, and that is built for us. It’s built for people. My parents should be able to use it, an environmental scientist should be able to use it and make new discoveries, or a policy maker in order to take good decisions.

Host: You’re listening to the Microsoft Research Podcast, a show that brings you closer to the cutting-edge of technology research and the scientists behind it. I’m your host, Gretchen Huizinga.

Host: With all the buzz surrounding AI, it can be tempting to envision it as a stand-alone entity that optimizes for accuracy and displaces human capabilities. But Dr. Besmira Nushi, a senior researcher in the Adaptive Systems and Interaction group at Microsoft Research, envisions AI as a cooperative entity that enhances human capabilities and optimizes for team performance.

On today’s podcast, Dr. Nushi talks about what it takes to develop collaborative AI systems, and unpacks the unique challenges machine learning engineers face in their version of the software development cycle. She also reveals why understanding the “terrain of failure” can help researchers develop AI systems that perform as well in the real world as they do in the lab. That and much more on this episode of the Microsoft Research Podcast.

Host: Besmira Nushi, welcome to the podcast!

Besmira Nushi: Thank you. It’s great to be here. I’ve been following the podcast in the last year, and, you know, it’s always interesting. Every new episode is different.

Host: You’ve been following the podcast! That’s nice!

Besmira Nushi: I have!

Host: Well, I’ve talked to you before. Last time, you were on a Research in Focus panel at Faculty Summit in 2017 and you talked about ML troubleshooting in real–time systems. Let’s go there again. As a senior researcher in the Adaptive Systems and interaction group, you work at what you call the intersection of human and machine intelligence.

Besmira Nushi: Yup, yup.

Host: Which I love. So we’ll get to your specific work in a minute, but in broad strokes, what’s going on at that intersection? What gets you up in the morning?

Besmira Nushi: Well, the intersection is a rich field and it really goes both ways. It goes into the direction of how can we build systems that learn from human feedback and input and intervention, and maybe learn from the way people solve problems and understand the world? And it also goes in the other direction, in like, how can we augment the human capabilities by using artificial intelligence systems? How can we make them more productive at work? And putting the best of both worlds together.

Host: Let’s talk a little bit more about this human-AI collaboration. You’ve framed it in terms of complementarity…

Besmira Nushi: Mmm-hmm.

Host: …because humans and machines have different strengths and weaknesses. And you’ve also characterized it as putting humans and machines together to quote-unquote “optimize for team performance.”

Besmira Nushi: Yes!

Host: So, elaborate on that for us. How should we understand AI as collaborator versus AI designed to work on its own?

Besmira Nushi: You know, people and algorithms, they have very different skills. We’re really good in reasoning and imagination. And machines are good in processing these terabytes of data for us and giving us these patterns. However, you know, if we can use the machine capabilities in an efficient way, we can be quicker and faster, as I said. But then, on the other hand, you know, these are concepts that, if you think deep about it, they are not that new. In the sense that when we invented personal computing in the 80s, this is one of the reasons why it became so successful, because the personal computer was suddenly this “buddy” that could help you do things faster and quicker. But then there is another thing that enabled that development in those years and really, I think that that is the field of human computer interaction.

Host: Yeah.

Besmira Nushi: What HCI did in those years, is that it made the interface understandable from a human perspective and it really made the computation technology accessible for everybody. So now we see billions of people around the world that use some form of computation without making any significant effort.

Host: Right.

Besmira Nushi: And I think that today, we are in front of such forms of developments in artificial intelligence. We are, in a way, in the position that we can innovate in the way how people interact with AI technologies, but we still need to make that leap and make AI accessible for users.

Host: Right.

Besmira Nushi: And this is what I mean by the fact that, so far, we have been optimizing AI for performance only, and performance when the AI is designed to play alone in the field.

Host: Yeah.

Besmira Nushi: But if it has to play together with a human, there are other scores that we need to think about. For example, one of them is interpretability, in that people should be able to understand how a machine makes a prediction. Another one that we focus a lot on is predictability of errors. And what this really means is that, if I’m working with an AI algorithm, I should be able to, kind of, understand that that AI algorithm is going to make mistakes. And this is important for me as a user because, if I have the agency to make the final decision at the end, I need to know when it’s right or wrong…

Host: Right.

Besmira Nushi: …so that I can correct it at the right time, as it goes.

Host: Let’s drill in on the topic of AI as a collaborator then.

Besmira Nushi: Okay.

Host: We’ve talked a little bit about AI working alone and it’s designed to optimize for performance and speed. How do you then go about training ML models with collaborative properties in mind instead of optimizing for speed and performance? What are tradeoffs in the algorithmic design and how do you go about enforcing them?

Besmira Nushi: Right, right. Yeah, so, you’re right that it always a tradeoff. It is a tradeoff for the machine learning developer to decide which model to deploy. Should I deploy a model that is fully accurate by its own or a model that optimizes team performance? But the difficulty is that this tradeoff is not always as visible or as easy to access. So I see the work that we do in our group as an enabling paradigm that allows you to explore this tradeoff or, in a way, to extend it and show it to the developer so that the developer can make the right choices.

Host: Okay.

Besmira Nushi: And there are two ways how you can go about this. The first one is what happens during grid search. So, in machine learning, we call grid search this process where you try to search through many, many, many parameters. And through this search, you try to find the model that pleases you the most in that it is accurate, but what we suggest is that you should also be looking at these other scores that work for collaboration, like predictability of errors. So, the second way you can go about this is to include these definitions in the training objective itself.

Host: Okay.

Besmira Nushi: And we have been doing more and more work in this part because we think that this explores the tradeoff even more. It’s extends it. It gives you a rich perspective of how many other parameters you can optimize and augment the objective function in a way that it should think about accuracy, but it should also think, with a certain factor, about these human collaboration scores.

Host: Right.

Besmira Nushi: And the best way to go is to do both, during training and during grid search, so that you really get the algorithm that works best for humans.

Host: So, you’ve talked a bit about enforcing properties within the algorithmic design.

Besmira Nushi: Yeah.

Host: Unpack that a little bit more.

Besmira Nushi: Yeah, so the enforcement usually really comes from the optimization stage. During optimization in machine learning, we model a loss function. And this loss function is the function that gives a signal to the training algorithm on how well it is doing, whether it is a good model or it is a bad model, and this is really the only signal that we get and it’s computed on the data, right? So we’re saying, that signal should be augmented with these human collaboration scores and be put together so that when we train the algorithm, these properties get enforced.

Host: Okay.

Besmira Nushi: The other way you can do it is by adding constraints in the training algorithm, saying that, whatever you do, whatever model you find, you shouldn’t be going lower or upper than a particular score.

(music plays)

Host: Well, let’s turn our attention to the methodologies and tools you’re working on for machine learning systems. And before we get specific, I’d like you to give us a general overview of the software development cycle, writ large, talking about the differences between traditional software development and software engineering for ML systems.

Besmira Nushi: Yeah, so machine learning has really been, until recently, a research-like field in that the algorithms were in the lab and they were accessed by the machine learning scientists, but now that we have machine learning deployed out there in the field in different products, machine learning software is being combined with traditional software. However, these two are very different in nature together in that machine learning software can be un-deterministic. It may be hard for you to say what it is going to do. And it also may have a black box nature, in that it’s hard to understand what exactly it is going to say. And, also, it may make these types of mistakes that do not happen because of bugs in the code. It may just be because the data that you are training the algorithm upon, might either be insufficient, in that there is not enough to learn what you want to learn, or it may not quite resemble what is out there in the real world. It is just lab data, and it’s never going to be as rich as the world that we live in. And also, needless to say, and this is a very subtle difference, but we often forget about it, is that in machine learning we don’t really write the code that is going to execute the program. We write the algorithm that is going to process the data and come up with a function that is the actual code. So, all these differences make the process very, very different. It is very data-dependent. And in traditional software engineering, for example, we didn’t have these parts of the lifecycle that we currently care so much about. Like for example, data collection and data cleaning, if it has mistakes. In fact, like, according to a study that we recently did in Microsoft, collection and cleaning takes at least 50% of the time from a machine learning engineer. And, you know, this is significant.

Host: Yeah.

Besmira Nushi: It’s a lot of time that is spent into these new stages.

Host: Yeah.

Besmira Nushi: The other thing is versioning. So, in traditional software, we know very well how to do versioning. We have tools like GitHub and other versioning tools that do that for us. But in machine learning, we have to version not only code, we have to version the model, we have to version the data, and the parameters of the model… So there are all these things that are entangled together and that we need to version in the right way.

Host: Okay. So, let’s go a little deeper there. You’re got your traditional software engineer that is very comfortable with how this all works, and now you’ve got machine learning engineers that are adding on layer upon layer to this software development cycle. How are you addressing that?

Besmira Nushi: So far what we have done is, we’ve started with first, understanding the needs of machine learning engineers, like, and understanding their background as well.

Host: Mmm-hmm.

Besmira Nushi: You know, because machine learning engineers, they may come from different fields. Some of them may not have a computer science background. They may be data scientists.

Host: Okay.

Besmira Nushi: They may be statisticians. Right? And the practices that are used in statistics and computer science may be very, very different. Within the same team, you may have people with so many different backgrounds and you need to put them together to speak the same language.

Host: Right.

Besmira Nushi: So we started by trying to understand, what is their background and what are their problems? And the first number one, you know, challenge that they have is to have end-to-end support in tooling that can support all these different stages in the lifecycle. It will take some time, but I think we’re getting closer!

Host: Well, have you laid out the stages, as it were? I mean, the software development cycle is very precise.

Besmira Nushi: Yep, yep, yep.

Host: And the machine learning cycle is a lot bigger, isn’t it?

Besmira Nushi: Yeah. It is. So, we have defined a few stages and there is other work that have tried to do the same thing. We have stages like data collection, data cleaning, model training, feature engineering and then model monitoring, and debugging and maintenance. So, these are kind of the main stages, if I didn’t forget any of them!

Host: Yeah, I know!

Besmira Nushi: But, but what is different – there’s something that is very interesting in the difference between the two –

Host: Mmm-hmm.

Besmira Nushi: – is that the machine learning life cycle is very experimental in that it is a little bit of trial and error, in a way, this grid search that I mentioned earlier.

Host: Yeah.

Besmira Nushi: It’s a little bit of trial and error. You have to try different things to see whether it works for your domain. Maybe you clean the data once more. Maybe you add some more features or a different representation. So there is a lot of experimentation. And when there is a lot of experimentation, there is a lot of uncertainty. You don’t know, as an engineer, whether it’s going to work or not.

Host: Right.

Besmira Nushi: So it has changed even the way we plan and manage projects.

Host: Well, let’s go a little deeper and talk about that troubleshooting and debugging that you’re working on. It’s a key challenge for all software systems, but it’s particularly challenging for debugging a black box.

Besmira Nushi: Yeah.

Host: And especially for complex, mission– and safety–critical software and hardware, which you’re dealing with all the time…

Besmira Nushi: Mmm-hmm.

Host: …in the real world, so how do you go about – let’s get real – how do you go about designing methodologies for the debugging phase in the ML development cycle?

Besmira Nushi: Yeah. It’s a topic that is really timely in that, you know, if this is deployed in places that are high stakes, like in medicine or autonomous driving, this can really have, like, either a very good impact or a very bad impact on people…

Host: Or flying.

Besmira Nushi: …or flying, yeah… exactly! It can have a, you know, a different impact on people’s lives. One of the things that we say in our work is that good debugging practices start from rigorous evaluation. You know how many times we hear things such as, this model is 90% accurate on a particular benchmark. And we use that one single score to describe the whole performance of one algorithm on the whole data set. Often, that single number may hide so many important conditions of failure that we don’t know about and that are so important to know if you are an engineer. What we suggest is that that performance number should be sliced down into different demographics, and different groups in the data, so that we really understand, is there any pocket in the data that is maybe underrepresented and maybe the error rate is higher?

Host: Right.

Besmira Nushi: So, these are the things that we suggest to do. And then we also continue and build interpretable models in order to explain exactly to the engineer when and how does a machine learning model fail. And we often suggest to do this for different groups. I’ll just give you a simple example. We recently were looking at gender recognition software from image photos, and we noticed that when these models are trained only from celebrity data, they have a much higher error rate for women that have short hair, that do not have any eye makeup on and also do not smile in the photo. It’s complicated. It is all these different dimensions that are put together…

Host: Right.

Besmira Nushi: …and the process of finding this particular thing, for example, I would have never thought to go and look for it, but this is what the interpretable model gives to you. And, you know, it takes away a lot of load from the engineer if you can at least automate part of this process.

Host: So, how are you automating parts of the process?

Besmira Nushi: Yeah, so what we’re doing is that we’re really gathering this data together and we are asking engineers not only to store the final aggregate number of performance, but we are asking them to give us the performance numbers on each example. And, at that point, you become super powerful…

Host: Right.

Besmira Nushi: …in that you can put in an interpretable model that can slice and dice the data in the right way and can show you visualizations about where these pockets of errors happen.

Host: Well, and it also comes back to what data is available. What data you have access to. What data people are allowing you to use. Because there’s a tradeoff right there.

Besmira Nushi: Yeah.

Host: You want fully accurate and representative models…

Besmira Nushi: Mmm-hmm.

Host: …but celebrities are out in the public domain and a lot of people just say, I don’t want my picture in anybody’s data set.

Besmira Nushi: Yeah, yeah, yeah, yeah.

Host: And therefore, you’re precluding some of the important nuances that you might need in your data…

Besmira Nushi: Yup.

Host: …to get accurate models.

Besmira Nushi: Yup. So, there is a tension between, you know, being ethical about collecting your data and being accurate in the data. I think, as a community, and also as an industry, we need to think deep about how to standardize this process.

Host: Well, as we’ve just kind of laid out, it’s hard out there for a machine learning engineer. These people need a whole new tool belt for the job. How is your research in Adaptive Systems helping to equip them with the tools they need for an AI world?

Besmira Nushi: These methodologies that I just mentioned, in the last two years we have worked hard, with many people at MSR AI, but also in Microsoft Cognition, to build concrete tools that can automate part of this process.

Host: Okay.

Besmira Nushi: And the tool that we are building now, it’s called Error Terrain Analysis. And it really helps in understanding the terrain of failure. This is an effort that I’m leading together with Ece Kamar and a lot of people from the Ethics and Society Team that cares a lot about these types of problems in the company, and broader than that. And really, what we are doing with the tool is that we are building workflow of processes and modular visualizations that can be put together like Lego pieces so that you can go from one general view of errors to a more detailed one, and even more detailed one in in looking at particular instances.

Host: Let me ask you one question on that, because we talked – I’m harkening back to our conversation at Faculty Summit –

Besmira Nushi: Yeah.

Host: – and we talked about how modularity is good, both in software development and for understanding how things work, but it also can have problems in the debugging because you have these different modules that aren’t all connected and if something goes wrong with one or something is better in one and you’ve got an older module, it poses new problems.

Besmira Nushi: Yeah. It does. In this case, we’re thinking about these modules more as visualization models in that first, you want to have a large overview of the data and this would be like one module. And then you want to drill down into the other ones so that you do not get overwhelmed, as an engineer. So that it is not too much information for you.

Host: Got it. Okay.

Besmira Nushi: Yeah.

Host: Go back to a phrase you just used, the “terrain of failure.”

Besmira Nushi: Mmm-hmm.

Host: Unpack that for me. That’s very intriguing!

Besmira Nushi: Yeah. So, what we mean by the terrain of failure is that, if you think about it, like, you know, as a set of mountains and hills and seaside, there are cases when, you know, the terrain of failure is really calm in parts of the data, in that the examples are easy to classify, there is nothing special about them, and everything is flat. And there are other cases where the data is so rich, there is so much diversity in the data, in like demographics or other properties where the error can fluctuate a lot. And we want to feel that terrain and to really understand what it looks like. Yeah.

Host: I… That’s one of the evocative phrases I’ve heard. What other kinds of tools do ML engineers need that are being worked on sort of upstream in the research community?

Besmira Nushi: I’d like to mention a set of other tools that we are also building in the group. One of them is called Interpret ML. And this is work that is led by Rich Caruana in the Adaptive Systems and Interaction group. They are really building a tool set for building interpretable models and generating explanations from these models. Yet another tool is called – this is shiny new – it’s called Tensor Watch, and this is built by Shital Shah who built this tool for doing real time debugging, so that you can see the errors and the training loss of machine learning models on-the-fly. That said, I think that there is still a lot to do when it comes to stitching all this together into one single framework. And, as I said, we need to do end-to-end frame work in versioning, in data provenance, data documentation, and in tools that can allow us to take this insight that we get from troubleshooting and debugging and integrating them back into the system for fixing them. And I will not claim that everything is going to be automated, but at least there is, like, a workflow and a process if that happens.

Host: Well, at this point, I’ll take good over automated, right?

Besmira Nushi: Right, right! Yeah, yeah!

Host: Well, hype notwithstanding, AI is still highly dependent on people, and I’m not sure that’s a bad thing, I think that might be a good thing. Why does ML add a difficult layer to this idea of “self-healing” software, that’s one of the things you talked about at Faculty Summit, where one component fixes another based on feedback? And how can strong causal reasoning tools and counterfactual analysis tools help us better understand what went wrong?

Besmira Nushi: Yeah. It is hard to heal a machine learning software, but it is even harder to heal a system that has many, many machine learning components that are tied together. And the reason why that is difficult is because sometimes it is hard to understand the different dynamics and interactions between the components. We’ve done this work that I also talked during the Faculty Summit on generating counterfactuals in the subcomponents in order to understand how these differences in the subcomponents affect the larger systems.

Host: Right.

Besmira Nushi: And again, we are using human intervention to generate these counterfactuals for us so that we can understand the dynamics better.

Host: Right.

Besmira Nushi: Debadeepta Dey is starting a new streamline of work in this space in order to optimize large systems that are integrative and to optimize them, on-the-fly, in real time. So, this is something new that is happening. In overall though, you know, the good news about causal reasoning in these systems, for debugging particularly, is that, as opposed to other fields, like, for example, in medicine, we can actually run the system again. If we want to apply a fix and to see how that works, we can apply the fix and see what is the impact, which is something that you cannot easily do in other fields, so that’s good. The not–as–good news is that we still have to understand the dynamics of the components when we have to understand the data distribution – how is the data generated? – in order to make the right assumptions when we do causal reasoning.

Host: Right. So, drilling in a little bit on Debadeepta Dey’s work and – it’s all of your work, but he’s like a focal point there – and he talked a little bit about this at Faculty Summit as well, these big systems with many parts and pieces… and you’ve got to be able to troubleshoot and debug in real time…

Besmira Nushi: Yeah.

Host: I want you to talk a little bit more about how that variable changes the game.

Besmira Nushi: Yeah, so in his work, Debadeepta talks about things such as, you might get new instances that are running through the system that the system has never seen before.

Host: Right.

Besmira Nushi: It doesn’t know how to optimize for these new instances. But by using the technique that he is building with reinforcement learning and off-policy learning, you can really try to adapt with less examples…

Host: Right.

Besmira Nushi: …and try to manage these instances that, you know, are not that well–known for the system.

Host: Right. And so that’s real–world, real–time, life.

Besmira Nushi: Yeah, exactly.

Host: Which is what humans are good at, is adapting, and machines are still a ways back.

Besmira Nushi: Yeah. It’s kind of… yeah, adapting to an unknown world in a way.

Host: The uncertainty.

Besmira Nushi: Yeah.

Host: All right. Well, I always like to know where research is on the delivery spectrum. And on a scale of “ten years out or more” to “already shipped,” where can we find what I would call trustworthy AI in products and services now, and given what you’ve told us, how confident should we be…

Besmira Nushi: About…

Host: …that it’s going to work as advertised?

Besmira Nushi: Yeah. So, I think that there exists some grand opportunities for us as researchers to work with engineers together in order to really improve this tooling aspect for allowing rigorous evaluation and debugging. And I think that if we put the right effort, and if we do this the right way, we can really make progress in five years. In order to not really solve the generic intelligence problem, but in order to be able to make the right promises to the user.

Host: Right.

Besmira Nushi: You know, one of the problems that we currently have is that we cannot really promise to the user, or specify, the performance of a system. We need to learn, still, how to do that and how to debug the bad cases. So, if we kind of go in both ends, if we are able to explain the performance in the right way, and also understand it in the right way, we can kind of meet in the middle with the user and set the expectations right.

Host: I would think it would be really important at this point to manage expectations, as they say, in terms of what you referred to as promises that you make to the user. So what are you doing in terms of communication and education about what you want to put out there in these real time systems?

Besmira Nushi: Yeah, so, exactly one of the things that we’d like to do is to be able to generate these types of reports that can describe the behavior of the system and can really quantify you in numbers that this is what we have seen about this system to behave in the past. It is 90% accurate for this group of people. It is 50% accurate if you stand in a particular pose, meaning that you shouldn’t be able to use the system in those cases. So, being able to decompose it and break it down in these cases will set the expectations right for the user and really to understand, if you see it in paper, here it’s green, here it’s red, you can kind of understand that, well, the system is not perfect.

Host: Right.

Besmira Nushi: And these are the right cases where I need to use it and others that maybe I shouldn’t.

(music plays)

Host: Well, this is the part of the podcast where I like to ask, what keeps you up at night? We could go in a couple directions here, either on the technical front or the social front, or maybe even both. What risks are you trying to mitigate and how are you going about that?

Besmira Nushi: Yeah, so sometimes I wonder whether we are building the right thing. I worry that we end up building things that are isolated from the world and maybe not safe. So, what I’d like AI to be, I’d like it to be a technology that enables everyone, and that is built for us. It’s built for people. My parents should be able to use it, an environmental scientist should be able to use it and make new discoveries, or a policy maker in order to take good decisions. And these are the things we really have a responsibility for, as researchers, to make sure we are building the right thing and that it’s safe and it’s a technology we can rely on.

Host: Well there’s all kinds of iterations, and in different verticals and in different horizontals, where we’re envisioning our future with AI. A lot of companies are thinking, how can we do this for businesses, you know, with speech recognition? And other places that have maybe some more nefarious purposes for AI and they are not saying much about it. So, is there anything you particularly see – let’s talk about the social front for a second – in terms of what we ought to be thinking now, as potential end–users of this?

Besmira Nushi: I think that there is a big question about how we manage the inventions that come up, either, you know, as academics, or as industry. There are decisions that need to be made in terms of, like, how do you monitor? And how do you report how you are using a certain product?

Host: Right.

Besmira Nushi: Right? Because we see these questions coming up even for other technologies that are not really related to intelligence. And there should be some sort of protocol, when you buy a certain product, as a user, to really claim which scenarios you are going to use it and for what reason.

Host: So, that ends up on the upstream, regulatory end of things.

Besmira Nushi: Yeah.

Host: And it goes into much more of the ethics and policy around AI. Well, tell us your story, Besmira. How did you get started in computer science, where has your journey taken you, and how did you end up at Microsoft Research doing the work you’re doing?

Besmira Nushi: I did my undergrad in Albania, which is my home country. So, this is a small country in south eastern Europe. One interesting thing about how I started is that, in Albania, computer science is a very gender-balanced field in that my peers at the university, 50% of them were women. And, in a way, I feel really lucky that I started my career in such an environment. It gives you the type of confidence that maybe one wouldn’t get if you are in a different environment. After that, I went for a masters. It was a double degree masters in Germany and in Italy, so I ended up spending one year in each of those. This was in data mining and HCI. Then I started my PhD. I spent five beautiful years in Switzerland, at ETH Zurich. And this was, again, at the intersection of human computation and machine learning.

Host: Yeah.

Besmira Nushi: So, in a way, this thing about me being at the intersection of machine learning and people has followed me in my career, and I think it has really been because I cannot give up any of them! The intersection keeps me motivated.

Host: Right.

Besmira Nushi: And it keeps me focused. And I kind of make sure that what I’m doing is useful and it is good for people out there.

Host: Right. So, from Switzerland to Redmond, how did that happen?

Besmira Nushi: How did that happen! Oh, wow! Yeah, so I came for an internship during my PhD here. I spent three months – Seattle is beautiful in the summer, and…

Host: That’s how we get you!

Besmira Nushi: Exactly! I like the group a lot. I still work with the same people.

Host: Who do you work with?

Besmira Nushi: I work with Ece Kamar very closely, Saleema Amershi, Eric Horvitz quite a lot and you know, we are surrounded by an amazing group of people…

Host: Yeah.

Besmira Nushi: …who come from very diverse backgrounds.

Host: Well, continuing on a personal note….

Besmira Nushi: Mmm-hmm.

Host: Tell us something we don’t know about you. I mean you already just did. I didn’t know that about you.

Besmira Nushi: Spoiler alert!

Host: Yeah, right, spoiler alert, I’m from Albania. Tell us something we don’t know about you, a defining experience, an interesting hobby, a personal characteristic, a side quest, any of those, that may have defined your direction in life?

Besmira Nushi: Yeah, so as you notice, I’ve moved quite a bit. US is the fifth country I’m living in, and really, when I think about it, I’ve met so many interesting people. I’ve met dear friends during the years, and it’s really these people that have shaped my personality. And they have really helped me to think out of the box, to be creative, but also learn about the different perspectives. All my friends in my network think in many different ways. They come from very diverse cultural backgrounds and this really helps you to pause and think further, more than what you have learned in school or in papers and books.

Host: All right, so you’ve got Albanian, English, German, Italian, what else do you speak?

Besmira Nushi: I speak C++, yup!

Host: C++! As we close, I want to give you the last word. What should our listeners know about what’s next in Adaptive Systems, and I know you don’t know all the answers – there’s a lot of uncertainty there, just like the field! –

Besmira Nushi: Yeah.

Host: – but what are the big unanswered questions and who do you do need to help you answer them?

Besmira Nushi: Yeah, so we have different directions in the Adaptative Systems and Interaction group. There is the whole direction of interpretability and debugging, then a lot happening on human–AI collaboration, either for decision making or in the physical world for human-robot interaction. There is a lot of work happening in reinforcement learning and robotics and decision–making under uncertainty. Overall, if I have to put a theme around this, is that we like to think about problems that are happening out there in the real world, so not in the lab. And we want to build trustworthy AI systems that operate out there. And as such, in all this diversity, we look for people that do have a strong technical background, but we also look for people who can speak all these different languages and are eager to learn more about each other’s field.

Host: Besmira Nushi, thank you for joining us today.

Besmira Nushi: Thanks for having me.

(music plays)

To learn more about Dr. Besmira Nushi and the latest research at the intersection of human and machine intelligence, visit Microsoft.com/research