ProBeat: Microsoft Teams video calls and the ethics of invisible AI
Microsoft Teams is getting a bunch of new features this year. Robert Aichner, Microsoft Teams group program manager, recently told us how Microsoft is building the most interesting one: real-time noise suppression, which uses machine learning to filter out background noise during calls.
Because the feature isn’t final, we also had a secondary discussion about the user experience. It’s one thing to filter out someone typing, a barking dog, a door being shut, a rustling chip bag, or a vacuum cleaner running in the background. It’s another to filter out something that call participants may want to hear (in fact, the team has decided not to filter out certain noises, including musical instruments, laughter, and singing).
Governments and companies alike have been abusing AI in a variety of ways. But even AI features clearly designed to help humans do something as basic as communicate clearly come with ethical questions for the companies and developers that build them. The discussion Aichner and I had gave me a brief glimpse into what businesses and developers wrestle with as they embed AI features deep into their products. I hope you find it as interesting as I did — a lightly edited transcript follows.
VentureBeat: How does it actually work in practice? Once the feature is rolled out, will you have to turn it on, or is it enabled by default for every call and every speaker?
VentureBeat: So it’s going to be on by default for recipients?
Aichner: It works on the send side. It works on the person who has the client. If I have this new client, then whoever I call will not get noise from me.
VentureBeat: Will there be any sort of indicator to the sender that there’s extra noise that is being filtered out or to the recipient that something is being filtered out? Any sort of visual indication that noise suppression is activated or anything like that?
Aichner: I think that’s some detail we haven’t really decided on, so I guess I can’t really comment on whether we would indicate that to the user. I assume you’re going down that road of ‘Hey, if you have this cool capability, why not indicate to the user that there’s a lot of noise but you can’t hear it?’”
VentureBeat: There’s that, and there’s also wanting to demo something to someone. Say I call my dog over and tell him or her to speak, and then the recipient doesn’t hear it because of the noise suppression. Or, say an audio company wants to demo something to a client. I’m sure we can think of better examples, but what happens when noise suppression gets in the way and it isn’t working as expected? You’re telling me you can’t turn the feature off, but you’re also telling me that the person that should be hearing the noise will not even know that noise suppression is modifying the call.
Aichner: That’s good feedback. We have discussed this topic. And as I said, we haven’t really made a final decision on that yet. The noise types we have … For example, typing noise, it’s not likely that you want to transmit typing noise. We are careful that certain things where we feel ‘OK, that might not actually be perceived as noise,’ then maybe we shouldn’t filter it out. But, yeah, that’s good feedback. We have thought about that, we just haven’t made the decision on that yet.
VentureBeat: Did I understand correctly that there are certain noises that you have deemed to be OK that aren’t speech, and they’re in your machine learning model?
Aichner: We looked at whether certain noises really are something which we want to filter out. I think one example, which you can probably think of, is music. Do you want to filter out music or do you not want to filter out music? Because there are cases where you could say, ‘Hey, I want to show you how I play the violin.’ And then we are filtering out the violin. That’s probably not what you want. So we are trying to see what are the cases where something would be considered as a desired signal you want to transmit.
VentureBeat: Those noise types that you want to keep, those are still up in the air?
Aichner: We have looked at certain noise types and basically said that for certain noise types we are not optimizing for those now to remove them. We would try to keep those.
VentureBeat: Right now, if I’m on a group call and someone accidentally has background noise, I can mute them. I can also unmute them, they can mute themselves, and so on, depending on the type of call. But this, because it’s automatic, and there’s no indication, that could be problematic. I think recipients should see an indication that background noise is being filtered out, and there should be a toggle that lets you turn the filter off. I think it makes sense to filter it out by default, especially if your team is confident that you get it right 99% of the time or whatever. But the user should have an override. Otherwise, you’re going to get into situations where users realize what’s going on, ditch Teams, and go use something else where they can hear the person unfiltered. Apply it to video. You wouldn’t want certain parts of someone’s camera feed filtered out without your knowledge.
Aichner: I get your point. I think the question is, what are you suppressing? The current noise suppression, we usually don’t give you a way to turn it off because it’s really more stationary noise and we believe that most people don’t want to hear that. But even there, there are special cases. In Skype we have a special mode where you can turn off noise suppression and you can turn off automatic gain control because you have professional audio setup and you don’t want the client to mess with your audio signal at all. I think there are cases where that’s justified. And then, I get your feedback. We have thought about that, and we haven’t made a final decision yet. We are looking into how confident are we that by default we would do the right thing. But then, as you said, it might make complete sense that you have some way to override it. Whether that’s a prominent button or whether that’s somewhere in the settings, that’s all something we would need to decide.
VentureBeat: I think there’s two things here. There’s the debate of whether the user should be made aware that noise suppression is activated. You should have that debate and it seems like you are. And then if you do that, there should be a secondary debate: Should there be an option to turn it off right then and there or do you have to go into the settings? Can you turn it off per call or is it a global switch for all of Teams? Those are UX decisions but, I think, also ethical decisions.
Aichner: Yep, fully agree.
ProBeat is a column in which Emil rants about whatever crosses him that week.