Literally the day after I got the Echo to work with my home automation, there were a flurry of announcements about the Alexa Skills Kit. That’s a subject for another post. It turns out that this hack to do direct home automation is a better experience than what you can get by making an Echo “app”.
Anyone who has taken steps toward home automation can probably relate to the feeling of wrongness that the Amazon Echo has such limited options for integrating into a smart home. I don’t use the Belkin WeMo system or Philips Hue light bulbs. But it just seems like I should be able to say, “Alexa, turn on the kitchen light” and make it work with my setup. There’s just enough already built in that not being able to do this is frustrating.
Here’s what I did to get it to work. My solution is general enough that it can be easily tweaked to work with many different technologies as long as you’ve got some kind of API available.
Here are some various technologies I’ve collected and installed over the years:
- Lights: Insteon with ISY-994i controller
- Alarm: Elk M1-Gold with Elk M1XEP Ethernet Interface
- Audio/Video control: Global Caché GC-100
- Media: Kodi (formery XBMC) on Ubuntu
Tying this all together, I have a dedicated internal server running Apache and wsgi Python code. I’ve built small Python modules that implement functions for common actions on each device. So I have an elk.py, isy.py, xbmc.py, etc. I then provide a more-or-less unified RESTful API on an internal URL.
Amazon Echo’s Current Home Automation Support
Until the recent addition of support for the Wink hub, the Echo knew how to talk to only the Belkin WeMo switches and the Philips Hue lights. That’s great if you have those devices, but even with the added support for Wink, there’s no overlap between my home setup and what the Echo supports. And the minute you want to do something more complex, say a macro that turns down the lights and turns on the TV, you’re out of luck.
The WeMo devices use the UPnP protocol to advertise themselves on the network, respond to searches from controllers, and define the details of their control interfaces. The Echo searches for the WeMo devices specifically and is programmed to know about the WeMo API. The minimal amount that the Echo uses the UPnP protocol means that it should be possible to emulate WeMo devices on the network in software.
Finding out how the Echo and the WeMo interact took some network sniffing with Wireshark. Because the Echo and WeMo are both WiFi devices, capturing the network traffic required a wireless adapter that could be put into “monitor mode”. This is often not possible under Windows. I used a USB WiFi adapter on a Linux system and had no problems. The next obstacle is the encryption between the access point and the WiFi devices. I wasn’t willing to turn off my security. Wireshark can decrypt the traffic if you tell it your SSID and passphrase, as long as the captured data includes the four EAPOL handshake packets from each device. This handshake is done when the device connects to the access point, so Wireshark needs to be capturing when you plug in the Echo and the WeMo.
Here’s an overview of the sequence of events that takes place when the Echo discovers WeMo switches and then responds to a voice command to turn the switch on:
Emulating the WeMo Switch
Creating a software emulation of the WeMo switch would allow me to have as many virtual WeMo devices as I wanted on my network, each with a different name. Each switch can be told to turn “on” or “off”, so the interface is pretty basic. But with an unlimited number of virtual switches, it doesn’t cost anything to have one called “Television”, another called “T. V.” that does the same thing as “Television”, and one more called “T. V. Mode” that turns the television on or off and sets the lights to their desired states.
Here’s what I decided I needed for my virtual WeMo cloud:
- An IP address for each virtual switch.
- A listener for UDP broadcasts to address 184.108.40.206 on port 1900.
- A listener on port 49153 for each switch on its associated IP address.
- Logic to customize the search response and the setup.xml to conform to the UPnP protocol and give the Echo the right information about each switch.
- Logic to respond to the on and off commands sent by the Echo and tie them to whatever action I wanted to really perform.
I don’t know that the Echo requires a different IP address for each switch or if I can use multiple ports on a single IP address or even multiple URLs on a single port. In theory, if the Echo honors the data sent in response to its searches and requests, it should be possible to just assign each virtual switch its own URL. But it’s also possible that the Echo is hard-coded to use port 49153 and to POST to /upnp/control/basicevent1. I haven’t yet experimented to find the minimum conditions that will work, so I started by emulating each switch with its own IP address. With the Linux server I’m using, it’s easy to create eth0:1, eth0:2, etc. each with its own IP address.
220.127.116.11:1900 is the address and port specified by the UPnP protocol. Only one such listener is needed since it can send multiple responses, one for each switch, in response to a search request.
A search request from the Echo is a UDP broadcast formatted as an HTTP request, with HTTP headers indicating what is being searched for. There is no body. The search request comes from the Echo’s IP address and (at least in my case) port 50000. The request looks like this:
M-SEARCH * HTTP/1.1 HOST: 18.104.22.168:1900 MAN: "ssdp:discover" MX: 15 ST: urn:Belkin:device:**
Each UPnP device on the network that satisfies the search term is supposed to send a UDP message to the IP address and port that made the search request. The response is formatted as an HTTP response. But this is not TCP, there aren’t really any connections involved. One request from the Echo generates many responses depending on the number of switches on the network. The response from a switch looks like this:
HTTP/1.1 200 OK CACHE-CONTROL: max-age=86400 DATE: Mon, 22 Jun 2015 17:24:01 GMT EXT: LOCATION: http://192.168.5.190:49153/setup.xml OPT: "http://schemas.upnp.org/upnp/1/0/"; ns=01 01-NLS: 905bfa3c-1dd2-11b2-8928-fd8aebaf491c SERVER: Unspecified, UPnP/1.0, Unspecified X-User-Agent: redsonic ST: urn:Belkin:device:** USN: uuid:Socket-1_0-221517K0101769::urn:Belkin:device:**
In order to avoid potential problems with my switch emulation, I tried to make it as compliant as possible with the UPnP specification and the actual behavior of the WeMo devices. I suspect that the Echo is probably not very picky, though, and many of the details could be ignored.
The 01-NLS header is supposed to contain a UUID that changes with every reboot. I assign a UUID each time the virtual switch is created, which happens each time I restart my WeMo emulation program.
The X-User-Agent is not part of the UPnP spec. However, this is what the WeMo sends and there’s at least one project on the web I’ve seen where they look for this string to find WeMo devices.
The uuid part of the USN header is supposed to be persistent for each device even across reboots. Belkin appears to use the fixed string “Socket-1_0-” followed by the switch serial number. I created a simple transformation of the switch’s friendly name into a generated serial number. That way, a switch with a given name will always generate the same USN. Having multiple switches on the network with the same name would cause unpredictable problems.
Once the Echo receives the search response, it sends an HTTP GET request to the URL specified in the LOCATION header. That request is very minimal:
GET /setup.xml HTTP/1.1 Host: 192.168.5.189:49153 Accept: */*
And the switch responds with the device description file, which is 133 lines long in the WeMo switch that I tested. The top part of that file contains this:
<?xml version="1.0"?> <root xmlns="urn:Belkin:device-1-0"> <specVersion> <major>1</major> <minor>0</minor> </specVersion> <device> <deviceType>urn:Belkin:device:controllee:1</deviceType> <friendlyName>kitchen light</friendlyName> <manufacturer>Belkin International Inc.</manufacturer> <manufacturerURL>http://www.belkin.com</manufacturerURL> <modelDescription>Belkin Plugin Socket 1.0</modelDescription> <modelName>Socket</modelName> <modelNumber>1.0</modelNumber> <modelURL>http://www.belkin.com/plugin/</modelURL> <serialNumber>221517K0101769</serialNumber> <UDN>uuid:Socket-1_0-221517K0101769</UDN> <UPC>123456789</UPC> <macAddress>94103E3489C0</macAddress> <firmwareVersion>WeMo_WW_2.00.8326.PVT-OWRT-SNS</firmwareVersion> <iconVersion>0|49153</iconVersion>
The <friendlyName> element is what the Echo will listen for when you tell it turn it on or off. This needs to be set differently for each virtual switch.
The <serialNumber> and <UDN> fields use the same information that was used to populate the USN header of the search response.
Those are the only fields I change in my generated setup.xml response. I haven’t noticed any problems by not changing the <macAddress> even though it’s the same value for all of my virtual switches.
When you tell the Echo to turn a device on or off, this is what it sends as an HTTP request to the device:
POST /upnp/control/basicevent1 HTTP/1.1 Host: 192.168.5.189:49153 Accept: */* Content-type: text/xml; charset="utf-8" SOAPACTION: "urn:Belkin:service:basicevent:1#SetBinaryState" Content-Length: 299 <?xml version="1.0" encoding="utf-8"?><s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><s:Body><u:SetBinaryState xmlns:u="urn:Belkin:service:basicevent:1"><BinaryState>1</BinaryState></u:SetBinaryState></s:Body></s:Envelope>
That’s an awful lot of stuff just for the “1” or “0” you actually care about in the <BinaryState> element. I didn’t bother with a SOAP parser. I simply look for the string “SOAPACTION: “urn:Belkin:service:basicevent:1#SetBinaryState”” in the data and, if it’s there, look for either “<BinaryState>1</BinaryState>” or “<BinaryState>0</BinaryState>”. Similarly for the response, which has no dynamic data other than the date and the content-length:
HTTP/1.1 200 OK CONTENT-LENGTH: 295 CONTENT-TYPE: text/xml; charset="utf-8" DATE: Mon, 22 Jun 2015 22:45:57 GMT EXT: SERVER: Unspecified, UPnP/1.0, Unspecified X-User-Agent: redsonic <s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"><s:Body> <u:SetBinaryStateResponse xmlns:u="urn:Belkin:service:basicevent:1"> <CountdownEndTime>0</CountdownEndTime> </u:SetBinaryStateResponse> </s:Body> </s:Envelope>
That’s all it takes to finish the dialog necessary to make the Echo think the software is a genuine WeMo switch. A tiny bit of extra code wired the “1” and “0” commands into REST API requests, and I was able to make it all work from voice command to action. When I walk in from the garage at night, I can tell Alexa to turn on the kitchen lights without ever having to reach for a wall switch.
That’s really all I care about for my WeMo emulation. In particular, the search request that Echo broadcasts is not the same as the search performed by the WeMo app on my Android phone. So the WeMo app can’t see or control my virtual switches. This is not a drawback for me, so I don’t intend to flesh out the UPnP handling.
So Where’s the Code?
The code is now available. You can read about it in my Virtual WeMo Code for Amazon Echo post.
The code is a single Python file and isn’t very complex. I am happy to release the source except that right now it contains the entire contents of the setup.xml I captured from my WeMo switch. I don’t know how Belkin would feel about me publishing that file beyond the short excerpt above.
My intention is to spend one or two more evenings experimenting to see which components the Echo cares about and which ones can be removed or changed and have the Echo still work. I’ll then produce my own contents that make the Echo happy. Of course, that’ll make my emulated switches even less compatible with other WeMo apps, but I don’t consider that a downside. As soon as that’s done, I’ll update the code and post it.
Less than 24 hours after I got this working and was controlling my home automation with voice commands, Amazon opened access to the Alexa Skills Kit. My next post about the Echo will talk about why it’s not as good as the WeMo emulation for basic on/off control, and what I’ve done with it to integrate the Kodi media system with the Echo.