All necessary components should be installed in one housing. The magic mirror should be able to speak with a user and recognize pictures (e.g. like, "What is that what I've got in my hand?" - answer: "That's a bicycle.").
At the end we decided for the following components:
Now Alexa could contact the Raspberry Pi directly and get the information that it needs. First necessary information was its IP address. In our case the mirror should be able to be taken to any home and connected to the Wi-Fi there, getting a new IP that Alexa needed to know. We have worked around this by using ngrok, a service 'creating tunnels to localhost'. It gave us a URL on their domain and this URLwas always forwarded to our device, wherever it was. A minor issue was that the free version couldn't use the same URL between runs (reboots) and we had to change the skill settings on each boot, but a few lines of code on startup fixed this (updating the skill settings on each start). All other services that we have tested were not reliable enough for our needs.
For the skill itself, instead of writing the whole code on our own, we have decided to use the Jovo framework. This framework helps writing skills that are compatible with both Alexa and Google's Assistant at the same time.
Finally, after the skill was done, an Alexa client application had to be deployed. If we were using Amazon's Echo, this functionality would have already been included, but as we had our own device, something needed to take the microphone input and stream it to Alexa. Also, the wake-up phrase should be recognized on-device, so there was a need of an on-device speech recognizer dedicated to recognizing only one phrase. After this 'simple' recognizer recognized the phrase ('Alexa'), the input from the microphone was sent to Alexa. Fortunately, Amazon has already made such sample applications available. Gone are the days when we needed to implement the raw HTTP/2 connections (with multiple streams) on our own by following the specification. First we deployed the sample Java application, but we collided with some issues (like blocking for some, to us unknown reason). At the end we took the then-new SDK application, which was running flawlessly (Actually, it had one issue where it was interrupting itself when it was uttering a word containing 'Alexa'. For example, when it was saying 'OK, I'm sending your selfie to Alexander', it was recognising this 'Alexa' in 'Alexander' and was starting to listen again. But let's assume that our visitors won't send selfies to Alexander for now and leave the topic of handling barge-in for another time.)
The idea behind this work package was simple:
The object recognition is already a well-developed field, with many available training libraries and pre-trained models, mainly based on deep neural networks (DNNs).
It turned out that object recognition was also useful for saving energy: the microphone and display were only activated when a person stood in front of the magic mirror.
We decided to use pre-trained models for object classification. There are a lot of choice of freely available models, trained on different objects sets and having different complexity. As we wanted them to run on Raspberry Pi, we needed only the simplest ones. But no matter how small and simple the models were, we couldn't reach acceptable speed. It always took a few seconds for each camera frame, and that was a considerable delay. Note that together with this classification there was an additional communication with the Alexa server, and that took a fraction of a second to complete (and the skill server was running on Raspberry Pi on Wi-Fi - which was not the fastest setup).
We needed a way to speed up object detection without losing accuracy at the same time. The Intel (Movidius) Neural Compute Stick (NCS) came to our aid. It accelerated the detection remarkably and made the delay absolutely acceptable.
However, as Rosebrock later remarked, the NCS is not completely trouble-free to set up:
"The install process is not entirely isolated and can/will change existing libraries on your system."
Nevertheless, we followed the original instructions, which worked sufficiently well for our application.
To display some useful information on the screen, we used the "MagicMirror" software. Practically all of the information on the display is shown by this application. It has a large selection of customisable extensions. We decided for a personal calendar, listed the local news and the local weather forecast.
Finally, we had one more challenge to overcome: all the above mentioned components run together and their temperature was displayed on the screen by means of a red thermometer.
In the worst case we had 85 °C. At this temperature the processor started to throttle. Both were disadvantageous for our talking mirror: the constant high temperatures could damage the hardware, the throttling reduced the performance. And we were already at the limit of acceptable delays.
Despite some trial and error phases we managed to get all features running on a small Raspberry Pi on time. The speaking Magic Mirror was very well liked by our visitors and was in demand all day long.
You make a phone call or a conversation via intercom - and you hear yourself with a slight delay from the loudspeaker. The echo makes communication increasingly difficult or the conversation sometimes gets stuck. How can this problem be solved?
For our company´s day of a "open house" we presented something very special to our visitors: a "Magic Mirror" with a voice control from Amazon Alexa and a Raspberry Pi. Read more about it in our construction manual.
Everyone knows the situation - at door intercoms or in public transport announcements, speakers are sometimes difficult to understand. Several factors can be responsible for this.
What parameters are these? How can speech intelligibility be evaluated using measurement technology and how can it be optimised?
+49 351 40752650