Should I Care about the Alexa Platform? A Developer's View

Note: I recently gave a talk at Pittsburgh Tech Fest on "Building Applications for the Alexa Platform". The slides are on my site here and the sample code is on github.com/throp/mathwhiz.

Who knows what the future holds, but at least at this moment, it's safe to say that Amazon is killing it with Alexa. Just 6 months ago, the total number of Echos, Dots, and Taps sold were around 5M and the number of skills available about 5K. Today, both of those numbers have almost doubled, to 8.2 million devices and 10k skills, respectively. What's more, it seems like every day now there's some new partnership, integration, or spin-off product to boast of - like the unveiling of the Show (voice + screen), the integration with Ikea, or the release of the first built-in Alexa phone. Even the "just-plain-fun" stuff like the Seattle Mariner's skill for their luxury boxes or the Silver make it clear that Alexa has made it into the zeitgeist.

So with everyone jumping on the Alexa bandwagon, the question for software developers is, should we jump on as well? In other words, is it worth it for us to invest in acquiring the knowledge and tools to build Alexa applications (or skills)? And if we do, what would we build? What are the "right" applications for this platform?

To answer these questions, it's probably helpful to first step back and take a look at what Alexa as a platform has going both for and against it. In my opinion (for whatever that's worth), the matter of whether Alexa will rule the (voice) world is still open - there are reasons for both optimism and concern. Here are my 2 cents...

What Alexa has going in its favor

As already covered, there are a ton of fun things happening in the world of Alexa recently. It feels a bit like the early days of the internet or of IOS apps, where people are just throwing things at the wall to see what sticks. There's a sense that it could be big, but it's not yet clear how or in what way. And this uncertainty and openness is what's so exciting. As developers, the opportunity is there to be a "first mover" - to make a huge impact with a relatively simple idea (if we can just come up with the right one!).

Even better, the barriers to entry for building are pretty low. Having been on the market for almost 3 years now, the platform is stable and rich, there's plenty of great documentation, a vibrant community, and ample choice from a tech stack perspective (with frameworks for Java, Javascript, and others). Moreover, from a conceptual perspective, there's not all that much there. Compared to what what you need to learn to build an IOS or web app, building a voice based application is a snap. Sure, there's some new jargon (like "utterance", "intent", "slot", etc.), but with a few hours of study you're basically there.

What Alexa has going against it

Given all this (millions of users, plenty of room in the market, and a low cost to build), why wouldn't we all jump in? Well, there are a few reasons to be wary, I think.

The first cautionary flag is the "stickiness" of Alexa. It's true that everyone on your block just bought an Echo, but that doesn't mean they're actually using it. A report from VoiceLab recently showed that whereas for an Android or IOS app there's about a 12 percent chance that a user will still be using it in 2 weeks, the percentage chance for an Alexa skill is only at about 3 percent. Further, of the 7,000+ skills available (at the time of the report), 69% have only zero or one review. These stats seem consistent with what I hear anecdotally from friends and family. The novelty of Alexa can wear off pretty quickly, and within weeks she can go relatively unused or might even end up relegated to the desk drawer.

If this is the case then, and there is indeed a retention problem, it begs the question: what's the reason? Is it too few useful skills or something inherent to platform itself?

If it's about the quality of skills, then reason is probably obvious. As of yet, there isn't a clear way for developers to monetize their creations - everything on Alexa is free. As a consumer, this is great, but as a developer not so much. Am I really going to sink dozens of hours into building a skill without a hope that there could be some (even meager) pot of gold at the end of my development hours? I'm not so sure. And this seems clear from even a quick perusal of the skills available. There are a ton of banal "fact of the day" skills, but very few skills of any depth, quality, or creativity (though there are some). Basically, it seems that developers are intrigued enough with the technology to dabble a bit, but without a proper incentive, very few have taken the time to build anything of real value.

It could also be the case, however, that the platform itself is to blame. Many have voiced their concern about some of the inherent limitations of Alexa. Among the major gripes are: an inability to push notifications to the user (pretty much a necessity for social apps), no coordination between multiple Alexa devices (e.g. keep my music in sync on all devices as I walk through my house), inability to identify voice, and no raw recording.

Lastly, not all users are impressed with the overall experience. Relative to Siri or Google Assistant, the conversations can seem more robotic and less natural. Users have to phrase things the way Alexa expects, not necessarily how they would typically speak. For example, whereas I might naturally want to say "Alexa, order me a pizza from Pizza Hut", Alexa wants me to say "Alexa, tell Pizza Hut to order me a pizza". Maybe not a big deal in the long run (as we adjust), but in my experience I do find myself having to think hard for the right words to speak - "hmmm, what the hell is that skill I enabled to find my phone again? Is it 'Phone Finder'...or 'Find Phone'...or 'Find My Phone'? Fuck it. I'll just look under the couch cushions again."

What to Build? What's the "Killer App"?

I'm certainly not a Gartner analyst or an oracle of consumer behavior, so to me it seems that the future is anything but clear. Yes, the potential of the Alexa platform seems huge, but there are definitely some glaring red flags as well. Assuming, however, that you're undaunted by these challenges and want to push forward and test your hand at publishing an Alexa skill, the question (of course) is, what should you build? What is the best use of this technology? Is there a "killer app"? And if so, what is it?

Given the ubiquity of smart devices in general, the crux to me seems to be in thinking about the personal trade-off we all make as users in our every day lives. When are we willing (or compelled) to give up the power and richness of our smart phone or desktop GUIs in favor of the austerity and simplicity of voice? Amazon is making the bet that voice is always simpler, but I'm not so sure. When I'm lounging on the couch, for example, would I rather ask Alexa to play that new album (that I can't exactly remember the name of), or just pick up my phone that's sitting 1 foot away from me on the coffee table (if not in my hand!) and make a few taps? In 2017, as humans have essentially reached a singularity with their smart phones, tapping seems almost easier and less conscious than speech (as scary as that sounds).

My sense is that we need to think about situations in which using our screen-based smart devices is just not possible. This seems to be the premise of a great skill by AllRecipes - i.e. when you're in the kitchen, you're hands are dirty, and so it's better to speak than tap. Another great hands-free zone is of course the car, and it appears that Amazon is working on it.

There are also huge (and obvious) wins for the visually impaired, and companies can use Alexa to create an omni-channel experience to tap into this, albeit smaller, market. For someone with low vision, the Alexa skill might be the best (or only) option to order an Uber, even if the majority of people would still prefer to do it via their smart phone.

Lastly, I wonder if the consumer-facing applications of Alexa are being overshadowed by the potential for business applications. There have long been companies that have offered voice-driven functionality to employees (e.g. factory workers, warehouse pickers, health care workers, etc.), and I would think that Alexa's open platform could open up this market more.

In the end, my projections and pontifications might be obvious or off-base, but I do find the Alexa platform to be exciting. I'd like to hear your thoughts though. As a developer, what do you think about the Alexa platform? Is it worth your while to jump in and learn about building applications with Alexa? Or have you already? What do you think will be the most useful skills?

Comments (3)

Josh

May 30, 2017

Thanks for this post. My company is wrestling with how much work to put into an Alexa skill, and you have given us food for thought.

And M

I'd like to point out something regarding the Skill ecosystem:
1) we are still not used to interacting with technology through voice, both from a user and a developer point of view.
2) making voice experiences is incredibly hard--much harder than coding the actual skill.
3) the technology has been there for a while but we can all agree that the VUI (voice user interface) paradigm is a new thing. While it's an unfair comparison to say that the voice experience space is like the early days of the WWW, it's also an unfair comparison to say that voice experiences TODAY should be as fluid, complex and involving as the UIs of websites or apps we use daily, which have evolved over years and years of refinement and research.
I think we should focus less on the product and more on the experience. Voice skills need to stop being a sugar-layer on top of normal apps, and be something that a web app could ever be.

Ben

May 31, 2017

@Josh - thanks! Glad it helped.

@And M - Really, really great points. I'm relatively new to the voice space (did a stint with a voice technology company a while back, and now with Alexa), but I would definitely agree with your assessment. It seems that people do think of voice as just a "sugar layer" on top of a normal app, as you put it. Of the skills that I've enabled for Alexa, very few would I say have an elegant or impressive user experience. Anyway, thanks for the comment.