Bridgeworks CEO, David Trossell features in this recent article from Forbes, about why data driven chatbots are seeking security in new technologies like award winning data acceleration application, WANrockIT.
October 1, 2018
Are Visual Chatbots The Next Step Of The Conversational Interfaces Revolution?
Which jacket looks better on me, the blue or the black one? Most of us have asked this question of a good friend or family member on a shopping spree. Now, Amazon Echo users can shoot the same question to an AI-powered chatbot even if they are shopping online.
This is just one example of visual chatbots entering the ecommerce domain. The revolutionary bots use visual cues where their predecessors relied on typed or written input. In this case, Amazon’s ‘Look’ will analyze photos of two different outfits, and choose the best one based on different parameters.
The Look may only appeal to people who are really into fashion, but it does highlight some things that are missing in the chatbot space. Let’s be honest, most of us have had frustrating experiences with chatbots. They don’t always understand context. Chatbots can’t often deal well with unpredictable speech scenarios. All of these things can lead to a less than ideal user experience.
The flawed human-chatbot connection
Chatbots can only be as successful as the user’s ability to communicate an issue, and the bot’s ability to understand it. When communication occurs just with voice and text, that has undeniable limitations and leads to frustrating and inaccurate experiences. Anyone who has received cluelessly unhelpful answers from a chatbot understands this. Research shows that 59% of people think that chatbots are slow to resolve problems.
This can be attributed to the fact that the connection between humans and bots is inherently flawed. People don’t just communicate with words. They use inflection, body language and visuals. Have you ever grabbed a pen and paper and used a sketch to describe something when words weren’t quite adequate? And visual chatbots can be a missing link in closing the communication gap.
How visual chatbots operate
Powering up a chatbot with vision capabilities has become possible due to advances in deep learning and image recognition in particular, that allow AI to recognize different patterns with high accuracy and progressively learn over time to tackle more complex visuals. The evolution process of visual chatbots will likely occur in the next four stages:
Receiving text and serving an image: This is the simplest phase and also the one users are most likely to encounter today. Bots receive ‘let me see’ requests and then serve up the correct image based on that input. For example, someone interacting with a car dealership bot might say, ‘Let me see the interior of a 2017 Escalade.’ The bot would then search a database of images and display the correct one.
One roadblock at this stage, however, is that large-scale image processing requires significant computing power and bandwidth – not something every company can afford to deliver. Bridgeworks are now working to tackle this issue with a new WAN acceleration solutions for encrypted data stored and transmitted by different apps over the Internet. “In the future, this could eliminate the danger of performance degradation in visual chatbots as they are tasked to deal with more and more incoming data,” said David Trossell, CEO of Bridgeworks.
Serving and receiving text and images. Here, bots will use image recognition along with text, then return relevant text and images. For instance, a user uploads an image of an auto part. The bot recognizes it, and returns the name and price of that part. Then, it shares pictures of several other parts informing the customer that they may need to buy those parts as well.
However, in order for a chatbot to develop visual intelligence, they must be exposed to and understand a tremendous amount of visual data. Even something as simple as a request to see a car interior would require that the bot has seen it, recognizes it, and has it appropriately indexed in all the various ways a customer might make that request.
That’s only what’s required when receiving text and serving images. It’s even more complex when the customer is sending an image. To properly recognize a car part, the bot must be able to recognize it from hundreds of possible angles.
The only way to solve this problem is to create massive data sets from which these bots can learn. The challenge is that doing this is exceptionally time-consuming and labor intensive. Still, the work is progressing. Much of it is being crowdsourced through sites like Mechanical Turk. Workers annotate images and then verify the annotations of others so that machines can better understand this visual input, among other tasks.
Receiving and analyzing images, serving images and recommendations. In these cases, the bot doesn’t just recognize the image, it analyzes it. Think of a customer taking a picture of a home after a house fire. The bot might be able to recognize areas of smoke and fire damage to create a report for the insurance company, and identify what repairs are needed.
Interactive image analysis. Eventually, chatbots will serve customers through live video. Imagine walking through the process of assembling a piece of equipment while the bot recognizes the pieces as you pick them up and is able to instruct you as you work.
At the moment visual chatbots are still at the very early stage and are capable to tackle simpler requests such as those described in phase 1 and phase 2. With rapid technology advances, however, we should expect them to evolve from occasionally helpful text assistants to full visual interactors, capable to deliver on-point advice for any occasion.