I’d like to tell you about my friend Karl Wiegand and his amazing research. He’s one of the smartest and most capable researchers I know, and it doesn’t hurt that he spends his valuable time and immense brain power on making the world a truly better place. Karl Wiegand is a doctoral candidate at Northeastern University in Boston in the College of Computer and Information Science. He is a computer scientist in Dr. Rupal Patel’s Communication Analysis & Design Laboratory (CADLab). Karl’s area of research is Augmentative and Alternative Communication (AAC). More specifically, he focuses on icon-based AAC and AAC facilitated by an exciting technology known as brain-computer interfaces (BCI). Karl will be presenting on these topics in San Diego at the 28th Annual International Technology and Persons with Disabilities Conference, (CSUN for short). I encourage anybody who will be at CSUN this year to attend Karl’s session on Friday, March 1st, at 3:10 p.m. PST in the Ford AB room on the 3rd floor of the Manchester Grand Hyatt. His presentation is titled Novel Approaches to Icon-Based AAC.
I strongly encourage you to read Laura Legendary's excellent write up on Karl's work at the Accessible Insights Blog. Laura does a great job of presenting AAC as well as defining some of the other terms often used in this line of research. I recommend reading Laura’s article for an overview of the terms and topics I’ll be discussing here. I’ll also link to relevant Wikipedia articles.
The following is taken from various discussions I’ve had with Karl about his work. He and I both encourage you to contact him about any questions you may have. Contact information can be found at the end of this post.
Explanation and Disclaimer
This was originally going to be one giant post on Karl’s work on Symbol Path, his icon-based AAC system, and also is work on brain-computer interfaces, but it turns out that writing about someone else’s PhD is as hard, if not harder, than writing about your own, or in my case, just writing your own; therefore, this post on Karl’s work will be split into two parts. This first part covers Symbol Path and the natural language processing work that Karl does, and then in part 2, I’ll discuss his exciting work with brain-computer interfaces. I’d also like to preemptively take full responsibility for any errors or omissions. As Bertrand Russell said, “A stupid man’s report of what a clever man says can never be accurate, because he unconsciously translates what he hears into something he can understand”. Karl’s quite the clever fellow, and I hope to do his amazing work a little bit of justice in this post.
Karl informs us that current icon-based AAC systems make assumptions that contribute to suboptimal design decisions or outcomes. For example, users might press a discrete physical or virtual button, which causes specific letters, words, or phrases to be either spoken or written out. While this approach is certainly useful to individuals with locked-in syndrome or other conditions that prevent standard verbal communication, Karl feels there is much room for advancement and quite possibly, disruption. Here are a few assumptions Karl lays out that we should keep in mind .
- For many AAC users, it’s hard to make precise movements. They might have not only speech impairments but also have trouble with fine motor control.
- It is fatiguing to make these movements for more than a short period of time.
- Most AAC systems artificially limit the types of input signals to either arm or hand movements.
Karl goes on to ask, what if you can signal with your feet, your head, your eyes, etc.? Or, in the case of BCI, signaling with your brain. We could have a politeness or courtesy signal. We could help phrase sentences in a more polite way. “I want food” can become “I would like some food, please”. To this end, Karl’s research explores alternative input techniques with an eye on the following two goals.
Two Goals for Easier Input
- Relieve motor fatigue.
- Increase message construction speed.
When I asked Karl the following question, “So are you starting to see AAC systems take advantage of more input signals?”, he responded that “some letter-based systems are starting to do that now”. “One example is in BCI-based AAC, we are starting to see motor imagery used. The user is told to imagine squeezing a fist, kicking, punching, etc. Now, you have several input signals; for example, left and right arm and leg, blink and double blink, etc.” These can indicate movement of a cursor, speeding up or slowing down, activation of a letter, backspace, and so forth. As Karl informs us, however, the use of motor imagery can be taxing to users. Even though one is not actually forming a fist, concentrating on performing these motor actions is tiring.
Fully Generative or Not
This discussion of letter-based AAC leads us to an important distinction. Karl explains, “They, the letter-based systems, have an advantage in that they are fully generative, which means that you can generate any possible word, but they can be fatiguing to use”. Because of the lack of a keyboard or other arbitrary letter-based input, icon-based AAC systems like Symbol Path are not fully generative. One can generate dozens, if not hundreds, of phrases, but not every possible phrase in the target language.
Karl has devised a system called Symbol Path that allows a user to select a series of icons on the screen, which the computer stitches together into meaningful phrases. Symbol path is quite powerful in that it doesn’t statically map icons to a simple set of phrases. Instead, it allows for out-of-order selection and also accommodates for tremor and other physical manifestations of different disabilities. Also, by allowing continuous selection, the user can slide over to the required icons instead of having to rely upon fine motor control to select the desired icon. For example, if one wishes to say the sentence “The boy likes the computer”, then one can select the computer icon and slide over, even with jerky motions, to the “boy” icon, and then perhaps over to the icon representing “like”, and Symbol Path stitches together the right phrase. The order of the icons could be rearranged a bit, and Symbol Path will still arrive at the correct sentence. One more thing to keep in mind about Symbol Path’s interaction paradigm is that the system is not hardcoded to using a touch screen. Any continuous signal will do, for example, blowing into a straw, moving a mouse, making vowel sounds with your mouth, etc., but currently, Symbol Path supports touch screen input an cursor input from a mouse. Due to the generalizable nature of the work, these continuous input signals don’t’ have to be perfect. Just like how the user’s hand can jerk or tremor, so can another input signal such as making continuous sounds with one’s mouth. This adaptiveness is possible because Symbol Path corrects for this error at the input layer.
Natural language processing (NLP) is a rich field full of thousands of papers and many dedicated researchers. I encourage you to investigate this field further with some of the provided links and resources in this article, but I don’t aim to give an introduction to the field in this post. Instead, let’s jump right into what Karl does in his work, and see where that takes us. The first small thing Symbol Path does is to add articles in the right places. For example, if someone selects the “dog”, “car”, and “chases” icons, it might output “The dog chases the car.” The addition of articles and prepositions is relatively straightforward, but it is an important step in outputting natural sounding speech. How does Symbol Path achieve the impressive out of order rearranging? Answer, Karl employs two powerful techniques.
Two Powerful Techniques
- Semantic frames
- Semantic grams
Let’s discuss these in order.
The use of semantic frames is a fairly old technique in the literature, but Karl is using it in a novel way. The use of semantic frames stems from Charles J. Fillmore‘s case grammar work. The theory simply states that the verb of a sentence is the central component of a message; thus, we can make stronger assertions and inferences about the roles of other words in the message . An example of that in action can be observed with the word “giving”. In the semantic frame for “giving”, or (to give), there are three slots to fill in.
Semantic Frame Slots for Giving
- The agent that does the giving.
- The receiver that receives the gift.
- The object that can be given or received.
So what does that get us?
It turns out that this semantic frame way of looking at sentences buys us the following two things.
- When the user draws the line through icons in Symbol Path, there won’t be many verbs. The system takes what few verbs that are intersected by the drawn line and looks up their semantic frames. Then the system gets the semantic roles of all the non-verbs, iterates through the possibilities, and ranks them statistically. This ranking process allows the system to intelligently guess, with a high degree of accuracy, which words were meant by the user.
- Now that the system has values for the verb and the various words, it can determine the order.
Technologies and Frameworks
If reading about NLP, you will come across n-grams. They are sets of words in a specific order. Examples of n-grams that are 3 words long would be phrases such as “to the store” or “in the box”. One problem is that n-grams imply correct order. Another problem is that they are a fixed length. Sem-grams, on the other hand, are bags of words at the sentence level. The theory of using sem-grams also makes the assumption that sentences are cohesive thoughts. Also, most words within a sentence are related. So, sem-grams are a unique set of words that can co-occur in a sentence. Unlike n-grams, they are not dependent on order or set to be a specific length. So why does the system use sem-grams? “It allows us to create statistics and gleam relationships from large corpora like the NY Times that we can then use on sentences we’ve never seen before”, Karl explains. This means that the system does not have to brute force, or iterate through all possibilities of ever larger n-grams, to predict what the user is trying to say; but it also means that by having these sem-grams and statistics about their common usage available, the system can make smarter decisions when it tries to infer what the user means.
How You Can Help
Karl and his fellow researchers are currently conducting a survey that will help them better understand the phrases and utterances needed by AAC users and their families, caretakers, and colleagues. Please spend a few moments and fill out their AAC survey.
This post covers just a shadow of the work Karl and his colleagues do. In part 2, we’ll explore his exciting work with brain-computer interfaces!
Nothing is as educational as A well conducted demonstration, so please attend Karl’s talk at CSUN if you’re in the area, or look him up if you’re ever in Boston. I find the combination of natural language processing and alternative input modalities to be an extremely important area because of the massive potential to help improve the quality of people’s lives. This technology is not only fascinating, makes for great papers, and is solid work, but it really can make the difference between a person being able to communicate with the outside world or not. What do you think? I’d love to hear from you on AAC, natural language processing, or anything else germane to this post. Reach out on social media and in the comments below. I’ll see you in part 2!