uCat’s early prototype
The open-sourced uCat Client App is a virtual place where Users can naturally express themselves again. Built using the Unity game engine, it is designed to guide the User through a set of progressively more complex motor tasks that continuously:
Hone the User’s skills with their Motor BCI,
(Re)calibrate the Motor BCI decoder itself.
Key Decisions
The uCat Client App (v0) is designed specifically for speech decoding studies, which convert the User’s attempted utterances into text. We chose to start with speech over other DMIs for the following reasons:
The rapid progress of Motor BCIs decoding attempting speech-to-text since Moses et al. (2021) may arguably outrun other high-dimensional Motor BCIs to the market (described in Speech and Facial Movements),
The simplicity (therefore compatibility) of the speech DMI (text or synthesized audio) over the uncertainty and complexity of other movement variables (described in Simulating neuromuscular control),
Prior to integrating it with a speech Motor BCI, the system can easily demonstrate efficacy (and be rapidly tested), by substituting the speech DMIs obtained from a BCI with spoken voice and text obtained from a speech-to-text API, provided by a third-party Natural Language Processing (”NLP”) solution.
Latency in VR can cause motion sickness. Unlike for movement, it is acceptable for attempted speech latency to be on the order of seconds, and still not cause adverse health effects.
During our development process, the decision to implement an AI ‘supervisor’ soon expanded into a personable AI guide for the User’s training journey. Once the user experience was mapped to the typical speech Motor BCI participant training tasks, the focus shifted to exploring the optimal scene and setting for the initial activities.
A robot was the ideal choice for the AI guide, mainly from a development perspective. With fewer articulation points and rigid surfaces than an organic being, robots yield faster iterations in the character design process, being easier to rig, texture, and animate. We prioritized uncomplicated art deliverables like this to lighten the rendering load on the game engine. However, conventional robot designs typically lack warmth.
Thus, the uCat character was conceptualized as rounded, cat-like, and charming (Pic. 1). The name is borne from a very early design; an acronym that stands for “Universal Conversation and Translation.”
Picture 1; uCat, 2024: The uCat Avatar — eponymously named “uCat”, accompanies the User, serving as a loyal companion and mentor. In a study setting, uCat automates much of the otherwise specialized supervision during participant training tasks.
Our opinion poll unanimously favored the ‘cat-like robot’ design, but the landscape concept required collaboration with our target audience. Partnering with previous Motor BCI users, the familiar, natural, and outdoor setting was approved for the initial Speech Training modules (Pic. 2). In the ideation stage, the outdoor landscape was most popular with Motor BCI Users and VR enthusiasts who identified as disabled. Generously offering their advice and opinions meant that our application was not made in a vacuum, detached from the end User’s perspective.
Picture 2; uCat, 2024: uCat’s Virtual Environment — The palette of complimentary pastels (a nod to cyber-futurist aesthetic movements) permeates the landscape: peaceful meadows within a valley that entice but do not clutter the senses. uCat’s voice, synthesized from text-to-speech AI voiceover tool ElevenLabs, presents ambiguously, from accent to gender, to maintain neutrality. The ambiance, a slow synth melody intertwining with birdsong, serves as a backdrop, not a soundtrack, to the main task: requiring a distraction-free and relaxing learning environment, which was the driving force behind the subtle creative direction.
User Experience
After the User is fitted with the HMD running the uCat Client App (v0), uCat first explains who she is and where the User is, then proceeds to instruct them about each of the activities available to them.
(To play, see footnote link) Video 23; uCat, 2024: Introduction Demo
The activities offered by our uCat Client App (v0) are modeled after participant task protocols from Moses et al. (2021) and Willett et al. (2023), but also include additional activities designed for social reintegration. They are modular, customizable, and by default ordered by complexity into the uCat Client App (v0) onboarding experience.
Repeat single words
Shortly after explaining the protocol, uCat starts displaying words to the User. The User is instructed to repeat them after a delay and a ‘go’ cue. If uCat is allowed to share feedback (i.e., the decoder would be in a closed-loop mode) she will tell the User whether they repeated the word correctly or not. If not, uCat instructs the User to try again, but the User may choose to skip a particular prompt by staying silent.
(To play, see footnote link) Video 24; uCat, 2024: Speak Single Words Demo
The word set consists of a vocabulary of common words used by people with disabilities and those used in Moses et al. (2021). Some keywords were included to help the user navigate the uCat Client App (v0) later on. The word set is fully customizable.
Diagram 2; uCat, 2024: State management of 3 uCat Client App (v0) activities — The Intro and Level 1/2 employ a countdown, in which the user has a 3 second warning that they are about to repeat a phrase. The listening state is active immediately after the countdown ends, in order to record their utterance. The menu can also be summoned at any time.
Repeat phrases
Similarly to word repetition, in this activity, the User practices repeating full sentences. uCat only offers positive feedback if the full phrase is correct.
After these two initial activities, and depending on their score, the closed-loop User should start to feel confident in using their Motor BCI to express themselves once again.
(To play, see footnote link) Video 25; uCat, 2024: Speak Phrases Demo
Summon the menu and select options
At any point in the game, if the User utters “Hey uCat,” uCat swiftly turns her attention to the User and starts listening.
The summoning command also spawns a menu that allows the User to select different activities, teleport around the world or alert their carer (which currently only powers off the application).
(To play, see footnote link) Video 26; uCat, 2024: Menu Demo
The User’s thoughts are their own. Besides the supervised activities, the uCat Client App (v0) only starts listening to their attempted speech after uCat is summoned. States when the system should be listening, and for what, are described and should be shared with the Motor BCI.
Freely answer questions asked by uCat
During this closed-loop activity, uCat asks the User various questions that they can answer as they wish — a true test of their ability to express themselves.
The questions are designed to elicit responses about their well-being, daily inspirations, and any notes they would like to take.
When the User stops talking, uCat displays their answer and asks the User to confirm its accuracy. Should the User indicate that the answer is not what they intended to say, uCat scraps it and asks them the question again.
(To play, see footnote link) Video 27; uCat, 2024: Freestyle Demo
Diagram 3; uCat, 2024: State management of the ‘Freestyle’ uCat Client App (v0) activity — In level 3, the user answers open-ended questions, such as ‘How is your day going?’. The game loop, instead of employing a countdown, revolves around the user speaking their response (a longer, open ended response), and the system asking them to confirm if it was perceived correctly. This continues until there are no more open questions.
This final activity simulates a typical conversation with another conscious being. uCat, in this case, responds with an AI-generated voice powered by a Large Language Model (”LLM”). For a more natural interaction, the text of the User’s decoded speech is synthesized into audio, spoken by a personalized computer-generated voice.
When desired, the User may switch to any other activity by summing the menu with “Hey uCat.”
(To play, see footnote link) Video 28; uCat, 2024: Conversation Demo
Diagram 4; uCat, 2024: State management of the ‘Conversation’ uCat Client App (v0) activity — In Conversation Mode, the user can freely converse with an AI Agent generated by a LLM. The listening states alternate between the user’s turn to speak, and the LLM (uCat)’s turn to speak.
Importantly, the activities are loaded automatically in a sequence per an automated protocol that can synchronize the uCat Client App (v0) with the Motor BCI.
Creating custom protocols and task variations per specific Motor BCI needs is easy. For instance, the first two activities, repeating words and phrases, can be used to calibrate and test the decoder. The third, freely answering questions, can also be used to test it.
However, we also implemented functionality to help Users exercise their agency. They may control which activities they enter or issue specific commands requiring the Motor BCI to switch to another decoding mode. This eventual bi-directional integration between the systems is further discussed below and illustrated in Diagram 5.
Diagram 5; uCat, 2024: High-level Sequence Diagram — Representing the actions and data flow between the uCat Client App and the partner Motor BCI system. The diagram and its events model a synchronized state between the two systems in the execution of two exemplar calibration protocols: (top left) word repetition speech-decoding task and (bottom left) center-out arm kinematics decoding task. (bottom center) The user feedback during each protocol is represented in the two sub-figures. Some core components of the solution, such as speech synthesis, multi-user session management, and internal simulations to realistically map DMIs onto the Avatar, are not depicted in the diagram for simplicity.
By keeping the user engaged with progressively more challenging motor and social tasks, we aim to create a sandbox for users to practice expressing themselves before they meet their loved ones or interact with the ever-expanding set of VR applications.
Deployment
While Motor BCIs may use various underlying components, they all aim to extract and decode neural dynamics in a safe, lasting, and performant manner (Diagram 6).
Diagram 6; uCat, 2024: Illustrative Speech BCI pipeline — Adapted from 2021, Simeral et al., a wireless iBCI system for dual-array recording. Extended with speech-decoding components from Willett et al. (2023).
It still needs to be determined whether the infrastructure to support the first-generation Motor BCI will be hosted on-premise (i.e., at the User’s home), remote, or hybrid.
For instance, the state-of-the-art speech decoder used the 6.7B parameter Open Pre-Trained Language Model from Meta, which required the NVIDIA A100 40GB GPU (Willet et al., 2023). This graphics card alone costs approximately $15,000 (2023, Grafikkarten), but it can also be rented from a remote cluster for a more affordable $1.10/h (2023, Lambda Labs).
Aside from price, another key decision factor is latency. In VR, users can experience symptoms of motion sickness, which has long hindered mass adoption. While it can have many causes (Chang et al., 2020), the Motion-to-Photon delay is considered among the most influential (Yeongil and Eun-Seok, 2021). In the case of Motor BCIs replacing traditional VR inputs, the delay is only more considerable; especially if the decoder is hosted remotely.
Across popular HMD, the on-premise initial motion-to-photon latency is between 21 and 42 ms (Warburton et al., 2023), while a typical VR system may experience end-to-end latencies of over 100ms. A target motor-intention-to-photon latency for Users should not exceed 50ms, at which point the system starts to break immersion (Carmack, 2013). The Motor BCI system adds many layers on top of the typical VR experience (Fig 3), making the challenge of latency everso greater.
A third immensely influential factor is high availability. The Motor BCI system should function at any time with minimal disruptions. In all cases, the User’s regained expression is at the mercy of network outages, security incidents, overwhelming traffic, hardware faults, or software bugs. The incidence of these risks can be reduced with an on premise set-up, but at the cost of the response time and observability offered by a remote multi-tenant solution.
Regardless of the Motor BCI’s physical location, the uCat Client App (v0) is an OpenXR-compatible application that can also run on the User’s PC (e.g., if published as a SteamVR app) or directly on a standalone HMD (we primarily tested it on Meta Quest 2). While still in alpha, we recommend you download the APK from the releases section of our GitHub page and sideload it to Quest 2.
The most anticipated research use of the uCat Client App (v0) and other components of the uCat system will conform to a PCVR architecture, where the uCat Expression Plugin and the uCat Client App are two separate programs distributed by Steam or services. The uCat Server is anticipated only in a commercial clinical solution or when integrating custom decoders and is currently not under development.
Speech API
To rapidly develop and test the uCat Client App (v0), we simulated the speech DMIs, otherwise obtained from a Motor BCI, with an able-bodied voice. As we spoke, our audio was processed by the uCat Client App (v0) and forwarded to a third-party NLP solution, which returned our speech as text. We used that text to control the environment — just as if it originated from a Motor BCI.
Furthermore, modeling our Speech API after best practices in the NLP sector, allows us to integrate more easily with Motor BCIs building production-grade solutions.
⚠️In the future, all DMIs will be supplied to the uCat Expression Plugin, which is the client program running on the User’s PC, that forwards them to the uCat Client App and any other compatible VR application.
Data Formats
DMIs as well as uCat Client App events could be described by the same protocol as the underlying neural data for simplicity and reliability.
For real-time data transfer across systems, the Lab Streaming Layer (”LSL”) offers an optimized architecture for time-synchronized streaming of data from diverse sources (Wang et al., 2023). Neural data, DMIs, and even audiovisual VR events could be defined as separate LSL data streams containing timestamps, aligned via the LSL time synchronization protocol. Relevant metadata would be provided in the info header of each stream. TCP would be utilized for high-throughput, low-latency networking across machines.
This approach provides temporally aligned and synchronized data streams that can be analyzed in real-time or recorded for offline analysis. On the Motor BCI side, the high sampling rate of raw neural data benefits from the performance optimization of LSL’s streaming architecture versus file-based formats. The concurrent behavior streams maintain a crucial experimental context alongside the neural data. LSL enables an extensible real-time framework for multimodal biomedical time series data.
As neural data formats remain fragmented, many Motor BCI researchers have chosen another format, such as Neurodata Without Borders, Brain Imaging Data Structure, or a custom one. The uCat System will happily conform to the preferences of prospective Motor BCI partners.
Decoded Speech as Audio Waveform
The uCat Client App (v0) includes functionality to render audio from a third-party source, such as the Motor BCI.
Currently, the uCat Client App (v0) issues a POST request to the NLP containing the text to be synthesized (including a OAuth2 bearer token and other metadata), and expects chunks of raw Pulse Code Modulation encoded audio, streamed over HTTP/2. Below is an example request where q is the query containing the text to be synthesized:
$ curl -XPOST '[<https://api>.](<https://api.wit.ai/synthesize?v=20230215>)NLP.com/synthesis' \\-H "Authorization: Bearer $TOKEN" \\-H "Content-Type: application/json" \\-H "Accept: audio/pcm16" \\-d '{"q": "testing text to speech"}' \\-o speech.raw
When integrating with a Motor BCI continuously streaming decoded audio over the internet (not synthesizing text as is done now), a persistent connection over WebSockets would reduce latency and improve compression. Additionally, the inherent bi-directionality of WebSockets may be utilized for streaming the uCat Client App events to our partner’s Motor BCI for real-time analysis.
The uCat Client App would initiate a WebSocket connection to the Motor BCI via the WebSocket Open Handshake, and establish a persistent bi-directional channel between the client and API for exchanging audio streaming messages.
Configuration
Once the WebSocket is connected, the uCat Client App would send a configuration message as the first step:
{ "action": "configure", "codec": "opus", "bitrate": 48000, "sample_rate": 48000,}
This JSON message configures the audio codec and encoding parameters to be used for this streaming session.
Transfer bitrate and sample rate should typically be identical, unless otherwise specified.
The Opus audio codec library provides built-in mechanisms for chunking and packetizing the encoded audio streams. Other (perhaps lossless) audio coding formats should also be supported, per the Motor BCI’s need.
The Motor BCI validates the configuration and responds with a “ready” message once the encoder is initialized:
{ "status": "ready"}
Streaming Audio
With the API configured, the Motor BCI would begin streaming audio by sending encoded Opus data in chunks. The audio stream shouldn’t exceed 10s of audio time. Each chunk has a JSON header with example metadata:
{ "id": "123", "timestamp": "2024-01-01T12:00:00Z", "duration": 5000, "chunk_size": 20000}
Followed by the binary Opus audio data with a maximum size of the configured chunk_size in bytes.
The uCat Client App would each audio chunk, extracting the metadata and decoding the Opus data. Errors are reported back to the Motor BCI via WebSocket messages.
Stopping
An empty message will be sent as the final chunk.
This would notify the uCat Client App to finalize encoding and terminate the WebSocket connection.
Decoded Speech as Text
Currently, the uCat Client App (0) receives HTTP/1.1 chunks of text from the NLP, based on the content of the speech audio submitted. Again, extending the WebSocket example, these DMIs could be transferred using a custom subprotocol we introduced earlier.
The uCat Client App (0) accepts partially or fully decoded transcriptions with optional speech metadata; a flag in the final chunk informs the uCat Client App (0) that the User has stopped speaking. The decoded transcription can be text that includes Speech Synthesis Markup Language ("SSML") properties.
SSML, endorsed by Google, Meta, Amazon, and others, is a markup language providing creators of synthesizable content a standard way to describe aspects of speech. These include: pronunciation, volume, pitch, rate, and more across different synthesis-capable platforms.
This way, the uCat Client App could render the vocal properties of the User’s attempted speech if the Motor BCI can decode them.
The Motor BCI may also include a transcription confidence value that estimates the accuracy of the decoding, and evolves with each partial transcription as a product of all prior probabilities in the full transcription. The uCat Client App could use these values to display more insights about the User’s progress.
Schema Example
In the example below, the utterance “What is green” is one that the uCat Client App (0) can accept.
The term “green” is described with SSML to indicate a change of pitch and volume, indicating that the utterance is a question. SSML syntax is optional, but it can be used to enrich the transcription with decoded non-textual cues.
{ "speech": { "confidence": 0.8362, "tokens": [ { "end": 1320, "start": 320, "token": "What" } ] }, "text": "What"}
{ "speech": { "confidence": 0.8991, "tokens": [ { "end": 1320, "start": 320, "token": "What" }, { "end": 1560, "start": 1320, "token": "Is" } ] }, "text": "What is"}
{ "speech": { "confidence": 0.7149, "tokens": [ { "end": 1320, "start": 320, "token": "What" }, { "end": 1560, "start": 1320, "token": "Is" }, { "end": 1680, "start": 1560, "token": "<speak> <prosody pitch=\\"high\\" volume=\\"+2dB\\"> green </prosody> </speak>" } ] }, "text": "<speak>What is <prosody pitch=\\"high\\" volume=\\"+2dB\\">green</prosody></speak>"}
{ "is_final": true, "speech": { "confidence": 0.7149, "tokens": [ { "end": 1320, "start": 320, "token": "What" }, { "end": 1560, "start": 1320, "token": "Is" }, { "end": 1680, "start": 1560, "token": "<speak> <prosody pitch=\\"high\\" volume=\\"+2dB\\"> green </prosody> </speak>" } ] }, "text": "<speak>What is <prosody pitch=\\"high\\" volume=\\"+2dB\\">green</prosody></speak>"}
Regulation
How the uCat System fits into the regulatory landscape is an open question. Meeting the User’s Needs with VR and Converging VR and Robotics pointed out several XR and robotic systems that underwent steps to receive FDA approval.
Some may argue the uCat System is most closely aligned with a (myo)electric prosthesis, as defined by the FDA (CFR 21 Sec. 890.3450):
A upper extremity prosthesis including a simultaneously powered elbow and/or shoulder with greater than two simultaneous powered degrees of freedom and controlled by non-implanted electrical components, is a prescription device intended for medical purposes, and is intended to replace a partially or fully amputated or congenitally absent upper extremity. It uses electronic inputs (other than simple, manually controlled electrical components such as switches) to provide greater than two independent and simultaneously powered degrees of freedom and includes a simultaneously powered elbow and/or shoulder. Prosthetic arm components that are intended to be used as a system with other arm components must include all degrees of freedom of the total upper extremity prosthesis system.
Others may say it is comparable with a ‘measuring exercise equipment’ (CFR 21 Sec. 890.5360), which includes various immersive gamified products promoting rehabilitation, such as MindMotionPRO, YuGo System, and REAL Immersive System. They are defined as:
Measuring exercise equipment consist of manual devices intended for medical purposes, such as to redevelop muscles or restore motion to joints or for use as an adjunct treatment for obesity. These devices also include instrumentation, such as the pulse rate monitor, that provide information used for physical evaluation and physical planning purposes., Examples include a therapeutic exercise bicycle with measuring instrumentation, a manually propelled treadmill with measuring instrumentation, and a rowing machine with measuring instrumentation.
On the other hand, some may argue the uCat System is an example of a software intended to be used alongside a Motor BCI, but is a) not regarded as a medical device (i.e., no regulatory oversight), or b) for which the FDA intends to exercise enforcement discretion (i.e. no regulatory oversight), because it poses little risk to the User (FDA, 2022a). Under this view, the uCat System is effectively a graphical rendition of the DMIs acquired from the Motor BCI, akin to an external speaker synthesizing one’s voice, or an animation tool visualizing one’s movement.
Relevant examples of a) software not regarded as a medical device include:
Software functions that meet the definition of Non-Device-MDDS: These are software functions that are solely intended to transfer, store, convert formats, and display medical device data or results, without controlling or altering the functions or parameters of any connected medical devices. Software functions that display patient-specific medical device data Software functions that are intended for individuals to log, record, track, evaluate, or make decisions or behavioral suggestions related to developing or maintaining general fitness, health or wellness.
Relevant examples of b) software for which FDA exercises enforcement discretion include:
Software functions that use video and video games to motivate patients to do their physical therapy exercises at home. Software functions that are intended to allow a user to initiate a pre-specified nurse call or emergency call using broadband or cellular phone technology. Software functions that enable a patient or caregiver to create and send an alert or general emergency notification to first responders.
In the future, however, it seems most likely that a version of the uCat System will be treated as a set of “software functions that connect to an existing device type for purposes of controlling its operation, function, or energy source, and therefore are the focus of the FDA’s regulatory oversight,” (FDA, 2022b) with the same rigor as the Motor BCI itself.
This is because, in addition to receiving the DMIs, the uCat System may also share important events with the Motor BCI and other systems.
Auditory and visual stimuli are rendered by the uCat Client App (v0) on the User’s HMD. Unless hard-coded into a routine, the Motor BCI is not aware of what the User is interacting with and the choices they make inside the uCat Client App (v0).
For instance, the User can select different activities to engage with and enter different application states. Activities such as “Phrase Practice” and “Word Practice,” prompt the User to repeat certain utterances. Only when the Motor BCI is aware of the selected activity, its ground-truth utterances, and precise timings of User attempts, it can (re)calibrate its decoder in an open-loop setting. Similarly, for instance, when the Motor BCI is tuned to decode speech, the uCat Client App must not render a kinematic task, like instructing the User to move their arm. Other contexts like biomarkers obtained with eye tracking, residual movement sensors, and scene metadata may be collected and shared with the Motor BCI to optimize its performance.
Additionally, the maturing Other Electronic Applications aimed at rehabilitation or sensory restoration may also consume events rendered by the uCat Client App to follow the User’s virtual environment interaction with an appropriate ortho-assistance or stimulation.
The uCat Client App is a defacto Motor BCI calibration program that will benefit from sophisticated, bidirectional integration with the Motor BCI, and hence will likely be highly regulated (and shipped alongside the device to onboard users). The remaining components of the uCat System, which do not interfere with the Motor BCI, might not require regulatory oversight.
Part 15 of a series of unedited excerpts from uCat: Transcend the Limits of Body, Time, and Space by Sam Hosovsky*, Oliver Shetler, Luke Turner, and Cai Kinnaird. First published on Feb 29th, 2024, and licensed under CC BY-NC-SA 4.0.
uCat is a community of entrepreneurs, transhumanists, techno-optimists, and many others who recognize the alignment of the technological frontiers described in this work. Join us!
*Sam was the primary author of this excerpt.
Comments