Project "Method and system for recognizing atypical dysfluences in a user's speech looking for a research cooperation agreement."

Project objectives:

Short summary The Spanish telecommunication expert has devised an implementation of a platform application hosted either in a computational data center at the cloud or the edge or both to determine an atypical disfluency in the stuttered speech of a user, received through a microphone of an electronic device, whether fixed or portable, with processing means and access to a database with biometric data of the user. The inventor seeks partners capable to utilize the invention and develop the platform.

Full description The Spanish inventor is a specialist of systems and applications for the telecommunications and computer industry, being affiliated with universities and institutes, and backed up by a wide range of academic resources. The novelty of the solution is based on algorithms that detect, predict and estimate the atypical disfluency (blocks, repetitions, extensions, fillers, pitch changes..) probability. The solution is based on speech recognition which recognizes periodicities in speech, which correlate the initial syllables of words with facial movements and the tone of voice of a specific person. It recognises atypical dysfluencies that tend to block on specific syllables. The platform also filters out noise and any unwanted interruptions. Instead of literally transcribing unknown and repeated words, the speech translator reviews the sentence and removes words to produce a translation that makes sense. This is done through natural language processing components in real time taking benefit of communications standards of low latency and high bandwith as 5G. Artificial intelligence and variable sensing (IoT) Facial recognition interprets movements of the face (eyes, eyebrows, mouth...) based on video images on scene to strengthen the 'vocabulary' of the speaker. . Other user variables may be sensed by the own electronic device as body temperature, skin humidity, heartbeat. The computing platform, implemented by user virtual machines dedicated or shared harvest continuously information for data mining. Speech prediction Digital signal processing is applied during the training phase to 'harvest' a person's behaviour, and continues to work to improve the speech data profile. Based on the syllables spoken, the message prediction capability provides suggestions to the speaker. The person with speech dysfluency can choose the suggestions presented on the device. The application contains also, afluency manager module to improve fluency for the stuttering which is a consequence of the user's motor control and emotional part. The potential customer base is quite large. This opens up opportunities for a wide range of models to generate income or other means of compensation for investments. The business cases and therefore the benefits differ depending on the target partner/customer. Health insurance organizations would look for cost savings, individual patients are driven by personal motivation, and therapists may use the platform to expand their offering, while hosting providers may want to generate service revenue by providing the apps. Partners are sought to develop the platform e.g. Universities, companies, etc. which can potentially be co-financed by third parties.

Advantages and innovations The platform would provide machine translation and recognition solutions for stuttered speech in an understandable, audible or readable format. Using common and easy-to-use devices, those who suffer from stuttering will be able to participate in uninterrupted conversations and interact with automated voice assistants (i.e. avatar, chat boat...) Technical Specification or Expertise Sought Core components. The solution to be developed will translate the speech and behaviour of the stutterer into understandable text by exploiting artificial intelligence techniques in combination with natural language processing and natural facial processing. Deep learning, reinforcement learning and machine learning implemented with neural networks will be applied to learn from the patient and to continuously improve translation results. Natural language processing and natural facial processing will recognize and translate the speaker's speech and the movements of the face, mouth, eyes and hands, also compensating for nervousness, anxiety, etc.
Contact / source: NEXT EEN Widgets (europa.eu)

IF YOU ARE INTERESTED IN KNOWING MORE ABOUT THIS PROJECT PLEASE REGISTER AS FREE MEMBER OR LOGIN IF ALREADY REGISTERED