Posted: 01 May 2022

“Speech Translation” April 2022 — summary from Astrophysics Data System and Arxiv

Astrophysics Data System - summary generated by Brevi Assistant

Although Transformers have obtained success in several speech processing jobs, like talk language understanding and speech translation, attaining online processing while keeping affordable performance is still essential for real-world communication. In enhancement, the CTC translation result is also made use of to refine the search space with CTC prefix score, achieving joint CTC/attention synchronised translation for the very first time.

In simultaneous speech translation, locating the very best trade-off between high translation top quality and reduced latency is a tough task. Experiments on en- > de, es indicate that, besides helping with the adoption of reputable offline strategies and styles without impacting latency, the offline remedy accomplishes similar or far better translation quality compared to the very same model learnt simultaneous settings, in addition to being affordable with the SimulST cutting-edge.

Code switching describes the sensation of interchangeably making use of words and expressions from various languages. We show that our ST architectures and specifically our bidirectional end-to-end style, carry out well on CS speech, also when no CS training data is made use of.

Straight speech-to-speech translation models struggle with information deficiency issues as there exists little identical S2ST information, contrasted to information readily available for standard plunged systems that contain automated speech recognition, machine translation, and text-to-speech synthesis. Our experiments reveal that self-supervised pre-training constantly boosts model performance compared to multitask learning with a BLEU gain of 4. 3-12. 0 under different data setups, and it can be further incorporated with information augmentation strategies that use MT to produce weakly managed training data.

Source texts:

Arxiv - summary generated by Brevi Assistant

We explain a method to jointly pre-train speech and text in an encoder-decoder modeling structure for speech translation and recognition. Two complementary supervised speech tasks are included to merge speech and text modeling space.

Transformers have attained cutting edge outcomes throughout multiple NLP jobs. To prove that some interest weights are preventable, we suggest substituting the standard self-attention with a local reliable one, establishing the quantity of context used based upon the results of the analysis.

Recently, we have seen a boosting rate of interest in the location of speech-to-text translation. Using this developed corpus, we propose Text-to-Speech models based on the example of the recently recommended FastSpeech 2 model that incorporates source language info.

Neural transducers have been commonly used in automatic speech recognition. Contrasted with plunged ST that performs ASR complied with by text-based machine translation, the suggested Transformer transducer -based ST model significantly reduces reasoning latency, makes use of speech info, and prevents error propagation from ASR to MT.

Simultaneous speech translation is a difficult job intending to convert streaming speech prior to the full input is observed.

A SimulST system usually includes 2 components: the pre-decision that accumulations the speech info and the plan that chooses to review or write.

Source texts:


The Brevi assistant is a novel way to summarize, assemble, and consolidate multiple text documents/contents.


