< Back
Posted: 12 Jan 2022 04:00

“Speech Translation” January 2022 — summary from Arxiv

Brevi Assistant
Brevi Assistant

Business performance assistant

“Speech Translation” January 2022 — summary from Arxiv main image

The content below is machine-generated by Brevi Technologies’ NLG model, and the source content was collected from open-source databases/integrate APIs.


Arxiv - summary generated by Brevi Assistant


We present CVSS, a massively multilingual-to-English speech-to-speech translation corpus, covering sentence-level parallel S2ST sets from 21 languages into English. CVSS is stemmed from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation corpus, by synthesizing the translation text from CoVoST 2 right into speech using advanced TTS systems. 2 versions of translation speeches are supplied: 1 CVSS-C: All the translation speeches remain in a solitary top notch approved voice; 2 CVSS-T: The translation speeches remain in voices transferred from the corresponding resource speeches.

End-to-end speech-to-text translation ~E2E-ST is becoming significantly popular due to the potential of its less error propagation, reduced latency, and fewer criteria. Given the triplet training corpus ⟨speech, transcription, translation⟩ the standard high-quality E2E-ST system leverages the ⟨speech, transcription⟩ pair to pre-train the model and afterwards makes use of the ⟨speech, translation⟩ pair to enhance it additionally. Experiments on the MuST-C benchmark show that our proposed approach significantly surpasses advanced E2E-ST standards on all 8 language pairs, while attaining far better performance in the automated speech recognition job.

We present a textless speech-to-speech translation system that can equate speech from one language right into another language and can be built without the demand of any text data. The secret to our method is a self-supervised unit-based speech normalization technique, which finetunes a pre-trained speech encoder with combined audios from multiple speakers and a solitary recommendation speaker to lower the variants due to accents, while maintaining the lexical content. With just 10 minutes of combined data for speech normalization, we obtain typically 3.2 BLEU gain when training the S2ST model on \vp~S2ST dataset, compared to a baseline trained on un-normalized speech target.


This can serve as an example of how to use Brevi Assistant and integrated APIs to analyze text content.


Source texts:


logo

The Brevi assistant is a novel way to summarize, assemble, and consolidate multiple text documents/contents.

Partners:

© All rights reserved 2022 made by Brevi Technologies