< Back
Posted: 10 Jan 2022 05:00

“wav2vec” January 2022 — summary from Astrophysics Data System and Arxiv

Brevi Assistant
Brevi Assistant

Business performance assistant

“wav2vec” January 2022 — summary from Astrophysics Data System and Arxiv main image

The content below is machine-generated by Brevi Technologies’ NLG model, and the source content was collected from open-source databases/integrate APIs.


Astrophysics Data System - summary generated by Brevi Assistant


Self-supervised speech representations such as wav2vec 2.0 and HuBERT are making cutting edge progression in Automatic Speech Recognition. In this work, we discover partial fine-tuning and entire fine-tuning on wav2vec 2. 0 and HuBERT pre-trained models for three non-ASR speech jobs: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. While wav2vec 2. 0 has been suggested for speech recognition, it can be used for speech feeling recognition; its efficiency can be considerably boosted utilizing various fine-tuning approaches. Experiments reveal that P-TAPT does much better than TAPT specifically under low-resource setups. Wav2vec 2. 0 is an end-to-end structure of self-supervised learning for speech representation that is effective in automatic speech recognition, but the majority of the deal with the subject has been created with a single language: English. In this paper, we present K-Wav2Vec 2. 0, which is a changed variation of Wav2vec 2. 0 was developed for Korean automated speech recognition by checking out and optimizing different variables of the initial Wav2vec 2. 0. Self-supervised pre-training has dramatically enhanced the efficiency of automated speech recognition. Experiments on ASR reveal that compared to wav2vec 2. 0, wav2vec-S only calls for limited increment of pre-training time but can considerably boost ASR performance on in-domain, cross-lingual and cross-domain datasets.

The goal of self-supervised learning for automatic speech recognition is to learn good speech depictions from a large quantity of unlabeled speech for the downstream ASR task. Along with the existing contrastive learning task, we switch over the quantized representations of the loud and original speech as extra forecast targets of each various other.


Source texts:



Arxiv - summary generated by Brevi Assistant


Self-supervised speech depictions such as wav2vec 2. 0 and HuBERT are making cutting edge developments in Automatic Speech Recognition. In this work, we explore partial fine-tuning and entire fine-tuning on wav2vec 2. 0 and HuBERT pre-trained models for 3 non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding. With simple down-stream frameworks, the most effective ratings get to 79. 58% heavy accuracy for Speech Emotion Recognition on IEMOCAP, 2. 36% equals mistake rate for Speaker Verification on VoxCeleb1 87. 51% precision for Intent Classification and 75. 32% F1 for Slot Filling on SLURP, therefore establishing a new cutting edge for these 3 standards, verifying that fine-tuned wav2vec 2. 0 and HuBERT models can better learn prosodic, voice-print and semantic depictions. Deep learning methods have been revealed to be effective in various jobs, particularly in the growth of speech recognition systems, that is, systems that aim to record an audio sentence in a sequence of written words. In this sense, this work offers the development of a public Automatic Speech Recognition system utilizing just open readily available sound data, from the fine-tuning of the Wav2vec 2.0 XLSR-53 model pre-trained in many languages, over BP information. The last model presents a typical word error rate of 12. 4% over 7 different datasets. While wav2vec 2. 0 has been proposed for speech recognition, it can be made use of for speech feeling recognition; its performance can be substantially enhanced utilizing various fine-tuning approaches. We additionally present a unique fine-tuning technique labelled P-TAPT, which modifies the TAPT goal to learn contextualized feeling representations. Experiments reveal that P-TAPT performs much better than TAPT specifically under low-resource setups.


This can serve as an example of how to use Brevi Assistant and integrated APIs to analyze text content.


Source texts:


logo

The Brevi assistant is a novel way to summarize, assemble, and consolidate multiple text documents/contents.

Partners:

© All rights reserved 2022 made by Brevi Technologies