home page  >  key word  >  Voice Recognition Update
 speech recognition

speech recognition

Speech recognition technology refers to the conversion of vocabulary content in human speech into computer-readable input, such as keys, binary encoding or character sequences.

And“ speech recognition ”Related hot search words of:

Related“ speech recognition ”5516 pieces of information

  • Alibaba open source video automatic editing tool FunClip supports Chinese speech recognition

    Alibaba Tongyi Lab recently opened a video automatic editing tool called FunClip, which is designed for accurate and convenient video slicing. FunClip can automatically recognize the Chinese voice in the video and allow users to tailor the video according to the voice content, greatly improving the efficiency of video editing. Through these open source projects, Alibaba has demonstrated its leadership in AI technology and its commitment to open innovation.

  • AI speech recognition tool Universal-1:38 seconds can process 60 minutes of audio faster than fast Whisper

    The latest research results of AssemblyAI show the performance of their Universal-1 model in a multilingual environment, and the model has achieved an industry leading position in terms of accuracy and robustness. Universal-1 is more accurate than WhisperLarge-v3 and faster than fastWhisper. It can process 60 minutes of audio in 38 seconds. It is worth mentioning that it is not open source and only provides API calls.

  • WhisperKit official website experience portal AI automatic speech recognition model compression and optimization tool online address

    WhisperKit is a powerful tool specially designed for automatic speech recognition model compression and optimization. It not only supports the compression and optimization of models, but also provides detailed performance evaluation data. Through the official website of WhisperKit, you can learn more about the functions and applications of the tool, and experience its excellent automatic speech recognition model optimization capability.

  • Nvidia launched a new AI speech recognition model, Parakeet, claiming to be superior to Whisper

    NVIDIANEMo, a leading open source dialogue AI toolkit, announced the launch of the ParakeetASR model series, which is a series of the most advanced automatic speech recognition models that can transcribe spoken English with excellent accuracy. The development of the ParakeetASR model in cooperation with Suno.ai is a breakthrough in the field of speech recognition, paving the way for more natural and efficient human-computer interaction. To access the model locally and explore the toolkit, visit the Github page of NVIDIANEMo.

  • Tencent Cloud Launches Speech Recognition System ASR Speech Recognition Large Model Online

    Tencent Cloud ASR is a voice recognition system launched by Tencent Cloud. After the latest upgrade, Tencent Cloud ASR can better handle dialects and noise, and improve recognition accuracy and understanding ability. The daily call volume of products has reached 10 billion times, and the number of internal and external enterprise customers served has reached thousands.

  • ASRU2023 | Standard Beta Technology Appears at IEEE Automatic Speech Recognition and Understanding Seminar

    Recently, the IEEE ASRU2023 Automatic Speech Recognition and Understanding Seminar was successfully concluded in Taipei. Experts, scientific research teams and famous scientific and technological enterprises from the global academic and industrial circles gathered together to discuss and share the current development trend and latest research results of the voice industry. As a silver sponsor, Standard Bay Technology was invited to the conference to show the guests its rich multilingual data sets and comprehensive data solutions. It is reported that the ASRU seminar is the flagship technical activity of IEEE Speech and Language Processing Technical Committee (SLTC)

  • AI system is constructed with living human brain cells, and the accuracy rate of speech recognition is increased to 78%

    Recently, a cutting-edge brain like research was published in the journal Nature. Researchers have constructed a new AI system using living human brain cells. This breakthrough means that the accuracy of speech recognition is expected to be significantly improved. This system can perform unsupervised learning and has the function similar to neural network. By using living human brain cells to build AI systems, the accuracy of speech recognition has been improved. This breakthrough will bring important enlightenment to the development of AI technology in the future.

  • Hugging Face researchers push the speech recognition model Distil Whisper to improve speed and reduce parameters

    HuggingFace researchers recently solved the problem of deploying large pre trained speech recognition models in resource constrained environments. By creating a huge open source dataset and using the method of pseudo tags, they extracted a smaller version of Whisper model, called Split Whisper. Although WER is slightly higher, the distil-medium. en model provides more direct reasoning and substantive model compression.

  • Recommendation Google's ambition: the general speech recognition model has supported 100+languages

    Last November, Google announced the launch of the "1000 Languages Plan", which aims to build a machine learning++model, support the 1000 languages most widely used in the world, and bring greater inclusiveness to billions of people around the world. Some of these languages are used by less than 20 million people, so the core challenge is how to support languages with relatively few users or limited available data. The basic model architecture and training+pipeline+of USM+laid the foundation for extending voice modeling to 1000 languages in the future.

  • South Korea began to use self-developed AI voice recognition to combat telecom fraud at the end of this month

    The South Korean Ministry of Administrative Security will begin to use the self-developed artificial intelligence speech recognition software at the end of this month to combat telecommunications fraud cases. Based on the latest in-depth learning technology, the software has a database of+1 million voice samples in different languages, including+6000+voice samples from more than criminal suspects. The voice data of telecom fraudsters analyzed by the new software will be published on the official website of the Financial Supervision Institute, and new software will be released overseas through international exchange activities in the second half of the year.

  • Vivo, together with Kunlun Core and Wenet, helps to improve speech recognition effect and performance, and jointly build an open source ecosystem

    Speech recognition is an important basic service in the AI field. It is also an important capability in vivo AI system. It is the cornerstone of Jovi input method, Jovi voice assistant and other applications. Only by building a speech recognition engine with high accuracy and high performance can we bring a good experience to Vivo's hundred million voice users. Kunlun Core will continue to give play to its leading advantages in the reasoning ecology, help the voice business user experience continue to optimize, and will also work closely with the community to jointly build the domestic ecology of Wenet.

  • Apple joins the project to improve speech recognition for disabled users

    The University of Illinois (UIUC) is cooperating with Apple and other technology giants to carry out a speech accessibility project, which aims to improve the current version of speech recognition system that is difficult to understand for people with speech patterns and disabilities In cooperation with Apple, Amazon, Google, Meta, Microsoft and non-profit organizations, UIUC's voice accessibility project will try to expand the range of voice modes that the voice recognition system can understand In some cases, the speech recognition system can improve the quality of life for users with motion suppression diseases, but the problems that affect the user's voice will affect its effectiveness Under the voice accessibility project, researchers will collect samples from individuals representing different voice patterns to create a private and unrecognized dataset

  • OpenAI Announces Open Source Multi Language Speech Recognition System Whisper

    Although technology giants, including Google, Amazon and Meta, have placed their powerful voice recognition systems at the core of their software and services. However, speech recognition is still a challenging topic in the field of artificial intelligence and machine learning. The good news is that today OpenAI solemnly announced the open source of Whisper - it can be seen that as a set of automatic speech recognition system, it is officially announced that it can realize powerful transcription of multiple languages and translate them into English. OpenAI said that Whisper's difference is that it has received 680000 hours of multilingual and "multi task" training data collected from the network, thus improving the program's ability to recognize unique accents, background noise and technical terms. According to the overview on the official GitHub repository?

  • One stop automatic optimization, and efficient iterative speech recognition model of "thousands of words and thousands of trainings" system

    With the application of massive training data, speech recognition system has been able to recognize common speech more accurately Sibirch integrates supervised, semi supervised, self supervised and other methods, proposes a mixed supervision optimization scheme, fully exploits the value of data, continuously optimizes the speech recognition model, and obtains better speech recognition results The new automatic optimization system of "thousands of words and thousands of trainings" launched by Sibirch is an integrated automation solution that integrates the functional modules based on active learning, such as data screening, automatic bid submission, mixed supervisory acoustics and language model training, automated testing and online publishing In the future, Sibitch will continue to optimize the voice recognition link strategy, further shorten the update cycle of the general voice recognition model, meet the evolving business needs, and support more scene fields

  • Thunder Monkey! New Cantonese support for Siri speech recognition

    HomePod 15.6 adds Siri speech recognition support for Mandarin, Cantonese and Japanese

  • Heavy upgrade! The version 3.0 of standard speech recognition is online to achieve stronger speech recognition capability

    After more than a year of algorithmic breakthroughs, the R&D team of Standard Bay Technology has comprehensively upgraded various technologies such as front-end voice signal processing, acoustic model, decoding mode, etc., which has not only significantly improved the accuracy and recognition speed, but also achieved rapid error correction and real-time update of hot words, further meeting the needs of industry users and improving the voice recognition experience In order to meet the needs of customer groups in different languages, Biaobei Technology speech recognition has continued to make efforts in language richness this year

  • Four functions are online! The hard core upgrade of speech recognition service of Biaobei Technology!

    The speech recognition capability of Standard Bay Technology can support one sentence recognition, long speech recognition, recording file recognition, and support Chinese, Cantonese, and English. It can not only ensure the millisecond level low delay recognition speed, but also achieve the recognition rate of Chinese Mandarin in the general field of more than 97% and English recognition rate of more than 95% in a quiet environment The speech recognition technology of Biaobei Technology adapts to the market demand, with new online timestamp, speech speed, volume and confidence functions The speech recognition confidence function of Beibei Technology means that when the speech recognition service converts the audio stream into text, it can output the confidence of the current phrase. The speech recognition model will select the phrase with high confidence as the output result among all the candidate results

  • Four freshman students program their own dormitory smart door locks: support face and voice recognition

    However, due to the cost constraints, most university dormitories still use old-fashioned padlocks. Four freshmen from Chongqing University of Posts and Telecommunications installed a smart door lock for their dormitories This smart lock not only has face recognition, voice recognition and two-dimensional code recognition functions, but also can easily enter the door without a key Four people were responsible for programming, technology research and development, device structure design and production, and completed a multi-functional smart lock in less than a month The "smart lock" adds a camera and sound sensor outside the door, and adds a control device at the latch inside the door, so the door lock itself has not been changed

  • Tencent's patent is open, and slide presentation supports voice recognition page turning

    Tianyan App shows that on January 11, Tencent Technology (Shenzhen) Co., Ltd. disclosed a patent of "document control method, device, computer equipment and storage medium", with the publication number of CN113918114A and the application date of October 15, 2021 The method comprises: when presenting a document, displaying the trigger control of following voice page turning in the presentation interface of the document; In response to the triggering operation of the following voice paging trigger control, enter the following voice paging mode; In the following voice page turning mode, follow the voice content of the presenter to the target page of the document, and the text content in the target page matches the semantics of the voice content of the presenter

  • Huawei Xiaoyi Input Method Begins Internal Test: Efficient Speech Recognition

    Huawei's input method is finally here! It is not the weak chicken version of the EMUI/HarmonyOS system, but a new design, named Huawei Xiaoyi Input Method. At present, Huawei Xiaoyi Input Method has started to recruit for internal testing, which can only be experienced after the application is approved. According to the feedback of experienced netizens, the interface design of Huawei Xiaoyi input method is very simple, or rather simple, not very rich in functions, not many themes, and ease of use and convenience need to be improved. However, it is very smooth and silky on the whole, and the speech recognition efficiency is very high. It seems that "Xiaoyi" is not for nothing. Some time ago, many input methods rolled over one after another, and some were even answered

  • The new generation Kaldi will be applied to many Xiaomi products to change many speech recognition methods

    Daniel Povey, the father of Kaldi, said that the goal of the new generation of Kaldi is not only to catch up with or slightly surpass the existing speech recognition library, but also to fundamentally change the way speech recognition is realized. "At present, the ASR in Xiaomi products uses the first generation Kaldi, and we are using k2 to speed up the decoding of the existing production model. The decoding speed is 300 times faster than real-time." Daniel said that it is expected that the next generation Kaldi will be applied to Xiaomi products at the end of this year or early next year, but there is still a lot of integration and testing to do.

  • From voice recognition to AI shooting, Snapdragon's AI computing power covers all aspects of mobile phone experience

    In recent years, the functions of smart phones have almost become perfect. The addition of various intelligent and interesting AI functions has made mobile phones more and more humanized, providing more convenience for our life and work, which also makes more and more people inseparable from it. Of course, the powerful AI function of mobile phones comes from the AI computing support provided by the built-in chip. The Snapdragon AI chip, based on advanced heterogeneous computing principles, has powerful AI computing power and is the most popular mobile phone AI chip at present. It is like an invisible shield behind smart phones, protecting the excellent AI experience of mobile phones. At present, the latest generation of Qualcomm Snapdragon high-end flagship AI chips are Snapdragon 888, Snapdragon 888 and Snapdragon 888?

  • Here comes MAXHUB V5 technology version! Superior experience thanks to speech recognition technology of Spitzer

    MAXHUB, a brand owned by SIYUAN, as the leader of the conference tablet industry, has focused on the field of intelligent conference tablets since its official launch in 2017, and has been the number one market share of the industry for three consecutive years with its professional operation (data from Ovi Cloud). Recently, Sibchi helped MAXHUB complete the function iteration of the first conference tablet MAXHUB V5 technology version with voice transcription function. MAXHUB V5 technology version has built-in Sibi Chichang speech recognition technology, with a recognition accuracy rate of 98% (tested by China Academy of Information and Communication, report number: V21Y000005), and realizes real-time subtitle and transcript of meeting minutes. AI sound source?

  • IFLYTEK's invention patent of "speech recognition method and system" won the China Patent Gold Award

    Recently, the list of the 22nd China Patent Gold Award winners was announced, and iFLYTEK's invention patent of "speech recognition method and system" won the gold award. The China Patent Award is jointly selected by the State Intellectual Property Office and the World Intellectual Property Organization. It is the highest award in the field of intellectual property in China. In the past three years, only 30 patents have been awarded the China Patent Gold Award every year. This award is not only recognition of iFLYTEK's intellectual property work, but also highly recognition of iFLYTEK's core AI technology. Gold Award Technology Breakthrough in Intelligent Voice Interaction "Ceiling" Voice language is the most natural and convenient way of communication and the treasure of human society

  • Microsoft Announces Acquisition of Voice Recognition Technology Company Nuance for US $19.7 Billion

    Microsoft announced that it would acquire voice recognition giant Nuance at a price of 56 dollars per share, with a transaction value of 19.7 billion dollars. Satya Nadella, soft CEO, said, "Nuance can provide the AI level of healthcare technology, and is also a pioneer in the practical application of enterprise level AI. AI is the top priority of technology, and healthcare is its most urgent application. Together with the partner ecosystem, we will deliver advanced AI solutions to professionals around the world to promote better decision-making and establish more meaningful connections. At the same time, we will accelerate Microsoft Cloud in Healthcare?

  • Microsoft Acquires Siri Speech Recognition Partner Nuance for US $19.7 Billion

    Microsoft acquired Nuance, a voice recognition system company, for $19.7 billion in full cash, thus controlling the company that helped Apple handle Siri queries. Microsoft revealed on Monday that it had actually reached an agreement with Nuance after a preliminary report over the weekend said that negotiations were under way. The price of this transaction is $56 per share, 23% higher than Nuance's closing price last Friday. It is speculated that the valuation of this transaction for Nuance is about $16 billion, but according to Microsoft's data, considering Nuance's net debt, the actual valuation of all cash transactions will be $19.7 billion. Mark, the current CEO of Nuance? Mark Benjamin?

  • Insider: Microsoft Negotiates to Acquire Voice Recognition Service Provider Nuance for US $16 Billion

    According to foreign media reports, after the acquisition of LinkedIn for US $26 billion and ZeniMax for US $7.5 billion, Microsoft is still making a big purchase. Many foreign media reports that they are negotiating to acquire Nuance, a voice recognition service provider, with the purchase price approaching US $16 billion.

  • Microsoft plans to acquire voice recognition company Nuance for 16 billion dollars

    Microsoft is in in-depth negotiations on the acquisition of Nuance Communications, an artificial intelligence and speech recognition company. The deal may be signed on Sunday and announced on Monday as soon as possible. This plan shows Microsoft's recent efforts to expand its business through transactions. Microsoft considered acquiring TikTok's U.S. business, a short video application, last year, and completed the acquisition of Zenimax, a game developer, for $7.5 billion last month.

  • Siri voice no longer defaults to female voice. When the D1 OS system is optimized, speech recognition is more accurate

    On April 1, according to foreign media reports, Apple released the iOS 14.5 beta 6 beta system today, and Apple made some changes to Siri. Apple said that since iOS 14.5, Siri will no longer default to female voice. At present, in the beta system, only English has added new sounds, while Chinese has not. Apple said in a statement: "We are very happy to introduce two new Siri sounds for English users, and allow Siri users to choose the voice they want when initializing their devices. This is a continuation of Apple's long-term commitment to diversity and inclusiveness. Products and services are designed to better reflect the diversity of the world we live in." Voice recognition

  • The joint team of Tencent Cloud Microenterprise and Tencent Cloud Smart Titanium won the championship of the International Accent English Speech Recognition Competition

    Recently, Interspeech2020, a top conference in the field of voice research, was held. In the accent English speech recognition challenge of this conference, the joint team of Tencent Cloud Microenterprise and Tencent Cloud Smart Titanium won the championship in the accent English speech recognition track with a big lead. Interspeech is one of the top conferences in the field of voice research organized by the International Voice Communication Association ISCA. At this conference, it was proposed that the standard English ASR system has a high recognition accuracy worldwide, but accent English recognition is still a challenging topic and the biggest challenge to be overcome in the application of technology. To this end, the General Assembly has set up a special interface

Hot text

  • 3 days
  • 7 days