Daily AI Digest: Microsoft's New Frontier Models & Google's Open-Source Gemma 4
Catch up on the latest AI advancements: Microsoft launches three foundational multimodal AI models, while Google makes its powerful Gemma 4 models fully open-source under the Apache 2.0 license, bringing local AI to a wider range of devices.
The AI landscape is heating up with major players Microsoft and Google making significant moves. Microsoft has unveiled three new foundational models, expanding its multimodal AI capabilities, while Google has championed open-source innovation by releasing its Gemma 4 models under the permissive Apache 2.0 license, democratizing powerful local AI for developers.
TL;DR
- Microsoft AI launched MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 to bolster its multimodal AI stack.
- Google released Gemma 4 open-weight AI models in four sizes, optimized for local usage, now under the Apache 2.0 license.
- Gemma 4 models from Google DeepMind are now fully open-source, enabling powerful local AI even on mobile devices and Raspberry Pi.
- Google's Gemma 4 family, built on Gemini 3 tech, offers four versions with an "unprecedented level of intelligence-per-parameter" and Apache 2.0 license.
- Microsoft AI CEO Mustafa Suleyman is focusing on pursuing "superintelligence" with a business and productivity-oriented strategy, following a major company restructuring.
Microsoft Takes on AI Rivals with Three New Foundational Models
Microsoft AI, the research division of the tech giant, announced on Thursday the release of three new foundational AI models capable of generating text, voice, and images. This move highlights Microsoft's strategic push to develop its own comprehensive suite of multimodal AI models, intensifying its competition with other leading AI laboratories, even as it maintains its partnership with OpenAI.
The new models include MAI-Transcribe-1, designed for speech transcription across 25 languages, boasting a 2.5 times faster performance than Microsoft's Azure Fast offering. MAI-Voice-1 is an audio-generating model that can produce 60 seconds of audio in just one second and allows for custom voice creation. Lastly, MAI-Image-2, a video-generating model, was initially introduced on MAI Playground, a new testing software for large language models in March.
Microsoft is expanding its multimodal AI capabilities with new text, voice, and image generation models, signaling a strong independent push in the competitive AI landscape.
Google Announces Gemma 4 Open AI Models, Switches to Apache 2.0 License
Google has unveiled Gemma 4, the latest iteration of its open-weight AI models, marking the first significant update in over a year. Developers can now utilize Gemma 4, which comes in four sizes optimized for local deployment. A notable change is Google's decision to shift from its previous custom Gemma license to the more permissive Apache 2.0 license, addressing developer feedback and offering greater freedom.
The larger Gemma 4 variants, including the 26B Mixture of Experts and 31B Dense models, are designed to run unquantized in bfloat16 format on a single 80GB Nvidia H100 GPU, a high-end AI accelerator. These models can also fit on consumer GPUs when quantized to lower precision. Google emphasizes reduced latency for local processing, with the 26B Mixture of Experts model activating only 3.8 billion of its 26 billion parameters during inference, leading to higher tokens-per-second performance. The 31B Dense model prioritizes quality, with an expectation for developers to fine-tune it for specialized applications.
Google's Gemma 4 models are now available under the Apache 2.0 license, offering enhanced local processing capabilities and greater developer flexibility across various hardware configurations.
Google's Gemma 4 Model Goes Fully Open-Source and Unlocks Powerful Local AI - Even on Phones
Google DeepMind's Gemma 4 models have been released under the Apache 2.0 license, signifying a full commitment to open-source principles. This move differentiates Gemma from Gemini, Google's subscription-based closed product, by allowing users to download and run the AI model locally for free. The shift is particularly beneficial for individuals and enterprises seeking to leverage AI without cloud dependencies, addressing concerns about privacy, offline functionality, and cost-effectiveness.
The ability to run AI models locally is crucial for businesses with stringent data sovereignty and confidentiality requirements, such as healthcare providers who face regulatory limitations on sharing patient data. Gemma 4's design facilitates deployment across a wide range of devices, from servers to smartphones and even Raspberry Pi, giving developers complete control over edge and on-premises AI implementations. This flexibility underscores Google's acknowledgment of the growing demand for accessible and adaptable AI solutions.
The transition of Google's Gemma 4 to a fully open-source Apache 2.0 license empowers developers with unprecedented local control over AI deployments, from enterprise servers to mobile devices, fostering privacy and reducing reliance on cloud services.
Google Releases Gemma 4, a Family of Open Models Built Off of Gemini 3
Google has released its new family of Gemma 4 open-weight models, incorporating some of the advanced technology and research that powered its proprietary Gemini 3 Pro large language models. This release includes four distinct versions of Gemma 4, tailored by the number of parameters they offer. For edge devices like smartphones, Google provides 2-billion and 4-billion "Effective" models, while more powerful machines can utilize the 26-billion "Mixture of Experts" and 31-billion "Dense" systems.
Google claims that Gemma 4 achieves an "unprecedented level of intelligence-per-parameter." This assertion is supported by the performance of its 31-billion and 26-billion variants, which secured the third and sixth positions, respectively, on Arena AI's text leaderboard, outperforming models 20 times their size. All Gemma 4 models are capable of processing video and images, making them suitable for tasks such as optical character recognition, and the two smaller models can also handle audio inputs and speech understanding. Additionally, Google has trained these models in over 140 languages, and they can generate offline code, allowing for development without an internet connection.
Google's Gemma 4 models, now under an Apache 2.0 license, deliver exceptional intelligence for their parameter count and offer multimodal capabilities across various devices, including offline code generation.
Microsoft’s New ‘Superintelligence’ Game Plan Is All About Business
Mustafa Suleyman, the inaugural CEO of Microsoft AI, is now primarily focused on the pursuit of "superintelligence," a shift in responsibilities that followed a significant large-scale restructuring at Microsoft in mid-March. This strategic pivot, which Suleyman says has been a long-held plan, became officially "unlocked" after renegotiating Microsoft's contract with OpenAI. For Suleyman, superintelligence and AGI are defined strictly through the lens of business and productivity, aiming to deliver tangible product value for millions of enterprises and consumers.
The reorganization at Microsoft saw the consolidation of its enterprise and consumer teams under the Copilot AI banner. While Suleyman retains his role in big-picture strategy, Jacob Andreou has taken on the executive vice president position, leading the engineering, growth, product, and design initiatives of the newly combined teams. This realignment has allowed Suleyman to dedicate his efforts to developing new frontier AI models and achieving his vision of superintelligence, at a time of escalating competition among leading AI companies.
Microsoft AI CEO Mustafa Suleyman is dedicating his focus to achieving "superintelligence" with a clear business-first strategy aimed at delivering significant product value for enterprises and consumers.