5 new protections on Google Messages to help keep you safe

October 22, 2024

Posted by Jan Jedrzejowicz, Director of Product, Android and Business Communications; Alberto Pastor Nieto, Sr. Product Manager Google Messages and RCS Spam and Abuse; Stephan Somogyi, Product Lead, User Protection; Branden Archer, Software Engineer

Every day, over a billion people use Google Messages to communicate. That’s why we’ve made security a top priority, building in powerful on-device, AI-powered filters and advanced security that protects users from 2 billion suspicious messages a month. With end-to-end encrypted¹ RCS conversations, you can communicate privately with other Google Messages RCS users. And we’re not stopping there. We're committed to constantly developing new controls and features to make your conversations on Google Messages even more secure and private.

As part of cybersecurity awareness month, we're sharing five new protections to help keep you safe while using Google Messages on Android:

Enhanced detection protects you from package delivery and job scams. Google Messages is adding new protections against scam texts that may seem harmless at first but can eventually lead to fraud. For Google Messages beta users², we’re rolling out enhanced scam detection, with improved analysis of scammy texts, starting with a focus on package delivery and job seeking messages. When Google Messages suspects a potential scam text, it will automatically move the message into your spam folder or warn you. Google Messages uses on-device machine learning models to classify these scams, so your conversations stay private and the content is never sent to Google unless you report spam. We’re rolling this enhancement out now to Google Messages beta users who have spam protection enabled.
Intelligent warnings alert you about potentially dangerous links. In the past year, we’ve been piloting more protections for Google Messages users when they receive text messages with potentially dangerous links. In India, Thailand, Malaysia and Singapore, Google Messages warns users when they get a link from unknown senders and blocks messages with links from suspicious senders. We’re in the process of expanding this feature globally later this year.
Controls to turn off messages from unknown international senders. In some cases, scam text messages come from international numbers. Soon, you will be able to automatically hide messages from international senders who are not existing contacts so you don’t have to interact with them. If enabled, messages from international non-contacts will automatically be moved to the “Spam & blocked” folder. This feature will roll out first as a pilot in Singapore later this year before we look at expanding to more countries.
Sensitive Content Warnings give you control over seeing and sending images that may contain nudity. At Google, we aim to provide users with a variety of ways to protect themselves against unwanted content, while keeping them in control of their data. This is why we’re introducing Sensitive Content Warnings for Google Messages.

Sensitive Content Warnings is an optional feature that blurs images that may contain nudity before viewing, and then prompts with a “speed bump” that contains help-finding resources and options, including to view the content. When the feature is enabled, and an image that may contain nudity is about to be sent or forwarded, it also provides a speed bump to remind users of the risks of sending nude imagery and preventing accidental shares.

All of this happens on-device to protect your privacy and keep end-to-end encrypted message content private to only sender and recipient. Sensitive Content Warnings doesn’t allow Google access to the contents of your images, nor does Google know that nudity may have been detected. This feature is opt-in for adults, managed via Android Settings, and is opt-out for users under 18 years of age. Sensitive Content Warnings will be rolling out to Android 9+ devices including Android Go devices³ with Google Messages in the coming months.
More confirmation about who you’re messaging. To help you avoid sophisticated messaging threats where an attacker tries to impersonate one of your contacts, we’re working to add a contact verifying feature to Android. This new feature will allow you to verify your contacts' public keys so you can confirm you’re communicating with the person you intend to message. We’re creating a unified system for public key verification across different apps, which you can verify through QR code scanning or number comparison. This feature will be launching next year for Android 9+ devices, with support for messaging apps including Google Messages.
These are just some of the new and upcoming features that you can use to better protect yourself when sending and receiving messages. Download Google Messages from the Google Play Store to enjoy these protections and controls and learn more about Google Messages here.

Notes
1. End-to-end encryption is currently available between Google Messages users. Availability of RCS varies by region and carrier. ↩
2. Availability of features may vary by market and device. Sign up for beta testing and a data plan may be required. ↩
3. Requires 2 GB of RAM. ↩

Safer with Google: Advancing Memory Safety

October 15, 2024

Posted by Alex Rebert, Security Foundations, and Chandler Carruth, Jen Engel, Andy Qin, Core Developers

Error-prone interactions between software and memory¹ are widely understood to create safety issues in software. It is estimated that about 70% of severe vulnerabilities² in memory-unsafe codebases are due to memory safety bugs. Malicious actors exploit these vulnerabilities and continue to create real-world harm. In 2023, Google’s threat intelligence teams conducted an industry-wide study and observed a close to all-time high number of vulnerabilities exploited in the wild. Our internal analysis estimates that 75% of CVEs used in zero-day exploits are memory safety vulnerabilities.

At Google, we have been mindful of these issues for over two decades, and are on a journey to continue advancing the state of memory safety in the software we consume and produce. Our Secure by Design commitment emphasizes integrating security considerations, including robust memory safety practices, throughout the entire software development lifecycle. This proactive approach fosters a safer and more trustworthy digital environment for everyone.

This post builds upon our previously reported Perspective on Memory Safety, and introduces our strategic approach to memory safety.

Our journey so far

Google's journey with memory safety is deeply intertwined with the evolution of the software industry itself. In our early days, we recognized the importance of balancing performance with safety. This led to the early adoption of memory-safe languages like Java and Python, and the creation of Go. Today these languages comprise a large portion of our code, providing memory safety among other benefits. Meanwhile, the rest of our code is predominantly written in C++, previously the optimal choice for high-performance demands.

We recognized the inherent risks associated with memory-unsafe languages and developed tools like sanitizers, which detect memory safety bugs dynamically, and fuzzers like AFL and libfuzzer, which proactively test the robustness and security of a software application by repeatedly feeding unexpected inputs. By open-sourcing these tools, we've empowered developers worldwide to reduce the likelihood of memory safety vulnerabilities in C and C++ codebases. Taking this commitment a step further, we provide continuous fuzzing to open-source projects through OSS-Fuzz, which helped get over 8800 vulnerabilities identified and subsequently fixed across 850 projects.

Today, with the emergence of high-performance memory-safe languages like Rust, coupled with a deeper understanding of the limitations of purely detection-based approaches, we are focused primarily on preventing the introduction of security vulnerabilities at scale.

Going forward: Google's two-pronged approach

Google's long-term strategy for tackling memory safety challenges is multifaceted, recognizing the need to address both existing codebases and future development, while maintaining the pace of business.

Our long-term objective is to progressively and consistently integrate memory-safe languages into Google's codebases while phasing out memory-unsafe code in new development. Given the amount of C++ code we use, we anticipate a residual amount of mature and stable memory-unsafe code will remain for the foreseeable future.

Graphic of memory-safe language growth as memory-unsafe code is hardened and gradually decreased over time.

Migration to Memory-Safe Languages (MSLs)

The first pillar of our strategy is centered on further increasing the adoption of memory-safe languages. These languages drastically drive down the risk of memory-related errors through features like garbage collection and borrow checking, embodying the same Safe Coding³ principles that successfully eliminated other vulnerability classes like cross-site scripting (XSS) at scale. Google has already embraced MSLs like Java, Kotlin, Go, and Python for a large portion of our code.

Our next target is to ramp up memory-safe languages with the necessary capabilities to address the needs of even more of our low-level environments where C++ has remained dominant. For example, we are investing to expand Rust usage at Google beyond Android and other mobile use cases and into our server, application, and embedded ecosystems. This will unlock the use of MSLs in low-level code environments where C and C++ have typically been the language of choice. In addition, we are exploring more seamless interoperability with C++ through Carbon, as a means to accelerate even more of our transition to MSLs.

In Android, which runs on billions of devices and is one of our most critical platforms, we've already made strides in adopting MSLs, including Rust, in sections of our network, firmware and graphics stacks. We specifically focused on adopting memory safety in new code instead of rewriting mature and stable memory-unsafe C or C++ codebases. As we've previously discussed, this strategy is driven by vulnerability trends as memory safety vulnerabilities were typically introduced shortly before being discovered.

As a result, the number of memory safety vulnerabilities reported in Android has decreased dramatically and quickly, dropping from more than 220 in 2019 to a projected 36 by the end of this year, demonstrating the effectiveness of this strategic shift. Given that memory-safety vulnerabilities are particularly severe, the reduction in memory safety vulnerabilities is leading to a corresponding drop in vulnerability severity, representing a reduction in security risk.

Risk Reduction for Memory-Unsafe Code

While transitioning to memory-safe languages is the long-term strategy, and one that requires investment now, we recognize the immediate responsibility we have to protect the safety of our billions of users during this process. This means we cannot ignore the reality of a large codebase written in memory-unsafe languages (MULs) like C and C++.

Therefore the second pillar of our strategy focuses on risk reduction & containment of this portion of our codebase. This incorporates:

C++ Hardening: We are retrofitting safety at scale in our memory-unsafe code, based on our experience eliminating web vulnerabilities. While we won't make C and C++ memory safe, we are eliminating sub-classes of vulnerabilities in the code we own, as well as reducing the risks of the remaining vulnerabilities through exploit mitigations.
We have allocated a portion of our computing resources specifically to bounds-checking the C++ standard library across our workloads. While bounds-checking overhead is small for individual applications, deploying it at Google's scale requires significant computing resources. This underscores our deep commitment to enhancing the safety and security of our products and services. Early results are promising, and we'll share more details in a future post.

In Chrome, we have also been rolling out MiraclePtr over the past few years, which effectively mitigated 57% of use-after-free vulnerabilities in privileged processes, and has been linked to a decrease of in-the-wild exploits.
Security Boundaries: We are continuing⁴ to strengthen critical components of our software infrastructure through expanded use of isolation techniques like sandboxing and privilege reduction, limiting the potential impact of vulnerabilities. For example, earlier this year, we shipped the beta release of our V8 heap sandbox and included it in Chrome's Vulnerability Reward Program.
Bug Detection: We are investing in bug detection tooling and innovative research such as Naptime and making ML-guided fuzzing as effortless and wide-spread as testing. While we are increasingly shifting towards memory safety by design, these tools and techniques remain a critical component of proactively identifying and reducing risks, especially against vulnerability classes currently lacking strong preventative controls.
In addition, we are actively working with the semiconductor and research communities on emerging hardware-based approaches to improve memory safety. This includes our work to support and validate the efficacy of Memory Tagging Extension (MTE). Device implementations are starting to roll out, including within Google’s corporate environment. We are also conducting ongoing research into Capability Hardware Enhanced RISC Instructions (CHERI) architecture which can provide finer grained memory protections and safety controls, particularly appealing in security-critical environments like embedded systems.

Looking ahead

We believe it’s important to embrace the opportunity to achieve memory safety at scale, and that it will have a positive impact on the safety of the broader digital ecosystem. This path forward requires continuous investment and innovation to drive safety and velocity, and we remain committed to the broader community to walk this path together.

We will provide future publications on memory safety that will go deeper into specific aspects of our strategy.

Notes
1. Anderson, J. Computer Security Technology Planning Study Vol II. ESD-TR-73-51, Vol. II, Electronic Systems Division, Air Force Systems Command, Hanscom Field, Bedford, MA 01730 (Oct. 1972).
  https://seclab.cs.ucdavis.edu/projects/history/papers/ande72.pdf ↩
2. https://www.memorysafety.org/docs/memory-safety/#how-common-are-memory-safety-vulnerabilities ↩
3. Kern, C. 2024. Developer Ecosystems for Software Safety. Commun. ACM 67, 6 (June 2024), 52–60. https://doi.org/10.1145/3651621 ↩
4. Barth, Adam, et al. “The security architecture of the chromium browser." Technical report. Stanford University, 2008.
  https://seclab.stanford.edu/websec/chromium/chromium-security-architecture.pdf ↩

Bringing new theft protection features to Android users around the world

October 15, 2024

Posted by Jianing Sandra Guo, Product Manager and Nataliya Stanetsky, Staff Program Manager, Android

Janine Roberta Ferreira was driving home from work in São Paulo when she stopped at a traffic light. A man suddenly appeared and broke the window of her unlocked car, grabbing her phone. She struggled with him for a moment before he wrestled the phone away and ran off. The incident left her deeply shaken. Not only was she saddened at the loss of precious data, like pictures of her nephew, but she also felt vulnerable knowing her banking information was on her phone that was just stolen by a thief.

Situations like Janine’s highlighted the need for a comprehensive solution to phone theft that exceeded existing tools on any platform. Phone theft is a widespread concern in many countries – 97 phones are robbed or stolen every hour in Brazil. The GSM Association reports millions of devices stolen every year, and the numbers continue to grow.

With our phones becoming increasingly central to storing sensitive data, like payment information and personal details, losing one can be an unsettling experience. That’s why we developed and thoroughly beta tested, a full suite of features designed to protect you and your data at every stage – before, during, and after device theft.

These advanced theft protection features are now available to users around the world through Android 15 and a Google Play Services update (Android 10+ devices).

AI-powered protection for your device the moment it is stolen

Theft Detection Lock uses powerful AI to proactively protect you at the moment of a theft attempt. By using on-device machine learning, Theft Detection Lock is able to analyze various device signals to detect potential theft attempts. If the algorithm detects a potential theft attempt on your unlocked device, it locks your screen to keep thieves out.

To protect your sensitive data if your phone is stolen, Theft Detection Lock uses device sensors to identify theft attempts. We’re working hard to bring this feature to as many devices as possible. This feature is rolling out gradually to ensure compatibility with various devices, starting today with Android devices that cover 90% of active users worldwide. Check your theft protection settings page periodically to see if your device is currently supported.

In addition to Theft Detection Lock, Offline Device Lock protects you if a thief tries to take your device offline to extract data or avoid a remote wipe via Android’s Find My Device. If an unlocked device goes offline for prolonged periods, this feature locks the screen to ensure your phone can’t be used in the hands of a thief.

If your Android device does become lost or stolen, Remote Lock can quickly help you secure it. Even if you can’t remember your Google account credentials in the moment of theft, you can use any device to visit Android.com/lock and lock your phone with just a verified phone number. Remote Lock secures your device while you regain access through Android’s Find My Device – which lets you secure, locate or remotely wipe your device. As a security best practice, we always recommend backing up your device on a continuous basis, so remotely wiping your device is not an issue.

These features are now available on most Android 10+ devices¹ via a Google Play Services update and must be enabled in settings.

Advanced security to deter theft before it happens

Android 15 introduces new security features to deter theft before it happens by making it harder for thieves to access sensitive settings, apps, or reset your device for resale:

Changes to sensitive settings like Find My Device now require your PIN, password, or biometric authentication.
Multiple failed login attempts, which could be a sign that a thief is trying to guess your password, will lock down your device, preventing unauthorized access.
And enhanced factory reset protection makes it even harder for thieves to reset your device without your Google account credentials, significantly reducing its resale value and protecting your data.

Later this year, we’ll launch Identity Check, an opt-in feature that will add an extra layer of protection by requiring biometric authentication when accessing critical Google account and device settings, like changing your PIN, disabling theft protection, or accessing Passkeys from an untrusted location. This helps prevent unauthorized access even if your device PIN is compromised.

Real-world protection for billions of Android users

By integrating advanced technology like AI and biometric authentication, we're making Android devices less appealing targets for thieves to give you greater peace of mind. These theft protection features are just one example of how Android is working to provide real-world protection for everyone. We’re dedicated to working with our partners around the world to continuously improve Android security and help you and your data stay safe.

You can turn on the new Android theft features by clicking here on a supported Android device. Learn more about our theft protection features by visiting our help center.

Notes

Android Go smartphones, tablets and wearables are not supported ↩

Using Chrome's accessibility APIs to find security bugs

October 10, 2024

Posted by Adrian Taylor, Security Engineer, Chrome

Chrome’s user interface (UI) code is complex, and sometimes has bugs.

Are those bugs security bugs? Specifically, if a user’s clicks and actions result in memory corruption, is that something that an attacker can exploit to harm that user?

Our security severity guidelines say “yes, sometimes.” For example, an attacker could very likely convince a user to click an autofill prompt, but it will be much harder to convince the user to step through a whole flow of different dialogs.

Even if these bugs aren’t the most easily exploitable, it takes a great deal of time for our security shepherds to make these determinations. User interface bugs are often flakey (that is, not reliably reproducible). Also, even if these bugs aren’t necessarily deemed to be exploitable, they may still be annoying crashes which bother the user.

It would be great if we could find these bugs automatically.

If only the whole tree of Chrome UI controls were exposed, somehow, such that we could enumerate and interact with each UI control automatically.

Aha! Chrome exposes all the UI controls to assistive technology. Chrome goes to great lengths to ensure its entire UI is exposed to screen readers, braille devices and other such assistive tech. This tree of controls includes all the toolbars, menus, and the structure of the page itself. This structural definition of the browser user interface is already sometimes used in other contexts, for example by some password managers, demonstrating that investing in accessibility has benefits for all users. We’re now taking that investment and leveraging it to find security bugs, too.

Specifically, we’re now “fuzzing” that accessibility tree - that is, interacting with the different UI controls semi-randomly to see if we can make things crash. This technique has a long pedigree.

Screen reader technology is a bit different on each platform, but on Linux the tree can be explored using Accerciser.

Screenshot of Accerciser showing the tree of UI controls in Chrome

All we have to do is explore the same tree of controls with a fuzzer. How hard can it be?

“We do this not because it is easy, but because we thought it would be easy” - Anon.

Actually we never thought this would be easy, and a few different bits of tech have had to fall into place to make this possible. Specifically,

There are lots of combinations of ways to interact with Chrome. Truly randomly clicking on UI controls probably won’t find bugs - we would like to leverage coverage-guided fuzzing to help the fuzzer select combinations of controls that seem to reach into new code within Chrome.
We need any such bugs to be genuine. We therefore need to fuzz the actual Chrome UI, or something very similar, rather than exercising parts of the code in an unrealistic unit-test-like context. That’s where our InProcessFuzzer framework comes into play - it runs fuzz cases within a Chrome browser_test; essentially a real version of Chrome.
But such browser_tests have a high startup cost. We need to amortize that cost over thousands of test cases by running a batch of them within each browser invocation. Centipede is designed to do that.
But each test case won’t be idempotent. Within a given invocation of the browser, the UI state may be successively modified by each test case. We intend to add concatenation to centipede to resolve this.
Chrome is a noisy environment with lots of timers, which may well confuse coverage-guided fuzzers. Gathering coverage for such a large binary is slow in itself. So, we don’t know if coverage-guided fuzzing will successfully explore the UI paths here.

All of these concerns are common to the other fuzzers which run in the browser_test context, most notably our new IPC fuzzer (blog posts to follow). But the UI fuzzer presented some specific challenges.

Finding UI bugs is only useful if they’re actionable. Ideally, that means:

Our fuzzing infrastructure gives a thorough set of diagnostics.
It can bisect to find when the bug was introduced and when it was fixed.
It can minimize complex test cases into the smallest possible reproducer.
The test case is descriptive and says which UI controls were used, so a human may be able to reproduce it.

These requirements together mean that the test cases should be stable across each Chrome version - if a given test case reproduces a bug with Chrome 125, hopefully it will do so in Chrome 124 and Chrome 126 (assuming the bug is present in both). Yet this is tricky, since Chrome UI controls are deeply nested and often anonymous.

Initially, the fuzzer picked controls simply based on their ordinal at each level of the tree (for instance “control 3 nested in control 5 nested in control 0”) but such test cases are unlikely to be stable as the Chrome UI evolves. Instead, we settled on an approach where the controls are named, when possible, and otherwise identified by a combination of role and ordinal. This yields test cases like this:

action { path_to_control { named { name: "Test - Chromium" } } path_to_control { anonymous { role: "panel" } } path_to_control { anonymous { role: "panel" } } path_to_control { anonymous { role: "panel" } } path_to_control { named { name: "Bookmarks" } } take_action { action_id: 12 } }

Fuzzers are unlikely to stumble across these control names by chance, even with the instrumentation applied to string comparisons. In fact, this by-name approach turned out to be only 20% as effective as picking controls by ordinal. To resolve this we added a custom mutator which is smart enough to put in place control names and roles which are known to exist. We randomly use this mutator or the standard libprotobuf-mutator in order to get the best of both worlds. This approach has proven to be about 80% as quick as the original ordinal-based mutator, while providing stable test cases.

Chart of code coverage achieved by minutes fuzzing with different strategies

So, does any of this work?

We don’t know yet! - and you can follow along as we find out. The fuzzer found a couple of potential bugs (currently access restricted) in the accessibility code itself but hasn’t yet explored far enough to discover bugs in Chrome’s fundamental UI. But, at the time of writing, this has only been running on our ClusterFuzz infrastructure for a few hours, and isn’t yet working on our coverage dashboard. If you’d like to follow along, keep an eye on our coverage dashboard as it expands to cover UI code.

Pixel's Proactive Approach to Security: Addressing Vulnerabilities in Cellular Modems

October 3, 2024

Posted by Sherk Chung, Stephan Chen, Pixel team, and Roger Piqueras Jover, Ivan Lozano, Android team

Pixel phones have earned a well-deserved reputation for being security-conscious. In this blog, we'll take a peek under the hood to see how Pixel mitigates common exploits on cellular basebands.

Smartphones have become an integral part of our lives, but few of us think about the complex software that powers them, especially the cellular baseband – the processor on the device responsible for handling all cellular communication (such as LTE, 4G, and 5G). Most smartphones use cellular baseband processors with tight performance constraints, making security hardening difficult. Security researchers have increasingly exploited this attack vector and routinely demonstrated the possibility of exploiting basebands used in popular smartphones.

The good news is that Pixel has been deploying security hardening mitigations in our basebands for years, and Pixel 9 represents the most hardened baseband we've shipped yet. Below, we’ll dive into why this is so important, how specifically we’ve improved security, and what this means for our users.

The Cellular Baseband

The cellular baseband within a smartphone is responsible for managing the device's connectivity to cellular networks. This function inherently involves processing external inputs, which may originate from untrusted sources. For instance, malicious actors can employ false base stations to inject fabricated or manipulated network packets. In certain protocols like IMS (IP Multimedia Subsystem), this can be executed remotely from any global location using an IMS client.

The firmware within the cellular baseband, similar to any software, is susceptible to bugs and errors. In the context of the baseband, these software vulnerabilities pose a significant concern due to the heightened exposure of this component within the device's attack surface. There is ample evidence demonstrating the exploitation of software bugs in modem basebands to achieve remote code execution, highlighting the critical risk associated with such vulnerabilities.

The State of Baseband Security

Baseband security has emerged as a prominent area of research, with demonstrations of software bug exploitation featuring in numerous security conferences. Many of these conferences now also incorporate training sessions dedicated to baseband firmware emulation, analysis, and exploitation techniques.

Recent reports by security researchers have noted that most basebands lack exploit mitigations commonly deployed elsewhere and considered best practices in software development. Mature software hardening techniques that are commonplace in the Android operating system, for example, are often absent from cellular firmwares of many popular smartphones.

There are clear indications that exploit vendors and cyber-espionage firms abuse these vulnerabilities to breach the privacy of individuals without their consent. For example, 0-day exploits in the cellular baseband are being used to deploy the Predator malware in smartphones. Additionally, exploit marketplaces explicitly list baseband exploits, often with relatively low payouts, suggesting a potential abundance of such vulnerabilities. These vulnerabilities allow attackers to gain unauthorized access to a device, execute arbitrary code, escalate privileges, or extract sensitive information.

Recognizing these industry trends, Android and Pixel have proactively updated their Vulnerability Rewards Program in recent years, placing a greater emphasis on identifying and addressing exploitable bugs in connectivity firmware.

Building a Fortress: Proactive Defenses in the Pixel Modem

In response to the rising threat of baseband security attacks, Pixel has incrementally incorporated many of the following proactive defenses over the years, with the Pixel 9 phones (Pixel 9, Pixel 9 Pro, Pixel 9 Pro XL and Pixel 9 Pro Fold) showcasing the latest features:

Bounds Sanitizer: Buffer overflows occur when a bug in code allows attackers to cram too much data into a space, causing it to spill over and potentially corrupt other data or execute malicious code. Bounds Sanitizer automatically adds checks around a specific subset of memory accesses to ensure that code does not access memory outside of designated areas, preventing memory corruption.
Integer Overflow Sanitizer: Numbers matter, and when they get too large an “overflow” can cause them to be incorrectly interpreted as smaller values. The reverse can happen as well, a number can overflow in the negative direction as well and be incorrectly interpreted as a larger value. These overflows can be exploited by attackers to cause unexpected behavior. Integer Overflow Sanitizer adds checks around these calculations to eliminate the risk of memory corruption from this class of vulnerabilities.
Stack Canaries: Stack canaries are like tripwires set up to ensure code executes in the expected order. If a hacker tries to exploit a vulnerability in the stack to change the flow of execution without being mindful of the canary, the canary "trips," alerting the system to a potential attack.
Control Flow Integrity (CFI): Similar to stack canaries, CFI makes sure code execution is constrained along a limited number of paths. If an attacker tries to deviate from the allowed set of execution paths, CFI causes the modem to restart rather than take the unallowed execution path.
Auto-Initialize Stack Variables: When memory is designated for use, it’s not normally initialized in C/C+ as it is expected the developer will correctly set up the allocated region. When a developer fails to handle this correctly, the uninitialized values can leak sensitive data or be manipulated by attackers to gain code execution. Pixel phones automatically initialize stack variables to zero, preventing this class of vulnerabilities for stack data.

We also leverage a number of bug detection tools, such as address sanitizer, during our testing process. This helps us identify software bugs and patch them prior to shipping devices to our users.

The Pixel Advantage: Combining Protections for Maximum Security

Security hardening is difficult and our work is never done, but when these security measures are combined, they significantly increase Pixel 9’s resilience to baseband attacks.

Pixel's proactive approach to security demonstrates a commitment to protecting its users across the entire software stack. Hardening the cellular baseband against remote attacks is just one example of how Pixel is constantly working to stay ahead of the curve when it comes to security.

Special thanks to our colleagues who supported our cellular baseband hardening efforts: Dominik Maier, Shawn Yang, Sami Tolvanen, Pirama Arumuga Nainar, Stephen Hines, Kevin Deus, Xuan Xing, Eugene Rodionov, Stephan Somogyi, Wes Johnson, Suraj Harjani, Morgan Shen, Valery Wu, Clint Chen, Cheng-Yi He, Estefany Torres, Hungyen Weng, Jerry Hung, Sherif Hanna

Evaluating Mitigations & Vulnerabilities in Chrome

October 3, 2024

Posted by Alex Gough, Chrome Security Team

The Chrome Security Team is constantly striving to make it safer to browse the web. We invest in mechanisms to make classes of security bugs impossible, mitigations that make it more difficult to exploit a security bug, and sandboxing to reduce the capability exposed by an isolated security issue. When choosing where to invest it is helpful to consider how bad actors find and exploit vulnerabilities. In this post we discuss several axes along which to evaluate the potential harm to users from exploits, and how they apply to the Chrome browser.

Historically the Chrome Security Team has made major investments and driven the web to be safer. We pioneered browser sandboxing, site isolation and the migration to an encrypted web. Today we’re investing in Rust for memory safety, hardening our existing C++ code-base, and improving detection with GWP-asan and lightweight use-after-free (UAF) detection. Considerations of user-harm and attack utility shape our vulnerability severity guidelines and payouts for bugs reported through our Vulnerability Rewards Program. In the longer-term the Chrome Security Team advocates for operating system improvements like less-capable lightweight processes, less-privileged GPU and NPU containers, improved application isolation, and support for hardware-based isolation, memory safety and flow control enforcement.

When contemplating a particular security change it is easy to fall into a trap of security nihilism. It is tempting to reject changes that do not make exploitation impossible but only make it more difficult. However, the scale we are operating at can still make incremental improvements worthwhile. Over time, and over the population that uses Chrome and browsers based on Chromium, these improvements add up and impose real costs on attackers.

Threat Model for Code Execution

Our primary security goal is to make it safe to click on links, so people can feel confident browsing to pages they haven’t visited before. This document focuses on vulnerabilities and exploits that can lead to code execution, but the approach can be applied when mitigating other risks.

Attackers usually have some ultimate goal that can be achieved by executing their code outside of Chrome’s sandboxed or restricted processes. Attackers seek information or capabilities that we do not intend to be available to websites or extensions in the sandboxed renderer process. This might include executing code as the user or with system privileges, reading the memory of other processes, accessing credentials or opening local files. In this post we focus on attackers that start with JavaScript or the ability to send packets to Chrome and end up with something useful. We restrict discussion to memory-safety issues as they are a focus of current hardening efforts.

User Harm ⇔ Attacker Utility

Chrome Security can scalably reduce risks to users by reducing attackers’ freedom of movement. Anything that makes some class of attackers’ ultimate goals more difficult, or (better) impossible, has value. People using Chrome have multiple, diverse adversaries. We should avoid thinking only about a single adversary, or a specific targeted user, the most advanced-persistent attackers or the most sophisticated people using the web. Chrome’s security protects a spectrum of people from a spectrum of attackers and risks. Focussing on a single bug, vector, attacker or user ignores the scale at which both Chrome and its attackers are operating. Reducing risks or increasing costs for even a fraction of threat scenarios helps someone, somewhere, be safer when using the web.

There are still better exploits for attackers and we should recognise and prioritize efforts that meaningfully prevent or fractionally reduce the availability or utility of the best bugs and escalation mechanisms.

Good Bugs and Bad Bugs

All bugs are bad bugs but some bugs are more amenable to exploitation. High value bugs and escalation mechanisms for attackers have some or all of the following attributes:

Reliable

An exploit that sometimes crashes, or that when launched only sometimes allows for exploitation, is less useful than one that can be mechanically triggered in all cases. Crashes might lead to detection by the target or by defenders that collect the crashes. Attackers might not always have more than one chance to launch their attacks. Bugs that only surface when different threads must do things in a certain order require more use of resources or time to trigger. If attackers are willing to risk detection by causing a crash they can retry their attacks as Chrome uses a multi-process architecture for cross-domain iframes. Conversely, bugs that only occur when the main browser process shuts down are more difficult to trigger as attackers get a single attempt per session.

Low-interaction

Chrome exists so that people can visit websites and click on links so we take that as our baseline for minimal interaction. Exploits that only work if a user performs an action, even if that action might be expected, are more risky for an attacker. This is because the code expressing the bug must be resident on a system for longer, the exploit likely has a lower yield as the action won’t always happen, and the bug is less silent as the user might become suspicious if they seem to be performing actions they are not used to performing.

Ubiquitous

A bug that exists on several platforms and can be exploited the same way everywhere will be more useful than one which is only exploitable on one platform or needs to be ported to several platforms. Bugs that manifest on limited hardware types, or in fewer configurations, are only useful if the attacker has targets using them. Every bug an attacker has to integrate into their exploitation flow requires some ongoing maintenance and testing, so the fewer bugs needed the better. For Chrome some bugs only manifest on Linux, while others are present on all of our platforms. Chrome is one of the most ubiquitous software products today, but some of its libraries are even more widely used, so attackers may invest extra effort in finding and exploiting bugs in third party code that Chrome uses. Bugs that require a user to install an extension or rely on particular hardware configurations are less useful than ones reachable from any web page.

Fast

Attacks that require more than a few seconds to set up or execute are less likely to succeed and more likely to be caught. It is more difficult to test and develop a reliable exploit using a slow bug as the compile-test-debug cycle will be stretched.

Scriptable

Bugs that require an exploit to perform grooming or state manipulation to succeed are more valuable if their environment can be scripted. The closer the scripting is to the bug, the easier it is to control the context in which the bug will be triggered. Bugs deep in a codec, or a race in a thread the attacker does not control, are more difficult to script. Scriptable bugs are more easily integrated into an exploitation flow, while bugs that are not scriptable might only be useful if they can be integrated with a related weird machine. Bugs that are adjacent to a scripting engine like JavaScript are easier to trigger - making some bugs in third party libraries more serious in Chrome than they might be in other contexts. Bugs in a tightly coupled API like WebGPU are easy to script. Chrome extensions can manipulate Chrome’s internal state and user-interface (for example, they can open, close and rearrange tabs), making some user-interaction scriptable.

Easy to Test

Attackers need long-term confidence in their exploits, and will want to test them against changing versions of Chrome and the operating system running Chrome. Bugs that can be automatically reproduced in a test environment can be tested easily. Bugs that can only be triggered with user interaction, or after complex network calls, or that require interaction with third-party services are harder to test. They need a complex test environment, or a patched version of Chrome that mimics the environment in a way that triggers the bug. Maintaining this sort of system takes time and resources, making such bugs less attractive. Note that being scriptable relates to the environment of the bug. Scriptable environments lend themselves to easier testing.

Silent

Bugs that cause side effects that can be detected are less useful than those which operate without alerting a user, modifying system state, emitting events, or causing repeatable and detectable network traffic. Side effects include metrics, crashes or slowdowns, pop ups & prompts, system logs and artifacts like downloaded files. Side effects might not alert a specific target of an attack as it happens but might lead to later identification of targeted systems. A bug that several groups know about could be detected without the attacker’s knowledge, even if it seems to succeed.

Long-lived

Attackers will prefer bugs that are not likely to be fixed or found by others. Analyzing and integrating a bug into an exploitation suite likely involves significant up-front work, and attackers will prefer bugs that are likely to last a long time. Many attackers sell exploits as a subscription service, and their economic model might be disrupted if they need to find bugs at a higher rate. Bugs recently introduced into a product, or that might be found with widely known fuzzing techniques, are likely to be found (and possibly fixed) faster.

Targeted

Attackers will try to protect their exploits from discovery and will prefer bugs that can be triggered only when they are confident they will only be exposed to chosen targets. It is relatively easy to fingerprint a web user using cookies, network knowledge and features of the web platform. Removing classes of delivery mechanisms (e.g. no unencrypted HTTP) can make it more difficult to target every exploit.

Easy to escalate

Modern browsers do have several mitigations that make it more difficult to exploit some bugs or bug classes. Attackers usually must take the primitives offered by a bug, then control them to achieve a sub-goal like executing arbitrary system calls. Some bugs won’t chain well to a follow-on stage, or might need significant integration effort or tooling to allow a follow-on stage to proceed. The utility of some bugs is related to how well they couple with later escalation or lateral movement mechanisms. Some bugs by themselves are not useful — but can be combined with other bugs to make them reliable or feasible. Many info leaks fit into this category. A stable read-what-where primitive or a way to probe which memory is allocated makes an arbitrary write easier to execute. If a particular escalation technique crops up often in exploit chains or examples it is worth seeing if it can be remediated.

Easy to find

This may be counter-intuitive but a bug that is easy to find can be useful until Chrome finds and fixes it and potential targets update. Chrome’s source code is publicly available and attackers can look for recent security or stability fixes and exploit them until the fixes are rolled out (N-days). Fuzzing finds the shallow bugs but does not hit those with even simple state requirements that are still amenable to manual discovery. An attacker may choose to specialize in finding bugs in a particular area that does not otherwise receive much security attention. Finally attackers might introduce the bug themselves in a library (a supply-chain attack).

Difficult to find

Some bugs might be easy to find for an attacker because they created the bug, or difficult to find because they are in an under-studied area of the code base, or behind state that is difficult to fuzz. This makes the bug, once found, more valuable as it is likely to be long-lived as other actors will be less likely to find it. Attackers willing to reverse engineer and target closed-source components of Chrome may have access to vulnerabilities that the wider security community are unlikely to discover.

Attacker Goals & Economics

Some attackers have a business model, others have a budget. Coarsely we worry about attackers that want to make money, and attackers that want to spy on people. Bugs and escalation mechanisms are useful to either group if they are well suited to their way of working. We can evaluate mitigations against different attacker's differing economic models. An unsophisticated actor targeting unsophisticated users might use a widely delivered unreliable attack with a low yield (e.g. encouraging people to run a malicious download). They only need to win a small fraction of the time. Other groups may do limited bug discovery but instead take short-lived, already-fixed bugs and integrate them into exploit kits. Some attackers could be modeled as having an infinite budget but they will still choose the cheapest most reliable mechanism to achieve their goals. The deprecation of Flash and the subsequent move to exploiting v8 perhaps best illustrates this.

When deploying mitigations or removing attack-surface we are ultimately trying to hinder adversaries from achieving their goals. Some attackers might make different decisions if the economics of their operations are changed by reducing the yield of the bugs that enable their activities. Some actors may be willing to devote substantial resources to maintaining a capability to target people using the web - and we can only speculate about their response to changes we introduce. For these sophisticated attackers, removing whole classes of vulnerabilities or escalation mechanisms will be more effective.

Avoid linear thinking

We perceive successful exploits as chains — linear steps that start with a bug, proceed through various escalation stages, and achieve an attacker’s immediate goal of code execution or data access outside the sandboxed renderer process. We even ask for such chains through our Vulnerability Rewards Programme. For example, a JS type confusion allows for an out of bounds read/write in the v8 sandbox, a v8 sandbox escape bug allows read/write in the renderer, overwriting a JIT write/execute region allows for arbitrary code execution, and calls to system or browser APIs lead to a browser sandbox escape. The attacker starts with the ability to serve JavaScript to a Chrome user, and ends up with unconstrained code execution on the user’s device, presumably to later use this to meet their higher-level goals. Even useful models of layered defense tend to focus on limited paths that trigger an incident (like the single arrow often drawn piercing slices of swiss-cheese).

In reality the terrain presented to the universe of attackers is a complex web of latent possibilities, some known to some, and many yet to be discovered. This is more than ‘attackers think in graphs’, as we must acknowledge that a defensive intervention can succeed even if it does not prevent every attacker from reaching every possible person they wish to exploit.

Conclusion

It is tempting to reject a mitigation or removal of attack surface on the basis that attackers can simply find another way to achieve their goals. However this mindset presumes the most sophisticated attackers and their most desired targets. Our frame of analysis should be wider. We must recognize that many attackers have limited capability and expertise. Some may graft N-days onto red team tools. Some may have an expert or an exploit pipeline that performs well on a small subset of the Chrome codebase, but need training or more resources to obtain useful bugs if their current domain is taken away. Some will sell exploit kits that need rewriting if an escalation mechanism is removed. Previously reliable exploits might become less reliable, or take longer. Making life more difficult for attackers helps protect people using Chrome.

Although we argue that we should not “give up” on mitigations for escalation paths, it is still clearly more important to implement mitigations that make it impossible or difficult to trigger wide classes of initial vulnerabilities, or bypass a significant fraction of mitigations. Reported attacks always start with an initial vulnerability so it is tempting to invest all of our effort there, but this neglects beneficial interventions later in the attack mesh. Reductions in attacker utility translate to increases in attacker costs and reduction in aggregate risk.

A mitigation or bug-reduction mechanism that affects any of the axes of utility outlined above has some value to some of the people using Chrome.

Resources

Project Zero: What is a "good" memory corruption vulnerability?
An Introduction to Exploit Reliability & What is a "good" Linux Kernel bug? (Isosceles)
Zero Day Markets with Mark Dowd (Security Cryptography Whatever podcast)
Escaping the Sandbox (Chrome and Adobe Pdf Reader) on Windows, Zer0Con 2024, Zhiniang Peng, R4nger, Q4n
Exploring Memory Safety in Critical Open Source Projects (CISA.gov)

Eliminating Memory Safety Vulnerabilities at the Source

September 25, 2024

Posted by Jeff Vander Stoep - Android team, and Alex Rebert - Security Foundations

Memory safety vulnerabilities remain a pervasive threat to software security. At Google, we believe the path to eliminating this class of vulnerabilities at scale and building high-assurance software lies in Safe Coding, a secure-by-design approach that prioritizes transitioning to memory-safe languages.

This post demonstrates why focusing on Safe Coding for new code quickly and counterintuitively reduces the overall security risk of a codebase, finally breaking through the stubbornly high plateau of memory safety vulnerabilities and starting an exponential decline, all while being scalable and cost-effective.

We’ll also share updated data on how the percentage of memory safety vulnerabilities in Android dropped from 76% to 24% over 6 years as development shifted to memory safe languages.

Counterintuitive results

Consider a growing codebase primarily written in memory-unsafe languages, experiencing a constant influx of memory safety vulnerabilities. What happens if we gradually transition to memory-safe languages for new features, while leaving existing code mostly untouched except for bug fixes?

We can simulate the results. After some years, the code base has the following makeup¹ as new memory unsafe development slows down, and new memory safe development starts to take over:

In the final year of our simulation, despite the growth in memory-unsafe code, the number of memory safety vulnerabilities drops significantly, a seemingly counterintuitive result not seen with other strategies:

This reduction might seem paradoxical: how is this possible when the quantity of new memory unsafe code actually grew?

The math

The answer lies in an important observation: vulnerabilities decay exponentially. They have a half-life. The distribution of vulnerability lifetime follows an exponential distribution given an average vulnerability lifetime λ:

A large-scale study of vulnerability lifetimes² published in 2022 in Usenix Security confirmed this phenomenon. Researchers found that the vast majority of vulnerabilities reside in new or recently modified code:

This confirms and generalizes our observation, published in 2021, that the density of Android’s memory safety bugs decreased with the age of the code, primarily residing in recent changes.

This leads to two important takeaways:

The problem is overwhelmingly with new code, necessitating a fundamental change in how we develop code.
Code matures and gets safer with time, exponentially, making the returns on investments like rewrites diminish over time as code gets older.

For example, based on the average vulnerability lifetimes, 5-year-old code has a 3.4x (using lifetimes from the study) to 7.4x (using lifetimes observed in Android and Chromium) lower vulnerability density than new code.

In real life, as with our simulation, when we start to prioritize prevention, the situation starts to rapidly improve.

In practice on Android

The Android team began prioritizing transitioning new development to memory safe languages around 2019. This decision was driven by the increasing cost and complexity of managing memory safety vulnerabilities. There’s much left to do, but the results have already been positive. Here’s the big picture in 2024, looking at total code:

Despite the majority of code still being unsafe (but, crucially, getting progressively older), we’re seeing a large and continued decline in memory safety vulnerabilities. The results align with what we simulated above, and are even better, potentially as a result of our parallel efforts to improve the safety of our memory unsafe code. We first reported this decline in 2022, and we continue to see the total number of memory safety vulnerabilities dropping³. Note that the data for 2024 is extrapolated to the full year (represented as 36, but currently at 27 after the September security bulletin).

The percent of vulnerabilities caused by memory safety issues continues to correlate closely with the development language that’s used for new code. Memory safety issues, which accounted for 76% of Android vulnerabilities in 2019, and are currently 24% in 2024, well below the 70% industry norm, and continuing to drop.

As we noted in a previous post, memory safety vulnerabilities tend to be significantly more severe, more likely to be remotely reachable, more versatile, and more likely to be maliciously exploited than other vulnerability types. As the number of memory safety vulnerabilities have dropped, the overall security risk has dropped along with it.

Evolution of memory safety strategies

Over the past decades, the industry has pioneered significant advancements to combat memory safety vulnerabilities, with each generation of advancements contributing valuable tools and techniques that have tangibly improved software security. However, with the benefit of hindsight, it’s evident that we have yet to achieve a truly scalable and sustainable solution that achieves an acceptable level of risk:

1st generation: reactive patching. The initial focus was mainly on fixing vulnerabilities reactively. For problems as rampant as memory safety, this incurs ongoing costs on the business and its users. Software manufacturers have to invest significant resources in responding to frequent incidents. This leads to constant security updates, leaving users vulnerable to unknown issues, and frequently albeit temporarily vulnerable to known issues, which are getting exploited ever faster.

2nd generation: proactive mitigating. The next approach consisted of reducing risk in vulnerable software, including a series of exploit mitigation strategies that raised the costs of crafting exploits. However, these mitigations, such as stack canaries and control-flow integrity, typically impose a recurring cost on products and development teams, often putting security and other product requirements in conflict:

They come with performance overhead, impacting execution speed, battery life, tail latencies, and memory usage, sometimes preventing their deployment.
Attackers are seemingly infinitely creative, resulting in a cat-and-mouse game with defenders. In addition, the bar to develop and weaponize an exploit is regularly being lowered through better tooling and other advancements.

3rd generation: proactive vulnerability discovery. The following generation focused on detecting vulnerabilities. This includes sanitizers, often paired with fuzzing like libfuzzer, many of which were built by Google. While helpful, these methods address the symptoms of memory unsafety, not the root cause. They typically require constant pressure to get teams to fuzz, triage, and fix their findings, resulting in low coverage. Even when applied thoroughly, fuzzing does not provide high assurance, as evidenced by vulnerabilities found in extensively fuzzed code.

Products across the industry have been significantly strengthened by these approaches, and we remain committed to responding to, mitigating, and proactively hunting for vulnerabilities. Having said that, it has become increasingly clear that those approaches are not only insufficient for reaching an acceptable level of risk in the memory-safety domain, but incur ongoing and increasing costs to developers, users, businesses, and products. As highlighted by numerous government agencies, including CISA, in their secure-by-design report, "only by incorporating secure by design practices will we break the vicious cycle of constantly creating and applying fixes."

The fourth generation: high-assurance prevention

The shift towards memory safe languages represents more than just a change in technology, it is a fundamental shift in how to approach security. This shift is not an unprecedented one, but rather a significant expansion of a proven approach. An approach that has already demonstrated remarkable success in eliminating other vulnerability classes like XSS.

The foundation of this shift is Safe Coding, which enforces security invariants directly into the development platform through language features, static analysis, and API design. The result is a secure by design ecosystem providing continuous assurance at scale, safe from the risk of accidentally introducing vulnerabilities.

The shift from previous generations to Safe Coding can be seen in the quantifiability of the assertions that are made when developing code. Instead of focusing on the interventions applied (mitigations, fuzzing), or attempting to use past performance to predict future security, Safe Coding allows us to make strong assertions about the code's properties and what can or cannot happen based on those properties.

Safe Coding's scalability lies in its ability to reduce costs by:

Breaking the arms race: Instead of an endless arms race of defenders attempting to raise attackers’ costs by also raising their own, Safe Coding leverages our control of developer ecosystems to break this cycle by focusing on proactively building secure software from the start.
Commoditizing high assurance memory safety: Rather than precisely tailoring interventions to each asset's assessed risk, all while managing the cost and overhead of reassessing evolving risks and applying disparate interventions, Safe Coding establishes a high baseline of commoditized security, like memory-safe languages, that affordably reduces vulnerability density across the board. Modern memory-safe languages (especially Rust) extend these principles beyond memory safety to other bug classes.
Increasing productivity: Safe Coding improves code correctness and developer productivity by shifting bug finding further left, before the code is even checked in. We see this shift showing up in important metrics such as rollback rates (emergency code revert due to an unanticipated bug). The Android team has observed that the rollback rate of Rust changes is less than half that of C++.

From lessons to action

Interoperability is the new rewrite

Based on what we’ve learned, it's become clear that we do not need to throw away or rewrite all our existing memory-unsafe code. Instead, Android is focusing on making interoperability safe and convenient as a primary capability in our memory safety journey. Interoperability offers a practical and incremental approach to adopting memory safe languages, allowing organizations to leverage existing investments in code and systems, while accelerating the development of new features.

We recommend focusing investments on improving interoperability, as we are doing with Rust ↔︎ C++ and Rust ↔︎ Kotlin. To that end, earlier this year, Google provided a $1,000,000 grant to the Rust Foundation, in addition to developing interoperability tooling like Crubit and autocxx.

Role of previous generations

As Safe Coding continues to drive down risk, what will be the role of mitigations and proactive detection? We don’t have definitive answers in Android, but expect something like the following:

More selective use of proactive mitigations: We expect less reliance on exploit mitigations as we transition to memory-safe code, leading to not only safer software, but also more efficient software. For instance, after removing the now unnecessary sandbox, Chromium's Rust QR code generator is 20 times faster.
Decreased use, but increased effectiveness of proactive detection: We anticipate a decreased reliance on proactive detection approaches like fuzzing, but increased effectiveness, as achieving comprehensive coverage over small well-encapsulated code snippets becomes more feasible.

Final thoughts

Fighting against the math of vulnerability lifetimes has been a losing battle. Adopting Safe Coding in new code offers a paradigm shift, allowing us to leverage the inherent decay of vulnerabilities to our advantage, even in large existing systems. The concept is simple: once we turn off the tap of new vulnerabilities, they decrease exponentially, making all of our code safer, increasing the effectiveness of security design, and alleviating the scalability challenges associated with existing memory safety strategies such that they can be applied more effectively in a targeted manner.

This approach has proven successful in eliminating entire vulnerability classes and its effectiveness in tackling memory safety is increasingly evident based on more than half a decade of consistent results in Android.

We'll be sharing more about our secure-by-design efforts in the coming months.

Acknowledgements

Thanks Alice Ryhl for coding up the simulation. Thanks to Emilia Kasper, Adrian Taylor, Manish Goregaokar, Christoph Kern, and Lars Bergstrom for your helpful feedback on this post.

Notes

Simulation was based on numbers similar to Android and other Google projects. The code base doubles every 6 years. The average lifetime for vulnerabilities is 2.5 years. It takes 10 years to transition to memory safe languages for new code, and we use a sigmoid function to represent the transition. Note that the use of the sigmoid function is why the second chart doesn’t initially appear to be exponential. ↩
Alexopoulos et al. "How Long Do Vulnerabilities Live in the Code? A Large-Scale Empirical Measurement Study on FOSS Vulnerability Lifetimes". USENIX Security 22. ↩
Unlike our simulation, these are vulnerabilities from a real code base, which comes with higher variance, as you can see in the slight increase in 2023. Vulnerability reports were unusually high that year, but in line with expectations given code growth, so while the percentage of memory safety vulnerabilities continued to drop, the absolute number increased slightly. ↩

Google & Arm - Raising The Bar on GPU Security

September 24, 2024

Posted by Xuan Xing, Eugene Rodionov, Jon Bottarini, Adam Bacchus - Android Red Team; Amit Chaudhary, Lyndon Fawcett, Joseph Artgole - Arm Product Security Team

Who cares about GPUs?

You, me, and the entire ecosystem! GPUs (graphics processing units) are critical in delivering rich visual experiences on mobile devices. However, the GPU software and firmware stack has become a way for attackers to gain permissions and entitlements (privilege escalation) to Android-based devices. There are plenty of issues in this category that can affect all major GPU brands, for example, CVE-2023-4295, CVE-2023-21106, CVE-2021-0884, and more. Most exploitable GPU vulnerabilities are in the implementation of the GPU kernel mode modules. These modules are pieces of code that load/unload during runtime, extending functionality without the need to reboot the device.

Proactive testing is good hygiene as it can lead to the detection and resolution of new vulnerabilities before they’re exploited. It’s also one of the most complex investigations to do as you don’t necessarily know where the vulnerability will appear (that’s the point!). By combining the expertise of Google’s engineers with IP owners and OEMs, we can ensure the Android ecosystem retains a strong measure of integrity.

Why investigate GPUs?

When researching vulnerabilities, GPUs are a popular target due to:

Functionality vs. Security Tradeoffs
Nobody wants a slow, unresponsive device; any hits to GPU performance could result in a noticeably degraded user experience. As such, the GPU software stack in Android relies on an in-process HAL model where the API & user space drivers communicating with the GPU kernel mode module are running directly within the context of apps, thus avoiding IPC (interprocess communication). This opens the door for potentially untrusted code from a third party app being able to directly access the interface exposed by the GPU kernel module. If there are any vulnerabilities in the module, the third party app has an avenue to exploit them. As a result, a potentially untrusted code running in the context of the third party application is able to directly access the interface exposed by the GPU kernel module and exploit potential vulnerabilities in the kernel module.
Variety & Memory Safety
Additionally, the implementation of GPU subsystems (and kernel modules specifically) from major OEMs are increasingly complex. Kernel modules for most GPUs are typically written in memory unsafe languages such as C, which are susceptible to memory corruption vulnerabilities like buffer overflow.

Can someone do something about this?

Great news, we already have! Who’s we? The Android Red Team and Arm! We’ve worked together to run an engagement on the Mali GPU (more on that below), but first, a brief introduction:

Android Red Team

The Android Red Team performs time-bound security assessment engagements on all aspects of the Android open source codebase and conducts regular security reviews and assessments of internal Android components. Throughout these engagements, the Android Red Team regularly collaborates with 3rd party software and hardware providers to analyze and understand proprietary and “closed source” code repositories and relevant source code that are utilized by Android products with the sole objective to identify security risks and potential vulnerabilities before they can be exploited by adversaries outside of Android. This year, the Android Red Team collaborated directly with our industry partner, Arm, to conduct the Mali GPU engagement and further secure millions of Android devices.

Arm Product Security and GPU Teams

Arm has a central product security team that sets the policy and practice across the company. They also have dedicated product security experts embedded in engineering teams. Arm operates a systematic approach which is designed to prevent, discover, and eliminate security vulnerabilities. This includes a Security Development Lifecycle (SDL), a Monitoring capability, and Incident Response. For this collaboration the Android Red Teams were supported by the embedded security experts based in Arm’s GPU engineering team.

Working together to secure Android devices

Google’s Android Security teams and Arm have been working together for a long time. Security requirements are never static, and challenges exist with all GPU vendors. By frequently sharing expertise, the Android Red Team and Arm were able to accelerate detection and resolution. Investigations of identified vulnerabilities, potential remediation strategies, and hardening measures drove detailed analyses and the implementation of fixes where relevant.

Recent research focused on the Mali GPU because it is the most popular GPU in today's Android devices. Collaborating on GPU security allowed us to:

Assess the impact on the broadest segment of the Android Ecosystem: The Arm Mali GPU is one of the most used GPUs by original equipment manufacturers (OEMs) and is found in many popular mobile devices. By focusing on the Arm Mali GPU, the Android Red Team could assess the security of a GPU implementation running on millions of Android devices worldwide.
Evaluate the reference implementation and vendor-specific changes: Phone manufacturers often modify the upstream implementation of GPUs. This tailors the GPU to the manufacturer's specific device(s). These modifications and enhancements are always challenging to make, and can sometimes introduce security vulnerabilities that are not present in the original version of the GPU upstream. In this specific instance, the Google Pixel team actively worked with the Android Red Team to better understand and secure the modifications they made for Pixel devices.

Improvements

Investigations have led to significant improvements, leveling up the security of the GPU software/firmware stack across a wide segment of the Android ecosystem.

Testing the kernel driver

One key component of the GPU subsystem is its kernel mode driver. During this engagement, both the Android Red Team and Arm invested significant effort looking at the Mali kbase kernel driver. Due to its complexity, fuzzing was chosen as the primary testing approach for this area. Fuzzing automates and scales vulnerability discovery in a way not possible via manual methods. With help from Arm, the Android Red Team added more syzkaller fuzzing descriptions to match the latest Mali kbase driver implementation.

The team built a few customizations to enable fuzzing the Mali kbase driver in the cloud, without physical hardware. This provided a huge improvement to fuzzing performance and scalability. With the Pixel team’s support, we also were able to set up fuzzing on actual Pixel devices. Through the combination of cloud-based fuzzing, Pixel-based fuzzing, and manual review, we were able to uncover two memory issues in Pixel’s customization of driver code (CVE-2023-48409 and CVE-2023-48421).

Both issues occurred inside of the gpu_pixel_handle_buffer_liveness_update_ioctl function, which is implemented by the Pixel team as part of device specific customization. These are both memory issues caused by integer overflow problems. If exploited carefully alongside other vulnerabilities, these issues could lead to kernel privilege escalation from user space. Both issues were fixed and the patch was released to affected devices in Pixel security bulletin 2023-12-01.

Testing the firmware

Firmware is another fundamental building block of the GPU subsystem. It’s the intermediary working with kernel drivers and GPU hardware. In many cases, firmware functionality is directly/indirectly accessible from the application. So “application ⇒ kernel ⇒ firmware ⇒ kernel” is a known attack flow in this area. Also, in general, firmware runs on embedded microcontrollers with limited resources. Commonly used security kernel mitigations (ASLR, stack protection, heap protection, certain sanitizers, etc.) might not be applicable to firmware due to resource constraints and performance impact. This can make compromising firmware easier, in some cases, than directly compromising kernel drivers from user space. To test the integrity of existing firmware, the Android Red Team and Arm worked together to perform both fuzzing and formal verification along with manual analysis. This multi-pronged approach led to the discovery of CVE-2024-0153, which had a patch released in the July 2024 Android Security Bulletin.

CVE-2024-0153 happens when GPU firmware handles certain instructions. When handling such instructions, the firmware copies register content into a buffer. There are size checks before the copy operation. However, under very specific conditions, an out-of-bounds write happens to the destination buffer, leading to a buffer overflow. When carefully manipulated, this overflow will overwrite some other important structures following the buffer, causing code execution inside of the GPU firmware.

The conditions necessary to reach and potentially exploit this issue are very complex as it requires a deep understanding of how instructions are executed. With collective expertise, the Android Red Team and Arm were able to verify the exploitation path and leverage the issue to gain limited control of GPU firmware. This eventually circled back to the kernel to obtain privilege escalation. Arm did an excellent job to respond quickly and remediate the issue. Altogether, this highlights the strength of collaboration between both teams to dive deeper.

Time to Patch

It’s known that attackers exploit GPU vulnerabilities in the wild, and time to patch is crucial to reduce risk of exploitation and protect users. As a result of this engagement, nine new Security Test suite (STS) tests were built to help partners automatically check their builds for missing Mali kbase patches. (Security Test Suite is software provided by Google to help partners automate the process of checking their builds for missing security patches.)

What’s Next?

The Arm Product Security Team is actively involved in security-focused industry communities and collaborates closely with its ecosystem partners. The engagement with the Android Red Team, for instance, provides valuable enablement that drives best practices and product excellence. Building on this collaborative approach, Arm is complementing its product security assurance capabilities with a bug bounty program. This investment will expand Arm’s efforts to identify potential vulnerabilities. For more information on Arm's product security initiatives, please visit this product security page.

The Android Red Team and Arm continue to work together to proactively raise the bar on GPU security. With thorough testing, rapid fixing, and updates to the security test suite, we’re improving the ecosystem for Android users. The Android Red Team looks forward to replicating this working relationship with other ecosystem partners to make devices more secure.

Security Blog

Notes

Notes

AI-powered protection for your device the moment it is stolen

Advanced security to deter theft before it happens

Real-world protection for billions of Android users

Notes

Threat Model for Code Execution

Good Bugs and Bad Bugs

Resources

Counterintuitive results

The math

In practice on Android

Evolution of memory safety strategies

The fourth generation: high-assurance prevention

From lessons to action

Interoperability is the new rewrite

Role of previous generations

Final thoughts

Acknowledgements

Notes

Who cares about GPUs?

Why investigate GPUs?

Can someone do something about this?

Working together to secure Android devices

Improvements

Testing the kernel driver

Testing the firmware

Time to Patch

What’s Next?

Labels

Archive

Feed