1100 results sorted by ID
Scribe: Low-memory SNARKs via Read-Write Streaming
Anubhav Baweja, Pratyush Mishra, Tushar Mopuri, Karan Newatia, Steve Wang
Cryptographic protocols
Succinct non-interactive arguments of knowledge (SNARKs) enable a prover to produce a short and efficiently verifiable proof of the validity of an arbitrary NP statement. Recent constructions of efficient SNARKs have led to interest in using them for a wide range of applications, but unfortunately, deployment of SNARKs in these applications faces a key bottleneck: SNARK provers require a prohibitive amount of time and memory to generate proofs for even moderately large statements. While...
Lova: Lattice-Based Folding Scheme from Unstructured Lattices
Giacomo Fenzi, Christian Knabenhans, Ngoc Khanh Nguyen, Duc Tu Pham
Cryptographic protocols
Folding schemes (Kothapalli et al., CRYPTO 2022) are a conceptually simple, yet powerful cryptographic primitive that can be used as a building block to realise incrementally verifiable computation (IVC) with low recursive overhead without general-purpose non-interactive succinct arguments of knowledge (SNARK).
Most folding schemes known rely on the hardness of the discrete logarithm problem, and thus are both not quantum-resistant and operate over large prime fields. Existing post-quantum...
Machine Learning-Based Detection of Glitch Attacks in Clock Signal Data
Asier Gambra, Durba Chatterjee, Unai Rioja, Igor Armendariz, Lejla Batina
Attacks and cryptanalysis
Voltage fault injection attacks are a particularly powerful threat to secure embedded devices because they exploit brief, hard-to-detect power fluctuations causing errors or bypassing security mechanisms. To counter these attacks, various detectors are employed, but as defenses strengthen, increasingly elusive glitches continue to emerge. Artificial intelligence, with its inherent ability to learn and adapt to complex patterns, presents a promising solution. This research presents an...
PASTA on Edge: Cryptoprocessor for Hybrid Homomorphic Encryption
Aikata Aikata, Daniel Sanz Sobrino, Sujoy Sinha Roy
Implementation
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
Orion's Ascent: Accelerating Hash-Based Zero Knowledge Proof on Hardware Platforms
Florian Hirner, Florian Krieger, Constantin Piber, Sujoy Sinha Roy
Implementation
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing the underlying data. Such proofs have applications in privacy-preserving technologies and verifiable computations. However, slow proof generation poses a significant challenge in the wide-scale adoption of ZKP. Orion is a recent ZKP scheme with linear prover time. It leverages coding theory, expander graphs, and Merkle hash trees to improve computational...
OPL4GPT: An Application Space Exploration of Optimal Programming Language for Hardware Design by LLM
Kimia Tasnia, Sazadur Rahman
Implementation
Despite the emergence of Large Language Models (LLMs) as potential tools for automating hardware design, the optimal programming language to describe hardware functions remains unknown. Prior works extensively explored optimizing Verilog-based HDL design, which often overlooked the potential capabilities of alternative programming languages for hardware designs. This paper investigates the efficacy of C++ and Verilog as input languages in extensive application space exploration, tasking an...
An Open Source Ecosystem for Implementation Security Testing
Aydin Aysu, Fatemeh Ganji, Trey Marcantonio, Patrick Schaumont
Attacks and cryptanalysis
Implementation-security vulnerabilities such as the
power-based side-channel leakage and fault-injection sensitivity
of a secure chip are hard to verify because of the sophistication
of the measurement setup, as well as the need to generalize the
adversary into a test procedure. While the literature has proposed
a wide range of vulnerability metrics to test the correctness of a
secure implementation, it is still up to the subject-matter expert to
map these concepts into a working and...
Efficient Modular Multiplication Hardware for Number Theoretic Transform on FPGA
Tolun Tosun, Selim Kırbıyık, Emre Koçer, Erkay Savaş, Ersin Alaybeyoğlu
Implementation
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
IO-Optimized Design-Time Configurable Negacyclic Seven-Step NTT Architecture for FHE Applications
Emre Koçer, Selim Kırbıyık, Tolun Tosun, Ersin Alaybeyoğlu, Erkay Savaş
FHE enables computations on encrypted data, making it essential for privacy-preserving applications. However, it involves computationally demanding tasks, such as polynomial multiplication, while NTT is the state-of-the-art solution to perform this task. Most FHE schemes operate over the negacyclic ring of polynomials. We introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the...
Classic McEliece Hardware Implementation with Enhanced Side-Channel and Fault Resistance
Peizhou Gan, Prasanna Ravi, Kamal Raj, Anubhab Baksi, Anupam Chattopadhyay
Implementation
In this work, we propose the first hardware implementation of Classic McEliece protected with countermeasures against Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA). Classic Mceliece is one of the leading candidates for Key Encapsulation Mechanisms (KEMs) in the ongoing round 4 of the NIST standardization process for post-quantum cryptography. In particular, we implement a range of generic countermeasures against SCA and FIA, particularly protected the vulnerable operations...
OPTIMSM: FPGA hardware accelerator for Zero-Knowledge MSM
Xander Pottier, Thomas de Ruijter, Jonas Bertels, Wouter Legiest, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
Improved ML-DSA Hardware Implementation With First Order Masking Countermeasure
Kamal Raj, Prasanna Ravi, Tee Kiah Chia, Anupam Chattopadhyay
Implementation
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
PriSrv: Privacy-Enhanced and Highly Usable Service Discovery in Wireless Communications
Yang Yang, Robert H. Deng, Guomin Yang, Yingjiu Li, HweeHwa Pang, Minming Huang, Rui Shi, Jian Weng
Cryptographic protocols
Service discovery is essential in wireless communications. However, existing service discovery protocols provide no or very limited privacy protection for service providers and clients, and they often leak sensitive information (e.g., service type, client’s identity and mobility pattern), which leads to various network-based attacks (e.g., spoofing, man-in-the-middle, identification and tracking). In this paper, we propose a private service discovery protocol, called PriSrv, which allows a...
HADES: Range-Filtered Private Aggregation on Public Data
Xiaoyuan Liu, Ni Trieu, Trinabh Gupta, Ishtiyaque Ahmad, Dawn Song
Cryptographic protocols
In aggregation queries, predicate parameters often reveal user intent. Protecting these parameters is critical for user privacy, regardless of whether the database is public or private. While most existing works focus on private data settings, we address a public data setting where the server has access to the database. Current solutions for this setting either require additional setups (e.g., noncolluding servers, hardware enclaves) or are inefficient for practical workloads. Furthermore,...
On the practicality of quantum sieving algorithms for the shortest vector problem
Joao F. Doriguello, George Giapitzakis, Alessandro Luongo, Aditya Morolia
Attacks and cryptanalysis
One of the main candidates of post-quantum cryptography is lattice-based cryptography. Its cryptographic security against quantum attackers is based on the worst-case hardness of lattice problems like the shortest vector problem (SVP), which asks to find the shortest non-zero vector in an integer lattice. Asymptotic quantum speedups for solving SVP are known and rely on Grover's search. However, to assess the security of lattice-based cryptography against these Grover-like quantum speedups,...
Multiplying Polynomials without Powerful Multiplication Instructions (Long Paper)
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
On Constructing Pseudorandom Involutions: Feistel variants using a single round function
Chun Guo, Meiqin Wang, Weijia Wang
Secret-key cryptography
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions.
First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides...
Efficient Boolean-to-Arithmetic Mask Conversion in Hardware
Aein Rezaei Shahmirzadi, Michael Hutter
Implementation
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Efficient Key-Switching for Word-Type FHE and GPU Acceleration
Shutong Jin, Zhen Gu, Guangyan Li, Donglong Chen, Çetin Kaya Koç, Ray C. C. Cheung, Wangchen Dai
Implementation
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Shaking up authenticated encryption
Joan Daemen, Seth Hoffert, Silvia Mella, Gilles Van Assche, Ronny Van Keer
Secret-key cryptography
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the...
HEonGPU: a GPU-based Fully Homomorphic Encryption Library 1.0
Ali Şah Özcan, Erkay Savaş
Implementation
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
Functional Adaptor Signatures: Beyond All-or-Nothing Blockchain-based Payments
Nikhil Vanjani, Pratik Soni, Sri AravindaKrishnan Thyagarajan
Cryptographic protocols
In scenarios where a seller holds sensitive data $x$, like employee / patient records or ecological data, and a buyer seeks to obtain an evaluation of specific function $f$ on this data, solutions in trustless digital environments like blockchain-based Web3 systems typically fall into two categories: (1) Smart contract-powered solutions and (2) cryptographic solutions leveraging tools such as adaptor signatures. The former approach offers atomic transactions where the buyer learns the...
The SMAesH dataset
Gaëtan Cassiers, Charles Momin
Implementation
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
A Combined Design of 4-PLL-TRNG and 64-bit CDC-7-XPUF on a Zynq-7020 SoC
Oğuz Yayla, Yunus Emre Yılmaz
Implementation
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
Marian: An Open Source RISC-V Processor with Zvk Vector Cryptography Extensions
Thomas Szymkowiak, Endrit Isufi, Markku-Juhani Saarinen
Implementation
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
Attestation Proof of Association – provability that attestation keys are bound to the same hardware and person
Eric Verheul
Implementation
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
Scabbard: An Exploratory Study on Hardware Aware Design Choices of Learning with Rounding-based Key Encapsulation Mechanisms
Suparna Kundu, Quinten Norga, Angshuman Karmakar, Shreya Gangopadhyay, Jose Maria Bermudo Mera, Ingrid Verbauwhede
Implementation
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
Eva: Efficient IVC-Based Authentication of Lossy-Encoded Videos
Chengru Zhang, Xiao Yang, David Oswald, Mark Ryan, Philipp Jovanovic
Applications
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
Reality Check on Side-Channels: Lessons learnt from breaking AES on an ARM Cortex A processor
Shivam Bhasin, Harishma Boyapally, Dirmanto Jap
Attacks and cryptanalysis
AES implementation has been vastly analysed against side-channel attacks in the last two decades particularly targeting resource-constrained microcontrollers. Still, less research has been conducted on AES implementations on advanced hardware platforms. In this study, we examine the resilience of AES on an ARM Cortex A72 processor within the Raspberry Pi 4B model. Unlike their microcontroller counterparts, these platforms operate within the complex ecosystem of an operating system (OS),...
EUCLEAK
Thomas Roche
Attacks and cryptanalysis
Secure elements are small microcontrollers whose main purpose is to generate/store secrets and then execute cryptographic operations. They undergo the highest level of security evaluations that exists (Common Criteria) and are often considered inviolable, even in the worst-case attack scenarios. Hence, complex secure systems build their security upon them.
FIDO hardware tokens are strong authentication factors to sign in to applications (any web service supporting FIDO); they often embed...
AGATE: Augmented Global Attested Trusted Execution in the Universal Composability framework
Lorenzo Martinico, Markulf Kohlweiss
Foundations
A Trusted Execution Environment (TEE) is a new type of security technology, implemented by CPU manufacturers, which guarantees integrity and confidentiality on a restricted execution environment to any remote verifier. TEEs are deployed on various consumer and commercial hardwareplatforms, and have been widely adopted as a component in the design of cryptographic protocols both theoretical and practical.
Within the provable security community, the use of TEEs as a setup assumption has...
Update to the Sca25519 Library: Mitigating Tearing-based Side-channel Attacks
Lukasz Chmielewski, Lubomír Hrbáček
Implementation
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios.
This library is protected against various passive and...
SoK: Instruction Set Extensions for Cryptographers
Hao Cheng, Johann Großschädl, Ben Marshall, Daniel Page, Markku-Juhani O. Saarinen
Implementation
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
MAESTRO: Multi-party AES using Lookup Tables
Hiraku Morita, Erik Pohle, Kunihiko Sadakane, Peter Scholl, Kazunari Tozawa, Daniel Tschudi
Cryptographic protocols
Secure multi-party computation (MPC) enables multiple distrusting parties to jointly compute a function while keeping their inputs private. Computing the AES block cipher in MPC, where the key and/or the input are secret-shared among the parties is important for various applications, particularly threshold cryptography.
In this work, we propose a family of dedicated, high-performance MPC protocols to compute the non-linear S-box part of AES in the honest majority setting. Our protocols...
R-STELLAR: A Resilient Synthesizable Signature Attenuation SCA Protection on AES-256 with built-in Attack-on-Countermeasure Detection
Archisman Ghosh, Dong-Hyun Seo, Debayan Das, Santosh Ghosh, Shreyas Sen
Applications
Side-channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side-channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these side channels significantly reduces the attacker’s search space.
In recent years, physical...
A Survey on SoC Security Verification Methods at the Pre-silicon Stage
Rasheed Kibria, Farimah Farahmandi, Mark Tehranipoor
Foundations
This paper presents a survey of the state-of-the-art pre-silicon security verification techniques for System-on-Chip (SoC) designs, focusing on ensuring that designs, implemented in hardware description languages (HDLs) and synthesized circuits, meet security requirements before fabrication in semiconductor foundries. Due to several factors, pre-silicon security verification has become an essential yet challenging aspect of the SoC hardware lifecycle. The modern SoC design process often...
Robust but Relaxed Probing Model
Nicolai Müller, Amir Moradi
Applications
Masking has become a widely applied and heavily researched method to protect cryptographic implementations against SCA attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through so-called probing models. Specifically, the robust $d$-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when...
FELIX (XGCD for FALCON): FPGA-based Scalable and Lightweight Accelerator for Large Integer Extended GCD
Sam Coulon, Tianyou Bao, Jiafeng Xie
Implementation
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
EMI Shielding for Use in Side-Channel Security: Analysis, Simulation and Measurements
Daniel Dobkin, Edut Katz, David Popovtzer, Itamar Levi
Attacks and cryptanalysis
Considering side-channel analysis (SCA) security for cryptographic devices, the mitigation of electromagnetic leakage and electromagnetic interference (EMI) between modules poses significant challenges. This paper presents a comprehensive review and deep analysis of the utilization of EMI shielding materials, devised for reliability purposes and standards such as EMI/EMC, as a countermeasure to enhance EM-SCA security. We survey the current landscape of EMI-shields materials, including...
Koala: A Low-Latency Pseudorandom Function
Parisa Amiri Eliasi, Yanis Belkheyar, Joan Daemen, Santosh Ghosh, Daniël Kuijsters, Alireza Mehrdad, Silvia Mella, Shahram Rasoolzadeh, Gilles Van Assche
Secret-key cryptography
This paper introduces the Koala PRF, which maps a variable-length sequence of $64$-bit input blocks to a single $257$-bit output block.
Its design focuses on achieving low latency in its implementation in ASIC.
To construct Koala, we instantiate the recently introduced Kirby construction with the Koala-P permutation and add an input encoding layer.
The Koala-P permutation is obtained as the $8$-fold iteration of a simple round function inspired by that of Subterranean.
Based on...
Designing a General-Purpose 8-bit (T)FHE Processor Abstraction
Daphné Trama, Pierre-Emmanuel Clet, Aymen Boudguiga, Renaud Sirdey
Applications
Making the most of TFHE programmable bootstrapping to evaluate
functions or operators otherwise challenging to perform with only the native addition
and multiplication of the scheme is a very active line of research. In this paper, we
systematize this approach and apply it to build an 8-bit FHE processor abstraction,
i.e., a software entity that works over FHE-encrypted 8-bits data and presents itself
to the programmer by means of a conventional-looking assembly instruction set.
In...
Hardware Implementation and Security Analysis of Local-Masked NTT for CRYSTALS-Kyber
Rafael Carrera Rodriguez, Emanuele Valea, Florent Bruguier, Pascal Benoit
Implementation
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic
Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
Towards ML-KEM & ML-DSA on OpenTitan
Amin Abdulrahman, Felix Oberhansl, Hoang Nguyen Hien Pham, Jade Philipoom, Peter Schwabe, Tobias Stelzer, Andreas Zankl
Implementation
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
Lightweight Dynamic Linear Components for Symmetric Cryptography
S. M. Dehnavi, M. R. Mirzaee Shamsabad
Foundations
In this paper, using the concept of equivalence of mappings we characterize all of the one-XOR matrices which are used in hardware applications and propose a family of lightweight linear mappings for software-oriented applications in symmetric cryptography. Then, we investigate interleaved linear mappings and based upon this study, we present generalized dynamic primitive LFSRs along with dynamic linear components for construction of diffusion layers.
From the mathematical...
STORM — Small Table Oriented Redundancy-based SCA Mitigation for AES
Yaacov Belenky, Hennadii Chernyshchyk, Oleg Karavaev, Oleh Maksymenko, Valery Teper, Daria Ryzhkova, Itamar Levi, Osnat Keren, Yury Kreimer
Attacks and cryptanalysis
Side-channel-analysis (SCA) resistance with cost optimization in AES hardware implementations remains a significant challenge. While traditional masking-based schemes offer provable security, they often incur substantial resource overheads (latency, area, randomness, performance, power consumption). Alternatively, the RAMBAM scheme introduced a redundancy-based approach to control the signal-to-noise ratio, and achieves exponential leakage reduction as redundancy increases. This method...
MATTER: A Wide-Block Tweakable Block Cipher
Roberto Avanzi, Orr Dunkelman, Kazuhiko Minematsu
Secret-key cryptography
In this note, we introduce the MATTER Tweakable Block Cipher, designed principally for low latency in low-area hardware implementations, but that can also be implemented in an efficient and compact way in software.
MATTER is a 512-bit wide balanced Feistel network with three to six rounds, using the ASCON permutation as the round function.
The Feistel network defines a keyed, non-tweakable core, which is made tweakable by using the encryption of the tweak as its key.
Key and tweak are...
Rudraksh: A compact and lightweight post-quantum key-encapsulation mechanism
Suparna Kundu, Archisman Ghosh, Angshuman Karmakar, Shreyas Sen, Ingrid Verbauwhede
Public-key cryptography
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
LR-OT: Leakage-Resilient Oblivious Transfer
Francesco Berti, Carmit Hazay, Itamar Levi
Cryptographic protocols
Oblivious Transfer (OT) is a fundamental cryptographic primitive, becoming a crucial component of a practical secure protocol.
OT is typically implemented in software, and one way to accelerate its running time is by using hardware implementations.
However, such implementations are vulnerable to side-channel attacks (SCAs).
On the other hand, protecting interactive protocols against SCA is highly challenging because of their longer secrets (which include inputs and randomness), more...
A Fast and Efficient SIKE Co-Design: Coarse-Grained Reconfigurable Accelerators with Custom RISC-V Microcontroller on FPGA
Jing Tian, Bo Wu, Lang Feng, Haochen Zhang, Zhongfeng Wang
Implementation
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
Oryx: Private detection of cycles in federated graphs
Ke Zhong, Sebastian Angel
Cryptographic protocols
This paper proposes Oryx, a system for efficiently detecting cycles in federated graphs where parts of the graph are held by different parties and are private. Cycle detection is an important building block in designing fraud detection algorithms that operate on confidential transaction data held by different financial institutions. Oryx allows detecting cycles of various length while keeping the topology of the graphs secret, and it does so efficiently; Oryx achieves quasilinear...
QuietOT: Lightweight Oblivious Transfer with a Public-Key Setup
Geoffroy Couteau, Lalita Devadas, Srinivas Devadas, Alexander Koch, Sacha Servan-Schreiber
Cryptographic protocols
Oblivious Transfer (OT) is at the heart of secure computation and is a foundation for many applications in cryptography. Over two decades of work have led to extremely efficient protocols for evaluating OT instances in the preprocessing model, through a paradigm called OT extension.
A few OT instances generated in an offline phase can be used to perform many OTs in an online phase efficiently, i.e., with very low communication and computational overheads.
Specifically, traditional OT...
A More Compact AES, and More
Dag Arne Osvik, David Canright
Implementation
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
Reading It like an Open Book: Single-trace Blind Side-channel Attacks on Garbled Circuit Frameworks
Sirui Shen, Chenglu Jin
Attacks and cryptanalysis
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers.
We investigate popular open-source GC frameworks and discover that the AES...
Grafting: Complementing RNS in CKKS
Jung Hee Cheon, Hyeongmin Choe, Minsik Kang, Jaehyung Kim
Implementation
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
Secure Vickrey Auctions with Rational Parties
Chaya Ganesh, Shreyas Gupta, Bhavana Kanukurthi, Girisha Shankar
Cryptographic protocols
In this work, we construct a second price (Vickrey) auction protocol (SPA), which does not require any auctioneers and ensures total privacy in the presence of rational parties participating in auction. In particular, the confidentiality of the highest bid and the identity of the second highest bidder are protected. We model the bidders participating in the second price auction as rational, computationally bounded and privacy-sensitive parties. These are self-interested agents who care about...
Side-Channel and Fault Resistant ASCON Implementation: A Detailed Hardware Evaluation (Extended Version)
Aneesh Kandi, Anubhab Baksi, Peizhou Gan, Sylvain Guilley, Tomáš Gerlich, Jakub Breier, Anupam Chattopadhyay, Ritu Ranjan Shrivastwa, Zdeněk Martinásek, Shivam Bhasin
Implementation
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
ICICLE v2: Polynomial API for Coding ZK Provers to Run on Specialized Hardware
Karthik Inbasekar, Yuval Shekel, Michael Asa
Applications
Polynomials play a central role in cryptography. In the context of Zero Knowledge Proofs (ZKPs), protocols can be exclusively expressed using polynomials, making them a powerful abstraction tool, as demonstrated in most ZK research papers. Our first contribution is a high-level framework that enables practitioners to implement ZKPs in a more natural way, based solely on polynomial primitives.
ZK provers are considered computationally intensive algorithms with a high degree of...
Designs for practical SHE schemes based on Ring-LWR
Madalina Bolboceanu, Anamaria Costache, Erin Hales, Rachel Player, Miruna Rosca, Radu Titiu
Public-key cryptography
The Learning with Errors problem (LWE) and its variants are among the most popular assumptions underlying lattice-based cryptography. The Learning with Rounding problem (LWR) can be thought of as a deterministic variant of LWE. While lattice-based cryptography is known to enable many advanced constructions, constructing Fully Homomorphic Encryption schemes based on LWR remains an under-explored part of the literature. In this work, we present a thorough study of Somewhat Homomorphic...
CISELeaks: Information Leakage Assessment of Cryptographic Instruction Set Extension Prototypes
Aruna Jayasena, Richard Bachmann, Prabhat Mishra
Attacks and cryptanalysis
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions.
Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
Time Sharing - A Novel Approach to Low-Latency Masking
Dilip Kumar S. V., Siemen Dhooghe, Josep Balasch, Benedikt Gierlichs, Ingrid Verbauwhede
Implementation
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
Approximate CRT-Based Gadget Decomposition and Application to TFHE Blind Rotation
Olivier Bernard, Marc Joye
Implementation
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Glitch-Stopping Circuits: Hardware Secure Masking without Registers
Zhenda Zhang, Svetla Nikova, Ventzislav Nikov
Implementation
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
HAWKEYE – Recovering Symmetric Cryptography From Hardware Circuits
Gregor Leander, Christof Paar, Julian Speith, Lukas Stennes
Implementation
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
INDIANA - Verifying (Random) Probing Security through Indistinguishability Analysis
Christof Beierle, Jakob Feldtkeller, Anna Guinet, Tim Güneysu, Gregor Leander, Jan Richter-Brockmann, Pascal Sasdrich
Implementation
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process.
This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
Elliptic Curve Cryptography for the masses: Simple and fast finite field arithmetic
Michael Scott
Implementation
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
Byzantine Reliable Broadcast with One Trusted Monotonic Counter
Yackolley Amoussou-Guenou, Lionel Beltrando, Maurice Herlihy, Maria Potop-Butucaru
Foundations
Byzantine Reliable Broadcast is one of the most popular communication primitives in distributed systems. Byzantine reliable broadcast ensures that processes agree to deliver a message from an initiator even if some processes (perhaps including the initiator) are Byzantine. In asynchronous settings it is known since the prominent work of Bracha [Bracha87] that Byzantine reliable broadcast can be implemented deterministically if $n \geq 3t+1$ where $t$ is an upper bound on the...
Constant-Cost Batched Partial Decryption in Threshold Encryption
Sora Suegami, Shinsaku Ashizawa, Kyohei Shibano
Cryptographic protocols
Threshold public key encryption schemes distribute secret keys among multiple parties, known as the committee, to reduce reliance on a single trusted entity.
However, existing schemes face inefficiencies as the committee should perform computation and communication for decryption of each individual ciphertext.
As the number of ciphertexts being decrypted per unit of time increases, this can limit the number of committee parties and their decentralization due to increased hardware...
Client-Efficient Online-Offline Private Information Retrieval
Hoang-Dung Nguyen, Jorge Guajardo, Thang Hoang
Cryptographic protocols
Private Information Retrieval (PIR) permits clients to query entries from a public database hosted on untrusted servers in a privacy-preserving manner. Traditional PIR model suffers from high computation and/or bandwidth cost due to entire database processing for privacy. Recently, Online-Offline PIR (OO-PIR) has been suggested to improve the practicality of PIR, where query-independent materials are precomputed beforehand to accelerate online access. While state-of-the-art OO-PIR schemes...
Automated Generation of Fault-Resistant Circuits
Nicolai Müller, Amir Moradi
Implementation
Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with well-established and documented methods such as redundant computation, can be a...
Private Computations on Streaming Data
Vladimir Braverman, Kevin Garbe, Eli Jaffe, Rafail Ostrovsky
Cryptographic protocols
We present a framework for privacy-preserving streaming algorithms which combine the memory-efficiency of streaming algorithms with strong privacy guarantees. These algorithms enable some number of servers to compute aggregate statistics efficiently on large quantities of user data without learning the user's inputs. While there exists limited prior work that fits within our model, our work is the first to formally define a general framework, interpret existing methods within this general...
Secure Implementation of SRAM PUF for Private Key Generation
Raja Adhithan Radhakrishnan
Implementation
This paper endeavors to securely implement a Physical Unclonable
Function (PUF) for private data generation within Field-Programmable
Gate Arrays (FPGAs). SRAM PUFs are commonly utilized due to their
use of memory devices for generating secret data, particularly in resource constrained devices. However, their reliance on memory access poses side-channel threats such as data remanence decay and memory-based attacks, and the time required to generate secret data is significant. To address...
A New Hash-based Enhanced Privacy ID Signature Scheme
Liqun Chen, Changyu Dong, Nada El Kassem, Christopher J.P. Newton, Yalan Wang
Cryptographic protocols
The elliptic curve-based Enhanced Privacy ID (EPID) signature scheme is broadly used for hardware enclave attestation by many platforms that implement Intel Software Guard Extensions (SGX) and other devices. This scheme has also been included in the Trusted Platform Module (TPM) specifications and ISO/IEC standards. However, it is insecure against quantum attackers. While research into quantum-resistant EPID has resulted in several lattice-based schemes, Boneh et al. have initiated the study...
Vision Mark-32: ZK-Friendly Hash Function Over Binary Tower Fields
Tomer Ashur, Mohammad Mahzoun, Jim Posen, Danilo Šijačić
Implementation
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
A Complete Beginner Guide to the Number Theoretic Transform (NTT)
Ardianto Satriawan, Rella Mareta, Hanho Lee
Foundations
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
Fast Parallelizable Misuse-Resistant Authenticated Encryption: Low Latency (Decryption-Fast) SIV
Mustafa Khairallah
Secret-key cryptography
MRAE security is an important goal for many AEAD applications where the nonce uniqueness cannot be maintained and security risks are significant. However, MRAE schemes can be quite expensive. Two of the SoTA MRAE-secure schemes; Deoxys-II and AES-GCM-SIV rely on internal parallelism and special instructions to achieve competitive performance. However, they both suffer from the same bottleneck, they have at least one call to the underlying primitive that cannot be parallelized to any other...
Number-Theoretic Transform Architecture for Fully Homomorphic Encryption from Hypercube Topology
Jingwei Hu, Yuhong Fang, Wangchen Dai
Implementation
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes.
The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
Tight ZK CPU: Batched ZK Branching with Cost Proportional to Evaluated Instruction
Yibin Yang, David Heath, Carmit Hazay, Vladimir Kolesnikov, Muthuramakrishnan Venkitasubramaniam
Cryptographic protocols
We explore Zero-Knowledge proofs (ZKP) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the...
Fastcrypto: Pioneering Cryptography Via Continuous Benchmarking
Kostas Kryptos Chalkias, Jonas Lindstrøm, Deepak Maram, Ben Riva, Arnab Roy, Alberto Sonnino, Joy Wang
Implementation
In the rapidly evolving fields of encryption and blockchain technologies, the efficiency and security of cryptographic schemes significantly impact performance. This paper introduces a comprehensive framework for continuous benchmarking in one of the most popular cryptography Rust libraries, fastcrypto. What makes our analysis unique is the realization that automated benchmarking is not just a performance monitor and optimization tool, but it can be used for cryptanalysis and innovation...
Threshold implementations of cryptographic functions between finite Abelian groups
Enrico Piccione
Implementation
Side-channel attacks pose a significant threat to the security of cryptographic hardware implementations and Threshold Implementation (TI) is a well-established countermeasure to mitigate those attacks. In 2023, Piccione et al. proposed a general construction of (first-order) TIs that is universal for S-boxes that are bijective vectorial Boolean function (functions from a binary vector space $\mathbb{F}_{2}^n$ into itself). This paper presents a novel approach to TI by addressing a broader...
Generalized Feistel Ciphers for Efficient Prime Field Masking - Full Version
Lorenzo Grassi, Loïc Masure, Pierrick Méaux, Thorben Moos, François-Xavier Standaert
Secret-key cryptography
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Plan your defense: A comparative analysis of leakage detection methods on RISC-V cores
Konstantina Miteloudi, Asmita Adhikary, Niels van Drueten, Lejla Batina, Ileana Buhan
Applications
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms.
In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
Accelerating SLH-DSA by Two Orders of Magnitude with a Single Hash Unit
Markku-Juhani O. Saarinen
Implementation
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
Combined Threshold Implementation
Jakob Feldtkeller, Jan Richter-Brockmann, Pascal Sasdrich, Tim Güneysu
Implementation
Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of...
Leakage-Tolerant Circuits
Yuval Ishai, Yifan Song
Foundations
A leakage-resilient circuit for $f:\{0,1\}^n\to\{0,1\}^m$ is a randomized Boolean circuit $C$ mapping a randomized encoding of an input $x$ to an encoding of $y=f(x)$, such that applying any leakage function $L\in \cal L$ to the wires of $C$ reveals essentially nothing about $x$. A leakage-tolerant circuit achieves the stronger guarantee that even when $x$ and $y$ are not protected by any encoding, the output of $L$ can be simulated by applying some $L'\in \cal L$ to $x$ and $y$ alone....
Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration
Andrey Kim, Ahmet Can Mert, Anisha Mukherjee, Aikata Aikata, Maxim Deryabin, Sunmin Kwon, HyungChul Kang, Sujoy Sinha Roy
Implementation
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation.
We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Hardware Acceleration of the Prime-Factor and Rader NTT for BGV Fully Homomorphic Encryption
David Du Pont, Jonas Bertels, Furkan Turan, Michiel Van Beirendonck, Ingrid Verbauwhede
Implementation
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
HiSE: Hierarchical (Threshold) Symmetric-key Encryption
Pousali Dey, Pratyay Mukherjee, Swagata Sasmal, Rohit Sinha
Cryptographic protocols
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk...
Prime Masking vs. Faults - Exponential Security Amplification against Selected Classes of Attacks
Thorben Moos, Sayandeep Saha, François-Xavier Standaert
Implementation
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then...
HADES: Automated Hardware Design Exploration for Cryptographic Primitives
Fabian Buschkowski, Georg Land, Jan Richter-Brockmann, Pascal Sasdrich, Tim Güneysu
Implementation
While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without...
Perceived Information Revisited II: Information-Theoretical Analysis of Deep-Learning Based Side-Channel Attacks
Akira Ito, Rei Ueno, Naofumi Homma
Attacks and cryptanalysis
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
X2X: Low-Randomness and High-Throughput A2B and B2A Conversions for $d+1$ shares in Hardware
Quinten Norga, Jan-Pieter D'Anvers, Suparna Kundu, Ingrid Verbauwhede
Implementation
The conversion between arithmetic and Boolean masking representations (A2B \& B2A) is a crucial component for side-channel resistant implementations of lattice-based (post-quantum) cryptography. In this paper, we first propose novel $d$-order algorithms for the secure addition (SecADDChain$_q$) and B2A (B2X2A). Our secure adder is well-suited for repeated ('chained') executions, achieved through an improved method for repeated masked modular reduction. The optimized B2X2A gadget removes a...
Improved All-but-One Vector Commitment with Applications to Post-Quantum Signatures
Dung Bui, Kelong Cong, Cyprien Delpech de Saint Guilhem
Public-key cryptography
Post-quantum digital signature schemes have recently received increased attention due to the NIST standardization project for additional signatures. MPC-in-the-Head and VOLE-in-the-Head are general techniques for constructing such signatures from zero-knowledge proof systems. A common theme between the two is an all-but-one vector commitment scheme which internally uses GGM trees. This primitive is responsible for a significant part of the computational time during signing and...
OBSCURE: Versatile Software Obfuscation from a Lightweight Secure Element
Darius Mercadier, Viet Sang Nguyen, Matthieu Rivain, Aleksei Udovenko
Applications
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice.
This work...
SDitH in Hardware
Sanjay Deshpande, James Howe, Jakub Szefer, Dongze Yue
Implementation
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction.
This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
CrISA-X: Unleashing Performance Excellence in Lightweight Symmetric Cryptography for Extendable and Deeply Embedded Processors
Oren Ganon, Itamar Levi
Implementation
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion
Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu
Implementation
Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In...
Efficient Hardware Implementation for Maiorana-McFarland type Functions
Anupam Chattopadhyay, Subhamoy Maitra, Bimal Mandal, Manmatha Roy, Deng Tang
Secret-key cryptography
Maiorana--McFarland type constructions are basically concatenating the truth tables of linear functions on a smaller number of variables to obtain highly nonlinear ones on larger inputs. Such functions and their different variants have significant cryptology and coding theory applications. The straightforward hardware implementation of such functions using decoders (Khairallah et al., WAIFI 2018; Tang et al., SIAM Journal on Discrete Mathematics, 2019) requires exponential resources on the...
Evaluating the security of CRYSTALS-Dilithium in the quantum random oracle model
Kelsey A. Jackson, Carl A. Miller, Daochen Wang
Public-key cryptography
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have...
Barrett Multiplication for Dilithium on Embedded Devices
Vincent Hwang, YoungBeom Kim, Seog Chung Seo
Implementation
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
Holepunch: Fast, Secure File Deletion with Crash Consistency
Zachary Ratliff, Wittmann Goh, Abe Wieland, James Mickens, Ryan Williams
Cryptographic protocols
A file system provides secure deletion if, after a file is deleted, an attacker with physical possession of the storage device cannot recover any data from the deleted file. Unfortunately, secure deletion is not provided by commodity file systems. Even file systems which explicitly desire to provide secure deletion are challenged by the subtleties of hardware controllers on modern storage devices; those controllers obscure the mappings between logical blocks and physical blocks, silently...
Succinct non-interactive arguments of knowledge (SNARKs) enable a prover to produce a short and efficiently verifiable proof of the validity of an arbitrary NP statement. Recent constructions of efficient SNARKs have led to interest in using them for a wide range of applications, but unfortunately, deployment of SNARKs in these applications faces a key bottleneck: SNARK provers require a prohibitive amount of time and memory to generate proofs for even moderately large statements. While...
Folding schemes (Kothapalli et al., CRYPTO 2022) are a conceptually simple, yet powerful cryptographic primitive that can be used as a building block to realise incrementally verifiable computation (IVC) with low recursive overhead without general-purpose non-interactive succinct arguments of knowledge (SNARK). Most folding schemes known rely on the hardness of the discrete logarithm problem, and thus are both not quantum-resistant and operate over large prime fields. Existing post-quantum...
Voltage fault injection attacks are a particularly powerful threat to secure embedded devices because they exploit brief, hard-to-detect power fluctuations causing errors or bypassing security mechanisms. To counter these attacks, various detectors are employed, but as defenses strengthen, increasingly elusive glitches continue to emerge. Artificial intelligence, with its inherent ability to learn and adapt to complex patterns, presents a promising solution. This research presents an...
Fully Homomorphic Encryption (FHE) enables privacy-preserving computation but imposes significant computational and communication overhead on the client for the public-key encryption. To alleviate this burden, previous works have introduced the Hybrid Homomorphic Encryption (HHE) paradigm, which combines symmetric encryption with homomorphic decryption to enhance performance for the FHE client. While early HHE schemes focused on binary data, modern versions now support integer prime fields,...
Zero-knowledge proofs (ZKPs) are cryptographic protocols that enable one party to prove the validity of a statement without revealing the underlying data. Such proofs have applications in privacy-preserving technologies and verifiable computations. However, slow proof generation poses a significant challenge in the wide-scale adoption of ZKP. Orion is a recent ZKP scheme with linear prover time. It leverages coding theory, expander graphs, and Merkle hash trees to improve computational...
Despite the emergence of Large Language Models (LLMs) as potential tools for automating hardware design, the optimal programming language to describe hardware functions remains unknown. Prior works extensively explored optimizing Verilog-based HDL design, which often overlooked the potential capabilities of alternative programming languages for hardware designs. This paper investigates the efficacy of C++ and Verilog as input languages in extensive application space exploration, tasking an...
Implementation-security vulnerabilities such as the power-based side-channel leakage and fault-injection sensitivity of a secure chip are hard to verify because of the sophistication of the measurement setup, as well as the need to generalize the adversary into a test procedure. While the literature has proposed a wide range of vulnerability metrics to test the correctness of a secure implementation, it is still up to the subject-matter expert to map these concepts into a working and...
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery...
FHE enables computations on encrypted data, making it essential for privacy-preserving applications. However, it involves computationally demanding tasks, such as polynomial multiplication, while NTT is the state-of-the-art solution to perform this task. Most FHE schemes operate over the negacyclic ring of polynomials. We introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the...
In this work, we propose the first hardware implementation of Classic McEliece protected with countermeasures against Side-Channel Attacks (SCA) and Fault Injection Attacks (FIA). Classic Mceliece is one of the leading candidates for Key Encapsulation Mechanisms (KEMs) in the ongoing round 4 of the NIST standardization process for post-quantum cryptography. In particular, we implement a range of generic countermeasures against SCA and FIA, particularly protected the vulnerable operations...
The Multi-Scalar Multiplication (MSM) is the main barrier to accelerating Zero-Knowledge applications. In recent years, hardware acceleration of this algorithm on both FPGA and GPU has become a popular research topic and the subject of a multi-million dollar prize competition (ZPrize). This work presents OPTIMSM: Optimized Processing Through Iterative Multi-Scalar Multiplication. This novel accelerator focuses on the acceleration of the MSM algorithm for any Elliptic Curve (EC) by improving...
We present the protected hardware implementation of the Module-Lattice-Based Digital Signature Standard (ML-DSA). ML-DSA is an extension of Dilithium 3.1, which is the winner of the Post Quantum Cryptography (PQC) competition in the digital signature category. The proposed design is based on the existing high-performance Dilithium 3.1 design. We implemented existing Dilithium masking gadgets in hardware, which were only implemented in software. The masking gadgets are integrated with the...
Service discovery is essential in wireless communications. However, existing service discovery protocols provide no or very limited privacy protection for service providers and clients, and they often leak sensitive information (e.g., service type, client’s identity and mobility pattern), which leads to various network-based attacks (e.g., spoofing, man-in-the-middle, identification and tracking). In this paper, we propose a private service discovery protocol, called PriSrv, which allows a...
In aggregation queries, predicate parameters often reveal user intent. Protecting these parameters is critical for user privacy, regardless of whether the database is public or private. While most existing works focus on private data settings, we address a public data setting where the server has access to the database. Current solutions for this setting either require additional setups (e.g., noncolluding servers, hardware enclaves) or are inefficient for practical workloads. Furthermore,...
One of the main candidates of post-quantum cryptography is lattice-based cryptography. Its cryptographic security against quantum attackers is based on the worst-case hardness of lattice problems like the shortest vector problem (SVP), which asks to find the shortest non-zero vector in an integer lattice. Asymptotic quantum speedups for solving SVP are known and rely on Grover's search. However, to assess the security of lattice-based cryptography against these Grover-like quantum speedups,...
We improve the performance of lattice-based cryptosystems Dilithium on Cortex-M3 with expensive multiplications. Our contribution is two-fold: (i) We generalize Barrett multiplication and show that the resulting shape-independent modular multiplication performs comparably to long multiplication on some platforms without special hardware when precomputation is free. We call a modular multiplication “shape-independent” if its correctness and efficiency depend only on the magnitude of moduli...
An involution is a permutation that is the inverse of itself. Involutions have attracted plenty attentions in cryptographic community due to their advantage regarding hardware implementations. In this paper, we reconsider constructing {\it pseudorandom involutions}. We demonstrate two constructions. First, the 4-round Feistel network {\it using the same random function (Feistel-SF) in every round} is a pseudorandom involution. This shows the Feistel-SF construction still provides...
Masking schemes are key in thwarting side-channel attacks due to their robust theoretical foundation. Transitioning from Boolean to arithmetic (B2A) masking is a necessary step in various cryptography schemes, including hash functions, ARX-based ciphers, and lattice-based cryptography. While there exists a significant body of research focusing on B2A software implementations, studies pertaining to hardware implementations are quite limited, with the majority dedicated solely to creating...
Speed efficiency, memory optimization, and quantum resistance are essential for safeguarding the performance and security of cloud computing environments. Fully Homomorphic Encryption (FHE) addresses this need by enabling computations on encrypted data without requiring decryption, thereby maintaining data privacy. Additionally, lattice-based FHE is quantum secure, providing defense against potential quantum computer attacks. However, the performance of current FHE schemes remains...
Authenticated encryption (AE) is a cryptographic mechanism that allows communicating parties to protect the confidentiality and integrity of messages exchanged over a public channel, provided they share a secret key. In this work, we present new AE schemes leveraging the SHA-3 standard functions SHAKE128 and SHAKE256, offering 128 and 256 bits of security strength, respectively, and their “Turbo” counterparts. They support session-based communication, where a ciphertext authenticates the...
HEonGPU is a high-performance library designed to optimize Fully Homomorphic Encryption (FHE) operations on Graphics Processing Unit (GPU). By leveraging the parallel processing capac- ity of GPUs, HEonGPU significantly reduces the computational overhead typically associated with FHE by executing complex operation concurrently. This allows for faster execution of homomorphic computations on encrypted data, enabling real-time applications in privacy-preserving machine learn- ing and secure...
In scenarios where a seller holds sensitive data $x$, like employee / patient records or ecological data, and a buyer seeks to obtain an evaluation of specific function $f$ on this data, solutions in trustless digital environments like blockchain-based Web3 systems typically fall into two categories: (1) Smart contract-powered solutions and (2) cryptographic solutions leveraging tools such as adaptor signatures. The former approach offers atomic transactions where the buyer learns the...
Datasets of side-channel leakage measurements are widely used in research to develop and benchmarking side-channel attack and evaluation methodologies. Compared to using custom and/or one-off datasets, widely-used and publicly available datasets improve research reproducibility and comparability. Further, performing high-quality measurements requires specific equipment and skills, while also taking a significant amount of time. Therefore, using publicly available datasets lowers the barriers...
True Random Number Generators (TRNGs) and Physically Unclonable Functions (PUFs) are critical hardware primitives for cryptographic systems, providing randomness and device-specific security. TRNGs require complete randomness, while PUFs rely on consistent, device-unique responses. In this work, both primitives are implemented on a System-on-Chip Field-Programmable Gate Array (SoC FPGA), leveraging the integrated Phase-Locked Loops (PLLs) for robust entropy generation in PLLbased TRNGs. A...
The RISC-V Vector Cryptography Extensions (Zvk) were ratified in 2023 and integrated into the main ISA manuals in 2024. These extensions support high-speed symmetric cryptography (AES, SHA2, SM3, SM4) operating on the vector register file and offer significant performance improvements over scalar cryptography extensions (Zk) due to data parallelism. As a ratified extension, Zvk is supported by compiler toolchains and is already being integrated into popular cryptographic middleware such as...
We propose a wallet provider issued attestation called Wallet Trust Evidence (WTE) and three related specific instructions for the European Digital Identity (EUDI) Wallet cryptographic hardware, most notably the generation of a Proof of Association (PoA). These allow the EUDI Wallet providing verifiable assurance to third parties (issuers, relying parties) that attestation private keys are not only bound to conformant cryptographic hardware but also that they are bound to the same such...
Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, etc., such variations in the underlying hard problem are very useful for designers...
With the increasing spread of fake videos for misinformation, proving the provenance of an edited video (without revealing the original one) becomes critical. To this end, we introduce Eva, the first cryptographic protocol for authenticating lossy-encoded videos. Compared to previous cryptographic methods for image authentication, Eva supports significantly larger amounts of data that undergo complex transformations during encoding. We achieve this by decomposing repetitive and manageable...
AES implementation has been vastly analysed against side-channel attacks in the last two decades particularly targeting resource-constrained microcontrollers. Still, less research has been conducted on AES implementations on advanced hardware platforms. In this study, we examine the resilience of AES on an ARM Cortex A72 processor within the Raspberry Pi 4B model. Unlike their microcontroller counterparts, these platforms operate within the complex ecosystem of an operating system (OS),...
Secure elements are small microcontrollers whose main purpose is to generate/store secrets and then execute cryptographic operations. They undergo the highest level of security evaluations that exists (Common Criteria) and are often considered inviolable, even in the worst-case attack scenarios. Hence, complex secure systems build their security upon them. FIDO hardware tokens are strong authentication factors to sign in to applications (any web service supporting FIDO); they often embed...
A Trusted Execution Environment (TEE) is a new type of security technology, implemented by CPU manufacturers, which guarantees integrity and confidentiality on a restricted execution environment to any remote verifier. TEEs are deployed on various consumer and commercial hardwareplatforms, and have been widely adopted as a component in the design of cryptographic protocols both theoretical and practical. Within the provable security community, the use of TEEs as a setup assumption has...
This short note describes an update to the sca25519 library, an ECC implementation computing the X25519 key-exchange protocol on the Arm Cortex-M4 microcontroller. The sca25519 software came with extensive mitigations against various side-channel and fault attacks and was, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios. This library is protected against various passive and...
Framed within the general context of cyber-security, standard cryptographic constructions often represent an enabling technology for associated solutions. Alongside or in combination with their design, therefore, the implementation of such constructions is an important challenge: beyond delivering artefacts that are usable in practice, implementation can impact many quality metrics (such as efficiency and security) which determine fitness-for-purpose. A rich design space of implementation...
Secure multi-party computation (MPC) enables multiple distrusting parties to jointly compute a function while keeping their inputs private. Computing the AES block cipher in MPC, where the key and/or the input are secret-shared among the parties is important for various applications, particularly threshold cryptography. In this work, we propose a family of dedicated, high-performance MPC protocols to compute the non-linear S-box part of AES in the honest majority setting. Our protocols...
Side-channel attacks (SCAs) remain a significant threat to the security of cryptographic systems in modern embedded devices. Even mathematically secure cryptographic algorithms, when implemented in hardware, inadvertently leak information through physical side-channel signatures such as power consumption, electromagnetic (EM) radiation, light emissions, and acoustic emanations. Exploiting these side channels significantly reduces the attacker’s search space. In recent years, physical...
This paper presents a survey of the state-of-the-art pre-silicon security verification techniques for System-on-Chip (SoC) designs, focusing on ensuring that designs, implemented in hardware description languages (HDLs) and synthesized circuits, meet security requirements before fabrication in semiconductor foundries. Due to several factors, pre-silicon security verification has become an essential yet challenging aspect of the SoC hardware lifecycle. The modern SoC design process often...
Masking has become a widely applied and heavily researched method to protect cryptographic implementations against SCA attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through so-called probing models. Specifically, the robust $d$-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when...
The Extended Greatest Common Divisor (XGCD) computation is a critical component in various cryptographic applications and algorithms, including both pre- and post-quantum cryptosystems. In addition to computing the greatest common divisor (GCD) of two integers, the XGCD also produces Bezout coefficients $b_a$ and $b_b$ which satisfy $\mathrm{GCD}(a,b) = a\times b_a + b\times b_b$. In particular, computing the XGCD for large integers is of significant interest. Most recently, XGCD computation...
Considering side-channel analysis (SCA) security for cryptographic devices, the mitigation of electromagnetic leakage and electromagnetic interference (EMI) between modules poses significant challenges. This paper presents a comprehensive review and deep analysis of the utilization of EMI shielding materials, devised for reliability purposes and standards such as EMI/EMC, as a countermeasure to enhance EM-SCA security. We survey the current landscape of EMI-shields materials, including...
This paper introduces the Koala PRF, which maps a variable-length sequence of $64$-bit input blocks to a single $257$-bit output block. Its design focuses on achieving low latency in its implementation in ASIC. To construct Koala, we instantiate the recently introduced Kirby construction with the Koala-P permutation and add an input encoding layer. The Koala-P permutation is obtained as the $8$-fold iteration of a simple round function inspired by that of Subterranean. Based on...
Making the most of TFHE programmable bootstrapping to evaluate functions or operators otherwise challenging to perform with only the native addition and multiplication of the scheme is a very active line of research. In this paper, we systematize this approach and apply it to build an 8-bit FHE processor abstraction, i.e., a software entity that works over FHE-encrypted 8-bits data and presents itself to the programmer by means of a conventional-looking assembly instruction set. In...
The rapid evolution of post-quantum cryptography, spurred by standardization efforts such as those led by NIST, has highlighted the prominence of lattice-based cryptography, notably exemplified by CRYSTALS-Kyber. However, concerns persist regarding the security of cryptographic implementations, particularly in the face of Side-Channel Attacks (SCA). The usage of operations like the Number Theoretic Transform (NTT) in CRYSTALS-Kyber introduces vulnerabilities to SCA, especially single-trace...
This paper presents extensions to the OpenTitan hardware root of trust that aim at enabling high-performance lattice-based cryptography. We start by carefully optimizing ML-KEM and ML-DSA - the two primary algorithms selected by NIST for standardization - in software targeting the OTBN accelerator. Based on profiling results of these implementations, we propose tightly integrated extensions to OTBN, specifically an interface from OTBN to OpenTitan's Keccak accelerator (KMAC core) and...
In this paper, using the concept of equivalence of mappings we characterize all of the one-XOR matrices which are used in hardware applications and propose a family of lightweight linear mappings for software-oriented applications in symmetric cryptography. Then, we investigate interleaved linear mappings and based upon this study, we present generalized dynamic primitive LFSRs along with dynamic linear components for construction of diffusion layers. From the mathematical...
Side-channel-analysis (SCA) resistance with cost optimization in AES hardware implementations remains a significant challenge. While traditional masking-based schemes offer provable security, they often incur substantial resource overheads (latency, area, randomness, performance, power consumption). Alternatively, the RAMBAM scheme introduced a redundancy-based approach to control the signal-to-noise ratio, and achieves exponential leakage reduction as redundancy increases. This method...
In this note, we introduce the MATTER Tweakable Block Cipher, designed principally for low latency in low-area hardware implementations, but that can also be implemented in an efficient and compact way in software. MATTER is a 512-bit wide balanced Feistel network with three to six rounds, using the ASCON permutation as the round function. The Feistel network defines a keyed, non-tweakable core, which is made tweakable by using the encryption of the tweak as its key. Key and tweak are...
Resource-constrained devices such as wireless sensors and Internet of Things (IoT) devices have become ubiquitous in our digital ecosystem. These devices generate and handle a major part of our digital data. In the face of the impending threat of quantum computers on our public-key infrastructure, it is impossible to imagine the security and privacy of our digital world without integrating post-quantum cryptography (PQC) into these devices. Usually, due to the resource constraints of these...
Oblivious Transfer (OT) is a fundamental cryptographic primitive, becoming a crucial component of a practical secure protocol. OT is typically implemented in software, and one way to accelerate its running time is by using hardware implementations. However, such implementations are vulnerable to side-channel attacks (SCAs). On the other hand, protecting interactive protocols against SCA is highly challenging because of their longer secrets (which include inputs and randomness), more...
This paper proposes a fast and efficient FPGA-based hardware-software co-design for the supersingular isogeny key encapsulation (SIKE) protocol controlled by a custom RISC-V processor. Firstly, we highly optimize the core unit, the polynomial-based field arithmetic logic unit (FALU), with the proposed fast convolution-like multiplier (FCM) to significantly reduce the resource consumption while still maintaining low latency and constant time for all the four SIKE parameters. Secondly, we pack...
This paper proposes Oryx, a system for efficiently detecting cycles in federated graphs where parts of the graph are held by different parties and are private. Cycle detection is an important building block in designing fraud detection algorithms that operate on confidential transaction data held by different financial institutions. Oryx allows detecting cycles of various length while keeping the topology of the graphs secret, and it does so efficiently; Oryx achieves quasilinear...
Oblivious Transfer (OT) is at the heart of secure computation and is a foundation for many applications in cryptography. Over two decades of work have led to extremely efficient protocols for evaluating OT instances in the preprocessing model, through a paradigm called OT extension. A few OT instances generated in an offline phase can be used to perform many OTs in an online phase efficiently, i.e., with very low communication and computational overheads. Specifically, traditional OT...
We reduce the number of bit operations required to implement AES to a new minimum, and also compute improvements to elements of some other ciphers. Exploring the algebra of AES allows choices of basis and streamlining of the nonlinear parts. We also compute a more efficient implementation of the linear part of each round. Similar computational optimizations apply to other cryptographic matrices and S-boxes. This work may be incorporated into a hardware AES implementation using minimal...
Garbled circuits (GC) are a secure multiparty computation protocol that enables two parties to jointly compute a function using their private data without revealing it to each other. While garbled circuits are proven secure at the protocol level, implementations can still be vulnerable to side-channel attacks. Recently, side-channel analysis of GC implementations has garnered significant interest from researchers. We investigate popular open-source GC frameworks and discover that the AES...
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's...
In this work, we construct a second price (Vickrey) auction protocol (SPA), which does not require any auctioneers and ensures total privacy in the presence of rational parties participating in auction. In particular, the confidentiality of the highest bid and the identity of the second highest bidder are protected. We model the bidders participating in the second price auction as rational, computationally bounded and privacy-sensitive parties. These are self-interested agents who care about...
In this work, we present various hardware implementations for the lightweight cipher ASCON, which was recently selected as the winner of the NIST organized Lightweight Cryptography (LWC) competition. We cover encryption + tag generation and decryption + tag verification for the ASCON AEAD and also the ASCON hash function. On top of the usual (unprotected) implementation, we present side-channel protection (threshold countermeasure) and triplication/majority-based fault protection. To the...
Polynomials play a central role in cryptography. In the context of Zero Knowledge Proofs (ZKPs), protocols can be exclusively expressed using polynomials, making them a powerful abstraction tool, as demonstrated in most ZK research papers. Our first contribution is a high-level framework that enables practitioners to implement ZKPs in a more natural way, based solely on polynomial primitives. ZK provers are considered computationally intensive algorithms with a high degree of...
The Learning with Errors problem (LWE) and its variants are among the most popular assumptions underlying lattice-based cryptography. The Learning with Rounding problem (LWR) can be thought of as a deterministic variant of LWE. While lattice-based cryptography is known to enable many advanced constructions, constructing Fully Homomorphic Encryption schemes based on LWR remains an under-explored part of the literature. In this work, we present a thorough study of Somewhat Homomorphic...
Software based cryptographic implementations provide flexibility but they face performance limitations. In contrast, hardware based cryptographic accelerators utilize application-specific customization to provide real-time security solutions. Cryptographic instruction-set extensions (CISE) combine the advantages of both hardware and software based solutions to provide higher performance combined with the flexibility of atomic-level cryptographic operations. While CISE is widely used to...
We present a novel approach to small area and low-latency first-order masking in hardware. The core idea is to separate the processing of shares in time in order to achieve non-completeness. Resulting circuits are proven first-order glitch-extended PINI secure. This means the method can be straightforwardly applied to mask arbitrary functions without constraints which the designer must take care of. Furthermore we show that an implementation can benefit from optimization through EDA tools...
One of the main issues to deal with for fully homomorphic encryption is the noise growth when operating on ciphertexts. To some extent, this can be controlled thanks to a so-called gadget decomposition. A gadget decomposition typically relies on radix- or CRT-based representations to split elements as vectors of smaller chunks whose inner products with the corresponding gadget vector rebuilds (an approximation of) the original elements. Radix-based gadget decompositions present the advantage...
Masking is one of the most popular countermeasures to protect implementations against power and electromagnetic side channel attacks, because it offers provable security. Masking has been shown secure against d-threshold probing adversaries by Ishai et al. at CRYPTO'03, but this adversary's model doesn't consider any physical hardware defaults and thus such masking schemes were shown to be still vulnerable when implemented as hardware circuits. To addressed these limitations glitch-extended...
We present the first comprehensive approach for detecting and analyzing symmetric cryptographic primitives in gate-level descriptions of hardware. To capture both ASICs and FPGAs, we model the hardware as a directed graph, where gates become nodes and wires become edges. For modern chips, those graphs can easily consist of hundreds of thousands of nodes. More abstractly, we find subgraphs corresponding to cryptographic primitives in a potentially huge graph, the sea-of-gates, describing an...
Despite masking being a prevalent protection against passive side-channel attacks, implementing it securely in hardware remains a manual, challenging, and error-prone process. This paper introduces INDIANA, a comprehensive security verification tool for hardware masking. It provides a hardware verification framework, enabling a complete analysis of simulation-based security in the glitch-extended probing model, with cycle-accurate estimations for leakage probabilities in the random...
Shaped prime moduli are often considered for use in elliptic curve and isogeny-based cryptography to allow for faster modular reduction. Here we focus on the most common choices for shaped primes that have been suggested, that is pseudo-Mersenne, generalized Mersenne and Montgomery-friendly primes. We consider how best to to exploit these shapes for maximum efficiency, and provide an open source tool to automatically generate, test and time working high-level language finite-field code....
Byzantine Reliable Broadcast is one of the most popular communication primitives in distributed systems. Byzantine reliable broadcast ensures that processes agree to deliver a message from an initiator even if some processes (perhaps including the initiator) are Byzantine. In asynchronous settings it is known since the prominent work of Bracha [Bracha87] that Byzantine reliable broadcast can be implemented deterministically if $n \geq 3t+1$ where $t$ is an upper bound on the...
Threshold public key encryption schemes distribute secret keys among multiple parties, known as the committee, to reduce reliance on a single trusted entity. However, existing schemes face inefficiencies as the committee should perform computation and communication for decryption of each individual ciphertext. As the number of ciphertexts being decrypted per unit of time increases, this can limit the number of committee parties and their decentralization due to increased hardware...
Private Information Retrieval (PIR) permits clients to query entries from a public database hosted on untrusted servers in a privacy-preserving manner. Traditional PIR model suffers from high computation and/or bandwidth cost due to entire database processing for privacy. Recently, Online-Offline PIR (OO-PIR) has been suggested to improve the practicality of PIR, where query-independent materials are precomputed beforehand to accelerate online access. While state-of-the-art OO-PIR schemes...
Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with well-established and documented methods such as redundant computation, can be a...
We present a framework for privacy-preserving streaming algorithms which combine the memory-efficiency of streaming algorithms with strong privacy guarantees. These algorithms enable some number of servers to compute aggregate statistics efficiently on large quantities of user data without learning the user's inputs. While there exists limited prior work that fits within our model, our work is the first to formally define a general framework, interpret existing methods within this general...
This paper endeavors to securely implement a Physical Unclonable Function (PUF) for private data generation within Field-Programmable Gate Arrays (FPGAs). SRAM PUFs are commonly utilized due to their use of memory devices for generating secret data, particularly in resource constrained devices. However, their reliance on memory access poses side-channel threats such as data remanence decay and memory-based attacks, and the time required to generate secret data is significant. To address...
The elliptic curve-based Enhanced Privacy ID (EPID) signature scheme is broadly used for hardware enclave attestation by many platforms that implement Intel Software Guard Extensions (SGX) and other devices. This scheme has also been included in the Trusted Platform Module (TPM) specifications and ISO/IEC standards. However, it is insecure against quantum attackers. While research into quantum-resistant EPID has resulted in several lattice-based schemes, Boneh et al. have initiated the study...
Zero-knowledge proof systems are widely used in different applications on the Internet. Among zero-knowledge proof systems, SNARKs are a popular choice because of their fast verification time and small proof size. The efficiency of zero-knowledge systems is crucial for usability, resulting in the development of so-called arithmetization-oriented ciphers. In this work, we introduce Vision Mark-32, a modified instance of Vision defined over binary tower fields, with an optimized number of...
The Number Theoretic Transform (NTT) is a powerful mathematical tool that has become increasingly important in developing Post Quantum Cryptography (PQC) and Homomorphic Encryption (HE). Its ability to efficiently calculate polynomial multiplication using the convolution theorem with a quasi-linear complexity $O(n \log{n})$ instead of $O(n^2)$ when implemented with Fast Fourier Transform-style algorithms has made it a key component in modern cryptography. FFT-style NTT algorithm or fast-NTT...
MRAE security is an important goal for many AEAD applications where the nonce uniqueness cannot be maintained and security risks are significant. However, MRAE schemes can be quite expensive. Two of the SoTA MRAE-secure schemes; Deoxys-II and AES-GCM-SIV rely on internal parallelism and special instructions to achieve competitive performance. However, they both suffer from the same bottleneck, they have at least one call to the underlying primitive that cannot be parallelized to any other...
This paper introduces a high-performance and scalable hardware architecture designed for the Number-Theoretic Transform (NTT), a fundamental component extensively utilized in lattice-based encryption and fully homomorphic encryption schemes. The underlying rationale behind this research is to harness the advantages of the hypercube topology. This topology serves to significantly diminish the volume of data exchanges required during each iteration of the NTT, reducing it to a complexity of...
We explore Zero-Knowledge proofs (ZKP) of statements expressed as programs written in high-level languages, e.g., C or assembly. At the core of executing such programs in ZK is the repeated evaluation of a CPU step, achieved by branching over the CPU’s instruction set. This approach is general and covers traversal-execution of a program’s control flow graph (CFG): here CPU instructions are straight-line program fragments (of various sizes) associated with the CFG nodes. This highlights the...
In the rapidly evolving fields of encryption and blockchain technologies, the efficiency and security of cryptographic schemes significantly impact performance. This paper introduces a comprehensive framework for continuous benchmarking in one of the most popular cryptography Rust libraries, fastcrypto. What makes our analysis unique is the realization that automated benchmarking is not just a performance monitor and optimization tool, but it can be used for cryptanalysis and innovation...
Side-channel attacks pose a significant threat to the security of cryptographic hardware implementations and Threshold Implementation (TI) is a well-established countermeasure to mitigate those attacks. In 2023, Piccione et al. proposed a general construction of (first-order) TIs that is universal for S-boxes that are bijective vectorial Boolean function (functions from a binary vector space $\mathbb{F}_{2}^n$ into itself). This paper presents a novel approach to TI by addressing a broader...
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many...
Hardening microprocessors against side-channel attacks is a critical aspect of ensuring their security. A key step in this process is identifying and mitigating “leaky" hardware modules, which leak information during the execution of cryptographic algorithms. In this paper, we explore how different leakage detection methods, the Side-channel Vulnerability Factor (SVF) and the Test Vector Leakage Assessment (TVLA), contribute to hardening of microprocessors. We conduct experiments on two...
We report on efficient and secure hardware implementation techniques for the FIPS 205 SLH-DSA Hash-Based Signature Standard. We demonstrate that very significant overall performance gains can be obtained from hardware that optimizes the padding formats and iterative hashing processes specific to SLH-DSA. A prototype implementation, SLotH, contains Keccak/SHAKE, SHA2-256, and SHA2-512 cores and supports all 12 parameter sets of SLH-DSA. SLotH also supports side-channel secure PRF computation...
Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of...
A leakage-resilient circuit for $f:\{0,1\}^n\to\{0,1\}^m$ is a randomized Boolean circuit $C$ mapping a randomized encoding of an input $x$ to an encoding of $y=f(x)$, such that applying any leakage function $L\in \cal L$ to the wires of $C$ reveals essentially nothing about $x$. A leakage-tolerant circuit achieves the stronger guarantee that even when $x$ and $y$ are not protected by any encoding, the output of $L$ can be simulated by applying some $L'\in \cal L$ to $x$ and $y$ alone....
Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a multiplier-less number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation. We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient...
Fully Homomorphic Encryption (FHE) enables computation on encrypted data, holding immense potential for enhancing data privacy and security in various applications. Presently, FHE adoption is hindered by slow computation times, caused by data being encrypted into large polynomials. Optimized FHE libraries and hardware acceleration are emerging to tackle this performance bottleneck. Often, these libraries implement the Number Theoretic Transform (NTT) algorithm for efficient polynomial...
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk...
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then...
While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances is still a mostly manual, tedious, and intuition-driven process. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without...
Previous studies on deep-learning-based side-channel attacks (DL-SCAs) have shown that traditional performance evaluation metrics commonly used in DL, like accuracy and F1 score, are not effective in evaluating DL-SCA performance. Therefore, some previous studies have proposed new alternative metrics for evaluating the performance of DL-SCAs. Notably, perceived information (PI) and effective perceived information (EPI) are major metrics based on information theory. While it has been...
The conversion between arithmetic and Boolean masking representations (A2B \& B2A) is a crucial component for side-channel resistant implementations of lattice-based (post-quantum) cryptography. In this paper, we first propose novel $d$-order algorithms for the secure addition (SecADDChain$_q$) and B2A (B2X2A). Our secure adder is well-suited for repeated ('chained') executions, achieved through an improved method for repeated masked modular reduction. The optimized B2X2A gadget removes a...
Post-quantum digital signature schemes have recently received increased attention due to the NIST standardization project for additional signatures. MPC-in-the-Head and VOLE-in-the-Head are general techniques for constructing such signatures from zero-knowledge proof systems. A common theme between the two is an all-but-one vector commitment scheme which internally uses GGM trees. This primitive is responsible for a significant part of the computational time during signing and...
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice. This work...
This work presents the first hardware realisation of the Syndrome-Decoding-in-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH's hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for...
The selection of a Lightweight Cryptography (LWC) algorithm is crucial for resource limited applications. The National Institute of Standards and Technology (NIST) leads this process, which involves a thorough evaluation of the algorithms’ cryptanalytic strength. Furthermore, careful consideration is given to factors such as algorithm latency, code size, and hardware implementation area. These factors are critical in determining the overall performance of cryptographic solutions at edge...
Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In...
Maiorana--McFarland type constructions are basically concatenating the truth tables of linear functions on a smaller number of variables to obtain highly nonlinear ones on larger inputs. Such functions and their different variants have significant cryptology and coding theory applications. The straightforward hardware implementation of such functions using decoders (Khairallah et al., WAIFI 2018; Tang et al., SIAM Journal on Discrete Mathematics, 2019) requires exponential resources on the...
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have...
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett...
A file system provides secure deletion if, after a file is deleted, an attacker with physical possession of the storage device cannot recover any data from the deleted file. Unfortunately, secure deletion is not provided by commodity file systems. Even file systems which explicitly desire to provide secure deletion are challenged by the subtleties of hardware controllers on modern storage devices; those controllers obscure the mappings between logical blocks and physical blocks, silently...