\stackMath

DLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E:A new Security Evaluation Tool for Deep Learning Based Watermarking Techniques

Sudev Kumar Padhi
Indian Institute of Technology
   Bhilai
Durg
   Chattisgarh    491002
[email protected]
   Dr. Sk. Subidh Ali
Indian Institute of Technology
   Bhilai
Durg
   Chattisgarh    491002
[email protected]
Abstract

Recent developments in Deep Neural Network (DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N) based watermarking techniques have shown remarkable performance. The state-of-the-artDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based techniques not only surpass the robustness of classical watermarking techniques but also show their robustness against many image manipulation techniques. In this paper, we performed a detailed security analysis of differentDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques. We propose a new class of attack called the Deep Learning-based OVErwriting (DLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E) attack, which leverages adversarial machine learning and overwrites the original embedded watermark with a targeted watermark in a watermarked image. To the best of our knowledge, this attack is the first of its kind. To show adaptability and efficiency, we launch ourDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack analysis on four different watermarking techniques,HiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k,PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G,andHiding Images in an Image.All these techniques use different approaches to create imperceptible watermarked images. Our attack analysis on these watermarking techniques with various constraints highlights the vulnerabilities ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking. Extensive experimental results validate the capabilities ofDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E.We proposeDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eas a benchmark security analysis tool to test the robustness of future deep learning-based watermarking techniques.

Keywords:
Deep Learning Adversarial Machine Learning (AML𝐴𝑀𝐿AMLitalic_A italic_M italic_L) Digital Watermarking.

1Introduction

Digital watermarking is a well-known technique where the watermark (message or image) is embedded covertly or overtly into a cover image without distorting the quality of the cover image[25,8,9,42,7].It has various critical applications, such as copyright protection, content authentication, tamper detection, data hiding, etc. In watermarking, the sender embeds the watermark into the cover image and sends the watermarked image to the receiver or verifier. To validate the authenticity or copyright, the watermark from the received watermarked image is extracted and compared with the original watermark, which is provided to the receiver or verifier in advance. Generally, watermarking techniques consist of two processes: watermark embedding and watermark extraction. In watermark embedding, the watermark is embedded into the input cover image to produce a watermarked image. While in the watermark extraction process, the watermark is extracted from the watermarked image and compared with the original watermark to validate the ownership or authenticity of the cover image. One of the popular watermarking techniques is invisible watermarking, where the watermark is covertly embedded in the cover image. The security of any invisible watermarking techniques lies in the secrecy of the embedded watermark, such that the watermarked image should be perceptually similar to the cover image and should not contain any detectable artifact.

The classical watermarking techniques use a wide variety of embedding approaches from the spatial and frequency domains[50,40,47,48,45,5,28,29].Recently, deep learning has emerged as the key enabler ofAI𝐴𝐼AIitalic_A italic_Iapplications. Thus, there has been an increase in deep learning techniques using Deep Neural Networks (DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N) for different tasks due to their adaptability in various applications. It is also being utilized in the domain of watermarking techniques, which has resulted in significant improvements in performance and efficiency compared to traditional techniques[4].InDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques, the watermark embedding and extraction processes are implemented using deep generative networks, such as autoencoders and Generative Adversarial Networks (GAN𝐺𝐴𝑁GANitalic_G italic_A italic_N). The pioneeringDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking technique proposed in[4]can hide anRGB𝑅𝐺𝐵RGBitalic_R italic_G italic_Bimage within anotherRGB𝑅𝐺𝐵RGBitalic_R italic_G italic_Bimage using an autoencoder network.DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking was further enhanced by introducing distortion into the training data to make the watermarked images robust against certain noises[44,21].These simple autoencoder-based techniques are vulnerable to Deep Learning based Removal (DLR𝐷𝐿𝑅DLRitalic_D italic_L italic_R) attacks[20,6,24].There are different types ofDLR𝐷𝐿𝑅DLRitalic_D italic_L italic_Rattacks. In one of the approaches, the attacker trains a denoising autoencoder to remove the watermark from the watermarked image as noise[6].In another approach, the pixel distribution of the watermarked image is used to identify the distorted pixels for removing the watermark[20].Pixel impainting technique is also utilized to remove the watermark from the watermarked image[24].In this line, the watermarking technique proposed in[53,51,19,26]is considered to be robust againstDLR𝐷𝐿𝑅DLRitalic_D italic_L italic_Rattacks due to the presence of noise layers in their model architectures. Among these, the most popular technique isHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N[53],which can withstand arbitrary types of image distortion and makes robust watermarked images.PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G[12]went one step ahead by introducing screen-shooting robustness such that the watermark can be extracted even if the digital image is captured with a camera. This robustness is achieved by introducing a mask-guided loss in the training pipeline of the watermarking technique. Similarly,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k[2]uses residual structure to embed the watermark, striking a balance between robustness and impermeability.

Please note thatDLR𝐷𝐿𝑅DLRitalic_D italic_L italic_Rattacks are useful for limited applications where the attacker’s objective is just to fail the ownership claim of the actual owner of the cover image. The attacker cannot claim ownership of the cover image usingDLR𝐷𝐿𝑅DLRitalic_D italic_L italic_Rattacks. In order to claim ownership of a cover image, the attacker has to overwrite the original watermark of a given watermarked image with the attacker’s watermark such that the watermark extraction process should extract the attacker’s watermark from the watermarked image instead of the original watermark. There is no doubt that classical watermark overwriting attacks will not work onDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking technique techniques[39,38].It requires a Deep Learning based OVErwriting (DLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E) attack. However, there is hardly any work in the open literature related to theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. In regular deep learning applications, similar attacks are common, which are known as Adversarial Machine Learning (AML𝐴𝑀𝐿AMLitalic_A italic_M italic_L) attacks[43,13,37,34].In targetedAML𝐴𝑀𝐿AMLitalic_A italic_M italic_L,the attacker induces a well-crafted perturbation into the input image such that the model used for classification not only fails to classify it but is also forced to misclassify it into a target class as desired by the attacker. We can intuitively consider that the attacker is overwriting the features of the original class in the input image with the features of the target class. Inspired by targetedAML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lattacks, for the first time, we developed theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack againstDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques.

In this paper, we perform a security analysis ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques using theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. Here, the robustness of theseDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques is verified against well-crafted perturbations where the final goal is to overwrite the embedded watermark with the desired watermark. The attack is targeted for the real-world scenario where the watermarking techniques are used to perform copyright protection. To show the adaptability and efficiency, we launch ourDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack on four different watermarking techniques, which areHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k[2],PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G[12],andHiding Images in an Image[4].All these techniques use different approaches to create imperceptible watermarked images. Devising a common approach to attack these techniques with various constraints highlights the vulnerabilities ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking.

The paper makes the following key contributions:

  1. 1.

    We are the first to proposeDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E,a watermarking overwriting attack based on the concept of targetedAML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lto overwrite the embedded watermark with the target watermark by adding well-crafted perturbation to the watermarked images.

  2. 2.

    We introduce a new class of attack solely using the knowledge available whenDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques are used for copyright protection.

  3. 3.

    A detailed experimental result is provided to validate the success of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. The results demonstrate that theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack generalizes well on differentDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques.

2Related Works

2.1Deep learning based Watermarking

Recently, manyDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques have been proposed, which surpass the performance of traditional watermarking by utilizing the efficient feature extraction ability of the neural networks. The main architecture used inDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking involves the use of an encoder network that embeds the watermark into the cover image and a decoder network that extracts the watermark from the watermarked image.DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking can embed an image or bit string as a watermark but most techniques choose to embed a bit strings. Bit strings work as metadata and provide more robustness compared to embedding images as a watermark. This is due to the fact that embedding an image requires the decoder to learn the spatial information of the watermark, which can hamper robustness. The training of the encoder and decoder is done in an end-to-end manner as a pipeline[4,21,11].To further enhance the quality and robustness of the watermarked image, a discriminator is added in the pipeline while training and noise layers are added in the model architecture of theDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking[53,26,12].The discriminator acts as an adversary network, which predicts whether the watermark is embedded in an image. Residual connections and layers of random combinations of a fixed set of distortions are also used in some model architectures to make the watermarking technique more robust with high data hiding capacity[2,30].Almost all of theseDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based methods achieve great performance in terms of image quality. Generally, when we consider robustness in watermarking, it refers to handling distortion that exists in image processing, such asJPEG𝐽𝑃𝐸𝐺JPEGitalic_J italic_P italic_E italic_Gcompression, blurring, noises, crop out, etc. There is hardly any analysis that aims to find the vulnerability ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques against theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattacks.

2.2Adversarial Machine Learning

AML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lattacks have the capability to fail highly accurate machine learning models[43,13,37,34] by adding a well-crafted perturbation into the input image. These attacks are majorly developed to fail deep convolution neural network-based classifiers. TransferableAML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lattacks are also developed[27,52,35,36]such that a perturbation crafted to fail one model can also be used to fail other models that perform a similar task even if the attacker has no access to the second model’s parameters or architecture. InAML𝐴𝑀𝐿AMLitalic_A italic_M italic_L,knowledge of the attacker is assumed to be either white-box (complete knowledge of the target model architecture, its parameters and training data)[13,31,33]or black-box access ( limited or no knowledge to the target model)[15,17].In a white-box attack, the attacker can craft adversarial examples by directly manipulating the input data to maximize the model’s loss or misclassification using the model parameters. While in a black-box attack, despite lacking internal knowledge of the model, the attackers can still generate adversarial examples by exploiting the model’s response to input queries. These queries can be carefully chosen such that by observing its outputs, information about the model can be inferred, and adversarial examples can be crafted accordingly using the transferability ofAML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lattacks.

AML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lis a great tool for the designer of deep convolution neural network-based classifiers to test the robustness of their classifiers. There is a lack of such tools in the domain ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques. In this paper, we tried to overcome this lacuna by introducing theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack, which can be an interesting tool for the designers ofDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques. Our approach is inspired by targetedAML𝐴𝑀𝐿AMLitalic_A italic_M italic_Lattacks. The objective of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack is to craft a new watermark, which, once added to the watermarked image, will force the watermark decoder to decode the new watermark instead of the original watermark. We demonstrate theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack in the white box as well as black box settings.

3Threat Model

3.1Attackers Goals

Copyright protection is one of the most important use cases of watermarking through which the owner of digital content can claim its rights. An attacker can violate the copyright protection either by corrupting/cleaning the embedded watermark in the image so that the decoder cannot decode the watermark from the watermarked image (Objective 1) or by overwriting the embedded watermark present in the watermarked image with the target watermark so that the decoder will decode the target watermark instead of the original watermark from the watermarked image (Objective 2). In either case, the objective is to defeat the techniques of watermarking.

Refer to caption
Figure 1:Overview of the proposedDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack leveraging Adversarial Machine Learning to a create well-crafted perturbation to overwrite the original watermark with the target watermark.

3.2Attackers Knowledge

Before going into the details of the attack, we make the following assumptions about the attacker’s knowledge:
Training Data: The attacker has no knowledge of the training data used to train theDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking model in both variants of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack (white box and black box settings). This included both the watermark and the cover image.
Network Architecture:The architecture of the encoder network is not known to the attacker in both black and white box settings. In the white-box variant of the attack, it is assumed that the attacker has knowledge of the decoder network architecture and its parameters. The same is not valid for the black-box setting of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack.

The black box setting of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack is more practical and useful in professional applications of watermark[32,14,1,46],where the watermarking technique is available as a service (API𝐴𝑃𝐼APIitalic_A italic_P italic_I) to verify the digital content. In such a scenario, the attacker can subscribe to the service and get Oracle access to both the encoder and decoder through itsAPI𝐴𝑃𝐼APIitalic_A italic_P italic_I.Nevertheless, there is a limit on the number query to theAPI𝐴𝑃𝐼APIitalic_A italic_P italic_I.However, in stringent secure application scenarios, even Oracle access to the decoder is infeasible for the attacker as it remains under the possession of the verifier only. TheDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack considers these stringent security assumptions in the black-box setting.

3.3Scenario

LetAlice𝐴𝑙𝑖𝑐𝑒Aliceitalic_A italic_l italic_i italic_c italic_ebe a digital artist who creates digital paintings. She wants to protect her digital paintings (copyright) from unauthorized use and distribution.Alice𝐴𝑙𝑖𝑐𝑒Aliceitalic_A italic_l italic_i italic_c italic_eusesDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based invisible watermarking as it protects the copyright of the painting and also preserves its aesthetic appeal. The watermarking technique subscribed by Alice uses the logos of the artist as the watermark. Thus, Alice embeds the logo of her website into her digital paintings (cover image). For verification, the verifier needs to find the presence of a watermark, extract it, and verify the owner of the digital painting. In copyright protection, similar information that forms the metadata of the digital content for different owners is used as a watermark (in this case, it is a logo). This is to make sure that the verifier can verify with consistent information. Now, there is an attacker,Eve𝐸𝑣𝑒Eveitalic_E italic_v italic_e,who has also subscribed to the same watermarking technique used by Alice. Thus, she knows that the digital paintings of Alice are copyright-protected with the logo of Alice’s website.Eve𝐸𝑣𝑒Eveitalic_E italic_v italic_ecan clean or overwrite the watermark with a target watermark containing a different logo in the watermarked image and recirculate it. By achievingObjective 1,Eve𝐸𝑣𝑒Eveitalic_E italic_v italic_ecan only remove the watermark from the digital painting. While achievingObjective 2,Eve𝐸𝑣𝑒Eveitalic_E italic_v italic_enot only removes the watermark but also makes herself the digital painting owner by embedding her logo into it.Alice𝐴𝑙𝑖𝑐𝑒Aliceitalic_A italic_l italic_i italic_c italic_ecannot prove that the digital painting belongs to her as the decoder decodes the logo, which belongs toEve𝐸𝑣𝑒Eveitalic_E italic_v italic_e.This scenario is depicted in Figure1,where the decoder decodes the target watermark instead of the original watermark when the well-crafted perturbation is added to the watermarked image. Thus, the verifier will announce that the digital paintings belong toEve𝐸𝑣𝑒Eveitalic_E italic_v italic_e.

4Proposed Approach

4.1Formal description

DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques consist of an encoder and a decoder. The encoderE𝐸Eitalic_Eproduces a watermarked imageW𝑊Witalic_Wby embedding the watermarkα𝛼\ Alphaitalic_αinto the cover imageI𝐼Iitalic_Ias shown in Eq. (1). In contrast, the decoderD𝐷Ditalic_DtakesW𝑊Witalic_Was input and extracts the embedded watermarkα𝛼\ Alphaitalic_αas the output, as shown in Eq. (2). The attacker’s aim is to launch theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack to foolD𝐷Ditalic_Dby inducing adversarial perturbationδ𝛿\deltaitalic_δinW𝑊Witalic_Wsuch thatD𝐷Ditalic_Ddecodes the target watermarkβ𝛽\betaitalic_βinstead of the original embedded watermarkα𝛼\ Alphaitalic_αas shown in Eq. (3).

E(I+α)W𝐸𝐼𝛼𝑊\displaystyle E(I+\ Alpha )\rightarrow Witalic_E ( italic_I + italic_α ) → italic_W (1)
D(W)α𝐷𝑊𝛼\displaystyle D(W)\rightarrow\ Alphaitalic_D ( italic_W ) → italic_α (2)
D(W+δ)β𝐷𝑊𝛿𝛽\displaystyle D(W+\delta)\rightarrow\betaitalic_D ( italic_W + italic_δ ) → italic_β (3)

4.1.1White-Box Access:

Having white-box access to the decoder gives the attacker enough information to simulate the network by devising a targetted adversarial attack and using the gradients of the decoder to create the desired perturbationδ𝛿\deltaitalic_δ,whereα𝛼\ Alphaitalic_αis the original watermark,β𝛽\betaitalic_βis the target watermark andϵitalic-ϵ\epsilonitalic_ϵis the perturbation limit. We minimize the loss (l𝑙litalic_l) ofβ𝛽\betaitalic_β,which is the target watermark while maximizing the loss ofα𝛼\ Alphaitalic_α,which is the original watermark, i.e. we solve the optimization problem as shown in Eq. (4). This is the easiest approach but does not align with the use cases of watermarking, where access to the decoder is not allowed.

\stackunderminimizeδ{l(D(W+δ),β)l(D(W+δ),α)},δ[ϵ,ϵ]\stackunder𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝛿𝑙𝐷𝑊𝛿𝛽𝑙𝐷𝑊𝛿𝛼𝛿italic-ϵitalic-ϵ\stackunder{minimize}{\delta}\{l(D(W+\delta),\beta)-l(D(W+\delta),\ Alpha )\},% \quad\delta\in[-\epsilon,\epsilon]italic_m italic_i italic_n italic_i italic_m italic_i italic_z italic_e italic_δ { italic_l ( italic_D ( italic_W + italic_δ ), italic_β ) - italic_l ( italic_D ( italic_W + italic_δ ), italic_α ) }, italic_δ ∈ [ - italic_ϵ, italic_ϵ ] (4)
Refer to caption
Figure 2:Overview of surrogate model attacka)Training the surrogate model using surrogate datasetb)Fine-tuning the surrogate decoder with the watermarked image of the targetDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniquec)Attacking the decoder of the targetDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking technique after generating the well-crafted perturbation from the surrogate decoder.

4.1.2Black-Box Access:

If the attacker has the ability to use the decoder as an oracle, it can obtain a set of watermarked images and their watermarks by querying the decoder with watermarked images. Once this data set is available, the attacker can train a surrogate decoder. Afterwards, a white-box attack is performed on the surrogate decoder to craft the desired perturbationδ𝛿\deltaitalic_δ,which is used to launch theDOVE𝐷𝑂𝑉𝐸DOVEitalic_D italic_O italic_V italic_Eattack onD𝐷Ditalic_D.However, in stringent security applications, even the decoder is not available. Therefore, we consider only having limited instances of watermarked images whose watermarks are known. One of the easiest ways for the attacker to gain access to such data is to request the subscribed copyright protection service provider to copyright on, say,n𝑛nitalic_npairs of cover images and their watermarks.

Under this scenario, the attacker will train its ownDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based surrogate watermarking encoder (Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) and decoder (Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) models with its own dataset (also known as the surrogate dataset)i.e.formulae-sequence𝑖𝑒i.e.italic_i. italic_e.,a set of cover images and their watermarks. Once the surrogate model is trained,Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTis fine-tuned with the limited instances of watermarked images available from the target decoderD𝐷Ditalic_Dto be attacked. While fine-tuning, loss between the extracted watermark and the original watermark is used for the training of the surrogate decoder, as shown in Eq (5). The surrogate decoder is trained and fine-tuned to act as the target decoderD𝐷Ditalic_D,making the black-box attack transferable. Therefore, the attacker can launch a white-boxDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack onDsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTusing the gradient information to craft the desired perturbationδ𝛿\deltaitalic_δas shown in Eq. (6). The sameδ𝛿\deltaitalic_δcan be used to failD𝐷Ditalic_Dwhen added withW𝑊Witalic_W(Eq. (7)). The valueϵitalic-ϵ\epsilonitalic_ϵis chosen judiciously such that the induced perturbation (δ𝛿\deltaitalic_δ) to the watermarked image is imperceptible. Figure2refers to the training procedure of the surrogate model and fine-tuning of the surrogate decoder to perform an attack on the target decoder.

minimize{l(D(W),α)}𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑙superscript𝐷𝑊𝛼{minimize}\{l(D^{\prime}(W),\ Alpha )\}italic_m italic_i italic_n italic_i italic_m italic_i italic_z italic_e { italic_l ( italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_W ), italic_α ) } (5)
\stackunderminimizeδ{l(D(W+δ),β)},δ[ϵ,ϵ]\stackunder𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝛿𝑙superscript𝐷𝑊𝛿𝛽𝛿italic-ϵitalic-ϵ\stackunder{minimize}{\delta}\{l(D^{\prime}(W+\delta),\beta)\},\quad\delta\in[% -\epsilon,\epsilon]italic_m italic_i italic_n italic_i italic_m italic_i italic_z italic_e italic_δ { italic_l ( italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_W + italic_δ ), italic_β ) }, italic_δ ∈ [ - italic_ϵ, italic_ϵ ] (6)
D(W+δ)β𝐷𝑊𝛿𝛽\ D(W+\delta)\rightarrow\beta\italic_D ( italic_W + italic_δ ) → italic_β (7)

4.2Crafting algorithm

The adversarial perturbation crafting algorithm is shown in Algo1.The algorithm shows how to craft a perturbation using theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack in a white box scenario. Inputs to the algorithm are, a watermarked imageW𝑊Witalic_W,the target decoderD𝐷Ditalic_D,the target watermarkβ𝛽\betaitalic_β,a perturbationδ𝛿\deltaitalic_δ(initialized as zero) with the same size (k×k𝑘𝑘k\times kitalic_k × italic_k) asW𝑊Witalic_W,a limiting rangeϵitalic-ϵ\epsilonitalic_ϵofδ𝛿\deltaitalic_δ(-ϵitalic-ϵ\epsilonitalic_ϵ\leqδ𝛿\deltaitalic_δ\leqϵitalic-ϵ\epsilonitalic_ϵ).δ𝛿\deltaitalic_δis added withW𝑊Witalic_Wand passed into the decoderD𝐷Ditalic_D,which decodes the secret asγ𝛾\gammaitalic_γ.The loss betweenγ𝛾\gammaitalic_γandβ𝛽\betaitalic_βis computed using the chosen loss functionl𝑙litalic_l.In each iteration of the loop, the optimizer tries to minimize the loss betweenγ𝛾\gammaitalic_γandβ𝛽\betaitalic_βand maximize the loss betweenβ𝛽\betaitalic_βandα𝛼\ Alphaitalic_α.Accordingly,ΔΔ\Deltaroman_Δis updated. This process is repeated until the model converges and the desiredδ𝛿\deltaitalic_δis obtained, which is the realization of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack onW𝑊Witalic_Wto overwriteα𝛼\ Alphaitalic_αwithβ𝛽\betaitalic_β.

Algorithm 1Adversarial perturbation crafting algorithm
1:W𝑊Witalic_W,D𝐷{D}italic_D,β𝛽\mathbf{\beta}italic_β,δ𝛿\deltaitalic_δ,ϵitalic-ϵ\epsilonitalic_ϵ
2:δ[0]k×k𝛿subscriptdelimited-[]0𝑘𝑘\delta\leftarrow[0]_{k\times k}italic_δ ← [ 0 ] start_POSTSUBSCRIPT italic_k × italic_k end_POSTSUBSCRIPT\trianglerightInitial perturbation
3:max_iter=k×k𝑘𝑘k\times kitalic_k × italic_k
4:γ𝛾\gammaitalic_γ=D𝐷Ditalic_D(W+δ𝑊𝛿W+\deltaitalic_W + italic_δ)\trianglerightEmbedded Watermark
5:whileimax_iter𝑖𝑚𝑎𝑥_𝑖𝑡𝑒𝑟i\leq max\_iteritalic_i ≤ italic_m italic_a italic_x _ italic_i italic_t italic_e italic_r&γ𝛾absent\gamma\neqitalic_γ ≠β𝛽\betaitalic_βdo
6:γD(W+δ)𝛾𝐷𝑊𝛿\gamma\leftarrow D(W+\delta)italic_γ ← italic_D ( italic_W + italic_δ )\trianglerightIntermediate decoder’s output
7:Δl(β,γ)l(β,α)Δ𝑙𝛽𝛾𝑙𝛽𝛼\Delta\leftarrow l(\beta,\gamma)-l(\beta,\ Alpha )roman_Δ ← italic_l ( italic_β, italic_γ ) - italic_l ( italic_β, italic_α )
8:Updateδ:δiδi1ηΔ(δ):𝛿subscript𝛿𝑖subscript𝛿𝑖1𝜂Δ𝛿\delta:\delta_{i}\leftarrow\delta_{i-1}-\eta\nabla\Delta(\delta)italic_δ: italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ← italic_δ start_POSTSUBSCRIPT italic_i - 1 end_POSTSUBSCRIPT - italic_η ∇ roman_Δ ( italic_δ )\trianglerightUpdate delta with respect to the loss
9:δClip(δ,[ϵ,ϵ])𝛿𝐶𝑙𝑖𝑝𝛿italic-ϵitalic-ϵ\delta\leftarrow Clip(\delta,[-\epsilon,\epsilon])italic_δ ← italic_C italic_l italic_i italic_p ( italic_δ, [ - italic_ϵ, italic_ϵ ] )\trianglerightδ𝛿\deltaitalic_δis clipped
10:endwhile
11:returnδ𝛿\deltaitalic_δ

The hyperparameters (parameters that we explicitly define) for this attack include:

  1. 1.

    ϵitalic-ϵ\epsilonitalic_ϵ:The maximum amount of allowable perturbation that can be added to the images.

  2. 2.

    Optimizer: It is used to find the well-crafted perturbationδ𝛿\deltaitalic_δ.

  3. 3.

    l𝑙litalic_l:The loss function chosen for minimizes the loss betweenγ𝛾\gammaitalic_γandβ𝛽\betaitalic_βand maximizes the loss betweenβ𝛽\betaitalic_βandα𝛼\ Alphaitalic_α.

This algorithm works similarly in the black-box scenario where the decoderD𝐷Ditalic_Dis replaced by a surrogate decoderDsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT,and the corresponding loss will bel(β,γ)𝑙𝛽𝛾l(\beta,\gamma)italic_l ( italic_β, italic_γ )+l(D(W),α)𝑙superscript𝐷𝑊𝛼l(D^{\prime}(W),\ Alpha )italic_l ( italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_W ), italic_α )in line6666of the algorithm.

4.3Reason For Successful Attack

In the classification task, whatever may be the input, the classifier will always classify it into one of the classes. These classes are also known to the attacker while performing a targeted adversarial attack. Thus, the attacker adds perturbation such that the boundary of the current class is crossed to the target class and the confidence level of the classifier corresponding to the target class is the highest. Suppose the watermark is ofN𝑁Nitalic_N-bit,i.e.formulae-sequence𝑖𝑒i.e.italic_i. italic_e.,the output of the decoder is aN𝑁Nitalic_N-bit watermark. Now, we can consider the decoder as a classifier that classifies the watermarked image into one of2Nsuperscript2𝑁2^{N}2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPTpossible watermark classes. Therefore, our attack can be considered a targeted adversarial attack where the target class is the target watermark among one of the2Nsuperscript2𝑁2^{N}2 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPTpossible cases.DNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques are trained end-to-end based on the perceptual similarity of the image after embedding a watermark, which makes the embedding region-specific and susceptible to attack. Even if the models are trained for robustness against prominent image manipulation attacks, the same factor is responsible for the generation of adversarial perturbations.

5Experimental Results

In this section, we validate the effectiveness of ourDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack on four well knownDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques:HiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N[53],ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k[2],PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G[12]andHiding Images in an Image[4].Experiments are conducted on a machine with14141414-core𝑐𝑜𝑟𝑒coreitalic_c italic_o italic_r italic_eIntel𝐼𝑛𝑡𝑒𝑙Intelitalic_I italic_n italic_t italic_e italic_li9𝑖9i9italic_i 910940X10940𝑋10940X10940 italic_XCPU𝐶𝑃𝑈CPUitalic_C italic_P italic_U,128128128128GB𝐺𝐵GBitalic_G italic_BRAM𝑅𝐴𝑀RAMitalic_R italic_A italic_M,and twoNvidia𝑁𝑣𝑖𝑑𝑖𝑎Nvidiaitalic_N italic_v italic_i italic_d italic_i italic_aRTX𝑅𝑇𝑋RTXitalic_R italic_T italic_X-5000500050005000GPU𝐺𝑃𝑈GPUitalic_G italic_P italic_Us with16161616GB𝐺𝐵GBitalic_G italic_BVRAM𝑉𝑅𝐴𝑀VRAMitalic_V italic_R italic_A italic_Meach.

5.1Setup of Target Models

The targetDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques have differentDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_Nmodel architectures, training pipelines, training datasets, and watermark sizes. The key features of these four techniques are described below with the help of Table1:

  1. 1.

    HiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_Nis an end-to-end model for image watermarking that is robust to arbitrary types of image distortion. It comprises four main components: an encoder, a parameterless noise layerN𝑁Nitalic_N,a decoder, and an adversarial discriminator. The encoder uses a128128128128×\times×128128128128×\times×3333cover image to embed a30303030-bit binary watermark.

  2. 2.

    ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_kuses residual connections, circular convolution, attack layer (simulated attacks during training against real-world manipulations, particularly JPEG compression), and1111d convolution layers for embedding and extracting the watermark. It takes a grayscale32323232×\times×32323232×\times×1111cover image and embeds a4444×\times×4444-bit watermark using the residual connection between the layers.

  3. 3.

    PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_Gconsists of three main parts: the encoder, the screen-shooting noise layer, and the decoder. In order to achieve both screen-shooting robustness by handling perspective distortion, illumination distortion and moir𝑟{r}italic_re distortion while maintaining high visual quality, the technique uses an adversary network with edge mask-guided image loss and gradient mask-guided image loss. It uses a128128128128×\times×128128128128×\times×3333size cover image to embed a30303030-bit watermark.

  4. 4.

    Hiding Images in an Imagecould hide a200200200200×\times×200200200200×\times×3333image as a watermark inside a200200200200×\times×200200200200×\times×3333cover image. In this technique, the watermarked image is passed through a preparation network that transforms and concatenates it to the original cover image using a hiding network. During decoding, the watermarked image is passed through a reveal network and outputs the watermarked image.

Table 1:Characteristics of differentDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques which are attacked byDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E.
Technique
Dicrimator
in the Loop
Cover
Image Size
Watermark
Size
Dataset
HiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N[53] Yes 128128128128×\times×128128128128×\times×3333 30303030bit COCO𝐶𝑂𝐶𝑂COCOitalic_C italic_O italic_C italic_O[23]
ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k[2] No 32323232×\times×32323232×\times×1111 4444×\times×4444bit CIFAR10𝐶𝐼𝐹𝐴𝑅10CIFAR10italic_C italic_I italic_F italic_A italic_R 10[22]
PIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G[12] Yes 128128128128×\times×128128128128×\times×3333 30303030bit COCO𝐶𝑂𝐶𝑂COCOitalic_C italic_O italic_C italic_O[23]
Hiding Images in an Image[4] No 200200200200×\times×200200200200×\times×3333 200200200200×\times×200200200200×\times×3333bit ImageNet𝐼𝑚𝑎𝑔𝑒𝑁𝑒𝑡ImageNetitalic_I italic_m italic_a italic_g italic_e italic_N italic_e italic_t[10]

5.2Training Surrogate Model

Each of the techniques mentioned above uses a different resolution of the cover image and watermark sizes as shown in Table1.Therefore, we have made four instances of the surrogate model (encoder and decoder),i.e.formulae-sequence𝑖𝑒i.e.italic_i. italic_e.,one instance for each target model. The size of the cover image, watermarked image and watermark for each instance of the surrogate model is set according to the target model it corresponds. We have usedUNet𝑈𝑁𝑒𝑡UNetitalic_U italic_N italic_e italic_t[41]architecture for the surrogate encoder, whereas for the surrogate decoder, after trying various models, we have used two different architectures: one is spatial transformer[18]with seven convolutional layers followed by two fully connected layers and the other is Self-supervised vision transformer (SiT𝑆𝑖𝑇SiTitalic_S italic_i italic_T)[3]based autoencoder. The first one is used to build surrogate decoders forHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k,andPIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G,where a bit-string is used as the watermark. The second architecture is used to build the surrogate decoder forHiding Images in an Image,where an image is used as the watermark. We have used theMirflickr𝑀𝑖𝑟𝑓𝑙𝑖𝑐𝑘𝑟Mirflickritalic_M italic_i italic_r italic_f italic_l italic_i italic_c italic_k italic_r[16]dataset as our surrogate dataset, consisting of one million images with varied contexts, lighting, and themes, from the social photography siteFlickr𝐹𝑙𝑖𝑐𝑘𝑟Flickritalic_F italic_l italic_i italic_c italic_k italic_r.We used50505050k images in our training set and10101010k in our test set. We trained all four surrogate models in an end-to-end manner for200epochs, which is the general approach followed inDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques[4,53,12,2].We usedMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,LPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_S[49],andL2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTresidual regularization loss functions between the cover image and the watermarked image for training the surrogate encoders. While training surrogate decoders forHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_k,andPIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G,where a bit-string is used as the watermark, we usedBCE𝐵𝐶𝐸BCEitalic_B italic_C italic_Eto calculate the loss between the extracted and the original watermarks. In the same line,MSE𝑀𝑆𝐸MSEitalic_M italic_S italic_EandLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_Sloss is used to train the surrogate decoder forHiding Images in an Image.

5.3Fine-tuning Surrogate Decoder

In order to demonstrate the efficacy of our attack (Algo1), we will next show that it can successfully adapt to different watermarking techniques through a set of experiments. Before going into the details of our attack results, we briefly discuss our fine-tuning setup. For the fine-tuning, we collected500500500500watermarked images and their watermarks from each of the four watermarking techniques. In the case ofHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_N,ReDMark𝑅𝑒𝐷𝑀𝑎𝑟𝑘ReDMarkitalic_R italic_e italic_D italic_M italic_a italic_r italic_kandPIMoG𝑃𝐼𝑀𝑜𝐺PIMoGitalic_P italic_I italic_M italic_o italic_G,each watermarked image is embedded with a unique watermark generated from randomly sampled bits, whereas forHiding Images in an Image,we use randomly sampled images from theImagenet𝐼𝑚𝑎𝑔𝑒𝑛𝑒𝑡Imagenetitalic_I italic_m italic_a italic_g italic_e italic_n italic_e italic_tdataset as the watermark. Subsequently, the four instances of the trained surrogate decoder are fine-tuned on the watermarked images of the respective target decoder. The results show that fine-tuning the surrogate decoder for100100100100epochs is sufficient to attack the target decoder successfully.

5.4Attack Validation

In order to validate our attack, we generate10000100001000010000watermarked images using each of the four target watermarking techniques. Using each of these watermarked images, we attack the corresponding fine-tuned surrogate decoder using Algo1to generate corresponding well-crafted perturbations. These perturbations are added to the watermarked images and are used to attack the target decoder to evaluate the success rate of our attack. We tried our attack on the surrogate decoder withL1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTandMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_Eloss (Line06060606,Algo1). Finally, we choseMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,as the perturbation generated was imperceptible and the attack converged quickly. The initial perturbationδ𝛿\deltaitalic_δis initialized as a zero-filled vector, whereasϵitalic-ϵ\epsilonitalic_ϵis chosen as0.3ϵ0.30.3italic-ϵ0.3-0.3\leq\epsilon\leq 0.3- 0.3 ≤ italic_ϵ ≤ 0.3.We employ theAdam𝐴𝑑𝑎𝑚Adamitalic_A italic_d italic_a italic_moptimizer with an initial learning rate of0.0010.0010.0010.001.The attack converges around5000500050005000iterations for all four watermarking techniques.

5.5Evaluation

After the initial training of all four surrogate decoders has an accuracy of more than90%percent9090\%90 %in successfully extracting the embedded watermark when validated on the test set. This shows that all four surrogate models have converged successfully. Subsequently, these surrogate decoders are successfully fine-tuned with500500500500watermarked images and their watermarks within100100100100epochs to launch a white-box attack on the surrogate decoder to get the well-crafted perturbation that will fail the target decoder. An experimental analysis is performed for each instance of the surrogate decoder to check if it can be fine-tuned in less than100100100100epochs and also with less than500500500500watermarked images and their watermarks. The optimum epochs and required watermarked images are shown in2ndsuperscript2𝑛𝑑2^{nd}2 start_POSTSUPERSCRIPT italic_n italic_d end_POSTSUPERSCRIPTand3rdsuperscript3𝑟𝑑3^{rd}3 start_POSTSUPERSCRIPT italic_r italic_d end_POSTSUPERSCRIPTcolumns of Table2.

Table 2:Optimal epoch and watermarked image required for fine-tuning different surrogate models along with the image quality of the watermarked images after attacking and adding the well-crafted perturbation usingDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. ForPSNR𝑃𝑆𝑁𝑅PSNRitalic_P italic_S italic_N italic_RandSSIM𝑆𝑆𝐼𝑀SSIMitalic_S italic_S italic_I italic_M,higher is better. ForLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_SandMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,lower is better.ASR𝐴𝑆𝑅ASRitalic_A italic_S italic_Rrepresent the rate of success on attacking10000100001000010000watermarked images generated from eachDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking technique.
Technique Fine-Tuning Pert Limit (ϵitalic-ϵ\epsilonitalic_ϵ) Evaluation Matrix
Epoch Image PSNR SSIM LPIPS MSE ASR
ReDMark 40 200 0.002 41 0.97 0.08 0.05 98
HiDDeN 60 300 0.008 38 0.99 0.07 0.15 96
PIMoG 70 400 0.02 37 0.99 0.1 0.27 93
Hiding Images
in an Image
90 500 0.1 33 0.95 0.12 0.36 89
Refer to caption

Normal

Image

Refer to caption Added Perturbation Refer to caption Attacked Image Refer to caption Normal Image Refer to caption Added Perturbation Refer to caption Attacked Image

Figure 3:The well-crafted imperceptible perturbation is successfully added to the original watermarked image without deteriorating the image quality of the watermarked image.

The evaluations are made using the settings mentioned in the Section5.4.One of the most important metrics in our evaluations is the Attack Success Rate (ASR𝐴𝑆𝑅ASRitalic_A italic_S italic_R), which defines the percentage of adversarial examples that lead to successful attacks in the target decoder. Furthermore, in order to evaluate the quality and the similarity of the attacked (perturbed) watermarked images in comparison to their respective original watermarked images, we useMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,Peak Signal-to-Noise Ratio (PSNR𝑃𝑆𝑁𝑅PSNRitalic_P italic_S italic_N italic_R), Structural Similarity Index (SSIM𝑆𝑆𝐼𝑀SSIMitalic_S italic_S italic_I italic_M), andLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_S(from Alexnet Network). These image quality metrics are used in combination with visual analysis to show the quality of ourDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_E,i.e.formulae-sequence𝑖𝑒i.e.italic_i. italic_e.,our watermark overwriting attack. In our terms, a good quality attack not only fails the target decoder without leaving any visual traces (artifacts) in the attacked watermarked image. In this line, theMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_EandPSNR𝑃𝑆𝑁𝑅PSNRitalic_P italic_S italic_N italic_Rmetrics provide pixel-wise error measurement, which helps the difference between the attacked watermarked images and their respective original watermarked images with respect to the pixels and their orientations. At the same time, theSSIM𝑆𝑆𝐼𝑀SSIMitalic_S italic_S italic_I italic_MandLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_Smetrics measure the image quality specifically such that there is no degradation of an image by adding perturbation.

Refer to caption

Normal

Image

Refer to caption Embedded Watermark Refer to caption Target Watermark Refer to caption Attacked Image Refer to caption Extracted Watermark

Figure 4:Result of attacking the watermarked image created by the technique ofHiding Images in an Imageusing theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack.

The performance of ourDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack on different techniques is shown in Table2.Figure3show the watermarked image quality after theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack where the normal image refers to the original watermarked image and the attacked image refers to the perturbed watermarked image afterDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. Figure4shows the quality of the target watermark extracted after attacking the technique ofHiding Images in an ImageusingDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. The attacked images maintained low values ofMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_E,LPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_S,and high values ofPSNR𝑃𝑆𝑁𝑅PSNRitalic_P italic_S italic_N italic_R(in %) andSSIM𝑆𝑆𝐼𝑀SSIMitalic_S italic_S italic_I italic_Mscores, indicating the added perturbation’s imperceptibility. TheASR𝐴𝑆𝑅ASRitalic_A italic_S italic_Ris90%percent9090\%90 %for almost all the techniques, highlighting the efficacy of theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. Our attack fails for instances where the cosine similarity between the original and target watermarks is less than0.10.10.10.1.In order to get a100%percent100100\%100 %success rate in these cases, we had to sacrifice in perturbation limit, which is increased to0.50.50.50.5.This took a toll on all other metrics. Thus, theMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_EandLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_Sincreased, whilePSNR𝑃𝑆𝑁𝑅PSNRitalic_P italic_S italic_N italic_RandSSIM𝑆𝑆𝐼𝑀SSIMitalic_S italic_S italic_I italic_Mdecreased significantly. Noticeable artifacts also appeared in the attacked images, as shown in Figure5.

Refer to caption

Normal

Image

Refer to caption Added Perturbation Refer to caption Attacked Image Refer to caption Normal Image Refer to caption Added Perturbation Refer to caption Attacked Image

Figure 5:Artifacts appear on attacking some watermarked images usingDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Ewhen cosine similarity is less than0.10.10.10.1.

5.6Discussion

The initial training of the surrogate model forReDMarkandHiDDeN𝐻𝑖𝐷𝐷𝑒𝑁HiDDeNitalic_H italic_i italic_D italic_D italic_e italic_Nhave a good accuracy and fine-tuning with the target watermarked image leads to a successfulDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. There were some problems withHiding Images in an ImageandPIMoGwhere the initial surrogate model had good accuracy and similar fine-tuning was performed, but still theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack was unsuccessful. In the case ofHiding Images in an Imageafter theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack, the target decoder could recover a distorted watermark that neither matched with the original watermark nor matched with the target watermark. To overcome this issue, we have addedLPIPS𝐿𝑃𝐼𝑃𝑆LPIPSitalic_L italic_P italic_I italic_P italic_Sloss (fromVGG𝑉𝐺𝐺VGGitalic_V italic_G italic_G-19191919Network) along with the initialMSE𝑀𝑆𝐸MSEitalic_M italic_S italic_Eloss while training the surrogate decoder with the surrogate dataset. This led to a successfulDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack when we fine-tuned the surrogate model for90909090epochs with500500500500watermarked image of the target decoder. In the case of,PiMoG,the target decoder recovers the original watermark successfully, even with the perturbed watermarked image. This was possible due to the presence of screen shooting robustness. We introduced perspective warp, motion blur, and colour manipulations to overcome the issue while initially training our surrogate model. Subsequently, the surrogate decoder is fine-tuned for70707070epochs with400400400400watermarked images of the target decoder, which led to a successfulDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. This shows that theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack needs minute tweaking in surrogate training and fine-tuning such that it can be adaptable to different techniques.

6Conclusion

In this work, we have proposed theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack onDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques by leveraging adversarial machine learning. The attack shows that modernDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques are vulnerable to theDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack. The proposedDLOVE𝐷𝐿𝑂𝑉𝐸DLOVEitalic_D italic_L italic_O italic_V italic_Eattack raises a clear question on the security of the existingDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques. It provides a new attack vector to the designer community to assess the security of theirDNN𝐷𝑁𝑁DNNitalic_D italic_N italic_N-based watermarking techniques.

References

  • [1] Adobe on watermarking ai-generated photos.https://blog.adobe /en/publish/2023/10/10/new-content-credentials-icon-transparency,accessed: 2022-20-02
  • [2] Ahmadi, M., Norouzi, A., Karimi, N., Samavi, S., Emami, A.: Redmark: Framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications146,113157 (2020)
  • [3] Atito, S., Awais, M., Kittler, J.: Sit: Self-supervised vision transformer. arXiv preprint arXiv:2104.03602 (2021)
  • [4] Baluja, S.: Hiding images within images. IEEE transactions on pattern analysis and machine intelligence42(7), 1685–1697 (2019)
  • [5] Berghel, H., O’Gorman, L.: Protecting ownership rights through digital watermarking. Computer29(7), 101–103 (1996). https://doi.org/10.1109/2.511977
  • [6] Corley, I., Lwowski, J., Hoffman, J.: Destruction of image steganography using generative adversarial networks. arXiv preprint arXiv:1912.10070 (2019)
  • [7] Cox, I., Kilian, J., Leighton, F., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing6(12), 1673–1687 (1997). https://doi.org/10.1109/83.650120
  • [8] Cox, I., Miller, M., Bloom, J., Honsinger, C.: Digital watermarking. Journal of Electronic Imaging11(3), 414–414 (2002)
  • [9] Cox, I.J., Miller, M.L.: The first 50 years of electronic watermarking. EURASIP Journal on Advances in Signal Processing2002,1–7 (2002)
  • [10] Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
  • [11] Fang, H., Chen, D., Huang, Q., Zhang, J., Ma, Z., Zhang, W., Yu, N.: Deep template-based watermarking. IEEE Transactions on Circuits and Systems for Video Technology31(4), 1436–1451 (2020)
  • [12] Fang, H., Jia, Z., Ma, Z., Chang, E.C., Zhang, W.: Pimog: An effective screen-shooting noise-layer simulation for deep-learning-based watermarking network. In: Proceedings of the 30th ACM International Conference on Multimedia. pp. 2267–2275 (2022)
  • [13] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  • [14] Google on watermarking ai-generated contents.https://deepmind.google/technologies/synthid/,accessed: 2022-20-02
  • [15] Guo, C., Gardner, J., You, Y., Wilson, A.G., Weinberger, K.: Simple black-box adversarial attacks. In: International Conference on Machine Learning. pp. 2484–2493. PMLR (2019)
  • [16] Huiskes, M.J., Thomee, B., Lew, M.S.: New trends and ideas in visual concept detection: The mir flickr retrieval evaluation initiative. In: Proceedings of the international conference on Multimedia information retrieval. pp. 527–536 (2010)
  • [17] Ilyas, A., Engstrom, L., Athalye, A., Lin, J.: Black-box adversarial attacks with limited queries and information. In: International conference on machine learning. pp. 2137–2146. PMLR (2018)
  • [18] Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Advances in neural information processing systems28(2015)
  • [19] Jia, Z., Fang, H., Zhang, W.: Mbrs: Enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression. In: Proceedings of the 29th ACM international conference on multimedia. pp. 41–49 (2021)
  • [20] Jung, D., Bae, H., Choi, H.S., Yoon, S.: Pixelsteganalysis: Pixel-wise hidden information removal with low visual degradation. IEEE Transactions on Dependable and Secure Computing (2021)
  • [21] Kandi, H., Mishra, D., Gorthi, S.R.S.: Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Computers & Security65,247–268 (2017)
  • [22] Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
  • [23] Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
  • [24] Liu, H., Xiang, T., Guo, S., Li, H., Zhang, T., Liao, X.: Erase and repair: An efficient box-free removal attack on high-capacity deep hiding. IEEE Transactions on Information Forensics and Security (2023)
  • [25] Liu, T., Qiu, Z.d.: The survey of digital watermarking-based image authentication techniques. In: 6th International Conference on Signal Processing, 2002. vol. 2, pp. 1556–1559. IEEE (2002)
  • [26] Liu, Y., Guo, M., Zhang, J., Zhu, Y., Xie, X.: A novel two-stage separable deep learning framework for practical blind watermarking. In: Proceedings of the 27th ACM International conference on multimedia. pp. 1509–1517 (2019)
  • [27] Liu, Y., Chen, X., Liu, C., Song, D.: Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770 (2016)
  • [28] Lu, C.S., Huang, S.K., Sze, C.J., Liao, H.Y.M.: Cocktail watermarking for digital image protection. IEEE Transactions on Multimedia2(4), 209–224 (2000). https://doi.org/10.1109/6046.890056
  • [29] Lu, C.S., Liao, H.Y.: Multipurpose watermarking for image authentication and protection. IEEE Transactions on Image Processing10(10), 1579–1592 (2001). https://doi.org/10.1109/83.951542
  • [30] Luo, X., Zhan, R., Chang, H., Yang, F., Milanfar, P.: Distortion agnostic deep watermarking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 13548–13557 (2020)
  • [31] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
  • [32] Meta on watermark ai-generated photos.https://about.fb /news/2024/02/labeling-ai-generated-images-on-facebook-instagram-and-threads/,accessed: 2022-20-02
  • [33] Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)
  • [34] Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 427–436 (2015)
  • [35] Papernot, N., McDaniel, P., Goodfellow, I.: Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277 (2016)
  • [36] Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia conference on computer and communications security. pp. 506–519 (2017)
  • [37] Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European symposium on security and privacy (EuroS&P). pp. 372–387. IEEE (2016)
  • [38] Pibre, L., Jérôme, P., Ienco, D., Chaumont, M.: Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch. arXiv preprint arXiv:1511.04855 (2015)
  • [39] Qian, Y., Dong, J., Wang, W., Tan, T.: Deep learning for steganalysis via convolutional neural networks. In: Media Watermarking, Security, and Forensics 2015. vol. 9409, pp. 171–180. SPIE (2015)
  • [40] Raj, N.N., Shreelekshmi, R.: A survey on fragile watermarking based image authentication schemes. Multimedia Tools and Applications80,19307–19333 (2021)
  • [41] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. pp. 234–241. Springer (2015)
  • [42] Shaik, A.S., Karsh, R.K., Islam, M., Laskar, R.H.: A review of hashing based image authentication techniques. Multimedia Tools and Applications pp. 1–28 (2022)
  • [43] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  • [44] Vukotić, V., Chappelier, V., Furon, T.: Are deep neural networks good for blind image watermarking? In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS). pp. 1–7. IEEE (2018)
  • [45] Wang, Y., Doherty, J.F., Van Dyck, R.E.: A wavelet-based watermarking algorithm for ownership verification of digital images. IEEE transactions on image processing11(2), 77–88 (2002)
  • [46] Single-frame & image forensic watermarking.https://castlabs /image-watermarking/,accessed: 2022-20-02
  • [47] Wong, P.W.: A public key watermark for image verification and authentication. Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269)1,455–459 vol.1 (1998),https://api.semanticscholar.org/CorpusID:15447332
  • [48] Wong, P.W., Memon, N.: Secret and public key image watermarking schemes for image authentication and ownership verification. IEEE Transactions on Image Processing10(10), 1593–1601 (2001). https://doi.org/10.1109/83.951543
  • [49] Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 586–595 (2018)
  • [50] Zhang, X., Wang, S.: Statistical fragile watermarking capable of locating individual tampered pixels. IEEE Signal Processing Letters14(10), 727–730 (2007). https://doi.org/10.1109/LSP.2007.896436
  • [51] Zhong, X., Huang, P.C., Mastorakis, S., Shih, F.Y.: An automated and robust image watermarking scheme based on deep neural networks. IEEE Transactions on Multimedia23,1951–1961 (2020)
  • [52] Zhou, W., Hou, X., Chen, Y., Tang, M., Huang, X., Gan, X., Yang, Y.: Transferable adversarial perturbations. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 452–467 (2018)
  • [53] Zhu, J., Kaplan, R., Johnson, J., Fei-Fei, L.: Hidden: Hiding data with deep networks. In: Proceedings of the European conference on computer vision (ECCV). pp. 657–672 (2018)