Revealing Traces in printouts and scans

September 2022 (list of samples updated 2023-12-02)

In our store in Leipzig (Germany), we offer to use a multi-function printer (printer, copier and scanner) as described in our concept. To ensure that this can be done in the most privacy-friendly way possible, we have addressed privacy issues and are taking a number of measures. Unlike many other copyshops, we have not installed any cameras, either in front of or inside our store.

On other factors we have no or only limited influence. For example, every print, copy and scan leaves traces or information behind that can have an adverse effect on the user. For example, if confidential or private documents are involved, which could thus unintentionally fall into the hands of third parties, or if the traces and information are suitable for identifying the person.

Table of contents

Data storage and processing

External media

Unless you only want to make a copy of a document you have brought with you, you will most likely use a rewritable storage medium (USB stick, SD card, …) to save scanned documents on it or to print stored documents. In some self-service copy stores, storage media brought along must be connected to a computer in order to be able to print out saved documents.

Malware

These computers and devices may be infected with malware. It is possible, for example, for the malware to embed itself in the rewritable external storage medium and to spread during further use, causing a wide range of damage. It is also conceivable that the malware reads out and forwards stored files. Last but not least, traces of the files can remain on the device and fall into unauthorized hands later.

Golem.de and Kaspersky tested this in 2015 with 70 USB sticks at photo terminals and copyshops in several cities. In this sample, one USB stick was infected with the malware Sality.

To reduce the above risks, people can print and save scanned documents from our Tails terminal. (This is a computer without a hard disk that boots the operating system Tails from a DVD. It allows you to access the Internet anonymously and to edit documents without leaving any traces on the computer. A cabin protects you from snooping eyes). It is also possible to bring files to be printed on CD/DVD and to burn scanned documents on CD/DVD.

In this context, using write-once CDs and DVDs is safer than flash storage media such as USB sticks and SD cards, even if the latter offer some kind of write protection: "In the case of storage media with built-in hardware write protection, it is always uncertain to what extent a proprietary firmware guarantees the blocking of write commands and other commands (e.g. firmware update) and only allows the commands required to read the data."1 If you want to read more about write protection, or want to retrofit write protection yourself, we recommend the website vkldata.com (in German language only).

Hardware write blockers (e.g. for forensic purposes DE, EN) currently seem too expensive and impractical for this use. An open source DIY solution, as first published at Black Hat Europe 2012, unfortunately has a very low read speed.

Identification

External storage media such as USB sticks or cell phones are detected by the device’s operating system. The resulting log files (logs) can reveal a variety of information about external storage media used on the device when analyzed, e.g., after being seized or infected. They can include hardware serial numbers, product names and product IDs, manufacturer IDs, times of mounting and ejecting, UUID/GUID of file systems, and more, from the last weeks, months or years. 2 3 4 If more than one device or person is involved, connections between several devices and persons could also be made. For sensitive work, it is therefore recommended to use an operating system such as Tails, which “forgets” all log files when it is shut down and additionally overwrites the main memory (RAM). For particularly sensitive work, the external storage media should also be deleted and physically destroyed afterwards.

Unfortunately, we could not determine what information about external storage media is being logged by our multi-function printer.

Internal memory and storage

Multi-function printers use their internal main memory (RAM) and hard disk(s) in order to work as intended. The main memory is primarily used to process print jobs. The hard disk stores firmware and settings, as well as data for printing, copying and scanning.

Our multi-function printer is a Canon imageRUNNER ADVANCE c5235i. Canon writes about its data processing:

Your imageRUNNER ADVANCE machine separates data into management information and actual data before storing the data. Management information is automatically erased when the copy, send/receive, or print operation is completed. However, actual data is stored and remains in the hard disk. If HDD Data Erase function is set to ‘On’, actual data (image, management, and spool data for the copying, mail box, printing, and sending/receiving functions) is erased at the same time as management information.

Hard disk

Our multi-function printer has a 160 GB hard disk. Depending on the device, data stored on hard disks can be partially or completely recovered with or without special software if it was not destroyed properly. Especially if used devices and hard disks are resold, such failures repeatedly lead to reports such as:

Canon writes about it:

The management information is automatically erased when the job is complete, but the actual data is stored in the hard disk.

We have configured our multi-function printer so that during a job the processed and stored data is overwritten according to the DoD standard. (“The data is overwritten three times. The first time with a fixed value, the second time with a complement of the fixed value, and the third time with random data.”). This includes:

In addition, we reset the device to factory settings approximately every 30 days and overwrite all data again with the device’s internal function according to DoD standards. We wanted to test the effectiveness of this method with a forensic investigation, but failed like others before when trying to read out the hard drive due to the ATA password. We hope to make up for this at a later date, and welcome advice.

It’s a good thing to overwrite storage devices that are no longer needed, such as hard disks, USB sticks and SD cards, with random data. Physically destroying them afterwards is an even better one. In order to provide other people and also ourselves with an easy-access way to decommission storage devices safely, we want to acquire a storage device shredder and make it available for free in our store and are currently collecting funds for this.

Main memory (RAM)

The multi-function printer in our store has 2.5 GB of main memory. As written in the previous section, we have configured the device to delete and overwrite information that is no longer needed, already during processing. Those who do not trust the setting can turn off the device after use to let the memory “forget”.

Misprints and forgotten documents

Misprints can be shredded directly into millimeter-small shreds in our store using the adjacent document shredder, a Dahle 706air with security levels P-7/F-3/T-6. If that is not sufficient, the remains can be taken away.

If people have forgotten printed, copied or scanned documents, we will look at them and retain presumably important originals for a limited time. Obvious copies and printouts will be shred at the end of the day.

Revealing traces on printouts and in scanned documents

There are many ways to extract information from printouts that can lead to identifying the manufacturer, the device, and the person responsible for that printout or copy. In the following, we will discuss most of the possibilities that we are aware of.

Machine Identification Codes

Although being known since 2004, this is something that few people have in mind when they print something. A Machine Identification Code (MIC) is also called yellow dots, tracking dots, secret dots, color printer marking, color tracking dots, or printer steganography, and refers to a process that adds tiny yellow dots to printouts, invisible to the naked eye. These yellow dots make a pattern (of which there are several forms) 5, in which information such as the serial number of the device and a timestamp (print date and time) is encoded. This pattern is repeated over the entire page. The firmware of the devices is responsible for the implementation. In 2004, Canon Deutschland GmbH received the BigBrotherAward in the category “Technology” for this.6

The encoded information is sometimes used to identify persons which are responsible for printouts. Among others, this has led to the identification of an NSA whistleblower (DE, EN) and of employee(s) of the Berlin State Security who sent threatening letters containing internal police information to 42 people.

The Electronic Frontier Foundation (EFF), which has extensively studied printers and their tracking methods, made the following assessment in 2017:

Reminder: It appears likely that all recent commercial color laser printers print some kind of forensic tracking codes, not necessarily using yellow dots. This is true whether or not thoses codes are visible to the eye and whether or not the printer models are listed here. This is also includes the printers that are listed here as not producing yellow dots.

The purpose of these procedures is obvious – to identify devices and people – but what is the justification for all this? Let’s take a look at the world of banknotes.

Counterfeit Deterrence Systems

The Central Bank Counterfeit Deterrence Group (CBCDG), a working group of now 32 central banks, wrote on its homepage already in March 2004:

The Central Bank Counterfeit Deterrence Group (CBCDG) has now developed the Counterfeit Deterrence System, consisting of anti-counterfeiting technologies which prevent personal computers and digital imaging tools from capturing or reproducing the image of a protected banknote.

Several leading personal computer hardware and software manufacturers have voluntarily adopted the system in recognition of the harm that counterfeit currency can cause their customers and the general public. The technology does not have the capacity to track the use of a personal computer or digital imaging tool and consumers will not notice any difference in the performance or effectiveness of products equipped with this technology.

The Independent Center for Data Protection Schleswig-Holstein (ULD) asked the manufacturer Canon in 2019 (Caution: Yellow Dots! Hidden information in color copies) and summarizes the response [freely translated]:

[Canon] refers to the global cooperation to combat counterfeiting between law enforcement agencies and the printing industry, which was established at the instigation of Europol and Interpol (cf. [8]), as well as the voluntary commitment of the printer industry to implement the counterfeit prevention system. The manufacturer could not provide more detailed information due to a confidentiality agreement.

EURion constellation

In experiments, Markus Kuhn discovered the EURion constellation on euro banknotes and published it in 2002. The pattern is also found on other countries’ banknotes. It is designed to work in conjunction with supporting software and firmware from hardware and software vendors to prevent the scanning, editing, and printing of banknotes. Further research by Steven J. Murdoch and Ben Laurie shows that the EURion constellation is not the only feature by which banknotes are recognized.7 8 9

Meme

Both our own investigations and those of the Independent Center for Data Protection Schleswig-Holstein confirm this. If a part of the EURion constellation, which is located between the digits on new euro bills, is identified by the device, it either omits areas when printing or changes the overall image, for example by blackening or streaking. We tried this with the front of euro banknotes from 5 to 50 euros and found that when part of the EURion constellation is covered, the identification fails and no other areas are changed.

Digital watermarks from Digimarc

In addition to the EURion constellation, a 2005 article in Datenschleuder Nr. 86 mentions Digimarc’s digital watermark, which was brought up shortly before at 21C3 by Steven J. Murdoch and Ben Laurie in their presentation The Convergence of Anti-Counterfeiting and Computer Security. Patent WO1999053428A1 from Digimarc Corporation describes its principles.

If you want to know more about euro banknotes in general, you can find it at Wikipedia.

Patents

The fundamentals of these already known processes are patent applications such as US Patent 5515451, which was applied for by Xerox on October 7, 1993 and granted in 1996, and US Patent 5845008, which was applied for by Omron Corporation on January 20, 1995 and granted in 1998. Since these applications were filed long before its implementation and subsequent “discoveries in the wild,” it may be worthwhile to research other patent applications filed by manufacturers to gain clarification on additional, possibly previously undiscovered, privacy-hostile methods.

MICs: Own investigations

Back to the Machine Identification Code. There are various methods for checking whether a color laser printer leaves Machine Identification Codes on printouts. We have based our investigations on the procedure of the Independent Center for Data Protection Schleswig-Holstein (ULD) in the second version of their paper Vorsicht: Yellow Dots!.

One of the factors that could lead incorrect analysis results is a low fill level of the yellow toner cartridge. It should be noted here that, on some printers, even printing in black and white is no longer possible if one color cartridge is empty. If the printout is scanned in, a too low scan resolution can also affect the result.

For all the following investigations, unless otherwise stated, we have used printouts of our Canon imageRUNNER ADVANCE c5235i in DIN A4 with the color profiles color and black. For a meaningful comparison, we analyzed a sheet of paper in the same way before and after printing. Beforehand, we marked one side of the sheet to ensure that we were looking at the same side before and after printing.

Analysis with a microscope

We first took unprinted sheets of paper, marked a spot and analyzed it with a pocket microscope (magnification 60 to 120) before and after printing. This allowed us to confirm the results of the ULD in our case. On printouts printed with the profile color, we detected several tiny yellow dots. These were not present on printouts made with the color profile black.

Yellow dots under the microscope Yellow dots under the microscope

Analysis with black light

In contrast to the result of the ULD, we could not make any yellow dots visible to the naked eye under black light. For better illustration, the following is a photo of the magnified image of the ULD analysis illuminated with black light.

Image source: Independent State Center for Data Protection Schleswig-Holstein

Analysis on the computer

In order to analyze the printouts on the computer, it is recommended to work with scanned documents. It should be noted that this is best done in a high resolution (e.g. 1200x1200 dpi) and in a lossless format.

We scanned a previously unprinted, white DIN A4 sheet before and after printing with a Canon CanoScan LiDE 210 with 300, 600, 1200 and 2400 dpi and saved it as a PNG file. We used GNOME simple-scan for scanning.

Edit with GIMP

We performed these steps with GIMP (GNU Image Manipulation Program):

  1. select scanned file and open it in GIMP
  2. Windows > Dockable Dialogs > Channels: deselect the colors red and green
  3. Colors > Invert
  4. Colors > Saturation: increase (multiple possible)
  5. If necessary, further adjustments, e.g. with the color curves
  6. Colors > Components > Mono Mixer: Highlight blue channel, e.g. with the values 0, 0, 1

Workflow in GIMP

A simplified workflow, which was not used here, can look as follows:

  1. Colors > Saturation: execute two times with the value 10
  2. Colors > Components > Mono Mixer: Highlight blue channel, e.g. with the values 0, 0, 1
  3. Colors > Invert

The higher the resolution (300, 600, 1200, 2400 dpi), the better the pattern can be recognized and analyzed:

Different scan resolutions

We scanned printouts made from a computer and an USB stick, as well as copies, at a resolution of 600x600 dpi, and processed them in GIMP to analyze them for Machine Identification Codes. In the Single color and Two color printing modes, we have analyzed only some of the available colors.

Category Mode Mode setting MIC
Print from PC Black and white - no MIC detected
Print from PC Color - MIC detected
Print from USB Auto (color/black) - MIC detected
Print from USB Black - no MIC detected
Copy Auto (color/black) - MIC detected
Copy Single color Yellow MIC detected
Copy Single color Green MIC detected
Copy Single color Red MIC detected
Copy Black - no MIC detected
Copy Full color - MIC detected
Copy Two color Black & red MIC detected

Printouts and copies that are made without colors, but only with black, are thus not given a visible Machine Identification Code in our tests on our device.

All detected patterns are so-called skewed small patterns, described by Peter Buck in his paper “Reverse Engineering the Machine Identification Code” and also studied at Duke University.

We can confirm the results of that work: We find a pattern of 18 dots, which we estimate to be arranged in a 16x32 grid, probably representing the serial number of our device (JWF11162) and repeating across the page. The pattern does not change with time, date, or the content of the printed document. It has the shape of a parallelogram tilted by about 30 degrees. However, the orientation, start and end could be different from what is shown here. The following image shows four repetitions of the pattern, which we have colored differently.

Four repetitions of the pattern in different colors

MICs in the wild

To determine how common MICs are “in the wild,” we analyzed 200 documents from 200 different companies, associations, and government agencies that we had received independently of this research. The oldest document is from 2009, and most are from 2020-2023. We randomly selected the documents and analyzed mostly the first sheet of each using scanning and processing with GIMP, as described above, and with a microscope at 120x magnification.

Table of analyzed documents and detected MICs (click to expand)
number Printing in color / black and white MIC Pattern
001 bw no MIC detected
002 color MIC detected
003 bw no MIC detected
004 bw no MIC detected
005 bw no MIC detected
006 bw no MIC detected
007 bw no MIC detected
008 color MIC detected
009 bw no MIC detected
010 bw no MIC detected
011 bw no MIC detected
012 bw no MIC detected
013 bw no MIC detected
014 bw no MIC detected
015 bw no MIC detected
016 bw no MIC detected
017 bw no MIC detected
018 bw no MIC detected
019 bw no MIC detected
020 color uncertain (CMYK dots)
021 color MIC detected
022 bw no MIC detected
023 bw no MIC detected
024 bw no MIC detected
025 bw no MIC detected
026 bw no MIC detected
027 bw uncertain (CMYK dots)
028 bw no MIC detected
029 bw no MIC detected
030 bw no MIC detected
031 bw no MIC detected
032 bw no MIC detected
033 color MIC detected
034 bw no MIC detected
035 bw uncertain (CMYK dots)
036 bw no MIC detected
037 color MIC detected
038 bw no MIC detected
039 color MIC detected
040 bw no MIC detected
041 bw no MIC detected
042 bw MIC detected
043 color MIC detected
044 color MIC detected
045 bw no MIC detected
046 bw no MIC detected
047 bw no MIC detected
048 bw no MIC detected
049 bw no MIC detected
050 bw no MIC detected
051 bw no MIC detected
052 bw no MIC detected
053 bw no MIC detected
054 bw no MIC detected
055 bw no MIC detected
056 bw no MIC detected
057 bw no MIC detected
058 bw no MIC detected
059 color MIC detected
060 bw no MIC detected
061 bw no MIC detected
062 bw no MIC detected
063 bw no MIC detected
064 bw no MIC detected
065 bw no MIC detected
066 bw no MIC detected
067 bw no MIC detected
068 bw no MIC detected
069 bw no MIC detected
070 bw no MIC detected
071 bw no MIC detected
072 bw no MIC detected
073 bw MIC detected
074 bw no MIC detected
075 bw no MIC detected
076 color MIC detected
077 bw no MIC detected
078 bw uncertain (CMYK dots)
079 bw no MIC detected
080 bw no MIC detected
081 bw no MIC detected
082 bw no MIC detected
083 bw no MIC detected
084 bw no MIC detected
085 bw/color no MIC detected
086 color MIC detected
087 bw no MIC detected
088 bw no MIC detected
089 bw no MIC detected
090 color MIC detected
091 bw no MIC detected
092 color MIC detected
093 bw no MIC detected
094 bw no MIC detected
095 bw no MIC detected
096 color MIC detected
097 bw no MIC detected
098 bw no MIC detected
099 bw no MIC detected
100 bw no MIC detected
101 color MIC detected
102 color no MIC detected
103 bw no MIC detected
104 bw no MIC detected
105 bw no MIC detected
106 bw no MIC detected
107 color no MIC detected
108 bw no MIC detected
109 bw no MIC detected
110 bw no MIC detected
111 color no MIC detected
112 bw no MIC detected
113 bw no MIC detected
114 bw no MIC detected
115 bw no MIC detected
116 bw no MIC detected
117 bw no MIC detected
118 bw no MIC detected
119 bw no MIC detected
120 bw no MIC detected
121 bw no MIC detected
122 bw no MIC detected
123 bw no MIC detected
124 bw no MIC detected
125 bw no MIC detected
126 bw no MIC detected
127 bw no MIC detected
128 color no MIC detected
129 color MIC detected
130 color MIC detected
131 color no MIC detected
132 bw no MIC detected
133 bw no MIC detected
134 bw no MIC detected
135 bw no MIC detected
136 bw no MIC detected
137 color no MIC detected
138 color MIC detected
139 color no MIC detected
140 bw no MIC detected
141 bw no MIC detected
142 bw no MIC detected
143 color no MIC detected
144 bw no MIC detected
145 color MIC detected
146 color MIC detected
147 bw no MIC detected
148 bw no MIC detected
149 bw no MIC detected
150 bw no MIC detected
151 bw no MIC detected
152 bw no MIC detected
153 bw no MIC detected
154 bw no MIC detected
155 bw no MIC detected
156 bw no MIC detected
157 bw no MIC detected
158 bw no MIC detected
159 bw MIC detected
160 bw no MIC detected
161 bw no MIC detected
162 bw no MIC detected
163 bw no MIC detected
164 color no MIC detected
165 bw no MIC detected
166 bw no MIC detected
167 bw no MIC detected
168 color no MIC detected
169 color no MIC detected
170 color no MIC detected
171 bw no MIC detected
172 bw no MIC detected
173 color MIC detected
174 color MIC detected
175 color MIC detected
176 bw no MIC detected
177 bw no MIC detected
178 bw no MIC detected
179 color no MIC detected
180 bw no MIC detected
181 color no MIC detected
182 bw no MIC detected
183 color no MIC detected
184 color MIC detected
185 color MIC detected
186 color MIC detected
187 color MIC detected
188 bw no MIC detected
189 color MIC detected
190 bw no MIC detected
191 bw no MIC detected
192 color MIC detected
193 bw no MIC detected
194 bw no MIC detected
195 bw no MIC detected
196 color no MIC detected
197 bw no MIC detected
198 bw no MIC detected
199 bw no MIC detected
200 color no MIC detected

The printouts and copies were from laser and inkjet printers unknown to us and were in black and white and in color. The sample included white paper, non-bleached recycled paper, and pre-printed notepaper. In the latter case, we did not investigate further whether the Machine Identification Code was applied when the form was created or when the content was printed, so the result here is also only MIC detected or no MIC detected.

With four documents, we were able to detect yellow dots on the paper with the microscope, but since there were colored dots in CMYK colors on the entire page in each document, we could not clearly identify a pattern and rated it as uncertain (CMYK dots).

In total, a Machine Identification Code was detected in 32 out of 200 documents. Of these 32 documents, 29 documents contained recognizable colored contents, while the remaining 3 seemed black and white.

Deda Toolkit

An interesting approach is taken by the Deda Toolkit, which was developed at the TU Dresden. It is intended to help detect tracking dots and offers the possibility of anonymization by removing detected patterns or adding new ones. Among others, Netzpolitik.org and Deutschlandfunk reported on the method.

We have tested the Deda Toolkit on a Debian-based system.

sudo apt update
sudo apt install python3-pip
pip3 install --user deda

deda_gui

python3 /home/user/.local/bin/deda_gui

The deda_gui did not recognize the pattern that became visible with editing in GIMP during our tests. We tested several scans saved as PNG files at 300, 600 and 1200 dpi and got only the message No tracking dot pattern detected. For best results try a 300 dpi scan and a lossless file format.

deda_extract_yd

python3 /home/user/.local/bin/deda_extract_yd filepath --debug

In our tests in the command line with the same input files, deda_extract_yd detected a tracking pattern (Detected tracking dot pattern (-1, -1, 0.283334, 0.006667)) for a file with 300 dpi resolution, but not for higher resolutions. Instead we received the error message AttributeError: ‘YellowDotsXposer’ object has no attribute ‘dots’.

Therefore, to search for MICs, the manual method with GIMP seems more reliable to us at the moment. Also the anonymization method of deda seems to work only for some patterns. In test with our Canon device, there was no change to the original file despite the successful message Document anonymized and saved in the deda_gui and the creation of a file anon.png. The MIC was still recognizable. As Stephan Escher told us, there was not enough material for the analysis of the samples used by Canon so far. In addition, patterns from Canon devices are more difficult to detect than those from other manufacturers.

Metadata in scanned files

Digital files such as documents and graphics contain metadata. These can unintentionally reveal information and thus allow to draw conclusions about the source. The Extensible Metadata Platform (XMP) is a standard developed by Adobe for embedding metadata in files. However, metadata can also be stored as a separate file in the same directory.

If no custom filename is entered when saving a scanned document, our Canon imageRUNNER ADVANCE c5235i generates the file name from the system time UTC+0 at the time of scan start, using the XMP standard. Thus, the file name 20041224084919.pdf means that the file, according to the time set on the device, was created on 2004-12-24 at 08:49:19 UTC+0. For scans in JPEG format, the file names are additionally numbered like _001, a number which increases by one with each page. Other devices may use a different scheme for naming. This may allow to draw conclusions about the manufacturer or model.

In the following metadata, you can see that in our test, the name was assigned with UTC+0 despite the UTC+1 time zone being set. We use ExifTool to read out the Exif data:

exiftool -g /home/user/20041224084919.pdf

---- ExifTool ----
ExifTool Version Number         : 12.44
---- File ----
File Name                       : 20041224084919.pdf
Directory                       : /mnt/usb/...
File Size                       : 44 kB
File Modification Date/Time     : 2004:12:24 09:49:26+01:00
File Access Date/Time           : 2022:08:08 11:29:37+02:00
File Inode Change Date/Time     : 2004:12:24 09:49:24+01:00
File Permissions                : -rwxr-xr-x
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
---- PDF ----
PDF Version                     : 1.4
Linearized                      : No
Creator                         : Canon iR-ADV C5235  PDF
Create Date                     : 2004:12:24 08:49:24Z
Page Count                      : 1
---- XMP ----
XMP Toolkit                     : Adobe XMP Core
Creator Tool                    : Canon iR-ADV C5235  PDF
Producer                        : Adobe PSL 1.2e for Canon
Format                          : application/pdf
Document ID                     : uuid:14d8cb41-0000-8887-1780-13af00000000

With -g, ExifTool groups the read out information.

These metadata can be individually manipulated or partially removed using programs such as Metadata Anonymisation Toolkit v2 (mat2). For example, mat2 /home/user/20041224084919.pdf creates the file 20041224084919.cleaned.pdf, which contains only the following metadata:

---- ExifTool ----
ExifTool Version Number         : 12.44
Warning                         : Invalid xref table
---- File ----
File Name                       : 20041224084919.cleaned.pdf
Directory                       : /mnt/usb/...
File Size                       : 97 kB
File Modification Date/Time     : 2022:08:05 10:19:10+02:00
File Access Date/Time           : 2022:08:05 10:19:36+02:00
File Inode Change Date/Time     : 2022:08:05 10:19:10+02:00
File Permissions                : -rwxr-xr-x
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
---- PDF ----
PDF Version                     : 1.5
Linearized                      : No

If you are using the Tails operating system, mat2 is already installed and you can right-click to select the Remove metadata option to strip a file in a supported file format of most of its metadata. If you are using a different Linux-based operating system and want to install mat2, you can find instructions here.

If a PDF file is converted to a Trusted PDF in Qubes OS, a lot of metadata is also removed, but e.g. not the Create Date, which is reset. If you want to manipulate the date and time information in the file system metadata and in the PDF metadata, you can change the system time for this and set it to the future or past. This may require being offline temporarily or preventing online time synchronization.

Pitfalls of the file systems

Some file systems store multiple timestamps, which are sometimes overlooked and cannot always be removed completely. For example, the ext4 file system stores a Creation Date for each file. Furthermore, the accuracy of the timestamp can also reveal information about the origin or transport method of a file. The Whonix project writes on the subject of File System Data Leakage in footnote 3:

USB flash drives are fairly unique in that they usually use FAT32, and FAT32 is unique in that its datetime fields have a resolution of 2 seconds. So it’s really easy to tell if files were on a USB flash drive (all the datetime values will be even numbers) unless the datetime metadata is scrubbed.

Inexplicable blue dots on the paper

During our analysis with the microscope, we noticed that small blue, sometimes violet appearing dots could be seen on the majority of the analyzed papers (DIN A4 and A3) and also on new unprinted sheets (recycled paper as well as bleached white) just taken out of the original packaging. Unfortunately, we have not managed to make these dots visible by means of scans and image processing. The quantity and arrangement of the dots appear to be non-repetitive and sometimes very different to us in these random sightings, so we are unsure whether they are, for example, a code added to the paper by paper manufacturers, or random artifacts created during the manufacturing process.

Blaue Punkte auf vielen Papiersorten

For the sake of completeness, it should be mentioned at this point that paper sheets can also be analyzed forensically, but we will not go into this further here.

Colors, typography and other factors

Forensic analyses of printed products use other indicators to identify or narrow down the range of devices. Among other things, this involves looking at the print image, which often differs between different printer models. Individual printers and scanners can also be identified via individual signs of wear and resulting small unique inaccuracies.

The printing colors themselves can also be analyzed. This starts using a range of non-destructive methods and, if necessary, is followed up with a small piece of paper that is cut out and analyzed. The results can be checked against databases. For example, the FBI maintains the International Ink Library, which is said to contain more than 15,000 data records. At EU level, there has been another database for classifying inkjet printers since 2010, which as of 2011 is said to cover more than 70 percent of all printer data and receives this information directly from the printer manufacturers. The existence of further databases for toner and other colors is likely.

Since we cannot influence these factors, we do not go into more detail here and refer to the further literature mentioned below.

Summary

If the color laser printer in our store (Canon imageRUNNER ADVANCE c5235i) is used to produce printouts in color (profile color), at least the already named Machine Identification Code is applied to the entire surface of the printed pages, thus enabling a connection to our device.

Since this is not generally known and the manufacturers do not inform about it, it is up to us to inform users of our device about revealing traces and to name possible countermeasures. The Independent State Center for Data Protection Schleswig-Holstein also sums up [freely translated]:

The use of the yellow dots found on the color copies of the multi-function printer is neither mentioned on the manufacturer’s website, nor in the system specification, nor in the operating instructions of the device. […] Thus, a color copy can no longer be used for (supposedly) confidential communication, because additional data is stored on the color copy on an additional layer (not visible to the naked eye) that does not comply with the transparency requirements.

Open hardware and free firmware (or appropriate reverse engineering) could solve this problem, which probably affects all modern printers and copiers. We wish much success to all who are working on this.

By the way, revealing traces on your printouts are not the only danger that can arise from your copier. For example, in April 2022, a listening device was discovered in an anarchist library in Paris hidden in a photocopier.

Acknowledgement and participation

We would like to take this opportunity to thank all those who developed the principles and procedures that we were able to use for this text.

If you have any further ideas, hints or suggestions for improvement, please feel free to write us or collaborate with us on github on this project.

If you want to support our work, you can tip us. You can also support our plan to set up a publicly accessible shredder for storage media that can be used free of charge. To order professional print products in a data-saving way, you can use our print service (European Union only).

Further literature

Press

Guides

References

Footnotes

1 https://vkldata.com/Open-Source-Projekte

2 https://github.com/snovvcrash/usbrip

3 https://forensafe.com/blogs/usbforensics.html

4 https://linuxhint.com/usb_forensics/

5 https://www.researchgate.net/publication/325976319_Reverse_Engineering_the_Machine_Identification_Code

6 https://bigbrotherawards.de/2004/technik-canon

7 https://murdoch.is/projects/currency/

8 https://murdoch.is/talks/ccc04_counterfeiting.pdf

9 https://people.duke.edu/~ng46/collections/steg-currency-detection.htm