Revealing Traces in printouts and scans
September 2022 (list of samples updated 2023-12-02)
In our store in Leipzig (Germany), we offer to use a multi-function printer (printer, copier and scanner) as described in our concept. To ensure that this can be done in the most privacy-friendly way possible, we have addressed privacy issues and are taking a number of measures. Unlike many other copyshops, we have not installed any cameras, either in front of or inside our store.
On other factors we have no or only limited influence. For example, every print, copy and scan leaves traces or information behind that can have an adverse effect on the user. For example, if confidential or private documents are involved, which could thus unintentionally fall into the hands of third parties, or if the traces and information are suitable for identifying the person.
Table of contents
Data storage and processing
Unless you only want to make a copy of a document you have brought with you, you will most likely use a rewritable storage medium (USB stick, SD card, …) to save scanned documents on it or to print stored documents. In some self-service copy stores, storage media brought along must be connected to a computer in order to be able to print out saved documents.
These computers and devices may be infected with malware. It is possible, for example, for the malware to embed itself in the rewritable external storage medium and to spread during further use, causing a wide range of damage. It is also conceivable that the malware reads out and forwards stored files. Last but not least, traces of the files can remain on the device and fall into unauthorized hands later.
Golem.de and Kaspersky tested this in 2015 with 70 USB sticks at photo terminals and copyshops in several cities. In this sample, one USB stick was infected with the malware Sality.
To reduce the above risks, people can print and save scanned documents from our Tails terminal. (This is a computer without a hard disk that boots the operating system Tails from a DVD. It allows you to access the Internet anonymously and to edit documents without leaving any traces on the computer. A cabin protects you from snooping eyes). It is also possible to bring files to be printed on CD/DVD and to burn scanned documents on CD/DVD.
In this context, using write-once CDs and DVDs is safer than flash storage media such as USB sticks and SD cards, even if the latter offer some kind of write protection: "In the case of storage media with built-in hardware write protection, it is always uncertain to what extent a proprietary firmware guarantees the blocking of write commands and other commands (e.g. firmware update) and only allows the commands required to read the data."1 If you want to read more about write protection, or want to retrofit write protection yourself, we recommend the website vkldata.com (in German language only).
Hardware write blockers (e.g. for forensic purposes DE, EN) currently seem too expensive and impractical for this use. An open source DIY solution, as first published at Black Hat Europe 2012, unfortunately has a very low read speed.
External storage media such as USB sticks or cell phones are detected by the device’s operating system. The resulting log files (logs) can reveal a variety of information about external storage media used on the device when analyzed, e.g., after being seized or infected. They can include hardware serial numbers, product names and product IDs, manufacturer IDs, times of mounting and ejecting, UUID/GUID of file systems, and more, from the last weeks, months or years. 2 3 4 If more than one device or person is involved, connections between several devices and persons could also be made. For sensitive work, it is therefore recommended to use an operating system such as Tails, which “forgets” all log files when it is shut down and additionally overwrites the main memory (RAM). For particularly sensitive work, the external storage media should also be deleted and physically destroyed afterwards.
Unfortunately, we could not determine what information about external storage media is being logged by our multi-function printer.
Internal memory and storage
Multi-function printers use their internal main memory (RAM) and hard disk(s) in order to work as intended. The main memory is primarily used to process print jobs. The hard disk stores firmware and settings, as well as data for printing, copying and scanning.
Our multi-function printer is a Canon imageRUNNER ADVANCE c5235i. Canon writes about its data processing:
Your imageRUNNER ADVANCE machine separates data into management information and actual data before storing the data. Management information is automatically erased when the copy, send/receive, or print operation is completed. However, actual data is stored and remains in the hard disk. If HDD Data Erase function is set to ‘On’, actual data (image, management, and spool data for the copying, mail box, printing, and sending/receiving functions) is erased at the same time as management information.
Our multi-function printer has a 160 GB hard disk. Depending on the device, data stored on hard disks can be partially or completely recovered with or without special software if it was not destroyed properly. Especially if used devices and hard disks are resold, such failures repeatedly lead to reports such as:
- cbsnews: Digital Photocopiers Loaded With Secrets (2010)
- DHZ: Gefährlicher Datenspeicher: Sicherheitslücke am Kopierer (2015)
- heise/c’t: Wirklich alles gelöscht? (2016)
- DiePresse: Das unschlagbare Langzeitgedächtnis von Druckern (2019)
- DasErste: Datenleck: Ungelöschte Festplatten auf ebay Kleinanzeigen (2021)
Canon writes about it:
The management information is automatically erased when the job is complete, but the actual data is stored in the hard disk.
We have configured our multi-function printer so that during a job the processed and stored data is overwritten according to the DoD standard. (“The data is overwritten three times. The first time with a fixed value, the second time with a complement of the fixed value, and the third time with random data.”). This includes:
- Image temporarily data created when scanning
- Remaining data after the files in the Mail Box/Advanced Space are deleted
- Remaining data after the files in the Fax/I-Fax Inbox (Confidential Fax Inbox/Memory RX Inbox) are deleted
- Sent and received fax/I-fax data
- Spooled data
- Data temporarily stored as print data
In addition, we reset the device to factory settings approximately every 30 days and overwrite all data again with the device’s internal function according to DoD standards. We wanted to test the effectiveness of this method with a forensic investigation, but failed like others before when trying to read out the hard drive due to the ATA password. We hope to make up for this at a later date, and welcome advice.
It’s a good thing to overwrite storage devices that are no longer needed, such as hard disks, USB sticks and SD cards, with random data. Physically destroying them afterwards is an even better one. In order to provide other people and also ourselves with an easy-access way to decommission storage devices safely, we want to acquire a storage device shredder and make it available for free in our store and are currently collecting funds for this.
Main memory (RAM)
The multi-function printer in our store has 2.5 GB of main memory. As written in the previous section, we have configured the device to delete and overwrite information that is no longer needed, already during processing. Those who do not trust the setting can turn off the device after use to let the memory “forget”.
Misprints and forgotten documents
Misprints can be shredded directly into millimeter-small shreds in our store using the adjacent document shredder, a Dahle 706air with security levels P-7/F-3/T-6. If that is not sufficient, the remains can be taken away.
If people have forgotten printed, copied or scanned documents, we will look at them and retain presumably important originals for a limited time. Obvious copies and printouts will be shred at the end of the day.
Revealing traces on printouts and in scanned documents
There are many ways to extract information from printouts that can lead to identifying the manufacturer, the device, and the person responsible for that printout or copy. In the following, we will discuss most of the possibilities that we are aware of.
Machine Identification Codes
Although being known since 2004, this is something that few people have in mind when they print something. A Machine Identification Code (MIC) is also called yellow dots, tracking dots, secret dots, color printer marking, color tracking dots, or printer steganography, and refers to a process that adds tiny yellow dots to printouts, invisible to the naked eye. These yellow dots make a pattern (of which there are several forms) 5, in which information such as the serial number of the device and a timestamp (print date and time) is encoded. This pattern is repeated over the entire page. The firmware of the devices is responsible for the implementation. In 2004, Canon Deutschland GmbH received the BigBrotherAward in the category “Technology” for this.6
The encoded information is sometimes used to identify persons which are responsible for printouts. Among others, this has led to the identification of an NSA whistleblower (DE, EN) and of employee(s) of the Berlin State Security who sent threatening letters containing internal police information to 42 people.
The Electronic Frontier Foundation (EFF), which has extensively studied printers and their tracking methods, made the following assessment in 2017:
Reminder: It appears likely that all recent commercial color laser printers print some kind of forensic tracking codes, not necessarily using yellow dots. This is true whether or not thoses codes are visible to the eye and whether or not the printer models are listed here. This is also includes the printers that are listed here as not producing yellow dots.
The purpose of these procedures is obvious – to identify devices and people – but what is the justification for all this? Let’s take a look at the world of banknotes.
Counterfeit Deterrence Systems
The Central Bank Counterfeit Deterrence Group (CBCDG), a working group of now 32 central banks, wrote on its homepage already in March 2004:
The Central Bank Counterfeit Deterrence Group (CBCDG) has now developed the Counterfeit Deterrence System, consisting of anti-counterfeiting technologies which prevent personal computers and digital imaging tools from capturing or reproducing the image of a protected banknote.
Several leading personal computer hardware and software manufacturers have voluntarily adopted the system in recognition of the harm that counterfeit currency can cause their customers and the general public. The technology does not have the capacity to track the use of a personal computer or digital imaging tool and consumers will not notice any difference in the performance or effectiveness of products equipped with this technology.
The Independent Center for Data Protection Schleswig-Holstein (ULD) asked the manufacturer Canon in 2019 (Caution: Yellow Dots! Hidden information in color copies) and summarizes the response [freely translated]:
[Canon] refers to the global cooperation to combat counterfeiting between law enforcement agencies and the printing industry, which was established at the instigation of Europol and Interpol (cf. ), as well as the voluntary commitment of the printer industry to implement the counterfeit prevention system. The manufacturer could not provide more detailed information due to a confidentiality agreement.
In experiments, Markus Kuhn discovered the EURion constellation on euro banknotes and published it in 2002. The pattern is also found on other countries’ banknotes. It is designed to work in conjunction with supporting software and firmware from hardware and software vendors to prevent the scanning, editing, and printing of banknotes. Further research by Steven J. Murdoch and Ben Laurie shows that the EURion constellation is not the only feature by which banknotes are recognized.7 8 9
Both our own investigations and those of the Independent Center for Data Protection Schleswig-Holstein confirm this. If a part of the EURion constellation, which is located between the digits on new euro bills, is identified by the device, it either omits areas when printing or changes the overall image, for example by blackening or streaking. We tried this with the front of euro banknotes from 5 to 50 euros and found that when part of the EURion constellation is covered, the identification fails and no other areas are changed.
Digital watermarks from Digimarc
In addition to the EURion constellation, a 2005 article in Datenschleuder Nr. 86 mentions Digimarc’s digital watermark, which was brought up shortly before at 21C3 by Steven J. Murdoch and Ben Laurie in their presentation The Convergence of Anti-Counterfeiting and Computer Security. Patent WO1999053428A1 from Digimarc Corporation describes its principles.
If you want to know more about euro banknotes in general, you can find it at Wikipedia.
The fundamentals of these already known processes are patent applications such as US Patent 5515451, which was applied for by Xerox on October 7, 1993 and granted in 1996, and US Patent 5845008, which was applied for by Omron Corporation on January 20, 1995 and granted in 1998. Since these applications were filed long before its implementation and subsequent “discoveries in the wild,” it may be worthwhile to research other patent applications filed by manufacturers to gain clarification on additional, possibly previously undiscovered, privacy-hostile methods.
- Brother Industries, Ltd.
- Brother Industries, Limited
- Brother International Corporation
- Brother Kogyo Kabushiki Kaisha
- Canon Kabushiki Kaisha
- Canon Production Printing Holding B.V.
- Fuji Pigment Co., Ltd.
- Fuji Xerox Co ltd.
- Hewlett-Packard Development Company, L.P.
- HP INDIGO B.V.
- Hewlett-Packard Indigo B.V.
- Hewlett-Packard Industrial Printing Ltd.
- Konica Minolta, Inc.
- Kyocera Document Solutions Inc.
- Lexmark International, Inc.
- Oki Electric Industry Co., Ltd.
- Ricoh Company ltd.
- Seiko Epson Corp.
- Seiko Epson Corporation
- Xerox Corporation
MICs: Own investigations
Back to the Machine Identification Code. There are various methods for checking whether a color laser printer leaves Machine Identification Codes on printouts. We have based our investigations on the procedure of the Independent Center for Data Protection Schleswig-Holstein (ULD) in the second version of their paper Vorsicht: Yellow Dots!.
One of the factors that could lead incorrect analysis results is a low fill level of the yellow toner cartridge. It should be noted here that, on some printers, even printing in black and white is no longer possible if one color cartridge is empty. If the printout is scanned in, a too low scan resolution can also affect the result.
For all the following investigations, unless otherwise stated, we have used printouts of our Canon imageRUNNER ADVANCE c5235i in DIN A4 with the color profiles
black. For a meaningful comparison, we analyzed a sheet of paper in the same way before and after printing. Beforehand, we marked one side of the sheet to ensure that we were looking at the same side before and after printing.
Analysis with a microscope
We first took unprinted sheets of paper, marked a spot and analyzed it with a pocket microscope (magnification 60 to 120) before and after printing. This allowed us to confirm the results of the ULD in our case. On printouts printed with the profile
color, we detected several tiny yellow dots. These were not present on printouts made with the color profile
Analysis with black light
In contrast to the result of the ULD, we could not make any yellow dots visible to the naked eye under black light. For better illustration, the following is a photo of the magnified image of the ULD analysis illuminated with black light.
Analysis on the computer
In order to analyze the printouts on the computer, it is recommended to work with scanned documents. It should be noted that this is best done in a high resolution (e.g. 1200x1200 dpi) and in a lossless format.
We scanned a previously unprinted, white DIN A4 sheet before and after printing with a Canon CanoScan LiDE 210 with 300, 600, 1200 and 2400 dpi and saved it as a PNG file. We used GNOME simple-scan for scanning.
Edit with GIMP
We performed these steps with GIMP (GNU Image Manipulation Program):
- select scanned file and open it in GIMP
- Windows > Dockable Dialogs > Channels: deselect the colors red and green
- Colors > Invert
- Colors > Saturation: increase (multiple possible)
- If necessary, further adjustments, e.g. with the color curves
- Colors > Components > Mono Mixer: Highlight blue channel, e.g. with the values 0, 0, 1
A simplified workflow, which was not used here, can look as follows:
- Colors > Saturation: execute two times with the value 10
- Colors > Components > Mono Mixer: Highlight blue channel, e.g. with the values 0, 0, 1
- Colors > Invert
The higher the resolution (300, 600, 1200, 2400 dpi), the better the pattern can be recognized and analyzed:
We scanned printouts made from a computer and an USB stick, as well as copies, at a resolution of 600x600 dpi, and processed them in GIMP to analyze them for Machine Identification Codes. In the
Single color and
Two color printing modes, we have analyzed only some of the available colors.
|Print from PC||Black and white||-||no MIC detected|
|Print from PC||Color||-||MIC detected|
|Print from USB||Auto (color/black)||-||MIC detected|
|Print from USB||Black||-||no MIC detected|
|Copy||Auto (color/black)||-||MIC detected|
|Copy||Single color||Yellow||MIC detected|
|Copy||Single color||Green||MIC detected|
|Copy||Single color||Red||MIC detected|
|Copy||Black||-||no MIC detected|
|Copy||Full color||-||MIC detected|
|Copy||Two color||Black & red||MIC detected|
Printouts and copies that are made without colors, but only with black, are thus not given a visible Machine Identification Code in our tests on our device.
All detected patterns are so-called skewed small patterns, described by Peter Buck in his paper “Reverse Engineering the Machine Identification Code” and also studied at Duke University.
We can confirm the results of that work: We find a pattern of 18 dots, which we estimate to be arranged in a 16x32 grid, probably representing the serial number of our device (JWF11162) and repeating across the page. The pattern does not change with time, date, or the content of the printed document. It has the shape of a parallelogram tilted by about 30 degrees. However, the orientation, start and end could be different from what is shown here. The following image shows four repetitions of the pattern, which we have colored differently.
MICs in the wild
To determine how common MICs are “in the wild,” we analyzed 200 documents from 200 different companies, associations, and government agencies that we had received independently of this research. The oldest document is from 2009, and most are from 2020-2023. We randomly selected the documents and analyzed mostly the first sheet of each using scanning and processing with GIMP, as described above, and with a microscope at 120x magnification.
Table of analyzed documents and detected MICs (click to expand)
|number||Printing in color / black and white||MIC||Pattern|
|001||bw||no MIC detected|
|003||bw||no MIC detected|
|004||bw||no MIC detected|
|005||bw||no MIC detected|
|006||bw||no MIC detected|
|007||bw||no MIC detected|
|009||bw||no MIC detected|
|010||bw||no MIC detected|
|011||bw||no MIC detected|
|012||bw||no MIC detected|
|013||bw||no MIC detected|
|014||bw||no MIC detected|
|015||bw||no MIC detected|
|016||bw||no MIC detected|
|017||bw||no MIC detected|
|018||bw||no MIC detected|
|019||bw||no MIC detected|
|020||color||uncertain (CMYK dots)|
|022||bw||no MIC detected|
|023||bw||no MIC detected|
|024||bw||no MIC detected|
|025||bw||no MIC detected|
|026||bw||no MIC detected|
|027||bw||uncertain (CMYK dots)|
|028||bw||no MIC detected|
|029||bw||no MIC detected|
|030||bw||no MIC detected|
|031||bw||no MIC detected|
|032||bw||no MIC detected|
|034||bw||no MIC detected|
|035||bw||uncertain (CMYK dots)|
|036||bw||no MIC detected|
|038||bw||no MIC detected|
|040||bw||no MIC detected|
|041||bw||no MIC detected|
|045||bw||no MIC detected|
|046||bw||no MIC detected|
|047||bw||no MIC detected|
|048||bw||no MIC detected|
|049||bw||no MIC detected|
|050||bw||no MIC detected|
|051||bw||no MIC detected|
|052||bw||no MIC detected|
|053||bw||no MIC detected|
|054||bw||no MIC detected|
|055||bw||no MIC detected|
|056||bw||no MIC detected|
|057||bw||no MIC detected|
|058||bw||no MIC detected|
|060||bw||no MIC detected|
|061||bw||no MIC detected|
|062||bw||no MIC detected|
|063||bw||no MIC detected|
|064||bw||no MIC detected|
|065||bw||no MIC detected|
|066||bw||no MIC detected|
|067||bw||no MIC detected|
|068||bw||no MIC detected|
|069||bw||no MIC detected|
|070||bw||no MIC detected|
|071||bw||no MIC detected|
|072||bw||no MIC detected|
|074||bw||no MIC detected|
|075||bw||no MIC detected|
|077||bw||no MIC detected|
|078||bw||uncertain (CMYK dots)|
|079||bw||no MIC detected|
|080||bw||no MIC detected|
|081||bw||no MIC detected|
|082||bw||no MIC detected|
|083||bw||no MIC detected|
|084||bw||no MIC detected|
|085||bw/color||no MIC detected|
|087||bw||no MIC detected|
|088||bw||no MIC detected|
|089||bw||no MIC detected|
|091||bw||no MIC detected|
|093||bw||no MIC detected|
|094||bw||no MIC detected|
|095||bw||no MIC detected|
|097||bw||no MIC detected|
|098||bw||no MIC detected|
|099||bw||no MIC detected|
|100||bw||no MIC detected|
|102||color||no MIC detected|
|103||bw||no MIC detected|
|104||bw||no MIC detected|
|105||bw||no MIC detected|
|106||bw||no MIC detected|
|107||color||no MIC detected|
|108||bw||no MIC detected|
|109||bw||no MIC detected|
|110||bw||no MIC detected|
|111||color||no MIC detected|
|112||bw||no MIC detected|
|113||bw||no MIC detected|
|114||bw||no MIC detected|
|115||bw||no MIC detected|
|116||bw||no MIC detected|
|117||bw||no MIC detected|
|118||bw||no MIC detected|
|119||bw||no MIC detected|
|120||bw||no MIC detected|
|121||bw||no MIC detected|
|122||bw||no MIC detected|
|123||bw||no MIC detected|
|124||bw||no MIC detected|
|125||bw||no MIC detected|
|126||bw||no MIC detected|
|127||bw||no MIC detected|
|128||color||no MIC detected|
|131||color||no MIC detected|
|132||bw||no MIC detected|
|133||bw||no MIC detected|
|134||bw||no MIC detected|
|135||bw||no MIC detected|
|136||bw||no MIC detected|
|137||color||no MIC detected|
|139||color||no MIC detected|
|140||bw||no MIC detected|
|141||bw||no MIC detected|
|142||bw||no MIC detected|
|143||color||no MIC detected|
|144||bw||no MIC detected|
|147||bw||no MIC detected|
|148||bw||no MIC detected|
|149||bw||no MIC detected|
|150||bw||no MIC detected|
|151||bw||no MIC detected|
|152||bw||no MIC detected|
|153||bw||no MIC detected|
|154||bw||no MIC detected|
|155||bw||no MIC detected|
|156||bw||no MIC detected|
|157||bw||no MIC detected|
|158||bw||no MIC detected|
|160||bw||no MIC detected|
|161||bw||no MIC detected|
|162||bw||no MIC detected|
|163||bw||no MIC detected|
|164||color||no MIC detected|
|165||bw||no MIC detected|
|166||bw||no MIC detected|
|167||bw||no MIC detected|
|168||color||no MIC detected|
|169||color||no MIC detected|
|170||color||no MIC detected|
|171||bw||no MIC detected|
|172||bw||no MIC detected|
|176||bw||no MIC detected|
|177||bw||no MIC detected|
|178||bw||no MIC detected|
|179||color||no MIC detected|
|180||bw||no MIC detected|
|181||color||no MIC detected|
|182||bw||no MIC detected|
|183||color||no MIC detected|
|188||bw||no MIC detected|
|190||bw||no MIC detected|
|191||bw||no MIC detected|
|193||bw||no MIC detected|
|194||bw||no MIC detected|
|195||bw||no MIC detected|
|196||color||no MIC detected|
|197||bw||no MIC detected|
|198||bw||no MIC detected|
|199||bw||no MIC detected|
|200||color||no MIC detected|
The printouts and copies were from laser and inkjet printers unknown to us and were in black and white and in color. The sample included white paper, non-bleached recycled paper, and pre-printed notepaper. In the latter case, we did not investigate further whether the Machine Identification Code was applied when the form was created or when the content was printed, so the result here is also only
MIC detected or
no MIC detected.
With four documents, we were able to detect yellow dots on the paper with the microscope, but since there were colored dots in CMYK colors on the entire page in each document, we could not clearly identify a pattern and rated it as
uncertain (CMYK dots).
In total, a Machine Identification Code was detected in 32 out of 200 documents. Of these 32 documents, 29 documents contained recognizable colored contents, while the remaining 3 seemed black and white.
An interesting approach is taken by the Deda Toolkit, which was developed at the TU Dresden. It is intended to help detect tracking dots and offers the possibility of anonymization by removing detected patterns or adding new ones. Among others, Netzpolitik.org and Deutschlandfunk reported on the method.
We have tested the Deda Toolkit on a Debian-based system.
sudo apt update sudo apt install python3-pip pip3 install --user deda
deda_gui did not recognize the pattern that became visible with editing in GIMP during our tests. We tested several scans saved as PNG files at 300, 600 and 1200 dpi and got only the message
No tracking dot pattern detected. For best results try a 300 dpi scan and a lossless file format.
python3 /home/user/.local/bin/deda_extract_yd filepath --debug
In our tests in the command line with the same input files,
deda_extract_yd detected a tracking pattern (
Detected tracking dot pattern (-1, -1, 0.283334, 0.006667)) for a file with 300 dpi resolution, but not for higher resolutions. Instead we received the error message
AttributeError: ‘YellowDotsXposer’ object has no attribute ‘dots’.
Therefore, to search for MICs, the manual method with GIMP seems more reliable to us at the moment. Also the anonymization method of deda seems to work only for some patterns. In test with our Canon device, there was no change to the original file despite the successful message
Document anonymized and saved in the
deda_gui and the creation of a file
anon.png. The MIC was still recognizable. As Stephan Escher told us, there was not enough material for the analysis of the samples used by Canon so far. In addition, patterns from Canon devices are more difficult to detect than those from other manufacturers.
Metadata in scanned files
Digital files such as documents and graphics contain metadata. These can unintentionally reveal information and thus allow to draw conclusions about the source. The Extensible Metadata Platform (XMP) is a standard developed by Adobe for embedding metadata in files. However, metadata can also be stored as a separate file in the same directory.
If no custom filename is entered when saving a scanned document, our Canon imageRUNNER ADVANCE c5235i generates the file name from the system time UTC+0 at the time of scan start, using the XMP standard. Thus, the file name
20041224084919.pdf means that the file, according to the time set on the device, was created on 2004-12-24 at 08:49:19 UTC+0. For scans in JPEG format, the file names are additionally numbered like
_001, a number which increases by one with each page. Other devices may use a different scheme for naming. This may allow to draw conclusions about the manufacturer or model.
In the following metadata, you can see that in our test, the name was assigned with UTC+0 despite the UTC+1 time zone being set. We use ExifTool to read out the Exif data:
exiftool -g /home/user/20041224084919.pdf
---- ExifTool ---- ExifTool Version Number : 12.44 ---- File ---- File Name : 20041224084919.pdf Directory : /mnt/usb/... File Size : 44 kB File Modification Date/Time : 2004:12:24 09:49:26+01:00 File Access Date/Time : 2022:08:08 11:29:37+02:00 File Inode Change Date/Time : 2004:12:24 09:49:24+01:00 File Permissions : -rwxr-xr-x File Type : PDF File Type Extension : pdf MIME Type : application/pdf ---- PDF ---- PDF Version : 1.4 Linearized : No Creator : Canon iR-ADV C5235 PDF Create Date : 2004:12:24 08:49:24Z Page Count : 1 ---- XMP ---- XMP Toolkit : Adobe XMP Core Creator Tool : Canon iR-ADV C5235 PDF Producer : Adobe PSL 1.2e for Canon Format : application/pdf Document ID : uuid:14d8cb41-0000-8887-1780-13af00000000
-g, ExifTool groups the read out information.
ExifTool Version Number: used version of ExifTool
File Name: file name
Directory: file path
File Size: file size
File Modification Date/Time: Time of the last modification of the file (in our example, the saving after the completion of the scan).
File Access Date/Time: Time of the last access to the file
File Inode Change Date/Time
File Permissions: file permissions
File Type: file type
File Type Extension: file name extension
MIME Type: Specification of the media type and its subtype
PDF Version: used PDF version 1.4
Creator: Name of the creator (can also be a device name or software used)
Create Date: System time of the device (UTC+0, “Zulu”)
Page Count: Number of pages
XMP Toolkit: XMP Toolkit used
Creator Tool: Name of the creator (can also be a device name or software used)
Producer: Adobe PDF Scan Library version
Format: Specification of the media type and its subtype
Document ID: Universally Unique Identifier
These metadata can be individually manipulated or partially removed using programs such as Metadata Anonymisation Toolkit v2 (mat2). For example,
mat2 /home/user/20041224084919.pdf creates the file
20041224084919.cleaned.pdf, which contains only the following metadata:
---- ExifTool ---- ExifTool Version Number : 12.44 Warning : Invalid xref table ---- File ---- File Name : 20041224084919.cleaned.pdf Directory : /mnt/usb/... File Size : 97 kB File Modification Date/Time : 2022:08:05 10:19:10+02:00 File Access Date/Time : 2022:08:05 10:19:36+02:00 File Inode Change Date/Time : 2022:08:05 10:19:10+02:00 File Permissions : -rwxr-xr-x File Type : PDF File Type Extension : pdf MIME Type : application/pdf ---- PDF ---- PDF Version : 1.5 Linearized : No
If you are using the Tails operating system,
mat2 is already installed and you can right-click to select the
Remove metadata option to strip a file in a supported file format of most of its metadata. If you are using a different Linux-based operating system and want to install
mat2, you can find instructions here.
If a PDF file is converted to a Trusted PDF in Qubes OS, a lot of metadata is also removed, but e.g. not the
Create Date, which is reset. If you want to manipulate the date and time information in the file system metadata and in the PDF metadata, you can change the system time for this and set it to the future or past. This may require being offline temporarily or preventing online time synchronization.
Pitfalls of the file systems
Some file systems store multiple timestamps, which are sometimes overlooked and cannot always be removed completely. For example, the ext4 file system stores a
Creation Date for each file. Furthermore, the accuracy of the timestamp can also reveal information about the origin or transport method of a file. The Whonix project writes on the subject of File System Data Leakage in footnote 3:
USB flash drives are fairly unique in that they usually use FAT32, and FAT32 is unique in that its datetime fields have a resolution of 2 seconds. So it’s really easy to tell if files were on a USB flash drive (all the datetime values will be even numbers) unless the datetime metadata is scrubbed.
Inexplicable blue dots on the paper
During our analysis with the microscope, we noticed that small blue, sometimes violet appearing dots could be seen on the majority of the analyzed papers (DIN A4 and A3) and also on new unprinted sheets (recycled paper as well as bleached white) just taken out of the original packaging. Unfortunately, we have not managed to make these dots visible by means of scans and image processing. The quantity and arrangement of the dots appear to be non-repetitive and sometimes very different to us in these random sightings, so we are unsure whether they are, for example, a code added to the paper by paper manufacturers, or random artifacts created during the manufacturing process.
For the sake of completeness, it should be mentioned at this point that paper sheets can also be analyzed forensically, but we will not go into this further here.
Colors, typography and other factors
Forensic analyses of printed products use other indicators to identify or narrow down the range of devices. Among other things, this involves looking at the print image, which often differs between different printer models. Individual printers and scanners can also be identified via individual signs of wear and resulting small unique inaccuracies.
The printing colors themselves can also be analyzed. This starts using a range of non-destructive methods and, if necessary, is followed up with a small piece of paper that is cut out and analyzed. The results can be checked against databases. For example, the FBI maintains the International Ink Library, which is said to contain more than 15,000 data records. At EU level, there has been another database for classifying inkjet printers since 2010, which as of 2011 is said to cover more than 70 percent of all printer data and receives this information directly from the printer manufacturers. The existence of further databases for toner and other colors is likely.
Since we cannot influence these factors, we do not go into more detail here and refer to the further literature mentioned below.
If the color laser printer in our store (Canon imageRUNNER ADVANCE c5235i) is used to produce printouts in color (profile
color), at least the already named Machine Identification Code is applied to the entire surface of the printed pages, thus enabling a connection to our device.
Since this is not generally known and the manufacturers do not inform about it, it is up to us to inform users of our device about revealing traces and to name possible countermeasures. The Independent State Center for Data Protection Schleswig-Holstein also sums up [freely translated]:
The use of the yellow dots found on the color copies of the multi-function printer is neither mentioned on the manufacturer’s website, nor in the system specification, nor in the operating instructions of the device. […] Thus, a color copy can no longer be used for (supposedly) confidential communication, because additional data is stored on the color copy on an additional layer (not visible to the naked eye) that does not comply with the transparency requirements.
Open hardware and free firmware (or appropriate reverse engineering) could solve this problem, which probably affects all modern printers and copiers. We wish much success to all who are working on this.
By the way, revealing traces on your printouts are not the only danger that can arise from your copier. For example, in April 2022, a listening device was discovered in an anarchist library in Paris hidden in a photocopier.
Acknowledgement and participation
We would like to take this opportunity to thank all those who developed the principles and procedures that we were able to use for this text.
If you have any further ideas, hints or suggestions for improvement, please feel free to write us or collaborate with us on github on this project.
If you want to support our work, you can tip us. You can also support our plan to set up a publicly accessible shredder for storage media that can be used free of charge. To order professional print products in a data-saving way, you can use our print service (European Union only).
- Chiang et al.: Printer and Scanner Forensics (2008)
- Jiang-Chun Li, Fang Fang, Xing-Zhou Han, Biao Li, Wei Han, Qian Zhou: Stability and Specificity of Counterfeit Protection System Code (2019)
- Joost van Beusekom, Faisal Shafait, Thomas M. Breuel: Automatic Authentication of Color Laser Print-Outs Using Machine Identification Codes
- Marco Schreyer, Christian Schulze, Armin Stahl, Wolfgang Effelsberg: Intelligent Printing Technique Recognition and Photocopy Detection for Forensic Document Examination
- Mikkilineni et al.: Printer Forensics using SVM Techniques
- M. Uma Devi, C. Raghavendra Rao, Arun Agarwal: A Survey of Image Processing Techniques for Identification of Printing Technology in Document Forensic Perspective (2010)
- Ryan Gibson: Steganography: Hiding Data In Plain Sight
- Timo Richter, Stephan Escher, Dagmar Schönfeld, Thorsten Strufe: Forensic Analysis and Anonymisation of Printed Documents (2018)
- Trevor M. Bobka: Analysis of a Photocopier Hard Drive for Forensically Relevant Artifacts
- TU Dresden: Yellow dots identify printers: Computer scientists of TU Dresden develop a tool for printer anonymisation (2018)
- heise.de: Anonymes Drucken und Kopieren nahzu unmöglich (2017)
- heise.de: Bürgerrechtler wollen Spionage per Farblaser-Ausdruck dokumentieren (2005)
- SRF: Der Spion im Farbdrucker (2019)
- anarsec.guide - Remove Identifying Metadata From Files
- Cipherbrain - Neues zu den gelben Punkten auf Laser-Ausdrucken
- kryptografie.de - Yellow Dots Code (Machine Identification Code für Farblaserdrucker)