DEV Community

Cover image for Building a 200ms Provenance Gate: Inside CPP v1.5's Pre-Publish Verification Extension

Building a 200ms Provenance Gate: Inside CPP v1.5's Pre-Publish Verification Extension

There is a moment — a fraction of a second, really — between the instant a user taps "Share" on their phone and the instant the social media app receives the content. It is a tiny window. On Android, it is the time between an ACTION_SEND Intent being dispatched and the target Activity receiving it. On iOS, it is the time between an NSExtensionItem arriving in a Share Extension and the processed attachment being handed back. In both cases, the user expects the transition to feel instantaneous. Any perceptible delay and the experience feels broken.

The Capture Provenance Profile specification version 1.5 proposes to use that fraction of a second to do something extraordinarily ambitious: verify cryptographic provenance, overlay a visual indicator, embed an invisible watermark, encode a verification QR code, and forward the composed result to the target application — all within a hard budget of 200 milliseconds, with the absolute guarantee that if anything goes wrong, the user will never know.

This article is a technical deep dive into the Pre-Publish Verification Extension defined in CPP v1.5. It covers the architecture, the threat model, the platform-specific implementation constraints, the deliberate language design, and the philosophical commitment that makes the whole thing work. If you have ever wondered what it takes to build trust infrastructure that operates in the spaces between existing systems — without asking those systems for permission — this is for you.


What is CPP, and why does it exist?

Before we get into v1.5, we need to understand the problem that the Capture Provenance Profile was designed to solve.

The Coalition for Content Provenance and Authenticity (C2PA) is the industry-standard framework for embedding cryptographic provenance into media files. It is backed by Adobe, Microsoft, Intel, the BBC, and dozens of other organizations. It works, and it is important. But C2PA was designed primarily to answer the question "how was this content edited?" — tracking the chain of tools and transformations that a piece of media has undergone. This is valuable for editorial workflows, but it leaves a different question largely unaddressed: "was this content actually captured from the real world at the time it claims?"

CPP was created to answer that second question. Its core focus is on the moment of capture — proving, with cryptographic rigor, that a specific piece of media came from a specific sensor at a specific time, and that the chain of events leading from capture to publication is complete and unmodified.

The specification has evolved through six versions in rapid succession, each addressing a specific gap identified during implementation and review. Version 1.0 established the foundational architecture: a three-layer model with event integrity (SHA-256 plus Ed25519 signatures), collection integrity (Merkle trees plus a Completeness Invariant), and external verifiability (mandatory RFC 3161 timestamps from independent Time Stamp Authorities). Version 1.1 refined the TSA integration. Version 1.2 introduced the AnchorDigest field and mandatory messageImprint verification — closing a subtle but critical gap where an attacker could potentially submit arbitrary data to a TSA and claim the resulting timestamp applied to different events. Version 1.3 fully specified the Merkle tree construction rules, defining exact algorithms for leaf hashing (LeafHash = SHA256(EventHash_bytes)), pairing (SHA256(Left || Right)), padding (duplicate last leaf), and proof direction (bottom to top). Version 1.4 added the Depth Analysis Extension for screen detection — using LiDAR, structured light, or stereoscopic sensors to determine whether a photograph was taken of a real three-dimensional scene or of a screen displaying another image — and expanded the supported device classes from smartphones to include DSLRs, drones, body cameras, dashcams, medical imaging devices, and embedded systems.

Each of these versions addressed a question of integrity: can we prove what happened, when it happened, and that nothing has been tampered with or deleted?

Version 1.5 addresses a completely different question: what happens at the moment of sharing?


The Sharing Problem

Here is the uncomfortable reality of media provenance in 2026. A photographer captures an image with a CPP-compliant device. The image carries a complete cryptographic provenance record: capture event signed with Ed25519, timestamped by an RFC 3161 TSA, anchored in a Merkle tree with a verified Completeness Invariant. Depth analysis confirms it was taken of a real three-dimensional scene. Biometric binding confirms the device holder was authenticated via Face ID at the moment of capture. The provenance chain is mathematically complete and independently verifiable.

Then the photographer opens their gallery app, taps "Share," selects Instagram, and posts it.

Instagram re-encodes the image. Every byte of embedded metadata — the CPP manifest, the C2PA JUMBF box, the EXIF data, all of it — is stripped. The image that appears on the platform is, from a provenance perspective, indistinguishable from a deepfake. All that cryptographic work, all those independent third-party timestamps, all that mathematical certainty — gone.

The VeritasSocial platform (which we have written about elsewhere) addresses the preservation side of this problem by storing original files byte-for-byte and issuing permanent verification links. But VeritasSocial requires an explicit upload step before sharing. CPP v1.5 addresses the sharing side — the moment of transition between the creator's device and the social media platform — by intercepting the share action itself.


The Core Design: Five Principles That Constrain Everything

The Pre-Publish Verification Extension is governed by five principles that function as hard constraints on every design decision. Understanding these principles is essential to understanding why the implementation looks the way it does.

Provenance ≠ Truth. The system proves that traceable origin data exists. It does not prove that the content is accurate, truthful, or authentic. This is not a hedge or a disclaimer — it is a fundamental architectural commitment. A photograph of a staged scene, captured by a CPP-compliant device, will have valid provenance. The system will mark it accordingly. The mark means "origin data exists," not "this is real."

Non-Blocking. The verification process must never block or delay the sharing flow. If the user taps "Share" and selects Instagram, the image must reach Instagram at the same speed it would have without CPP. Any perceptible delay violates this principle.

Silent Failure. If verification fails — for any reason, including timeout, parsing errors, invalid signatures, corrupted manifests, or unexpected file formats — the content passes through to the target application completely unmarked and completely unmodified. The user sees no error message, no warning, no notification, no indication that anything was attempted. Internal metrics may be logged for debugging, but they must never surface to the user.

Privacy by Design. All verification is performed entirely on-device. No content data, no metadata, and no verification telemetry is transmitted to any external server during the pre-publish verification process. The user's content never leaves their device until they explicitly share it with the target application.

Platform Agnostic. The system works without any cooperation from social media platforms. It does not require platforms to support C2PA, preserve metadata, or implement any provenance-related APIs. It operates entirely within the share-extension and intent-handling mechanisms that already exist in mobile operating systems.

These five principles interact in ways that produce severe engineering constraints. The non-blocking principle, combined with the silent failure principle, means that the entire verification pipeline — manifest detection, signature validation, indicator composition, and watermark embedding — must complete within 200 milliseconds. Not 200 milliseconds average. Not 200 milliseconds with a "please wait" spinner for edge cases. 200 milliseconds hard limit, after which the system silently passes the original content through as if CPP did not exist.


The 200ms Budget: What Happens Inside

The verification pipeline within the 200ms budget is divided into mandatory and optional checks, each with its own time allocation.

CPP manifest existence detection is mandatory and must complete within 50ms. This involves scanning the file's metadata for CPP JSON structures. C2PA JUMBF scan is also mandatory within 50ms — the system looks for the JUMBF box signature (0x6A756D62, the ASCII bytes for "jumb") and the C2PA label within it. These two scans together must determine whether the file contains any provenance data at all. If neither is found, the result is PROVENANCE_UNAVAILABLE, and the content passes through silently.

If a manifest is detected, optional checks may proceed: signature validity verification (up to 200ms allocated, but subject to the total budget) and certificate chain validation against a bundled trust store (up to 100ms). These individual allocations are maximums, not guarantees — implementations must short-circuit once the total 200ms budget is exceeded, returning VERIFICATION_TIMEOUT and proceeding with silent passthrough.

The specification explicitly encourages parallelization. On modern mobile hardware with multiple CPU cores, the manifest scan, signature verification, and certificate chain validation can run concurrently. An implementation that parallelizes effectively can complete all checks within the budget for most files. An implementation that runs checks sequentially will likely time out on files with complex manifests or long certificate chains — and that is fine, because the timeout behavior is by design.


The Confidence Level Model: Preventing Over-Claims

One of the most carefully designed aspects of v1.5 is the three-level confidence model for verification results. This model exists to prevent a specific and dangerous failure mode: displaying signer information from an unverified manifest.

Consider what happens if a malicious actor crafts a file with a fake CPP manifest containing the signer name "Associated Press" and a plausible-looking certificate chain. If the system detected the manifest, parsed the signer name, and displayed it in the provenance indicator — all without actually verifying the cryptographic signature — the user would see "Associated Press" as the signer of a completely fabricated image. This would be worse than showing nothing at all, because it would actively mislead the viewer.

The confidence level model prevents this by restricting which fields are available at each verification stage.

At the DETECTED level — meaning the system found a manifest structure but has not parsed it — only the source type (CPP or C2PA) is available. At the PARSED level — meaning the system has extracted fields from the manifest but has not verified signatures — the capture timestamp and some indicator flags become available, but signer information remains hidden. Only at the VERIFIED level — meaning the cryptographic signature has been validated against the trust store — does the signer name, organization, and certificate issuer become available.

This is not a progressive disclosure UX pattern. It is a security boundary. The specification explicitly states the rationale: "Displaying Signer information without cryptographic verification creates a spoofing vector." By making signer information structurally unavailable below the VERIFIED confidence level — not hidden behind a toggle, not grayed out, but absent from the data model — the specification eliminates the possibility of accidental or lazy implementations that display unverified signer claims.


The Indicator System: Marks That Survive Platform Destruction

When verification succeeds at the VERIFIED confidence level, the system composes provenance indicators onto the image before forwarding it to the target application. This is where the specification gets creative about surviving the platform's metadata-stripping pipeline.

Three indicator types are defined, each targeting a different survival mechanism.

The VisualMark is a small semi-transparent icon overlaid in the corner of the image. It is 48×48 density-independent pixels, positioned by default in the bottom-right corner with 8dp margins, rendered at 85% opacity over a 60%-opacity black background with 8dp corner radius. The icon style is constrained to neutral informational symbols: an info circle, a link/chain icon, a document icon, or a tag/label icon. Checkmarks, green checks, shields, and stars are explicitly prohibited because they imply verification, approval, security guarantees, or endorsement — all of which contradict the "Provenance ≠ Truth" principle.

The critical insight about the VisualMark is that once it is composited into the image, it becomes pixels. It is not an overlay, not a DOM element, not an annotation layer. It is part of the image data. When the social media platform re-encodes the image, the VisualMark survives as part of the pixel content. It cannot be tapped or interacted with on the social media platform — the specification is explicit about this: "VisualMark is a purely visual, non-interactive element once composited into the image." But it is visible. A viewer who sees the mark knows to look for more information.

The DynamicQR is a small QR code overlaid in a different corner (typically bottom-left). It encodes a verification URL: https://verify.veritaschain.org/v/{proof_id}. The QR code carries not just the proof identifier but also the asset hash and a Unix timestamp, creating a self-contained pointer to the full verification record. Like the VisualMark, the QR code becomes pixels in the image and survives re-encoding. Unlike the VisualMark, it is machine-readable — a viewer can scan it with any QR reader to reach the full verification page.

The InvisibleWatermark is a DCT-domain watermark that embeds up to 128 bytes of payload — including the proof ID, timestamp, and a signature fragment — directly into the frequency domain of the image. The watermark is designed to survive JPEG compression down to quality 50, resizing down to 50% of original dimensions, and cropping of up to 10% of the image area. This is the forensic-grade survival mechanism: even if the VisualMark is cropped out and the QR code is cut off, the invisible watermark can still be detected and used to locate the verification record.

The specification also defines compatibility with C2PA's Soft Binding mechanism, allowing implementations to use C2PA's standardized invisible watermarking approach as an alternative to or in addition to the CPP-specific DCT watermark.


Platform Implementation: The Android and iOS Divide

The implementation architecture differs significantly between Android and iOS, driven by fundamental differences in how the two operating systems handle inter-application content sharing.

On Android, the architecture is an Intent Proxy Model. When the user taps "Share" in any application that supports ACTION_SEND, the operating system presents a chooser that includes VeraSnap as a share target (registered via an <intent-filter> in the AndroidManifest). When the user selects VeraSnap, the ProvenanceShareActivity receives the Intent, extracts the media URI, runs the verification pipeline within a withTimeoutOrNull(200) coroutine, composes indicators if appropriate, saves the processed file to a FileProvider cache, and then creates a new explicit Intent targeting the user's chosen social media app.

This forward-to-target pattern is relatively clean on Android because Activities can launch other Activities with explicit package names. The Kotlin implementation uses setPackage() on the forward Intent to target the specific app the user originally intended. The specification requires a fallback to Intent.createChooser() if resolveActivity() returns null (meaning the target app is not installed) or if an ActivityNotFoundException is thrown — this is a REQUIRED behavior, not optional.

On iOS, the situation is more constrained. iOS Share Extensions cannot directly launch other applications. They cannot access UIApplication.shared. They operate in a sandboxed environment with a strict 120MB memory limit. The specification acknowledges these constraints and defines a fundamentally different architecture: the Share Extension with Re-Share pattern.

In this model, when the user selects VeraSnap from the share sheet, the Share Extension receives the NSExtensionItem, processes the image through the verification pipeline (using DispatchSemaphore with a 200ms timeout for hard budget enforcement), and then — crucially — presents a new UIActivityViewController with the processed image. This means the user goes through the share sheet twice: once to select VeraSnap, and once to select the actual target application. It is a two-step UX flow, and the specification acknowledges this is not ideal, but it is the only reliable method that works across all iOS applications without depending on URL schemes.

The specification defines direct URL schemes as an informative (non-normative) optimization that implementations MAY attempt for specific apps like Instagram or Twitter, but it requires that any URL scheme attempt must fall back to the re-share pattern on failure. The language is unambiguous: "implementations MUST fall back to the re-share pattern when URL schemes fail or are unavailable." The specification even includes a warning that most Share Extensions cannot reliably use URL schemes and that the re-share pattern is "the ONLY reliable method."

The iOS memory constraint is addressed with specific optimization requirements: CGImageSource-based loading with kCGImageSourceShouldCache: false to avoid double-buffering, kCGImageSourceThumbnailMaxPixelSize: 2048 to downsample large images, memory-mapped file access via Data(contentsOf:options:.mappedIfSafe), and tile-based processing for very large images. These are not suggestions — the 120MB limit is a hard constraint that will crash the extension if exceeded.


The Language Problem: Why Words Matter More Than Code

The most unusual section of the v1.5 specification is the legal compliance section, which defines seven prohibited terms and five recommended alternatives. This is a technical specification that includes a vocabulary constraint, and understanding why requires understanding the threat model for provenance indicators.

The prohibited terms are: "Verified," "Authentic," "Official," "Guaranteed," "Safe," "Trusted," "Checked," "Reviewed," "Real," and "True." Each is prohibited for a specific legal reason. "Verified" implies platform endorsement and risks TOS violations. "Authentic" implies truth verification and creates false advertising liability. "Guaranteed" implies a warranty and violates consumer protection law. "Safe" or "Trusted" implies a security assessment and creates liability for harmful content. "Real" or "True" implies truth verification and creates defamation risk.

The recommended alternatives are: "Provenance Available" (neutral, factual), "Content Credentials" (the C2PA standard term), "Source Information" (descriptive), "Origin Data" (technical), and "Traceable" (factual capability).

This vocabulary constraint is not academic. It reflects a deep understanding of how provenance indicators will be perceived by users and interpreted by regulators. The EU AI Act requires AI-generated content marking. Japan's 景品表示法 (Act against Unjustifiable Premiums and Misleading Representations) prohibits false endorsement. The FTC Guidelines prohibit deceptive practices. California's AB 853 AI Transparency Act adds another layer of disclosure requirements. A provenance indicator labeled "Verified" that appears on a deepfake image — and deepfakes will have valid provenance if created by CPP-compliant tools — would create regulatory exposure in multiple jurisdictions simultaneously.

The specification defines three levels of disclaimer text, aligned with the three-level information disclosure model (L1/L2/L3). The L1 tooltip is minimal: "Source information provided." The L2 panel is standard: "This mark indicates that source information is available for this content. It does not guarantee the accuracy or truthfulness of the content." The L3 detail page is comprehensive and includes explicit statements that the mark does not represent platform endorsement, is not related to advertising disclosure requirements, is not related to AI-generated content disclosure requirements, and that all verification is performed on-device with no external data transmission.

All three levels are specified in both English and Japanese, with requirements for localization into at least ten languages at Priority 1 through 3: English and Japanese (Priority 1), Chinese Simplified, Chinese Traditional, Korean, Spanish, and French (Priority 2), and German, Portuguese, and Arabic (Priority 3).


Security: The STRIDE Model and Fake Mark Countermeasures

The specification includes a full STRIDE threat analysis covering spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege. The most interesting threats are the ones that target the provenance indicator itself rather than the underlying cryptography.

Fake Mark Display is rated as high likelihood and medium severity. An attacker could create an image with a VisualMark-like icon embedded in the pixel data, making it appear to have provenance when it does not. The specification addresses this through countermeasures that apply only within interactive contexts — the VeraSnap app preview and the verification web page. In these contexts, the real provenance indicator uses dynamic behaviors that static fake icons cannot replicate: slight overlap with content edges (positioning that a static image cannot match exactly), tap responses that show the L2 summary panel, long-press responses that show the L3 detail page, animated borders, and pulse-on-first-view animations.

Importantly, the specification explicitly acknowledges that these countermeasures do not apply to composited images shared on social media platforms, where the VisualMark is purely visual pixels and cannot be distinguished from a fake. The QR code serves as the bridge to a verification context where the countermeasures do apply — a viewer who scans the QR code reaches a web page where the full interactive verification experience is available.

User Misinterpretation is rated as medium likelihood but high severity. This is the threat that the prohibited terms list and disclaimer templates are designed to address. A user who sees a provenance indicator and interprets it as a truth claim will make worse decisions than a user who sees no indicator at all, because they will assign unwarranted confidence to the content. The specification treats this as the most dangerous failure mode in the entire system — worse than a failed verification, worse than a spoofed indicator, worse than a compromised signing key. A compromised key can be revoked. A misled user's trust is much harder to repair.


C2PA Interoperability: Complement, Not Compete

The v1.5 specification defines explicit interoperability with C2PA, reflecting CPP's positioning as a complementary standard rather than a competing one.

At the detection level, implementations must support both CPP manifests (stored in EXIF UserComment or XMP metadata) and C2PA manifests (stored in JUMBF boxes). The manifest detection function returns a discriminated union of four states: CPP only, C2PA only, both, or none. When both are present, the implementation uses whichever provides the higher confidence level.

A field mapping defines the correspondence between CPP and C2PA data models. CPP's DeviceInfo.Manufacturer and DeviceInfo.Model map to C2PA's claim_generator. CPP's CaptureTimestamp maps to C2PA's dc:created. CPP's GPS sensor data maps to standard EXIF GPS fields. CPP's ProofBundle.TSA maps to C2PA's c2pa.time_stamp. Two CPP-specific extensions — Depth Analysis and Biometric Binding — have no C2PA equivalents and represent CPP's unique contribution to the provenance ecosystem.

The specification also defines a dual-standard output mode where implementations may generate both CPP and C2PA manifests simultaneously, with shared fields synchronized between the two and standard-specific fields preserved in their respective locations. This allows a single capture application to produce content that is verifiable by both CPP and C2PA tools.


The Evolution: How Six Versions Built a Complete System

Stepping back from the v1.5 details, the CPP specification's evolution over six versions in twelve days tells a story about how trust infrastructure gets built in practice.

Version 1.0 asked: "Can we prove when something was captured?" It introduced the event model, the Completeness Invariant (an XOR hash sum that detects any missing events in a collection), mandatory RFC 3161 timestamping, and the three-layer architecture. It defined the foundational principle that CPP proves capture timing and device identity but does NOT prove content truth.

Version 1.1 asked: "Is our TSA integration actually airtight?" It refined the anchoring specification but left some ambiguities.

Version 1.2 asked: "What exactly gets sent to the TSA, and how do we verify it came back unchanged?" It introduced the AnchorDigest field and mandatory messageImprint verification, closing a gap where the binding between events and timestamps could theoretically be broken.

Version 1.3 asked: "Can two independent implementations build the same Merkle tree from the same events?" It fully specified leaf hashing, pairing rules, index interpretation, padding, and proof direction — transforming the Merkle tree from a concept to a deterministic, interoperable algorithm.

Version 1.4 asked: "Is this actually a photograph of the real world, or a photograph of a screen?" It added depth analysis using LiDAR, structured light, and stereoscopic sensors, and expanded the device model from smartphones to every kind of imaging device from DSLRs to medical endoscopes.

Version 1.5 asks: "How do we make all of this matter at the moment of sharing?" It bridges the gap between cryptographic proof and social media reality, operating in the 200ms window between the user's intent to share and the platform's destruction of provenance data.

Each version addresses a question that only becomes visible after the previous version's question is answered. You cannot worry about Merkle tree interoperability until you have defined what gets hashed. You cannot worry about screen detection until you have a device model that supports depth sensors. You cannot worry about pre-publish verification until you have a complete, verified provenance chain to verify against.


Looking Forward: SCITT and the Transparency Log Horizon

The specification reserves version 1.6 for SCITT (Supply Chain Integrity, Transparency, and Trust) integration — a mechanism for recording provenance hashes in public transparency logs. This would add non-repudiation to the system: once a provenance record is anchored in a SCITT transparency log, the creator cannot deny its existence and no one can backdate or delete it.

The design considerations for SCITT integration are already sketched in v1.5: only hash values would be logged (not content, preserving privacy), anchoring would be asynchronous (not blocking the capture or share flow), and the system would function without SCITT if the transparency log is unavailable. This last constraint — graceful degradation — is consistent with the silent failure principle that permeates the entire specification.


What This Means for Developers

If you are building applications that capture or share media, the CPP specification offers a reference architecture for several problems that are not specific to provenance.

The 200ms hard-timeout pattern with silent fallback is applicable to any middleware that processes content during inter-application sharing. The confidence level model — restricting data availability based on verification status rather than hiding it behind UI toggles — is a security pattern that transfers to any system where unverified data could be spoofed. The prohibited terms framework is a template for any application that displays trust indicators and needs to avoid regulatory exposure across multiple jurisdictions.

The platform-specific implementation details — Android's Intent Proxy Model versus iOS's Share Extension with Re-Share — are practical guides for anyone building share-extension-based features, regardless of whether those features relate to provenance.

And the philosophical commitment at the core of the specification — that provenance is not truth, that absence of indication is not an error, that the system must never claim more than it can prove — is a design discipline that applies far beyond media authenticity.

The CPP v1.5 specification is available under CC BY 4.0 from the VeritasChain Standards Organization. The reference implementations, when they arrive, will likely present their own set of challenges. But the specification itself is complete, and the 200ms window between "Share" and destruction is, for the first time, a space where cryptographic truth can leave its mark.


The Capture Provenance Profile (CPP) is maintained by the VeritasChain Standards Organization (VSO). The specification versions referenced in this article (v1.0 through v1.5) were developed between January 18 and January 30, 2026. VeraSnap is a reference CPP-compliant capture application. C2PA is a trademark of the Coalition for Content Provenance and Authenticity.

Disclosure: This article describes the specification as designed. Reference implementations are not yet publicly available. All code examples are from the specification document and should be treated as reference, not production, code.

Top comments (0)