The Surface Problem and the Real Problem
The surface problem: You have a Word document with tracked changes. You need to send a clean version. You click Accept All Changes. The red strikethroughs and blue underlines disappear. Document looks clean. You send it.
The real problem: That document still contains your name, your company name, how long you spent editing, your computer's username, the path to the original file, and sometimes fragments of text you thought you deleted. All of this is extractable by anyone who receives the file.
This isn't theoretical. Real legal cases have turned on metadata extracted from Word documents. Negotiation strategies exposed. Confidential comments discovered. Document histories revealing information about work product that should never have left the organization.
What "Accept All Changes" Actually Does
When you click Review > Accept > Accept All Changes:
What happens:
- Insertions become normal text
- Deletions are removed from visible content
- Change markup (colors, underlines, strikethroughs) disappears
- Revision tracking is turned off
What doesn't happen:
- Metadata is not removed
- Document properties stay intact
- Previous author list remains
- Comments may still be present (Accept Changes doesn't delete comments)
- Hidden text remains hidden but present
- Custom XML data stays in document
The document looks clean. The document is not clean.
Hidden Data in Word Documents: The Complete List
Word documents (.docx files) are ZIP archives containing XML files. These XML files store everything about the document - including data you may not want to share.
Document Properties (docProps/core.xml)
| Property | What It Contains | Risk |
|---|---|---|
| dc:creator | Person who created original file | Reveals original author |
| cp:lastModifiedBy | Person who last saved | Reveals who touched it |
| dcterms:created | Creation timestamp | Dates your work |
| dcterms:modified | Last modification | Shows recent editing |
| cp:revision | Number of saves | Indicates editing extent |
App Properties (docProps/app.xml)
| Property | What It Contains | Risk |
|---|---|---|
| Application | "Microsoft Office Word" version | Minor |
| Company | Your organization name | Reveals affiliation |
| TotalTime | Minutes spent editing | Shows effort level |
| Pages, Words, Characters | Document statistics | Minor |
| Template | Template filename and path | May reveal internal naming |
Custom Properties
Organizations often add custom properties for document management systems. These might contain:
- Internal project codes
- Classification levels
- Approval statuses
- Matter numbers (law firms)
- Reference IDs
All visible to anyone who opens the document properties.
Revision Information
Even after accepting changes, Word may retain:
- Names of all people who tracked changes
- Timestamps of editing sessions
- rsid (revision save IDs) attributes throughout the document
- Move tracking information
Comments
Accept All Changes does not delete comments. Comments require separate deletion. Even after "deleting" comments, residual data may remain in custom XML parts.
Real Cases Where Hidden Data Caused Problems
SCO vs. IBM (2003)
Internal Microsoft documents produced in litigation contained revision history revealing earlier negotiation strategies. The metadata was used to undermine Microsoft's position in the case.
Pentagon Memo (2005)
A memo about the Giuliana Sgrena incident was released as a Word document. Redacted sections could be recovered by examining the track changes history, revealing classified content.
Numerous M&A Situations
We've worked with organizations that discovered, during post-deal review, that shared documents contained:
- Valuation models with earlier proposed prices visible
- Comments like "they'll never accept this" on deal terms
- Author names revealing involvement of undisclosed parties
These aren't edge cases. Document metadata exposure is common and preventable.
Document Inspector: Your Primary Tool
Word's Document Inspector finds and removes hidden data. It's buried but comprehensive.
How to Use Document Inspector
- File > Info (backstage view)
- Click Check for Issues dropdown
- Select Inspect Document
- If prompted, save your document first
- Leave all inspection categories checked (default)
- Click Inspect
- Review findings
- Click Remove All for each category you want to clean
- Save the document
- Run Inspector again to verify
What Document Inspector Finds
| Category | What It Detects | Should You Remove? |
|---|---|---|
| Comments, Revisions, Versions | Track changes, comments, version history | Yes for external sharing |
| Document Properties and Personal Info | Author, company, editing time, template | Yes for external sharing |
| Custom XML Data | Data stores from add-ins, custom apps | Usually yes |
| Headers, Footers, Watermarks | Content that might contain sensitive data | Review before removing |
| Invisible Content | Objects formatted as invisible | Usually yes |
| Hidden Text | Text with "Hidden" formatting applied | Usually yes |
The Warning You Should Heed
Document Inspector warns: "Some changes cannot be undone." This is accurate. Save a copy before cleaning if you might need the original metadata later.
Why "Accept All" Is Greyed Out
If the Accept button is greyed out, check these causes:
1. View Mode Mismatch
You're viewing "Final" mode, which shows the document as if changes are accepted - but they're not actually accepted.
Fix: Review > Tracking > Display for Review dropdown > change to "All Markup"
If track changes become visible, you can now accept them.
2. Document Protection
The document has edit restrictions.
Fix: Review > Restrict Editing > Stop Protection (may require password)
3. Read-Only Mode
Document opened from email, SharePoint, or OneDrive in protected view.
Fix: Click "Enable Editing" in yellow banner, or save to local drive and reopen
4. No Track Changes Exist
There are actually no tracked changes in the document.
Verification: Review > Tracking > Display for Review > "All Markup" - if nothing appears, nothing to accept
5. Document Corruption
Track changes data is corrupted.
Fix: Copy all content, paste into new document, reformat if needed
Enterprise Document Cleaning
Organizations handling sensitive documents (law firms, healthcare, financial services) need scalable cleaning processes.
Policy-Level Settings
Trust Center settings to enforce:
- File > Options > Trust Center > Trust Center Settings
- Privacy Options: "Remove personal information from file properties on save" (limited effect)
- Document-specific settings: "Make hidden markup visible when opening or saving"
These settings help but don't replace Document Inspector for thorough cleaning.
VBA Macro for Batch Cleaning
Sub CleanDocumentMetadata()
Dim doc As Document
Set doc = ActiveDocument
' Accept all track changes
doc.AcceptAllRevisions
' Delete all comments
Do While doc.Comments.Count > 0
doc.Comments(1).Delete
Loop
' Remove document properties
doc.RemoveDocumentInformation (wdRDIAll)
doc.Save
MsgBox "Document cleaned"
End Sub
This macro accepts changes, deletes comments, and removes document information. For batch processing, wrap in a folder enumeration loop.
DocMods API for Programmatic Cleaning
from docxagent import DocxClient
client = DocxClient()
# Upload sensitive document
doc_id = client.upload("contract_internal_review.docx")
# Full metadata scrub
client.clean_document(
doc_id,
accept_track_changes=True,
remove_comments=True,
remove_metadata=True,
remove_hidden_text=True,
remove_custom_xml=True,
verify_clean=True # Runs second inspection to confirm
)
# Download cleaned version
client.download(doc_id, "contract_for_client.docx")
The API returns verification results confirming the document is clean.
Verification: Is the Document Actually Clean?
Quick Verification
- Save cleaned document
- Close document
- Reopen document
- Run Document Inspector again
- All categories should show "No items were found"
Thorough Verification (For High-Stakes Documents)
- Make copy of cleaned document
- Rename copy from .docx to .zip
- Extract ZIP contents
- Open
docProps/core.xml- check for author, creator - Open
docProps/app.xml- check for Company, editing time - Open
word/document.xml- search forw:comment,w:ins,w:del - If any of the above contain sensitive data, cleaning was incomplete
Programmatic Verification
import zipfile
import xml.etree.ElementTree as ET
def verify_document_clean(docx_path):
issues = []
with zipfile.ZipFile(docx_path, 'r') as z:
# Check for track changes
doc_xml = z.read('word/document.xml')
if b'<w:ins' in doc_xml or b'<w:del' in doc_xml:
issues.append("Track changes found")
# Check for comments
if 'word/comments.xml' in z.namelist():
issues.append("Comments file exists")
# Check core properties
core_xml = z.read('docProps/core.xml')
if b'<dc:creator>' in core_xml:
issues.append("Author metadata present")
return issues
Special Situations
Regulatory Submissions
Some agencies require metadata. FDA eCTD submissions have specific metadata requirements. Don't clean if the metadata is required.
Legal Discovery
Documents produced in litigation may need metadata preserved for chain of custody. Cleaning could constitute evidence tampering. Consult counsel.
Document Authentication
Clean documents can't be forensically verified as originals. If authenticity matters, keep an uncleaned archive copy.
Version Control Systems
SharePoint, OneDrive, and document management systems may preserve version history outside the document. Cleaning the document doesn't clean the repository.
The Complete Pre-Send Checklist
Before sending a Word document externally:
- Accept all track changes (Review > Accept > Accept All Changes)
- Delete all comments (Review > Delete > Delete All Comments in Document)
- Run Document Inspector (File > Info > Check for Issues > Inspect Document)
- Remove All for every category with findings
- Run Document Inspector again - verify nothing found
- Check document visually - does it look correct?
- Save as new file (don't overwrite your internal version)
- Consider: Should this be PDF instead of DOCX?
For truly sensitive documents, add:
- Manual XML verification
- IT/compliance review
- Archive original with metadata in secure location
Why This Matters
Hidden data in documents isn't just a privacy issue. It's a:
- Competitive issue: Negotiation strategies, pricing models, alternative positions
- Legal issue: Work product, privileged communications, discoverable content
- Reputational issue: Internal comments that would be embarrassing if seen
- Compliance issue: PHI, PII, confidential data in metadata
The fix is simple - Document Inspector takes 30 seconds. The consequences of not doing it can be severe.
Accept All Changes is not the same as "clean document." Document Inspector is the minimum. For high-stakes situations, verify the cleaning worked. Your future self will thank you.



