The GitHub Issue That Never Gets Fixed
python-docx issue #455: "Track changes not supported" - opened 2016, still open.
Issue #1044: "Reading tracked changes" - 2021, no resolution.
Issue #753: "Preserve track changes when editing" - 2019, corrupts documents.
The pattern is clear: python-docx was never designed for revision-aware document processing, and bolting it on isn't happening.
What Actually Happens When You Read a Document
When python-docx reads a DOCX with track changes, here's what you're missing:
<!-- What's actually in the document -->
<w:p>
<w:r>
<w:t>The contract amount is </w:t>
</w:r>
<w:del w:author="Legal" w:date="2024-01-15T10:30:00Z">
<w:r>
<w:delText>$50,000</w:delText>
</w:r>
</w:del>
<w:ins w:author="Legal" w:date="2024-01-15T10:30:00Z">
<w:r>
<w:t>$75,000</w:t>
</w:r>
</w:ins>
<w:r>
<w:t> per year.</w:t>
</w:r>
</w:p>
# What python-docx gives you
from docx import Document
doc = Document("contract.docx")
print(doc.paragraphs[0].text)
# Output: "The contract amount is $75,000 per year."
# What you're missing:
# - Original value: $50,000
# - Who changed it: Legal
# - When: 2024-01-15T10:30:00Z
# - That this is a PENDING change, not yet accepted
python-docx returns the text as if all insertions were accepted and all deletions were applied. You lose:
- The original text (what was deleted)
- The change author
- The change timestamp
- The fact that it's a pending revision
This is catastrophic for legal document workflows, audit trails, and contract review automation.
The OOXML Revision Schema (What You Need to Parse)
OOXML tracks revisions through several element types:
Text Insertions (w:ins)
<w:ins w:id="1" w:author="John Smith" w:date="2024-01-15T10:30:00Z">
<w:r>
<w:t>inserted text</w:t>
</w:r>
</w:ins>
Attributes:
w:id: Unique revision IDw:author: Person who made the changew:date: ISO 8601 timestamp
Text Deletions (w:del)
<w:del w:id="2" w:author="Jane Doe" w:date="2024-01-16T14:00:00Z">
<w:r>
<w:delText>deleted text</w:delText>
</w:r>
</w:del>
Note: Deleted text uses w:delText, not w:t.
Formatting Changes (w:rPrChange)
<w:r>
<w:rPr>
<w:b/>
<w:rPrChange w:author="Editor" w:date="2024-01-17T09:00:00Z">
<w:rPr/>
</w:rPrChange>
</w:rPr>
<w:t>now bold</w:t>
</w:r>
This tracks when text was made bold (new state: <w:b/>, old state: no bold).
Paragraph Property Changes (w:pPrChange)
<w:pPr>
<w:jc w:val="center"/>
<w:pPrChange w:author="Designer" w:date="2024-01-18T11:00:00Z">
<w:pPr>
<w:jc w:val="left"/>
</w:pPr>
</w:pPrChange>
</w:pPr>
Paragraph changed from left-aligned to centered.
Move Operations (w:moveFrom, w:moveTo)
<w:moveFrom w:id="3" w:author="Editor" w:date="...">
<w:r><w:t>moved text</w:t></w:r>
</w:moveFrom>
<!-- ... elsewhere in document ... -->
<w:moveTo w:id="3" w:author="Editor" w:date="...">
<w:r><w:t>moved text</w:t></w:r>
</w:moveTo>
Tracks text movement (not just delete + insert).
Parsing OOXML Revisions Directly
If you need full control, parse the XML directly:
from lxml import etree
from zipfile import ZipFile
NSMAP = {
'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
}
def extract_revisions(docx_path):
"""Extract all tracked revisions from a DOCX file."""
revisions = []
with ZipFile(docx_path, 'r') as zf:
xml_content = zf.read('word/document.xml')
root = etree.fromstring(xml_content)
# Find all insertions
for ins in root.xpath('//w:ins', namespaces=NSMAP):
revision = {
'type': 'insertion',
'id': ins.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id'),
'author': ins.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}author'),
'date': ins.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date'),
'text': ''.join(ins.xpath('.//w:t/text()', namespaces=NSMAP))
}
revisions.append(revision)
# Find all deletions
for deletion in root.xpath('//w:del', namespaces=NSMAP):
revision = {
'type': 'deletion',
'id': deletion.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id'),
'author': deletion.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}author'),
'date': deletion.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}date'),
'text': ''.join(deletion.xpath('.//w:delText/text()', namespaces=NSMAP))
}
revisions.append(revision)
return revisions
# Usage
revisions = extract_revisions('contract_with_changes.docx')
for rev in revisions:
print(f"{rev['type']}: '{rev['text']}' by {rev['author']} at {rev['date']}")
This works for reading. Writing is harder—you need to:
- Track revision IDs (must be unique per document)
- Maintain
rsid(revision session IDs) - Handle nested revisions correctly
- Update
settings.xmlrevision tracking settings - Preserve existing revisions while adding new ones
The Problem with Modifying Documents
When you modify a document with python-docx that has existing track changes:
from docx import Document
doc = Document("contract_with_changes.docx")
doc.paragraphs[0].add_run(" Additional text.")
doc.save("modified.docx")
What can happen:
- Existing track changes may be corrupted
- New changes are NOT tracked (no
w:inswrapper) - Revision IDs may conflict
- The document may become unreadable in Word
python-docx doesn't know about revisions, so it can't preserve them correctly.
Alternatives That Actually Work
Option 1: Word COM Automation (Windows Only)
import win32com.client
def accept_all_changes(input_path, output_path):
"""Accept all track changes using Word COM."""
word = win32com.client.Dispatch("Word.Application")
word.Visible = False
doc = word.Documents.Open(input_path)
doc.AcceptAllRevisions()
doc.Save()
doc.Close()
word.Quit()
def add_text_with_tracking(input_path, text, output_path):
"""Add text with revision tracking enabled."""
word = win32com.client.Dispatch("Word.Application")
word.Visible = False
doc = word.Documents.Open(input_path)
doc.TrackRevisions = True
# Add text at end
doc.Content.InsertAfter(text)
doc.SaveAs(output_path)
doc.Close()
word.Quit()
Pros:
- Full Word functionality
- Reliable revision handling
- Supports all Word features
Cons:
- Windows only
- Requires Word installation
- Slow (COM overhead)
- Licensing considerations
Option 2: LibreOffice Headless
import subprocess
import shutil
def accept_all_changes_libreoffice(input_path, output_path):
"""Accept all changes using LibreOffice in headless mode."""
# LibreOffice macro to accept all changes
macro = """
Sub AcceptAll
ThisComponent.AcceptAllChanges()
ThisComponent.Store()
End Sub
"""
# This is simplified - actual implementation needs macro setup
subprocess.run([
'soffice',
'--headless',
'--accept="socket,host=localhost,port=2002;urp;"',
input_path
])
# Additional UNO API calls needed for full implementation
Pros:
- Cross-platform (Linux, macOS, Windows)
- Free/open source
- No Word license needed
Cons:
- Complex setup
- UNO API is poorly documented
- Some Word compatibility issues
- Slow for batch processing
Option 3: DocMods API
from docxagent import DocxClient
client = DocxClient()
# Upload document
doc_id = client.upload("contract_with_changes.docx")
# Read with revision awareness
content = client.read(doc_id, include_revisions=True)
# Returns revisions with author, date, original and new text
# Make changes WITH track changes
client.edit(
doc_id,
"Change the payment terms from Net 30 to Net 45"
)
# Changes are tracked with proper w:ins/w:del elements
# Accept specific revisions
client.accept_revision(doc_id, revision_id="1")
# Reject revisions
client.reject_revision(doc_id, revision_id="2")
# Accept all
client.accept_all_revisions(doc_id)
# Download
client.download(doc_id, "final_contract.docx")
Pros:
- Full revision tracking support
- Cross-platform (API-based)
- AI-powered editing with track changes
- No local software dependencies
Cons:
- Requires API calls (network dependency)
- Usage-based pricing
Option 4: Direct OOXML Manipulation
For complete control, manipulate the OOXML directly:
from lxml import etree
from zipfile import ZipFile
import tempfile
import shutil
import os
NSMAP = {
'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'
}
class DocxRevisionEditor:
def __init__(self, path):
self.path = path
self.temp_dir = tempfile.mkdtemp()
# Extract DOCX
with ZipFile(path, 'r') as zf:
zf.extractall(self.temp_dir)
# Parse document.xml
doc_path = os.path.join(self.temp_dir, 'word', 'document.xml')
self.tree = etree.parse(doc_path)
self.root = self.tree.getroot()
# Track max revision ID
self.max_id = self._get_max_revision_id()
def _get_max_revision_id(self):
"""Find highest existing revision ID."""
max_id = 0
for elem in self.root.xpath('//*[@w:id]', namespaces=NSMAP):
try:
rev_id = int(elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}id'))
max_id = max(max_id, rev_id)
except (ValueError, TypeError):
pass
return max_id
def accept_revision(self, revision_id):
"""Accept a specific revision by ID."""
# Find insertion
ins = self.root.xpath(f'//w:ins[@w:id="{revision_id}"]', namespaces=NSMAP)
if ins:
# Move children out of w:ins, remove w:ins
parent = ins[0].getparent()
index = list(parent).index(ins[0])
for child in list(ins[0]):
parent.insert(index, child)
index += 1
parent.remove(ins[0])
return True
# Find deletion
deletion = self.root.xpath(f'//w:del[@w:id="{revision_id}"]', namespaces=NSMAP)
if deletion:
# Remove the entire w:del element (text is gone)
parent = deletion[0].getparent()
parent.remove(deletion[0])
return True
return False
def insert_text_tracked(self, paragraph_index, text, author="Python Script"):
"""Insert text with track changes."""
from datetime import datetime
self.max_id += 1
paragraphs = self.root.xpath('//w:p', namespaces=NSMAP)
if paragraph_index >= len(paragraphs):
raise IndexError("Paragraph index out of range")
para = paragraphs[paragraph_index]
# Create w:ins element
w = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
ins = etree.Element(f'{w}ins')
ins.set(f'{w}id', str(self.max_id))
ins.set(f'{w}author', author)
ins.set(f'{w}date', datetime.utcnow().isoformat() + 'Z')
# Create run with text
run = etree.SubElement(ins, f'{w}r')
t = etree.SubElement(run, f'{w}t')
t.text = text
# Append to paragraph
para.append(ins)
def save(self, output_path):
"""Save the modified document."""
# Write document.xml
doc_path = os.path.join(self.temp_dir, 'word', 'document.xml')
self.tree.write(doc_path, xml_declaration=True, encoding='UTF-8', standalone=True)
# Repack DOCX
with ZipFile(output_path, 'w') as zf:
for root, dirs, files in os.walk(self.temp_dir):
for file in files:
file_path = os.path.join(root, file)
arc_name = os.path.relpath(file_path, self.temp_dir)
zf.write(file_path, arc_name)
# Cleanup
shutil.rmtree(self.temp_dir)
# Usage
editor = DocxRevisionEditor("contract.docx")
editor.insert_text_tracked(0, " AMENDED: New terms apply.", author="Legal Bot")
editor.accept_revision("1") # Accept revision with ID 1
editor.save("contract_modified.docx")
Pros:
- Full control over revisions
- No external dependencies beyond lxml
- Cross-platform
- Can handle complex revision scenarios
Cons:
- Complex to implement correctly
- Must handle all edge cases
- Easy to corrupt documents
- Need deep OOXML knowledge
Building a Revision-Aware Pipeline
For production document automation with track changes:
from docxagent import DocxClient
import json
def contract_review_pipeline(template_path, data, reviewer_name):
"""
Automated contract review with tracked changes.
1. Load template
2. Fill in data
3. AI reviews and suggests changes (tracked)
4. Return document with all changes visible
"""
client = DocxClient()
# Upload and fill template
doc_id = client.upload(template_path)
# Fill template variables
for key, value in data.items():
client.edit(doc_id, f"Replace placeholder {{{{{key}}}}} with: {value}")
# AI-powered contract review with track changes
client.edit(
doc_id,
f"""Review this contract as {reviewer_name}. Make specific suggestions:
1. Flag any unusual indemnification language
2. Verify payment terms match industry standards
3. Check for missing limitation of liability
4. Suggest clearer language where ambiguous
All changes should be tracked with your name as author."""
)
# Get revision summary
revisions = client.get_revisions(doc_id)
summary = {
"total_revisions": len(revisions),
"insertions": len([r for r in revisions if r['type'] == 'insertion']),
"deletions": len([r for r in revisions if r['type'] == 'deletion']),
"by_author": {}
}
for rev in revisions:
author = rev.get('author', 'Unknown')
if author not in summary['by_author']:
summary['by_author'][author] = 0
summary['by_author'][author] += 1
# Download document with all tracked changes visible
client.download(doc_id, "reviewed_contract.docx")
return summary
# Usage
result = contract_review_pipeline(
"msa_template.docx",
{
"CLIENT_NAME": "Acme Corp",
"EFFECTIVE_DATE": "January 1, 2025",
"PAYMENT_TERMS": "Net 30"
},
reviewer_name="Contract AI"
)
print(json.dumps(result, indent=2))
Why This Matters
If you're building document automation in Python without revision awareness:
- Legal risk: You can't prove what changed or when
- Audit failures: No trail of modifications
- Collaboration breaks: Changes made by your system aren't visible to reviewers
- Data loss: Original text is silently discarded
python-docx is fine for simple document generation. For anything involving revisions, tracked changes, or collaborative editing, you need tools that understand OOXML revisions.
The Bottom Line
python-docx doesn't support track changes. This isn't a bug—it's a fundamental design limitation that won't be fixed.
Your options:
- Windows only: Word COM automation
- Cross-platform free: LibreOffice headless (complex setup)
- Cross-platform API: DocMods or similar services
- Full control: Direct OOXML manipulation (steep learning curve)
Choose based on your platform requirements, complexity tolerance, and whether you need to read revisions, write revisions, or both.
For production document workflows where track changes matter, don't fight python-docx's limitations. Use the right tool for the job.



