The Python DOCX Landscape
When you need to edit Word documents programmatically in Python, you have several options:
| Library | Free? | Track Changes | Comments | Maturity |
|---|---|---|---|---|
| python-docx | ✓ | ✗ | ✗ | High |
| Aspose.Words | ✗ ($) | ✓ | ✓ | High |
| Spire.Doc | ✗ ($) | ✓ | ✓ | Medium |
| DocMods SDK | API-based | ✓ | ✓ | Growing |
The most common choice—python-docx—has a critical limitation that we'll address in this guide.
python-docx Basics
python-docx is the go-to library for basic DOCX manipulation:
Installation
pip install python-docx
Reading a Document
from docx import Document
doc = Document('contract.docx')
# Read paragraphs
for para in doc.paragraphs:
print(para.text)
# Read tables
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
Modifying Content
from docx import Document
doc = Document('contract.docx')
# Replace text in paragraphs
for para in doc.paragraphs:
if 'PLACEHOLDER' in para.text:
para.text = para.text.replace('PLACEHOLDER', 'Actual Value')
# Add a new paragraph
doc.add_paragraph('This is new content.')
# Save
doc.save('contract_modified.docx')
Adding Tables
from docx import Document
doc = Document()
# Add a table with 3 rows and 3 columns
table = doc.add_table(rows=3, cols=3)
# Fill in cells
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
cell.text = f'Row {i+1}, Col {j+1}'
doc.save('table_example.docx')
Formatting
from docx import Document
from docx.shared import Pt, Inches
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
# Add formatted paragraph
para = doc.add_paragraph()
run = para.add_run('Bold and Large')
run.bold = True
run.font.size = Pt(14)
# Center alignment
para.alignment = WD_ALIGN_PARAGRAPH.CENTER
doc.save('formatted.docx')
The Track Changes Problem
Here's where python-docx falls short:
from docx import Document
doc = Document('contract.docx')
# This adds text, but NOT as a tracked change
doc.paragraphs[0].add_run('NEW TEXT')
# When you open in Word:
# - The text is there
# - But there's no track change record
# - No author attribution
# - No insertion markup
# - Can't be accepted/rejected
doc.save('modified.docx')
This is a fundamental limitation, not a bug. The python-docx maintainers have acknowledged that track changes support would require significant architectural changes.
GitHub Issues
The most requested feature on python-docx:
- Issue #340: revisions/track changes (2015, still open)
- Issue #566: Accept commented changes to parse text
- Issue #1025: Adding Text/Paragraph is not tracked
The response is consistently: "This is a large feature that would require significant refactoring."
Why Track Changes Matter
Track changes aren't just for lawyers. They're essential when:
Compliance and auditing:
- Regulated industries require proof of who changed what
- Financial documents need audit trails
- Healthcare records require attribution
Collaborative workflows:
- Multiple reviewers need to see each other's edits
- Editors can accept/reject suggestions individually
- Document owners maintain control
Quality control:
- Changes can be reviewed before becoming permanent
- Mistakes can be identified and reverted
- Process documentation is automatic
Without track changes, you're making "silent" edits—the document changes but there's no record of what changed or who did it.
Solution 1: DocMods Python SDK
DocMods provides a Python SDK that supports track changes:
Installation
pip install docxagent
Basic Usage
from docxagent import DocxClient
client = DocxClient()
# Upload document
doc_id = client.upload('contract.docx')
# Read content
content = client.read_document(doc_id)
print(content['paragraphs'][0]['text'])
# Insert text WITH track changes
client.insert_text(
doc_id,
paragraph_index=0,
text='[REVIEWED] ',
author='Legal Bot' # Attribution!
)
# Download
client.download(doc_id, 'contract_reviewed.docx')
When you open contract_reviewed.docx in Word:
- The inserted text appears with track change markup
- Author is "Legal Bot"
- Timestamp is recorded
- Can be accepted or rejected
Adding Comments
# Add a comment (appears in Word's review pane)
client.add_comment(
doc_id,
paragraph_index=5,
comment_text='Please verify this amount with Finance.',
author='Review Bot'
)
Proposing Deletions
# Mark text for deletion (strikethrough with attribution)
client.propose_deletion(
doc_id,
paragraph_index=3,
start_char=0,
end_char=50, # First 50 characters
author='Compliance Bot'
)
Full Example
from docxagent import DocxClient
def review_contract(input_path, output_path):
"""Add automated review comments to a contract."""
client = DocxClient()
doc_id = client.upload(input_path)
content = client.read_document(doc_id)
# Define flags
flags = {
'indemnify': 'Legal review required: indemnification clause',
'unlimited': 'Legal review required: unlimited liability',
'perpetual': 'Legal review required: perpetual term',
'TBD': 'Placeholder needs completion',
}
# Flag each matching paragraph
for i, para in enumerate(content['paragraphs']):
text_lower = para['text'].lower()
for trigger, message in flags.items():
if trigger.lower() in text_lower:
client.add_comment(
doc_id,
paragraph_index=i,
comment_text=message,
author='Contract Review Bot',
highlight=True
)
# Add processing marker
client.insert_text(
doc_id,
paragraph_index=0,
text='[AUTO-REVIEWED] ',
author='Contract Review Bot'
)
client.download(doc_id, output_path)
return output_path
# Usage
review_contract('contract.docx', 'contract_reviewed.docx')
Solution 2: Aspose.Words for Python
Commercial option with comprehensive track changes support:
Installation
pip install aspose-words
Basic Usage
import aspose.words as aw
doc = aw.Document('contract.docx')
# Start tracking changes
doc.start_track_revisions('My Name')
# Make changes (they're now tracked)
builder = aw.DocumentBuilder(doc)
builder.move_to_document_start()
builder.write('[REVIEWED] ')
# Stop tracking
doc.stop_track_revisions()
doc.save('contract_tracked.docx')
Accept/Reject Changes
import aspose.words as aw
doc = aw.Document('document_with_changes.docx')
# Accept all changes
doc.accept_all_revisions()
# Or iterate through revisions
for revision in doc.revisions:
if revision.author == 'Trusted Reviewer':
revision.accept()
else:
revision.reject()
doc.save('document_processed.docx')
Pros and Cons
Advantages:
- Full track changes support
- Comprehensive Word feature coverage
- No external API dependency
- Offline processing
Disadvantages:
- Commercial license required ($999+)
- Large library size
- Learning curve for API
Solution 3: Hybrid Approach
Use python-docx for basic operations, DocMods for track changes:
from docx import Document
from docxagent import DocxClient
def process_document(input_path, output_path):
"""
Use python-docx for reading/analysis,
DocMods for changes that need tracking.
"""
# Step 1: Analyze with python-docx (free)
doc = Document(input_path)
paragraphs_to_flag = []
for i, para in enumerate(doc.paragraphs):
if 'IMPORTANT' in para.text:
paragraphs_to_flag.append(i)
# Step 2: Add tracked changes with DocMods
client = DocxClient()
doc_id = client.upload(input_path)
for para_idx in paragraphs_to_flag:
client.add_comment(
doc_id,
paragraph_index=para_idx,
comment_text='Flagged: Contains IMPORTANT marker',
author='Analysis Bot'
)
client.download(doc_id, output_path)
Batch Processing
With python-docx (No Track Changes)
import os
from docx import Document
input_folder = 'documents/'
output_folder = 'processed/'
for filename in os.listdir(input_folder):
if filename.endswith('.docx'):
doc = Document(os.path.join(input_folder, filename))
# Replace placeholder
for para in doc.paragraphs:
para.text = para.text.replace('{{DATE}}', '2026-01-29')
doc.save(os.path.join(output_folder, filename))
With DocMods (With Track Changes)
import os
from docxagent import DocxClient
client = DocxClient()
input_folder = 'documents/'
output_folder = 'processed/'
for filename in os.listdir(input_folder):
if filename.endswith('.docx'):
doc_id = client.upload(os.path.join(input_folder, filename))
# Add tracked change
client.insert_text(
doc_id,
paragraph_index=0,
text='[Batch Processed] ',
author='Batch Bot'
)
client.download(doc_id, os.path.join(output_folder, filename))
Handling Common Scenarios
Find and Replace with Tracking
from docxagent import DocxClient
client = DocxClient()
doc_id = client.upload('contract.docx')
content = client.read_document(doc_id)
old_value = 'ACME Corp'
new_value = 'Initech LLC'
for i, para in enumerate(content['paragraphs']):
if old_value in para['text']:
# Find position
start = para['text'].find(old_value)
end = start + len(old_value)
# Delete old (tracked)
client.propose_deletion(
doc_id,
paragraph_index=i,
start_char=start,
end_char=end,
author='Find/Replace Bot'
)
# Insert new (tracked)
client.insert_text(
doc_id,
paragraph_index=i,
text=new_value,
position=start,
author='Find/Replace Bot'
)
client.download(doc_id, 'contract_updated.docx')
Extract Track Changes
from docxagent import DocxClient
client = DocxClient()
doc_id = client.upload('reviewed_document.docx')
content = client.read_document(doc_id, include_track_changes=True)
for change in content.get('track_changes', []):
print(f"Type: {change['type']}")
print(f"Author: {change['author']}")
print(f"Date: {change['date']}")
print(f"Text: {change['text']}")
print("---")
Generate Report from Multiple Documents
import os
from docxagent import DocxClient
import json
client = DocxClient()
def analyze_documents(folder):
"""Extract all comments and changes from documents in a folder."""
report = []
for filename in os.listdir(folder):
if filename.endswith('.docx'):
doc_id = client.upload(os.path.join(folder, filename))
content = client.read_document(doc_id, include_track_changes=True)
doc_report = {
'filename': filename,
'paragraph_count': len(content['paragraphs']),
'changes': content.get('track_changes', []),
'comments': content.get('comments', [])
}
report.append(doc_report)
return report
report = analyze_documents('legal_documents/')
print(json.dumps(report, indent=2))
Performance Considerations
python-docx
- Fast, runs locally
- Memory usage scales with document size
- No network latency
DocMods API
- Network round-trip for each operation
- Better for complex operations
- Batching reduces overhead
Optimization Tips
# BAD: Multiple API calls
for para_idx in range(10):
client.add_comment(doc_id, para_idx, 'Comment', 'Bot')
# BETTER: Batch operations when possible
comments = [(i, 'Comment', 'Bot') for i in range(10)]
client.add_comments_batch(doc_id, comments)
When to Use What
| Scenario | Recommended Tool |
|---|---|
| Simple text replacement | python-docx |
| Template filling | python-docx |
| Adding track changes | DocMods or Aspose |
| Adding comments | DocMods or Aspose |
| Bulk format changes | python-docx |
| Legal/compliance docs | DocMods or Aspose |
| One-time scripts | python-docx |
| Production workflows | DocMods (API reliability) |
The Bottom Line
python-docx is excellent for basic DOCX manipulation but fundamentally cannot add track changes. This isn't changing—it's been the top feature request for 10+ years with no resolution.
For workflows requiring track changes:
- DocMods SDK: API-based, straightforward, pay-per-use
- Aspose.Words: Commercial library, comprehensive, offline
- Spire.Doc: Similar to Aspose, different licensing
Choose based on your needs:
- Occasional use → DocMods API
- High volume, offline → Aspose.Words
- Basic editing only → python-docx (free, no track changes)
The investment in proper track changes support pays off when you need audit trails, collaborative review, or document compliance.



