A side-by-side HTML diff program for Windows
shtmldiff.py is a Python implementation of a side-by-side html diff program designed to run in Windows. It is based on the excellent IETF rfcdiff tool.
2021-01-20: Updated to Python 3. See
Changes in v1.3.
An Example
shtmldiff.py lao.txt tzu.txt
lao.txt | tzu.txt | |||
---|---|---|---|---|
1 | The Way that can be told of is not the eternal Way; | |||
2 | The name that can be named is not the eternal name. | |||
3 | The Nameless is the origin of Heaven and Earth; | The Nameless is the origin of Heaven and Earth; | 1 | |
4 | The Named is the mother of all things. | The named is the mother of all things. | 2 | |
¶ | 3 | |||
5 | Therefore let there always be non-being, | Therefore let there always be non-being, | 4 | |
6 | so we may see their subtlety, | so we may see their subtlety, | 5 | |
7 | And let there always be being, | And let there always be being, | 6 | |
8 | so we may see their outcome. | so we may see their outcome. | 7 | |
9 | The two are the same, | The two are the same, | 8 | |
10 | But after they are produced, | But after they are produced, | 9 | |
11 | they have different names. | they have different names. | 10 | |
They both may be called deep and profound. | 11 | |||
Deeper and more profound, | 12 | |||
The door of all subtleties! | 13 | |||
End of changes. 1 change block. | ||||
3 lines changed or deleted | 5 lines changed or added | |||
shtmldiff.py is written in Python 3. We're assuming here that you have Python 3 installed on your system. If not, you can download from Python Releases for Windows.
Default behaviour
The default behaviour of shtmldiff.py is to create an .html file in the current working directory with a name like
file2-from-1.diff.html
and to open it automatically in the default browser.
Line numbering is turned on by default (because that's the way we like it) and tabs are expanded to 8 characters.
All these behaviours can be changed using command-line options. For more information use the help option:
shtmldiff.py --help
Syntax
C:\Python38\python.exe C:/!Data/Python/shtmldiff/shtmldiff.py --help usage: shtmldiff.py [-h] [-v] [-L] [-w WIDTH] [+l] [-l] [+b] [-b] [-t TABSTOP] [-i] [-s] [-a] [-f FILENAME] [--stdout] [-d] file1 file2 A side-by-side html diff program positional arguments: file1 file2 optional arguments: -h, --help show this help message and exit -v, --version show program's version number and exit -L, --license show licence and exit -w WIDTH, --width WIDTH set maximum width in characters for each half of display (default=no max) +l, --linenum show linenumbers for each line (default) -l, --no-linenum do not show linenumbers for each line +b, --browse show html output in browser (default) -b, --no-browse do not show html output in browser -t TABSTOP, --tabstop TABSTOP set tabstop for expanding tabs to spaces (default=8) -i, --ignore-case consider upper- and lower-case to be the same -s, --ignore-space-change ignore changes in the amount of white space -a, --ignore-all-space ignore all white space -f FILENAME, --filename FILENAME specify filename for html output (default=`file2-from-1.diff.html`) --stdout send output to stdout instead of to a file -d print debugging output to stderr, `-dd` show more If file1 or file2 is `-`, read standard input. Process finished with exit code 0
A Larger Example
Here's an example of running shtmldiff
to show changes in a larger file.
This is a comparison of the original
wdiff.c
source file with our new one adapted for Windows.
Rfcdiff Tool
We really, really like the rfcdiff
tool used at the
IETF Rfcdiff Web Service
to compare documents
in a side-by-side html diff format.
Unfortunately for Windows users, the original rfcdiff tool (available here) only works on Linux platforms because it is written in bash/awk and relies on the GNU diffutils 'diff' and 'wdiff'. You could upload your files to the IETF website and rather messily copy the resulting html. We wanted a version to use as a stand-alone program on Windows.
Yes, there are other similar programs that do this but we prefer the rfcdiff style and the way it shows differences by word rather than individual characters.
So we wrote our own version for Windows in Python, using the style and logic from Henrik Levkowetz's original code, and released it under the same GNU licence. It's a simplified version that merely compares two text files without any RFC-specific preprocessing.
Diffutils dependencies
The rfcdiff program also requires the Linux-specific GNU utilities wdiff
and diff
which are not standard on Windows.
There is a Windows version of
diff
(a copy is provided below) but no version of wdiff
that we could find.
So we created a wdiff.exe
program that works on Windows, also available below.
For more information, see our wdiff for Windows page.
Downloads
- shtmldiff complete setup script including diffutils executables: shtmldiff-1.3.zip (113 kB).
- shtmldiff.py source only: shtmldiff-1.3.src.zip (7.4 kB).
- wdiff.exe and diff.exe executables only: wdiff-0.5.1W.bin.zip (106 kB).
- test files: shtmldiff-tests.zip (0.9 kB): lao.txt and tzu.txt.
Old Python 2.7 version
- shtmldiff complete setup script including diffutils executables: shtmldiff-1.2.zip (110 kB).
- shtmldiff.py source only: shtmldiff-1.2.src.zip (7.1 kB).
Installation
The complete setup script (Number 1 above) will install everything including the two diffutils executables. To install:
- Download the
shtmldiff-X.X.zip
file (where "X.X" represents the version e.g. "3.0") - Open a command-line window in the same directory
- Type
pip install shtmldiff-X.X.zip
This will copy the python script and required diff exe files into \PythonXX\Scripts
.
If you don't want this, or the setup program doesn't work for you,
download the executable files separately and copy the files wdiff.exe
and diff.exe
to a directory somewhere on your Windows System PATH where the script shtmldiff.py
will find them.
If the exe files cannot be found, you should get an error like
OSError: 'wdiff' failed: [Error 2] The system cannot find the file specified
To uninstall, type pip uninstall shtmldiff
Changes in v1.3
- Code updated for Python 3.
- Detects UTF-8 or Latin-1 (ISO-8859-1) encoding in input files automatically.
- Output is now always UTF-8 encoded.
--latin1
option removed.
Can I use the Python code on a Linux system?
You may need to make a couple of changes. Some Windows-specific techniques are used.
- Double quotes (") are used in command-line calls instead of Linux single quotes (')
- CR-LF line endings are assumed in the output.
Of course, you could just download and use rfcdiff directly.
Contact us
To comment on this page or to contact us, please send us a message.
This page first published 12 June 2016. Last updated 21 January 2021.