Working with Binary Files within CVS

by Jeff Hunter, Sr. Database Administrator


While the most common use for CVS is to store text files, there are many times when developers will need to include several binary files within a CVS module. When working with text files, CVS can perform merges on different revisions of a file, display the difference between revisions of a file in a human-visible fashion, and several other such operations. With binary files, however, CVS is not able to perform these types of operations.

Most developers are familiar with CVS's ability to show the difference between two revisions of a file. For example, if a developer checked in a new version of a file, they may wish to look at what they changed and determine whether their changes are satisfactory. When working with text files, CVS can easily provide this functionality via the cvs diff command. With binary files, it may be possible to extract the two revisions and then compare them with a tool external to CVS (for example, a word processor or graphics program).

The ability to merge two revisions of a file is another of those possibilities in CVS. When a developer makes a change in a separate working area (directories) and also when one merges explicitly with the "update -j" command, CVS can merge the changes and if a change creates a conflict, CVS can signal the developer to resolve the issue. With binary files, the best CVS can do is present the two different copies of the file, and leave it to the developer to resolve the conflict. The developer may choose one copy or the other, or may run an external merge tool against the two binary files that know its binary format. Keep in mind that having a user merge relies on the user to not accidentally omit some changes, and thus is potentially error prone.

As you can see, this process can be undesirable and the best choice may be to avoid merging. To avoid the merges that result from separate working directories. There are two issues with using CVS to store binary files. The first is that CVS by default, converts line endings between the canonical form in which they are stored in the repository (linefeed only), and the form appropriate to the operating system in use on the client (for example, carriage return followed by line feed for Windows NT).

The second is that a binary file might happen to contain data which looks like a keyword , so keyword expansion must be turned off.

The '-kb' option available with some CVS commands insures that neither line ending conversion nor keyword expansion will be done.

Here is an example of how you can create a new file using the '-kb' flag:

$ echo '$Id$' > kotest
$ cvs add -kb -m"A test file" kotest
$ cvs ci -m"First checkin; contains a keyword" kotest
If a file accidentally gets added without '-kb', one can use the cvs admin command to recover. For example:
$ echo '$Id$' > kotest
$ cvs add -m"A test file" kotest
$ cvs ci -m"First checkin; contains a keyword" kotest
$ cvs admin -kb kotest
$ cvs update -A kotest
# For non-unix systems:
# Copy in a good copy of the file from outside CVS
$ cvs commit -m "make it binary" kotest
When you check in the file 'kotest' the file is not preserved as a binary file, because you did not check it in as a binary file. The cvs admin -kb command sets the default keyword substitution method for this file, but it does not alter the working copy of the file that you have. If you need to cope with line endings (that is, you are using CVS on a non-unix system), then you need to check in a new copy of the file, as shown by the cvs commit command above. On unix, the cvs update -A command suffices.

However, in using cvs admin -k to change the keyword expansion, be aware that the keyword expansion mode is not version controlled. This means that, for example, that if you have a text file in old releases, and a binary file with the same name in new releases, CVS provides no way to check out the file in text or binary mode depending on what version you are checking out. There is no good workaround for this problem.

You can also set a default for whether cvs add and cvs import treat a file as binary based on its name; for example you could say that files who names end in '.exe' are binary. See the cvswrappers file below. There is currently no way to have CVS detect whether a file is binary based on its contents. The main difficulty with designing such a feature is that it is not clear how to distinguish between binary and non-binary files, and the rules to apply would vary considerably with the operating system.

It is possible to set the default behavior that CVS will use for recognizing binary files with cvs add and cvs import operations by configuring the cvswrappers file within the CVSROOT module.

The cvswrappers File
Wrappers allow you to set a hook which transforms files on their way in and out of CVS.

The file 'cvswrappers' defines the script that will be run on a file when its name matches a regular expresion. There are two scripts that can be run on a file or directory. One script is executed on the file/directory before being checked into the repository (this is denoted with the -t flag) and the other when the file is checked out of the repository (this is denoted with the -f flag). The '-t'/'-f' feature does not work with client/server CVS.

The 'cvswrappers' also has a '-m' option to specify the merge methodology that should be used when a non-binary file is updated. MERGE means the usual CVS behavior: try to merge the files. COPY means that cvs update will refuse to merge files, as it also does for files specified as binary with '-kb' (but if the file is specified as binary, there is no need to specify '-m 'COPY''). CVS will provide the user with the two versions of the files, and require the user using mechanisms outside CVS, to insert any necessary changes.

WARNING: do not use COPY with CVS 1.9 or earlier - such versions of CVS will copy one version of your file over the other, wiping out the previous contents. The '-m' wrapper option only affects behavior when merging is done on update; it does not affect how files are stored.

The basic format of the file 'cvswrappers' is:

wildcard     [option value][option value]...

where option is one of
-f           from cvs filter         value: path to filter
-t           to cvs filter           value: path to filter
-m           update methodology      value: MERGE or COPY
-k           keyword expansion       value: expansion mode
and value is a single-quote delimited value.
*.nib    -f 'unwrap %s' -t 'wrap %s %s' -m 'COPY'
*.c      -t 'indent %s %s'
The above example of a 'cvswrappers' file states that all files/directories that end with a .nib should be filtered with the 'wrap' program before checking the file into the repository. The file should be filtered though the 'unwrap' program when the file is checked out of the repository. The 'cvswrappers' file also states that a COPY methodology should be used when updating the files in the repository (that is, no merging should be performed).

The last example line says that all files that end with .c should be filtered with 'indent' before being checked into the repository. Unlike the previous example, no filtering of the .c file is done when it is checked out of the repository. The -t filter is called with two arguments, the first is the name of the file/directory to filter and the second is the pathname to where the resulting filtered file should be placed.

The -f filter is called with one argument, which is the name of the file to filter from. The end result of this filter will be a file in the users directory that they can work on as they normally would.

Note that the '-t'/'-f' features do not conveniently handle one portion of CVS's operation: determining when files are modified. CVS will still want a file (or directory) to exist, and it will use its modification time to determine whether a file is modified. If CVS erroneously thinks a file is unmodified (for example, a directory is unchanged but one of the files within it is changed), you can force it to check in the file anyway by specifying the '-f' option to cvs commit.

For another example, the following command imports a directory, treating files whose name ends in '.exe' as binary:

$ cvs import -I ! -W "*.exe -k 'b'" first-dir vendortag reltag
Here is an example cvswrappers file that can be checked into the CVSROOT module:

# This file affects handling of files based on their names.
# The -m option specifies whether CVS attempts to merge files.
# The -k option specifies keyword expansion (e.g. -kb for binary).
# Format of wrapper file ($CVSROOT/CVSROOT/cvswrappers or .cvswrappers)
#  wildcard	[option value][option value]...
#  where option is one of
#  -m		update methodology	value: MERGE or COPY
#  -k		expansion mode		value: b, o, kkv, &c
#  and value is a single-quote delimited value.
# For example:
#*.gif -k 'b'
*.CLASS   -k 'b' -m 'COPY'
*.DOC   -k 'b' -m 'COPY'
*.EAR   -k 'b' -m 'COPY'
*.GIF   -k 'b' -m 'COPY'
*.JPG   -k 'b' -m 'COPY'
*.PDF   -k 'b' -m 'COPY'
*.TAR   -k 'b' -m 'COPY'
*.WAR   -k 'b' -m 'COPY'
*.ZIP   -k 'b' -m 'COPY'
*.avi   -k 'b' -m 'COPY'
*.bin   -k 'b' -m 'COPY'
*.bz    -k 'b' -m 'COPY'
*.bz2   -k 'b' -m 'COPY'
*.class   -k 'b' -m 'COPY'
*.doc   -k 'b' -m 'COPY'
*.ear   -k 'b' -m 'COPY'
*.exe   -k 'b' -m 'COPY'
*.gif   -k 'b' -m 'COPY'
*.gz    -k 'b' -m 'COPY'
*.hqx   -k 'b' -m 'COPY'
*.jar   -k 'b' -m 'COPY'
*.jpeg  -k 'b' -m 'COPY'
*.jpg   -k 'b' -m 'COPY'
*.mov   -k 'b' -m 'COPY'
*.mp3   -k 'b' -m 'COPY'
*.mpg   -k 'b' -m 'COPY'
*.pdf   -k 'b' -m 'COPY'
*.png   -k 'b' -m 'COPY'
*.ppt   -k 'b' -m 'COPY'
*.rpm   -k 'b' -m 'COPY'
*.sit   -k 'b' -m 'COPY'
*.srpm  -k 'b' -m 'COPY'
*.swf   -k 'b' -m 'COPY'
*.tar   -k 'b' -m 'COPY'
*.tbz   -k 'b' -m 'COPY'
*.tgz   -k 'b' -m 'COPY'
*.tif   -k 'b' -m 'COPY'
*.tiff  -k 'b' -m 'COPY'
*.war   -k 'b' -m 'COPY'
*.xbm   -k 'b' -m 'COPY'
*.xls   -k 'b' -m 'COPY'
*.zip   -k 'b' -m 'COPY'

Last modified on: Saturday, 18-Sep-2010 18:16:17 EDT
Page Count: 110868