开发者

How can I tell TortoiseHg to display a UTF-16 file as non-binary?

开发者 https://www.devze.com 2023-03-17 17:22 出处:网络
In a Microsoft Access 2007 project the Access form objects are exported to files with a dedicated software by using the built-in function \"SaveAsText\". This is necessary because Access doesn\'t stor

In a Microsoft Access 2007 project the Access form objects are exported to files with a dedicated software by using the built-in function "SaveAsText". This is necessary because Access doesn't store any of it's code modules in isolated files at its own.

The file starts with the bytes "FF FE" (which is UTF-16 according to http://de.wikipedia.org/wiki/B开发者_Python百科yte_Order_Mark). I presume because of many NUL characters in this file, Hg treats this file as a binary file. Hence the diff pane in the TortoiseHG workbench always tells

File or diffs not displayed: File is binary.

which is quite understandable under this assumption. But nevertheless this file is just usual source code. I can view it for example in Windows' notepad without any problems.

Is there any way to tell Mercurial, that this particular file should be treated as text, not binary?

Edit: Additionally to the marked preferred answer below I decided not to change the saving behaviour, but to use the "Visual Diff" command (select file, then press Ctrl+d) instead.


I'm guessing that you frequently or occasionally export the form objects in order to track source code changes.

The only way to convince Mercurial that a file is not binary is to avoid NUL bytes.

You may want to convert the source code files to ASCII (or maybe ANSI) encoding as an additional step in your export in order to avoid the NUL bytes. If the source code files contain Unicode characters, you might try UTF-8, as this will only do multi-byte characters when necessary and single-byte characters otherwise, thus avoiding NUL bytes again. I tried it out briefly and Mercurial handles UTF-8: it doesn't show "File is binary", but the actual diff. I committed on the commandline, but viewed the diff in TortoiseHg. I have a link about commandline encoding challenges below.

The hgrc encode/decode sections might be particularly useful in helping to filter the UTF-16 files into something that works better.

A couple other pages on Mercurial and encoding:

  • Character Encoding On Windows
  • Encoding Strategy

TortoiseHg 2.1 + Mercurial 1.9


From https://www.mercurial-scm.org/wiki/BinaryFiles:

The question naturally arises, what is a binary file anyway? It turns out there's really no good answer to this question, so Mercurial uses the same heuristic that programs like diff(1) use. The test is simply if there are any NUL bytes in a file.

For diff, export, and annotate, this will get things right almost all of the time and it will not attempt to process files it thinks are binary. If necessary, you can force these commands to treat files as text with -a.


This didn't exist at the time the question was asked, but now there's the msaccess-vcs-integration project, which exports/imports MS Access objects so that they can be version controlled.

Quote from the project's readme:

Encoding

For Access objects which are normally exported in UCS-2-little-endian encoding , the included module automatically converts to the source code to and from UTF-8 encoding during export/import; this is to ensure that you don't have trouble branching, merging, and comparing in tools such as Mercurial which treat any file containing 0x00 bytes as a non-diffable binary file.

If you export your forms and modules with this instead of directly using Access's SaveAsText function, Mercurial will not treat the files as binary.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号