<http://www.netikka.net/tsneti/info/tscmd028.htm>
Copyright © 2003-2009 by Prof. Timo Salmi  
Last modified Wed 18-Feb-2009 15:30:49

 
Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index
 

This page is edited from the 1cmdfaq.txt faq-file contained in my tscmd.zip command line interface (CLI) collection. That zipped file has much additional material, including a number of detached .cmd script files. It is recommended that you also get the zipped version as a companion.

Please see "The Description and the Index page" for the conditions of usage and other such information.



28} How to convert a file written in IBM PC characters into LATIN1?

This is another task, which is best suited for SED, the Stream EDitor.
@echo off & setlocal enableextensions
::
:: Covert IBM PC characters into LATIN1 characters
:: Requires SED.EXE
::
:: Make a test file with PC characters
echo This is a test file in Finnish (which uses Scandinavian characters)>testin.txt
echo Tämä on testitiedosto ääkösten testaamiseksi.>>testin.txt
echo Lisää: åäö ÅÄÖ>>testin.txt
echo åäöüéçæñÅÄÖÜÉÇÆÑ>>testin.txt
::
:: Optionally, create a SED command file
echo.y/åäöüéçæñÅÄÖÜÉÇÆÑ/\xE5\xE4\xF6\xFC\xE9\xE7\xE6\xF1\xC5\xC4\xD6\xDC\xC9\xC7\xC6\xD1/>"%TEMP%\IBM2LAT1.SED"
::
:: Do the conversion
sed -f"%TEMP%\IBM2LAT1.SED" testin.txt > testout.txt
::
:: See that the result is what one expected
notepad testout.txt
::
:: Clean up
for %%f in ("%TEMP%\IBM2LAT1.SED" testin.txt testout.txt) do (
  if exist %%f del %%f)
endlocal & goto :EOF

The contents of testout.txt:
This is a test file in Finnish (which uses Scandinavian characters)
Tämä on testitiedosto ääkösten testaamiseksi.
Lisää: åäö ÅÄÖ
åäöüéçæñÅÄÖÜÉÇÆÑ

A catch if you are using the sed.exe from sed15x.zip as mostly in this FAQ. The file names must be in the SFN 8+3 format. If we use the GnuWin32 sed.exe instead, that will not pose a problem. In below, the sed.exe has been renamed as unxsed.exe

@echo off & setlocal enableextensions
::
:: Covert IBM PC characters into LATIN1 characters
:: Requires SED.EXE (renamed here to UNXSED.EXE)
::
:: Make a test file with PC characters
echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
echo Lisää: åäö ÅÄÖ>>"testin.txt"
echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
::
:: Another way to build the SED command file
set temp_=%temp%
if defined mytemp if exist "%mytemp%\" set temp_=%mytemp%
set sedcmd=%temp_%\sedcmd.tmp
echo s/å/\xE5/g >  "%sedcmd%"
echo s/ä/\xE4/g >> "%sedcmd%"
echo s/ö/\xF6/g >> "%sedcmd%"
echo s/ü/\xFC/g >> "%sedcmd%"
echo s/é/\xE9/g >> "%sedcmd%"
echo s/ç/\xE7/g >> "%sedcmd%"
echo s/æ/\xE6/g >> "%sedcmd%"
echo s/ñ/\xF1/g >> "%sedcmd%"
echo s/Å/\xC5/g >> "%sedcmd%"
echo s/Ä/\xC4/g >> "%sedcmd%"
echo s/Ö/\xD6/g >> "%sedcmd%"
echo s/Ü/\xDC/g >> "%sedcmd%"
echo s/É/\xC9/g >> "%sedcmd%"
echo s/Ç/\xC7/g >> "%sedcmd%"
echo s/Æ/\xC6/g >> "%sedcmd%"
echo s/Ñ/\xD1/g >> "%sedcmd%"
::
:: Do the conversion
unxsed --text -f"%sedcmd%" "testin.txt" > "testout.txt"
::
:: See that the result is what one expected
notepad "testout.txt"
::
:: Clean up
for %%f in ("%sedcmd%" "testin.txt" "testout.txt") do (
  if exist %%f del %%f)
endlocal & goto :EOF

The problem can also be solved with a Visual Basic Script. VBScript has the advantage of being a part of the original XP command environment. On the other hand the solution is clearly more complicated than the very simple SED solution. Anyway, first, cut and the following scrip. Name it e.g. IBM2LAT1.VBS and the call it using

CSCRIPT //NOLOGO "IBM2LAT1.VBS" < "MYIBM.TXT" > "MYLATIN1.TXT"

' IBM2LAT1.VBS by Prof. Timo Salmi
'
' Define the relevant characters
Const IbmChar = "åäöüéçæñÅÄÖÜÉÇÆÑ"
Const Lat1Char = "åäöüéçæñÅÄÖÜÉÇÆÑ"
'
' Define StandardIn and StandardOut
Dim StdIn, StdOut
Set StdIn = WScript.StdIn
Set StdOut = WScript.StdOut
'
' Convert one IBM character to Latin1
Function CharIbm2Lat1(char)
  Dim p
  p = Instr (1, IbmChar, char, 1)
  If p > 0 Then
    CharIbm2Lat1 = Mid(Lat1Char, p, 1)
  Else
    CharIbm2Lat1 = char
  End If
End Function
'
' Convert a string
Function Ibm2Lat1(str1)
  Dim str2
  For i = 1 To Len(str1)
    str2 = str2 & CharIbm2Lat1(Mid(str1,i,1))
  Next
  Ibm2Lat1 = str2
End Function
'
' Convert the input
Dim str
Do While Not StdIn.AtEndOfStream
  str = StdIn.ReadLine
  StdOut.WriteLine Ibm2Lat1(str)
Loop
It is easy to see that other, similar conversion tasks can be done with the same methods after just some slight customization.


ANSI vs. UNICODE

There is another feature that factors in. That is how the new instance of the Windows XP command interpreter is called. See CMD /? for the options. Essentially
   /A Causes the output of internal commands to a pipe or file to be ANSI
   /U Causes the output of internal commands to a pipe or file to be Unicode
It appreas that /A would be the default.

Using the example of this item we could write
@echo off & setlocal enableextensions
::
:: Covert ANSI characters into UNICODE characters
::
:: Make a test file with PC characters (assuming that CMD has been thus invoked)
echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
echo Lisää: åäö ÅÄÖ>>"testin.txt"
echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
::
:: Do a conversion to Unicode (/a would be to ANSI)
cmd /u /c type "testin.txt" > "testout.txt"
::
:: See that the result is what one expected
notepad "testout.txt"
::
:: Clean up
for %%f in ("testin.txt" "testout.txt") do if exist %%f del %%f
endlocal & goto :EOF

with Notepad we get


but in (a default opened) CLI we get


The above rises an additonal questions. What actually is in the above file? In HEX


It is obvious that there is much padding with the nul 00 characters. If you wish to filter them, the easiest solution is to use a UNIX tr port. Let's rename it unxtr.exe for identification. Then
@echo off & setlocal enableextensions
type "testout.txt"|unxtr -d \000
endlocal & goto :EOF
will give


However, if a TR.EXE port is not available, a Visual Basic Script (VBScript) aided command line script can be applied:
@echo off & setlocal enableextensions
::
:: Build a Visual Basic Script and run it
set vbs_="%temp%\tmp$$$.vbs"
set skip=
findstr "'%skip%VBS" "%~f0" > %vbs_%
cscript //nologo %vbs_%
::
:: Clean up
for %%f in (%vbs_%) do if exist %%f del %%f
endlocal & goto :EOF
'
'The Visual Basic Script
Dim StdIn, StdOut, char, chr0 'VBS
Set StdIn = WScript.StdIn 'VBS
Set StdOut = WScript.StdOut 'VBS
'
chr0 = Chr(0) 'VBS
Do While Not StdIn.AtEndOfStream 'VBS
  char = StdIn.Read(1) 'VBS
  If char <> chr0 Then 'VBS
    StdOut.Write char 'VBS
  End If 'VBS
Loop 'VBS

Usage: cmdfaq < "testout.txt"

You might also find of interest the information given by Windows Character Map C:\WINDOWS\system32\charmap.exe

References/Comments:
  ISO/IEC 8859-1 Wikipedia
  Windows Code Pages
  hh ntcmds.chm::/cmd.htm
  Google Groups May 20 2008, 5:44 pm
  Google Groups May 25 2008, 10:43 am