<http://www.netikka.net/tsneti/info/tscmd028.htm>
Copyright © 2003-2009 by Prof. Timo Salmi  
Last modified Mon 28-Dec-2009 21:10:40

 
Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index
 

This page is edited from the 1cmdfaq.txt faq-file contained in my tscmd.zip command line interface (CLI) collection. That zipped file has much additional material, including a number of detached .cmd script files. It is recommended that you also get the zipped version as a companion.

Please see "The Description and the Index page" for the conditions of usage and other such information.



28} How to convert a file written in IBM PC characters into LATIN1?

This is another task, which is best suited for SED, the Stream EDitor.
  @echo off & setlocal enableextensions
  ::
  :: Covert IBM PC characters into LATIN1 characters
  :: Requires SED.EXE
  ::
  :: Make a test file with PC characters
  echo This is a test file in Finnish (which uses Scandinavian characters)>testin.txt
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>testin.txt
  echo Lisää: åäö ÅÄÖ>>testin.txt
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>testin.txt
  ::
  :: Optionally, create a SED command file
  echo.y/åäöüéçæñÅÄÖÜÉÇÆÑ/\xE5\xE4\xF6\xFC\xE9\xE7\xE6\xF1\xC5\xC4\xD6\xDC\xC9\xC7\xC6\xD1/>"%TEMP%\IBM2LAT1.SED"
  ::
  :: Do the conversion
  sed -f"%TEMP%\IBM2LAT1.SED" testin.txt > testout.txt
  ::
  :: See that the result is what one expected
  notepad testout.txt
  ::
  :: Clean up
  for %%f in ("%TEMP%\IBM2LAT1.SED" testin.txt testout.txt) do (
    if exist %%f del %%f)
  endlocal & goto :EOF

The contents of testout.txt:
  This is a test file in Finnish (which uses Scandinavian characters)
  Tämä on testitiedosto ääkösten testaamiseksi.
  Lisää: åäö ÅÄÖ
  åäöüéçæñÅÄÖÜÉÇÆÑ

A catch if you are using the sed.exe from sed15x.zip as mostly in this FAQ. The file names must be in the SFN 8+3 format. If we use the GnuWin32 sed.exe instead, that will not pose a problem. In below, the sed.exe has been renamed as unxsed.exe

  @echo off & setlocal enableextensions
  ::
  :: Covert IBM PC characters into LATIN1 characters
  :: Requires SED.EXE (renamed here to UNXSED.EXE)
  ::
  :: Make a test file with PC characters
  echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
  echo Lisää: åäö ÅÄÖ>>"testin.txt"
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
  ::
  :: Another way to build the SED command file
  set temp_=%temp%
  if defined mytemp if exist "%mytemp%\" set temp_=%mytemp%
  set sedcmd=%temp_%\sedcmd.tmp
  echo s/å/\xE5/g >  "%sedcmd%"
  echo s/ä/\xE4/g >> "%sedcmd%"
  echo s/ö/\xF6/g >> "%sedcmd%"
  echo s/ü/\xFC/g >> "%sedcmd%"
  echo s/é/\xE9/g >> "%sedcmd%"
  echo s/ç/\xE7/g >> "%sedcmd%"
  echo s/æ/\xE6/g >> "%sedcmd%"
  echo s/ñ/\xF1/g >> "%sedcmd%"
  echo s/Å/\xC5/g >> "%sedcmd%"
  echo s/Ä/\xC4/g >> "%sedcmd%"
  echo s/Ö/\xD6/g >> "%sedcmd%"
  echo s/Ü/\xDC/g >> "%sedcmd%"
  echo s/É/\xC9/g >> "%sedcmd%"
  echo s/Ç/\xC7/g >> "%sedcmd%"
  echo s/Æ/\xC6/g >> "%sedcmd%"
  echo s/Ñ/\xD1/g >> "%sedcmd%"
  ::
  :: Do the conversion
  unxsed --text -f"%sedcmd%" "testin.txt" > "testout.txt"
  ::
  :: See that the result is what one expected
  notepad "testout.txt"
  ::
  :: Clean up
  for %%f in ("%sedcmd%" "testin.txt" "testout.txt") do (
    if exist %%f del %%f)
  endlocal & goto :EOF

The problem can also be solved with a Visual Basic Script. VBScript has the advantage of being a part of the original XP command environment. On the other hand the solution is clearly more complicated than the very simple SED solution. Anyway, first, cut and the following scrip. Name it e.g. IBM2LAT1.VBS and the call it using

  CSCRIPT //NOLOGO "IBM2LAT1.VBS" < "MYIBM.TXT" > "MYLATIN1.TXT"

  ' IBM2LAT1.VBS by Prof. Timo Salmi
  '
  ' Define the relevant characters
  Const IbmChar  = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  Const Lat1Char = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  '
  ' Define StandardIn and StandardOut
  Dim StdIn, StdOut
  Set StdIn = WScript.StdIn
  Set StdOut = WScript.StdOut
  '
  ' Convert one IBM character to Latin1
  Function CharIbm2Lat1(char)
    Dim p
    p = Instr (1, IbmChar, char, 1)
    If p > 0 Then
      CharIbm2Lat1 = Mid(Lat1Char, p, 1)
    Else
      CharIbm2Lat1 = char
    End If
  End Function
  '
  ' Convert a string
  Function Ibm2Lat1(str1)
    Dim str2
    For i = 1 To Len(str1)
      str2 = str2 & CharIbm2Lat1(Mid(str1,i,1))
    Next
    Ibm2Lat1 = str2
  End Function
  '
  ' Convert the input
  Dim str
  Do While Not StdIn.AtEndOfStream
    str = StdIn.ReadLine
    StdOut.WriteLine Ibm2Lat1(str)
  Loop

It is easy to see that other, similar conversion tasks can be done with the same methods after just some slight customization. To take the most obvious example, consider the conversion into the other direction:
  ' C:\_F\CMD\IBM2LAT1.VBS by Prof. Timo Salmi
  ' Usage: CSCRIPT //NOLOGO "LAT12IBM.VBS" < "MYLATIN1.TXT" > "MYIBM.TXT"
  '
  ' Define the relevant characters
  Const Lat1Char = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  Const IbmChar  = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  '
  ' Define StandardIn and StandardOut
  Dim StdIn, StdOut
  Set StdIn = WScript.StdIn
  Set StdOut = WScript.StdOut
  '
  ' Convert one IBM character to Latin1
  Function CharLat12Ibm(char)
    Dim p
    p = Instr (1, Lat1Char, char, vbBinaryCompare)
    if p > 0 Then
      CharLat12Ibm = Mid(IbmChar, p, 1)
    Else
      CharLat12Ibm = char
    End If
  End Function
  '
  ' Convert a string
  Function Lat12Ibm(str1)
    Dim str2
    For i = 1 To Len(str1)
      str2 = str2 & CharLat12Ibm(Mid(str1,i,1))
    Next
    Lat12Ibm = str2
  End Function
  '
  ' Convert the input
  Dim str
  Do While Not StdIn.AtEndOfStream
    str = StdIn.ReadLine
    StdOut.WriteLine Lat12Ibm(str)
  Loop


ANSI vs. UNICODE

There is another feature that factors in. That is how the new instance of the Windows XP command interpreter is called. See CMD /? for the options. Essentially
   /A Causes the output of internal commands to a pipe or file to be ANSI
   /U Causes the output of internal commands to a pipe or file to be Unicode
It appreas that /A would be the default.

Using the example of this item we could write
  @echo off & setlocal enableextensions
  ::
  :: Covert ANSI characters into UNICODE characters
  ::
  :: Make a test file with PC characters (assuming that CMD has been thus invoked)
  echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
  echo Lisää: åäö ÅÄÖ>>"testin.txt"
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
  ::
  :: Do a conversion to Unicode (/a would be to ANSI)
  cmd /u /c type "testin.txt" > "testout.txt"
  ::
  :: See that the result is what one expected
  notepad "testout.txt"
  ::
  :: Clean up
  for %%f in ("testin.txt" "testout.txt") do if exist %%f del %%f
  endlocal & goto :EOF

with Notepad we get


but in (a default opened) CLI we get


The above rises an additonal questions. What actually is in the above file? In HEX


It is obvious that there is much padding with the nul 00 characters. If you wish to filter them, the easiest solution is to use a UNIX tr port. Let's rename it unxtr.exe for identification. Then
  @echo off & setlocal enableextensions
  type "testout.txt"|unxtr -d \000
  endlocal & goto :EOF
will give


However, if a TR.EXE port is not available, a Visual Basic Script (VBScript) aided command line script can be applied:
  @echo off & setlocal enableextensions
  ::
  :: Build a Visual Basic Script and run it
  set vbs_="%temp%\tmp$$$.vbs"
  set skip=
  findstr "'%skip%VBS" "%~f0" > %vbs_%
  cscript //nologo %vbs_%
  ::
  :: Clean up
  for %%f in (%vbs_%) do if exist %%f del %%f
  endlocal & goto :EOF
  '
  'The Visual Basic Script
  Dim StdIn, StdOut, char, chr0 'VBS
  Set StdIn = WScript.StdIn 'VBS
  Set StdOut = WScript.StdOut 'VBS
  '
  chr0 = Chr(0) 'VBS
  Do While Not StdIn.AtEndOfStream 'VBS
    char = StdIn.Read(1) 'VBS
    If char <> chr0 Then 'VBS
      StdOut.Write char 'VBS
    End If 'VBS
  Loop 'VBS

  Usage: cmdfaq < "testout.txt"

You might also find of interest the information given by Windows Character Map C:\WINDOWS\system32\charmap.exe

References/Comments: (If a Google message link fails try the links within the brackets.)
  ISO/IEC 8859-1 Wikipedia
  Windows Code Pages
  hh ntcmds.chm::/cmd.htm
  VBScript InStr Function
  Google Groups May 20 2008, 5:44 pm [M]
  Google Groups May 25 2008, 10:43 am [M]