<http://www.netikka.net/tsneti/info/tscmd028.htm>
Copyright © 2003- by Prof. Timo Salmi  
Last modified Sat 28-Jan-2017 11:08:31

 
Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index
 

This page is edited from the 1cmdfaq.txt faq-file contained in my tscmd.zip command line interface (CLI) collection. That zipped file has much additional material, including a number of detached .cmd script files. It is recommended that you also get the zipped version as a companion.

Please see "The Description and the Index page" for the conditions of usage and other such information.



28} How to convert a file written in IBM PC characters into LATIN1? (And vice versa)

I'll insert here at the top the essence of the lastest solution which I use in my personal scripts. It utilizes the 32-bit UNIX port tr from UnxUpdates. I have renamed it unxtr.exe. Here are my own two subroutines. (My note: C:\_F\XTOOLS\CSVNAMES.CMD)
  :Latin1ToIbm
  set filein_=%~1
  set fileout_=%~2
  set octset_=\206\204\224\201\202\207\221\244\217\216\231\232\220\200\222\245\240\242
  "C:\_F\FTOOLS\unxtr.exe" %octset_% < "%filein_%" > "%fileout_%"
  goto :EOF

  :IbmToLatin1
  set filein_=%~1
  set fileout_=%~2
  set octset_=\206\204\224\201\202\207\221\244\217\216\231\232\220\200\222\245\240\242
  "C:\_F\FTOOLS\unxtr.exe" %octset_% < "%filein_%" > "%fileout_%"
  goto :EOF



Previous information: This is another task, which is best suited for SED, the Stream EDitor.
  @echo off & setlocal enableextensions
  ::
  :: Covert IBM PC characters into LATIN1 characters
  :: Requires SED.EXE
  ::
  :: Make a test file with PC characters
  echo This is a test file in Finnish (which uses Scandinavian characters)>testin.txt
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>testin.txt
  echo Lisää: åäö ÅÄÖ>>testin.txt
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>testin.txt
  ::
  :: Optionally, create a SED command file
  echo.y/åäöüéçæñÅÄÖÜÉÇÆÑ/\xE5\xE4\xF6\xFC\xE9\xE7\xE6\xF1\xC5\xC4\xD6\xDC\xC9\xC7\xC6\xD1/>"%TEMP%\IBM2LAT1.SED"
  ::
  :: Do the conversion
  sed -f"%TEMP%\IBM2LAT1.SED" testin.txt > testout.txt
  ::
  :: See that the result is what one expected
  notepad testout.txt
  ::
  :: Clean up
  for %%f in ("%TEMP%\IBM2LAT1.SED" testin.txt testout.txt) do (
    if exist %%f del %%f)
  endlocal & goto :EOF

The contents of testout.txt:
  This is a test file in Finnish (which uses Scandinavian characters)
  Tämä on testitiedosto ääkösten testaamiseksi.
  Lisää: åäö ÅÄÖ
  åäöüéçæñÅÄÖÜÉÇÆÑ

A catch if you are using the sed.exe from sed15x.zip as mostly in this FAQ. The file names must be in the SFN 8+3 format. If we use the GnuWin32 sed.exe instead, that will not pose a problem. In below, the sed.exe has been renamed as unxsed.exe

  @echo off & setlocal enableextensions
  ::
  :: Covert IBM PC characters into LATIN1 characters
  :: Requires SED.EXE (renamed here to UNXSED.EXE)
  ::
  :: Make a test file with PC characters
  echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
  echo Lisää: åäö ÅÄÖ>>"testin.txt"
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
  ::
  :: Another way to build the SED command file
  set temp_=%temp%
  if defined mytemp if exist "%mytemp%\" set temp_=%mytemp%
  set sedcmd=%temp_%\sedcmd.tmp
  echo s/å/\xE5/g >  "%sedcmd%"
  echo s/ä/\xE4/g >> "%sedcmd%"
  echo s/ö/\xF6/g >> "%sedcmd%"
  echo s/ü/\xFC/g >> "%sedcmd%"
  echo s/é/\xE9/g >> "%sedcmd%"
  echo s/ç/\xE7/g >> "%sedcmd%"
  echo s/æ/\xE6/g >> "%sedcmd%"
  echo s/ñ/\xF1/g >> "%sedcmd%"
  echo s/Å/\xC5/g >> "%sedcmd%"
  echo s/Ä/\xC4/g >> "%sedcmd%"
  echo s/Ö/\xD6/g >> "%sedcmd%"
  echo s/Ü/\xDC/g >> "%sedcmd%"
  echo s/É/\xC9/g >> "%sedcmd%"
  echo s/Ç/\xC7/g >> "%sedcmd%"
  echo s/Æ/\xC6/g >> "%sedcmd%"
  echo s/Ñ/\xD1/g >> "%sedcmd%"
  ::
  :: Do the conversion
  unxsed --text -f"%sedcmd%" "testin.txt" > "testout.txt"
  ::
  :: See that the result is what one expected
  notepad "testout.txt"
  ::
  :: Clean up
  for %%f in ("%sedcmd%" "testin.txt" "testout.txt") do (
    if exist %%f del %%f)
  endlocal & goto :EOF

Let's write the solution again with a couple of twists:
  @echo off & setlocal enableextensions
  ::
  :: Covert IBM PC characters into LATIN1 characters
  :: Requires a TR.EXE Unix port (renamed here to UNXTR.EXE)
  ::
  :: Assign a value for the temporary folder variable temp_

  call :AssignTemp temp_
  ::
  :: Make a test file with PC characters

  > "%temp_%\testin.txt" echo This is a test file in Finnish (which uses Scandinavian characters)
  >>"%temp_%\testin.txt" echo Tm on testitiedosto ksten testaamiseksi.
  >>"%temp_%\testin.txt" echo Lis:
  >>"%temp_%\testin.txt" echo
  ::
  :: Which text filtering program to use (UnxUtils tr)

  set filter_=unxtr.exe
  call :IsAtPath "%filter_%" file_
  if not defined file_ (
    echo.
    echo File "%filter_%" not found at path or in the current folder
    call :CleanUp
    goto :EOF)
  ::
  :: Do the conversion (octal coding)
  rem                                                             

  set octset_=\206\204\224\201\202\207\221\244\217\216\231\232\220\200\222\245\240\242
  "%filter_%" %octset_% < "%temp_%\testin.txt" > "%temp_%\testout.txt"
  ::
  :: See that the result is what we expected

  notepad "%temp_%\testout.txt"
  ::
  call :CleanUp
  endlocal & goto :EOF
  ::
  :: ==========================================================

  :AssignTemp
  setlocal
  set return_=%temp%
  if defined mytemp if exist "%mytemp%\" set return_=%mytemp%
  endlocal & set "%1=%return_%" & goto :EOF
  ::
  :IsAtPath SearchFor found_
  setlocal enableextensions disabledelayedexpansion
  set found_=
  for %%f in ("%~1") do set found_="%%~$PATH:f"
  if exist "%~1" set found_="%~1"
  if [%found_%]==[""] set found_=
  endlocal & set "%~2=%found_%" & goto :EOF
  ::
  :CleanUp
  for %%f in ("%temp_%\testin.txt" "%temp_%\testout.txt") do (
    if exist %%f del %%f)
  goto :EOF

The contents of "%temp_%\testout.txt":
  This is a test file in Finnish (which uses Scandinavian characters)
  Tämä on testitiedosto ääkösten testaamiseksi.
  Lisää: åäö ÅÄÖ
  åäöüéçæñÅÄÖÜÉÇÆÑ

The problem can also be solved with a Visual Basic Script. VBScript has the advantage of being a part of the original XP command environment. On the other hand the solution is clearly more complicated than the very simple SED solution. Anyway, first cut and paste the following script. Name it e.g. IBM2LAT1.VBS and then call it using

  CSCRIPT //NOLOGO "IBM2LAT1.VBS" < "MYIBM.TXT" > "MYLATIN1.TXT"

  ' IBM2LAT1.VBS by Prof. Timo Salmi
  '
  ' Define the relevant characters
  Const IbmChar  = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  Const Lat1Char = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  '
  ' Define StandardIn and StandardOut
  Dim StdIn, StdOut
  Set StdIn = WScript.StdIn
  Set StdOut = WScript.StdOut
  '
  ' Convert one IBM character to Latin1
  Function CharIbm2Lat1(char)
    Dim p
    p = Instr (1, IbmChar, char, 1)
    If p > 0 Then
      CharIbm2Lat1 = Mid(Lat1Char, p, 1)
    Else
      CharIbm2Lat1 = char
    End If
  End Function
  '
  ' Convert a string
  Function Ibm2Lat1(str1)
    Dim str2
    For i = 1 To Len(str1)
      str2 = str2 & CharIbm2Lat1(Mid(str1,i,1))
    Next
    Ibm2Lat1 = str2
  End Function
  '
  ' Convert the input
  Dim str
  Do While Not StdIn.AtEndOfStream
    str = StdIn.ReadLine
    StdOut.WriteLine Ibm2Lat1(str)
  Loop

It is easy to see that other, similar conversion tasks can be done with the same methods after just some slight customization. To take the most obvious example, consider the conversion into the other direction:
  ' C:\_F\CMD\IBM2LAT1.VBS by Prof. Timo Salmi
  ' Usage: CSCRIPT //NOLOGO "LAT12IBM.VBS" < "MYLATIN1.TXT" > "MYIBM.TXT"
  '
  ' Define the relevant characters
  Const Lat1Char = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  Const IbmChar  = "åäöüéçæñÅÄÖÜÉÇÆÑ"
  '
  ' Define StandardIn and StandardOut
  Dim StdIn, StdOut
  Set StdIn = WScript.StdIn
  Set StdOut = WScript.StdOut
  '
  ' Convert one IBM character to Latin1
  Function CharLat12Ibm(char)
    Dim p
    p = Instr (1, Lat1Char, char, vbBinaryCompare)
    if p > 0 Then
      CharLat12Ibm = Mid(IbmChar, p, 1)
    Else
      CharLat12Ibm = char
    End If
  End Function
  '
  ' Convert a string
  Function Lat12Ibm(str1)
    Dim str2
    For i = 1 To Len(str1)
      str2 = str2 & CharLat12Ibm(Mid(str1,i,1))
    Next
    Lat12Ibm = str2
  End Function
  '
  ' Convert the input
  Dim str
  Do While Not StdIn.AtEndOfStream
    str = StdIn.ReadLine
    StdOut.WriteLine Lat12Ibm(str)
  Loop


ANSI vs. UNICODE

There is another feature that factors in. That is how the new instance of the Windows XP command interpreter is called. See CMD /? for the options. Essentially
   /A Causes the output of internal commands to a pipe or file to be ANSI
   /U Causes the output of internal commands to a pipe or file to be Unicode
It appreas that /A would be the default.

Using the example of this item we could write
  @echo off & setlocal enableextensions
  ::
  :: Covert ANSI characters into UNICODE characters
  ::
  :: Make a test file with PC characters (assuming that CMD has been thus invoked)
  echo This is a test file in Finnish (which uses Scandinavian characters)>"testin.txt"
  echo Tämä on testitiedosto ääkösten testaamiseksi.>>"testin.txt"
  echo Lisää: åäö ÅÄÖ>>"testin.txt"
  echo åäöüéçæñÅÄÖÜÉÇÆÑ>>"testin.txt"
  ::
  :: Do a conversion to Unicode (/a would be to ANSI)
  cmd /u /c type "testin.txt" > "testout.txt"
  ::
  :: See that the result is what one expected
  notepad "testout.txt"
  ::
  :: Clean up
  for %%f in ("testin.txt" "testout.txt") do if exist %%f del %%f
  endlocal & goto :EOF

with Notepad we get


but in (a default opened) CLI we get


The above rises an additonal questions. What actually is in the above file? In HEX


It is obvious that there is much padding with the nul 00 characters. If you wish to filter them, the easiest solution is to use a UNIX tr port. Let's rename it unxtr.exe for identification. Then
  @echo off & setlocal enableextensions
  type "testout.txt"|unxtr -d \000
  endlocal & goto :EOF
will give


However, if a TR.EXE port is not available, a Visual Basic Script (VBScript) aided command line script can be applied:
  @echo off & setlocal enableextensions
  ::
  :: Build a Visual Basic Script and run it
  set vbs_="%temp%\tmp$$$.vbs"
  set skip=
  findstr "'%skip%VBS" "%~f0" > %vbs_%
  cscript //nologo %vbs_%
  ::
  :: Clean up
  for %%f in (%vbs_%) do if exist %%f del %%f
  endlocal & goto :EOF
  '
  'The Visual Basic Script
  Dim StdIn, StdOut, char, chr0 'VBS
  Set StdIn = WScript.StdIn 'VBS
  Set StdOut = WScript.StdOut 'VBS
  '
  chr0 = Chr(0) 'VBS
  Do While Not StdIn.AtEndOfStream 'VBS
    char = StdIn.Read(1) 'VBS
    If char <> chr0 Then 'VBS
      StdOut.Write char 'VBS
    End If 'VBS
  Loop 'VBS

  Usage: cmdfaq < "testout.txt"

You might also find of interest the information given by Windows Character Map C:\WINDOWS\system32\charmap.exe

References/Comments: (If a Google message link fails try the links within the brackets.)
  ISO/IEC 8859-1 Wikipedia
  Windows Code Pages
  ISO Latin 1 Character Entries 
  hh ntcmds.chm::/cmd.htm
  VBScript InStr Function
  Google Groups May 20 2008, 5:44 pm [M]
  Google Groups May 25 2008, 10:43 am [M]