<http://www.netikka.net/tsneti/info/tscmd150.htm>
Copyright © 2003-2010 by Prof. Timo Salmi  
Last modified Mon 10-May-2010 14:57:56

 
Assorted NT/2000/XP/.. CMD.EXE Script Tricks
From the html version of the tscmd.zip 1cmdfaq.txt file
To the Description and the Index
 

This page is edited from the 1cmdfaq.txt faq-file contained in my tscmd.zip command line interface (CLI) collection. That zipped file has much additional material, including a number of detached .cmd script files. It is recommended that you also get the zipped version as a companion.

Please see "The Description and the Index page" for the conditions of usage and other such information.



150} How can I extract the http and ftp URL:s from an HTML file?

  @echo off & setlocal enableextensions enabledelayedexpansion
  ::
  :: Get the name of the HTML file to be processed

  set Source=%~1
  if not defined Source (
    echo +----------------------------------------------------+
    echo ^| A script to get the HTTP, FTP and FILE links       ^|
    echo ^| from an HTML page soure file                       ^|
    echo ^| By Prof. Timo Salmi, Last modified Thu 15-Feb-2007 ^|
    echo +----------------------------------------------------+
    echo.
    echo Usage %~0 [HTMLSourceFilename]
    goto :EOF)
  if not exist "%Source%" (
    echo File "%Source%" not found
    goto :EOF)
  ::
  :: An auxiliary temporary file

  set temp_=%temp%
  if defined mytemp set temp_=%mytemp%
  for /f "tokens=*" %%f in ("%temp_%") do set temp_=%%~sf
  set tempfile=%temp_%\tempfile.tmp
  set tempfil2=%temp_%\tempfil2.tmp
  for %%f in (%tempfile% %tempfil2%) do if exist %%f del %%f
  ::
  :: Substitute = with , since tokens does not work with =
  :: Customize the delim character if necessary

  set delim=,
  for /f "tokens=*" %%c in ('type "%Source%"') do (
    set lineContents=%%c
    echo !lineContents:"=%delim%!
    )>>%tempfile%
  ::
  :: Find the HTTP links

  for /f "tokens=2 delims=%delim%" %%a in ('
    find /i "HREF=%delim%http://" %tempfile%') do (
      echo %%a
      )>>%tempfil2%
  type %tempfil2%|find /i "http://"
  ::
  :: Find the FILE links

  for /f "tokens=2 delims=%delim%" %%a in ('
    find /i "HREF=%delim%file://" %tempfile%') do (
      echo %%a
      )>>%tempfil2%
  type %tempfil2%|find /i "file://"
  ::
  :: Find the FTP links

  for /f "tokens=2 delims=%delim%" %%a in ('
    find /i "HREF=%delim%ftp://" %tempfile%') do (
      echo %%a
      )>>%tempfil2%
  type %tempfil2%|find /i "ftp://"
  ::
  :: Clean up

  for %%f in (%tempfile% %tempfil2%) do if exist %%f del %%f
  endlocal & goto :EOF

The output from the tscmd.php index page's source would start with
http://www.netikka.net/tsneti/
http://www.netikka.net/tsneti/pc/link/tscmd.zip
http://groups.google.com/group/alt.msdos.batch.nt
http://garbo.uwasa.fi/pub/pc/unix/sed15x.zip
http://gnuwin32.sourceforge.net/packages/sed.htm
http://sourceforge.net/project/showfiles.php?group_id=9328
http://sourceforge.net/
http://garbo.uwasa.fi/pub/pc/unix/gawk2156.zip