[Oberon] n-o: find files containing string-set ?

danforth at greenwoodfarm.com danforth at greenwoodfarm.com
Sun Feb 15 19:48:22 CET 2004


Chris,

I am in the process of implementing a parallel string search algorithm 
developed at Stanford in the 1970's.  The grad student who did this was 
working for Donald Knuth.  The algorithm was simultaneously developed by
Aho-Corasick.  By parallel search I mean to simultaneously look for a 
set of words e.g. {FAT, FATHER, THE, HER, HERE} where some words may be 
prefices, infices, for suffices of the other words.  The algorithm 
builds a finite state machine that examines each input character once 
(no backup) and is linear in the time of the file searched independent 
of the number of strings sought. I'm working in Component Pascal but 
will make the algorithm available in standard Oberon-2.  It should be a 
couple of weeks yet until it is polished.

-Doug Danforth

easlab at absamail.co.za wrote:
> Hi, 
>    perhaps some one can supply the source code for this useful
> algorithm which I really am getting to need:
> 
> For EachOfFileSet
>    For EachOfStringSet
>       IF NextString NOT in File then EXIT this file
>    ListFile {which contains each of the strings in any order}
> 
> Perhaps one can adapt:
> RX.Grep ( filename | "*" ) [option] RegExpr
>  if "*"  could handle <SpecialPartitonForThis>:*
> Except that my tests show I don't understand how to represent:
> " string1" AND "string2".
> And then one would need 2 runs to handle the order for 2 strings,
> and 6 runs for 3 strings ?
> 
> BTW string searching must be one of THE greatest demands for heavy 
> usage general IT.   The multiple partitions which allow different
> 'concept' spaces is a great help.   Additionally I find that for big
> partitons, speed-up by skipping *.Bak files is easily acheived, by
> [instead of using Find.Panel], have on hand <partnID>:SearchList .
> Where <partnID>:SearchList is made by:
> 1.  System.Directory <partnID>:*
> 2. System.Store   the directory
> 3. RX.Grep Directory \i   ".Bak"
> 4. re-name as <partnID>:SearchList and store
> 
> I have a set of:
> Configuration.DoText Mount<partitionName>
> in my main Tool.
> Each one typically has the following 6 lines:-----
> 
> !OFSTools.Mount  Legal  AosFS  IDE1#28  ~ <-- rotate Bakup  8 June 2003 
> 
> OFSTools.Mount  Legal  AosFS IDE0#28  ~
> 
> ! System.Directory  Legal:* 
> 
> ! OFSTools.Unmount  Legal
> 
> !  SmartDir.Directory Legal:*\r 3  <-- very useful; edit "3" to step back in time
> 
> ! Legal:SearchList 
> ----------------
> 
> And the <partnID>:SearchList is typically a few hundred files 
> wrapped thus:---
> 
> Updated 2004 Jan 18  Find.All ^
> 
> Find.Domain
> Legal:Absurd
> Legal:ActionDateX
> Legal:AdminJustc
> ....
> Legal:derebus.org.za
> ~
> ----------------
> 
> The n-o documentation which advises having 2 or 3 partitions, show
> a lack of appreciation of the increased power of having many separate
> "concept spaces".  But now one really needs to be able to find files 
> containing multiple strings.      Related to this need, I think that if
> google died tomorrow, the world would shake !
> 
> Thanks for any help.
> 
> Chris Glur.
> 
> --
> Oberon at inf.ethz.ch mailing list for ETH Oberon and related systems
> https://www.mail.inf.ethz.ch/lists/listinfo/oberon
> 
> 






More information about the Oberon mailing list