[Oberon] n-o: find files containing string-set ?
danforth at greenwoodfarm.com
danforth at greenwoodfarm.com
Sun Feb 15 19:48:22 CET 2004
Chris,
I am in the process of implementing a parallel string search algorithm
developed at Stanford in the 1970's. The grad student who did this was
working for Donald Knuth. The algorithm was simultaneously developed by
Aho-Corasick. By parallel search I mean to simultaneously look for a
set of words e.g. {FAT, FATHER, THE, HER, HERE} where some words may be
prefices, infices, for suffices of the other words. The algorithm
builds a finite state machine that examines each input character once
(no backup) and is linear in the time of the file searched independent
of the number of strings sought. I'm working in Component Pascal but
will make the algorithm available in standard Oberon-2. It should be a
couple of weeks yet until it is polished.
-Doug Danforth
easlab at absamail.co.za wrote:
> Hi,
> perhaps some one can supply the source code for this useful
> algorithm which I really am getting to need:
>
> For EachOfFileSet
> For EachOfStringSet
> IF NextString NOT in File then EXIT this file
> ListFile {which contains each of the strings in any order}
>
> Perhaps one can adapt:
> RX.Grep ( filename | "*" ) [option] RegExpr
> if "*" could handle <SpecialPartitonForThis>:*
> Except that my tests show I don't understand how to represent:
> " string1" AND "string2".
> And then one would need 2 runs to handle the order for 2 strings,
> and 6 runs for 3 strings ?
>
> BTW string searching must be one of THE greatest demands for heavy
> usage general IT. The multiple partitions which allow different
> 'concept' spaces is a great help. Additionally I find that for big
> partitons, speed-up by skipping *.Bak files is easily acheived, by
> [instead of using Find.Panel], have on hand <partnID>:SearchList .
> Where <partnID>:SearchList is made by:
> 1. System.Directory <partnID>:*
> 2. System.Store the directory
> 3. RX.Grep Directory \i ".Bak"
> 4. re-name as <partnID>:SearchList and store
>
> I have a set of:
> Configuration.DoText Mount<partitionName>
> in my main Tool.
> Each one typically has the following 6 lines:-----
>
> !OFSTools.Mount Legal AosFS IDE1#28 ~ <-- rotate Bakup 8 June 2003
>
> OFSTools.Mount Legal AosFS IDE0#28 ~
>
> ! System.Directory Legal:*
>
> ! OFSTools.Unmount Legal
>
> ! SmartDir.Directory Legal:*\r 3 <-- very useful; edit "3" to step back in time
>
> ! Legal:SearchList
> ----------------
>
> And the <partnID>:SearchList is typically a few hundred files
> wrapped thus:---
>
> Updated 2004 Jan 18 Find.All ^
>
> Find.Domain
> Legal:Absurd
> Legal:ActionDateX
> Legal:AdminJustc
> ....
> Legal:derebus.org.za
> ~
> ----------------
>
> The n-o documentation which advises having 2 or 3 partitions, show
> a lack of appreciation of the increased power of having many separate
> "concept spaces". But now one really needs to be able to find files
> containing multiple strings. Related to this need, I think that if
> google died tomorrow, the world would shake !
>
> Thanks for any help.
>
> Chris Glur.
>
> --
> Oberon at inf.ethz.ch mailing list for ETH Oberon and related systems
> https://www.mail.inf.ethz.ch/lists/listinfo/oberon
>
>
More information about the Oberon
mailing list