[Oberon] Text search is THE main task.
Douglas G. Danforth
danforth at greenwoodfarm.com
Sat Jun 12 18:35:45 CEST 2004
Chris et al,
I have released to the BlackBox community the module GftSearch
(Gft = Greenwood Farm Technologies, LLC) which searches in parallel for
any set of unambiguous strings. It examines each input character only
once (linear search time) and seems to be 4 times faster than the
BlackBox search engine.
The user must implement two procedures: GetChar and Report.
In Component Pascal there are extensions of ABSTRACT procedures.
Since I have lost touch with the NO/Bluebottle development I can simply
change those abstract procedures to procedure pointers for you guys.
The reason for making these procedures user setable is so you can not
only search files but also memory, or the internet directly (you
implement how to get the characters). You also specify what to do
(Report) with a string match when it happens.
I have also implemented GftSearchFile and GftSearchFiles but those are
much more deeply embedded in the BlackBox framework. I'll give a shot
at converting GftSearchFile (but probablly not GftSearchFiles). From
the template of GftSearchFile you should be quickly able to implement
GftSearchFiles.
This algoritm comes from Stanford and work by a graduate student under
Donald Knuth.
Chris, I can send you my drafts and you can tell me if they compile
under your system.
-Doug Danforth
easlab at absamail.co.za wrote:
> Who needs pre-emptive multitasking except for real-time control
> systems ?
>
> Newsgroups: comp.lang.oberon
> Subject: Re: Questions from Unix world...
>
>
>>WLad wrote:
>>
>>>Dear colleague!
>>>
>>>I show Aos to one Plan9 programmer.
>>>He asked some questions:
>>>
>>> > in Oberon/bluebottle, do we have
>>> > 1. something like 'grep' to search regular expressions (or the like)
>>> within textfiles;
>>
> (jmdrake) wrote:
>
>>Yes. There is a package called "RX" that comes with Oberon System 3, NO
>>and BlueBottle. See:
>>
>>http://www.oberon.ethz.ch/software/RX.html
>>
>>Also there is another regular expressions package called "Regul".
>>
>>http://www.oberon.ethz.ch/software/Regul.html
>>
>
> I was persuaded to fetch this, because I didn't realise that it
> couldn't do [as described below] what is needed.
> It's a monster, because it is part of the meta compiler: Bable;
> and needs several other packages. You must buy the complete
> automobile if you want to test the cigarette lighter !
>
>
>>> > 2. something like 'awk' to work with flatfile databases, or some
>>> other database system;
>>
>>Nothing like "awk" exists, but you can make calls to the RX module
>>from inside another program, do pattern matching and accomplish much
>>of what someone might do with awk inside a simple Oberon program.
>
>
> Efficient searching for text is undoubtely becoming a key for
> productivety, beside minimising possible great frustration.
> As a 'heavy' user of NO, this is what helps me [PLUS what I still need]:-
> * file spaces are organised by topic into disk-partitions: say 5 to 20.
> * new incoming 'articles' get appended to their appropriate file,
> under their 'chapter number & title' and the 'chapter number & title'
> appears at the file header as an index thus:
> file = AImisc:ExprtSys4Legal
> 1. Overview of AI Research Groups in The Netherlands
> 2. Legal advisory system - EDI within EC
> 3. A Pragmatic Legal Expert System
> 4. Some URLs
> 5. About Legal informatics
> 6. expert systems
> 7. lexit.at/resource
> 8. A Pragmatic Legal Expert System
> 9. Book: Representation and Reasoning in Law
>
> Guided by colours-info [qed with NO], its a 'wipe & a click'
> [750 ms.] to get to the beginning of the "6. expert systems" chapter.
>
> * For searching a <text string> in a partition:-
> in order to avoid half the redundant *.Bak files, have a pre-build
> file list in eg. ACTnRule:SearchTemplate which looks like this:---
> ACTnRule:SearchTemplate ==
> Updated 2004 Jan 25 Find.All ^
>
> Find.Domain
> ACTnRule:AI.ES.4law
> ACTnRule:AbortionAct
> ACTnRule:Aces2InfoAct
> ACTnRule:AmdCompaniesAct
> ...
> ACTnRule:UsuryAct
> ACTnRule:WillsAct.html
> ~ ---- end of file: ACTnRule:SearchTemplate
>
> With nicely coloured for hi-lighting of the 2 commands:
> 'Find.Domain' selects the set of files [which was created by
> System.Directory <partitonID.>:* and Store the Directory and
> RX.Grep Directory \i ".Bak"; to remove the redundany *.Bak].
> And select <keyString> , click 'Find.All ^' rapidly list all files of the
> partition containing <keyString>.
>
> What is needed !!
> List all files [in the list of files {which BTW can be further partitioned
> by date - using SmartDir.Tool ; it's very common to want limit access
> to files that you've been working on recently} ] containg <string1>
> and <string2> ...<stringN> in any order in files of the file set.
>
> I mistakenly though that I could hack this function by running the
> output of 'Find.All ^'
> [a text of lines of: <FileID> <Tab> <subString containing Key> ]
> through RX to remove the <Tab> <subString containing Key>,
> and then use the reduced set of 'satisfying files' to test for the
> next key.
>
> Of course this is no good since multiple finds of a single key
> produce multiple FileIDs for the next stage.
> What I need is another quick hack to remove duplicate lines,
> in the text if one-FileID-per-line. Any solutions welcomed !
>
> Alternatively, it would be valuable if someone [ELSE ;-] could add
> a command to Find.Mod [perhaps by extending existing code] to
> search for multiple strings, in a short-circuited-AND way.
> And just output the fileIDs which pass: contain ALL of the strings
> - in any oder.
>
> Related to this: it looks as if "western civilisation" is getting overly
> dependent on google ? The dissapearance of the facilities would
> be disasterous for me.
>
> == Chris Glur.
>
> --
> Oberon at inf.ethz.ch mailing list for ETH Oberon and related systems
> https://www.mail.inf.ethz.ch/lists/listinfo/oberon
>
>
More information about the Oberon
mailing list