[Oberon] Re: report-back on linux-oberon

easlab at absamail.co.za easlab at absamail.co.za
Mon Jun 5 07:47:38 CEST 2006


> >I also looked a Wirth's compiler-book in pdf, expecting that the text
> >could be extracted.  I see it's just a graphical image.

Koen wrote:
> - I opened the book from http://www.oberon.ethz.ch/books.html in 
> Firefox on WinXP;
> - clicked on the 'Select Tool' in the embedded pdf viewer's toolbar;
> - right clicked somewhere in the text;
> - chose 'Select All' from the context menu
> - pressed <Ctrl-C>
> - opened Plugin Oberon
> - opened a new Oberon Text
> - pressed <Ctrl-V>
> 
> That's it. Should be possible on Linux also?

Well if you consider the context of the rest of my post you'll
see what I mean ?
] Is pdf economical for images ?
] Is OCR of pdf-images done ?

Often *pdf is just ascii with fancy-formatting [like *html].
In which case the 'original text' is easily re-extractable,
like the cut <Ctrl-C> and paste <Ctrl-V> that you describe,
once the data has been through [the eqivalent of linux] pdf2txt.

But Wirth's compiler-book is an 'analog' image, like a photograph.
So it needs human intelligence or OCR [which will give some errors]
to convert it to digital - eg. ascii text.

The next question is: since ps/pdf are designed for text and our
book is only a 'analog-image', was pdf better than other image
formats ?

PS. Dijkstra [one of the other high-priests who died recently] had
his personel notes also published on the net.  From pre-computing
[what were those 'wax-papers'/rolling-machines called ?] typewriter
days.  They were calling for volunteers to transcribe [Texas Uni
AFAI-remember]. I think they initially used OCR and needed editors.

> You get the complete book in Oberon Text format, pretty 
> unreadable though imo because all formatting
> (including indentation) is gone.

I don't know that the compiler book which I'm talking about is
on the net in text, although I'd expect the original 'computer
file' to be still in archive ?

Which is the other point: if you/I get text which is not perfect, then
the process of edit/cleaning it while also colouring sections, really
helps me to absorb it.   After all, the material has only reached it's
peak value once it's added to YOUR knowledge.  Which is the top
of the heirarchy ?

Brantley Coile wrote:
> If you want to play with the source code from a Oberon text file
> on other systems, like Linux, you can easily convert the files.
> Remove all the bytes at the beginning of the file upto the word MODULE.
> Next, convert all the CRs to NLs.
> Last, either set indention in your editor to two spaces or just
> expand the tabs to 2 spaces.  (I do the first).

Yes but you don't have to do that 'manually', because others have
already made the utilities, eg. ET.OpenAscii  ^, ET.StoreAscii * ....etc.


== Chris Glur.


More information about the Oberon mailing list