[Oberon] all in one git tree

Mon Dec 28 00:52:42 CET 2020

Hi,

> Thirdly, Git is being used for a huge number of projects for which it
> is overkill. It is like Perl: a sort of Swiss Army Chainsaw with a
> tonne of functionality that most of its users don't need, and for whom
> it merely makes like overcomplicated.

Interestingly at its core git is rather simple and robust. Whenever a 
file is added or is changed, the whole file is stored in the repository 
(i.e. what you find in the .git directory) in a file having its SHA1 
checksum as name.

For every commit typically two more files are created:

  * one which maps the location of all files i.e. there original path
    and file name to that SHA1 checksum of the file stored in the
    repository. This is called a tree file and its name is again the
    SHA1 hash of its content.
  * one which contains the hash of above mentioned tree file and a hash
    of the preceeding commit file(s) and useful information like date,
    name of the person who did the comit, a comit message etc. This is
    called a commit file and its name is again the SHA1 hash of its content.

A branch is represented by a file having the branch name as file name 
and holding just the SHA1 hash of the most recent commit file.

Thus if e.g. you modify two files then commit, four new files are 
created in the repository:

  * Two files having the complete new content of the two files you have
    changed.
  * A tree file which is basically a copy of the last tree file except
    for the two entries pertaining to your two changed files that now
    have the new checksum in its mapping,
  * A commit file which refers to the new tree file and has a reference
    to the last commit file.

As a side note, I found it very insightful to do an exercise where I 
created an empty git repository with git init /tmp/X then looked at all 
the files created (looking at the content either with a hex editor or 
git cat-file -p then git add a file having a look at what changed then 
doing a commit etc. and so understand what the commands do. Actually 
seeing what the git commands I use do behind the scenes helped me a lot 
to understand how to properly use them.

With regard to someting like Oberon Text files or non plain text files 
in general, keeping the content in the repository to just have snapshots 
of individual states of these files over time works perfectly with git.

It starts to get tricky once you want to see what changed between two 
commits. git knows what files have changed and it knows for each file 
the old and the new content, but then needs a way to show to the user 
what has changed. An easy task for a plain text file (using the unix 
diff command) but more difficult for something else. First the diff 
command needs an understanding of a more complex file format and 
furthermore it needs to be able to illustrate to the user in an 
understandable way the change. It also depends a lot of what the user is 
interested to see. Is it ok to just show that a statement was inserted, 
or is the user also interested to see that an otherwise unchanged 
statements is now set in a different font?

It gets even more difficult, once you consider merging changes made on 
one branch onto another one. Not only do you need a tool that can create 
the difference between two commits, you need additionally a tool that 
can detect whether the change can easily/automatically be applied to a 
target file, and in such case knows how to apply the change. When it 
detects that a change cannot be applied automatically it needs a way to 
present the issue to the user, so the user can see and understand what 
needs to be done. Finally it needs a way so the user can give back its 
decision on what the outcome of the merge shall be.

By choosing to (properly) support plain text files only, you avoid these 
difficulties and that is what git does, given that its first use case 
was plain text files anyway.

claudio