[dev] [RFC] Design of a vim like text editor

From : Marc André Tanner < : Marc André Tanner < mat_AT_brain-dump.org





-- Marc André Tanner >< http://www.brain-dump.org/ >< GPG key: CF7D56C0

Received on Sat Sep 13 2014 - 16:01:15 CEST

TLDR: I'm writing an experimental but (hopefully) highly efficient vimlike text editor based on a piece chain data structure. You will findan url to a git repository at the end of this rather long mail.Help welcome!Why another text editor?========================It all started when I was recently reading the excellent Project Oberon[0],where in chapter 5 a data structure for managing text is introduced.I found this rather appealing and wanted to see how it works in practice.After some time I decided that besides just having fun hacking around Imight as well build something which could (at least in the long run)replace my current editor of choice: vim.This should be accomplished by a reasonable amount of clean (your mileagemay vary), modern and legacy free C code. Certainly not an old, 500'000lines[1] long, #ifdef cluttered mess which tries to run on all brokensystems ever envisioned by mankind.Admittedly vim has a lot of functionally, most of which I don't use. Itherefore set out with the following main goals:- Unicode aware- binary clean- handle arbitrary files (this includes large ones, think >100M SQL-dumps)- unlimited undo/redo support- syntax highlighting- regex search (and replace)- multiple file/window support- extensible and configurable through familiar config.def.h mechanismThe goal could thus be summarized as "80% of vim's features (in otherwords the useful ones) implemented in roughly 1% of the code".Finally and most importantly it is fun! Writing a text editor presentssome interesting challenges and design decisions, some of which areexplained below.Text management using a piece table/chain=========================================The core of this editor is a persistent data structure called a piecetable which supports all modifications in O(m), where m is the numberof non-consecutive editing operations. This bound could be furtherimproved to O(log m) by use of a balanced search tree, however theadditional complexity doesn't seem to be worth it, for now.The actual data is stored in buffers which are strictly append only.There are two types of buffers, a fixed-sized for the original filecontent and append-only ones one for all modifications.A text, i.e. a sequence of bytes, is represented as a double linkedlist of pieces each with a pointer into a buffer and an associatedlength. Pieces are never deleted but instead always kept around forredo/undo support. A span is a range of pieces, consisting of a startand end piece. Changes to the text are always performed by swappingout an existing, possibly empty, span with a new one.An empty document is represented by two special sentinel pieces whichalways exist:/-+ --> +-\| | | |\-+ +-----------------+ --> +-\| | | I am an editor! | | |\-+ +---------------+ --> +----------------+ --> +--+ --> +-\| | | I am an editor| |which sucks less| |! | | |\-+ +-----+ --> +--+ --> +-\| | | I am| |! | | |\-+ (shift-right), < (shift-left)those depend on handling of indention tabs spacesMovements---------h (char left)l (char right)j (line down)k (line up)0 (start of line)^ (first non-blank of line)g_ (last non-blank of line)$ (end of line)% (match bracket)b (previous start of a word)w (next start of a word)e (next end of a word)ge (previous end of a word){ (previous paragraph)} (next paragraph)( (previous sentence)) (next sentence)gg (begin of file)G (goto line or end of file)| (goto column)n (repeat last search forward)N (repeat last search backwards)f{char} (to next occurrence of char to the right)t{char} (till before next occurrence of char to the right)F{char} (to next occurrence of char to the left)T{char} (till before next occurrence of char to the left)/{text} (to next match of text in forward direction)?{text} (to next match of text in backward direction)There is currently no distinction between what vim calls a WORD anda word, only the former is implemented. Though infrastructure forthe latter also exists.The semantics of a paragraph and a sentence is also not always 100%the same as in vim.Some of these commands do not work as in vim when prefixed with adigit i.e. a multiplier. As an example 3$ should move to the endof the 3rd line down. The way it currently behaves is that the firstmovement places the cursor at the end of the current line and the lasttwo have thus no effect.In general there are still a lot of improvements to be made in thecase movements are forced to be line or character wise. Also some ofthem should be inclusive in some context and exclusive in others.At the moment they always behave the same.Text objects------------All of the following text objects are implemented in an inner variant(prefixed with 'i') and a normal variant (prefixed with 'a'):w words sentencep paragraph[,], (,), {,}, , ", ', ` block enclosed by these symbolsFor word, sentence and paragraph there is no difference between theinner and normal variants.Modes-----At the moment there exists a more or less functional insert, replaceand character wise visual mode.A line wise visual mode is planned.Marks-----Only the 26 lower case marks [a-z] are supported. No marks across filesare supported. Marks are not preserved over editing sessions.Registers---------Only the 26 lower case registers [a-z] and 1 additional default registeris supported.Undo/Redo and Repeat--------------------The text is currently snapshoted whenever an operator is completed aswell as when insert or replace mode is left. Additionally a snapshotis also taken if in insert or replace mode a certain idle time elapses.Another idea is to snapshot based on the distance between two consecutiveediting operations (as they are likely unrelated and thus should beindividually reversible).The repeat command '.' currently only works for operators. This forexample means that inserting text can not be repeated (i.e. insertedagain). The same restriction also applies to commands which are notimplemented in terms of operators, such as 'o', 'O', 'J' etc.Command line prompt-------------------At the ':'-command prompt only the following commands are recognized::nnn go to line nnn:edit replace current file with a new one or reload it from disk:open open a new window:qall close all windows, exit editor:quit close currently focused window:read insert content of another file at current cursor position:split split window horizontally:vsplit split window vertically:wq write changes then close window:write write current buffer content to fileThe substitute command is recognized but not yet implemented. The '!'command to filter text through an external program is also planned.At some point the range syntax should probably also be supported.History support, tab completion and wildcard expansion are otherworthwhile features.Tab Space-------------Currently there is no expand tab functionality i.e. they are alwaysinserted as is. For me personally this is no problem at all. Tabsshould be used for indention! That way everybody can configure theirpreferred tab width whereas spaces should only be used for alignment.Jump list and change list-------------------------Neither the jump list nor the change lists are currently supported.Mouse support-------------The mouse is currently not used at all.Other features--------------Other things I would like to add in the long term are:+ code completion: this should be done as an external process. I willhave to take a look at the tools from the llvm / clang project. Maybedvtm's terminal emulation support could be reused to display anslmenu inside the editor at the cursor position?+ something similar to vim's quick fix functionalityStuff which vim does which I don't use and have no plans to add:- GUIs (neither x11, motif, gtk, win32 ...)- text folding- visual block mode- plugins (certainly not vimscript, if anything it should be lua based)- runtime key bindings- right-to-left text- tabs (as in multiple workspaces)- ex mode- macro recordingHow to help?------------At this point it might be best to fetch the code, edit some scratch file,notice an odd behavior or missing functionality, write and submit a patchfor it, then iterate.WARNING: There are probably still some bugs left which could corrupt yourunsaved changes. Use at your own risk. At this point I suggest toonly edit non-critical files which are under version control andthus easily recoverable!git clone git://repo.or.cz/vis.gitA quick overview over the code structure to get you started:config.def.h definition of key bindings, commands, syntax highlighting etc.vis.c vi(m) specific editor frontend, program entry pointeditor.[ch] screen / window / statusbar / command prompt managementwindow.[ch] window drawing / syntax highlighting / cursor placementtext-motions.[ch] movement functions take a file position and return a new onetext-objects.[ch] functions take a file position and return a file rangetext.[ch] low level text / marks / {un,re}do / piece table implementationHope this gets the interested people started. Feel free to ask questionsif something is unclear! There are still a lot of bugs left to fix, butby now I'm fairly sure that the general concept should work.As always, comments and patches welcome!Cheers,Marc[0] http://www.inf.ethz.ch/personal/wirth/ProjectOberon/ [1] https://www.openhub.net/p/vim [2] http://swtch.com/~rsc/regexp/ [3] http://lists.suckless.org/dev/1408/23219.html