Mizar project wishlist

(originally a note submitted to the 30 Years of Mizar MKM 2004 Affiliated Workshop - http://merak.pb.bialystok.pl/mkm2004

My main wishes for Mizar as a project are in this note.

Go Big

MML now contains 855 articles by about 150 authors, it grows at cca 50-100 articles a year. Wikipedia http://en.wikipedia.org has now over 300 000 articles accumulated over 3 years by about 6000 active volunteers. In 2002 Debian GNU/Linux had about 9000 packages working together on 11 platforms and about 1000 active developers. In 1995 the numbers were 250 packages, 60 developers and 2 platforms http://telemetrybox.org/tokyo

In the internet age, very large projects can be done very quickly, and not in 300 years as Piotr Rudnicki once estimated. The essential part of the trick is to get volunteers interested. To succeed in that, you have to give something back to the volunteers. In the internet-driven collaborative projects, it is the satisfaction (pride, egoboo) from doing a useful, technically or scientifically challenging work owned by all the authors, and discussing, justifying and taking the decisions about how the work should be done.

The construction of MML lacks these properties. My opinion is, that if it continues to lack these properties for longer time, it will be overrun by another formalization project, despite its initial advantage.

Licensing

My attempts to find out what the Mizar and MML license is have been unsuccessful. I doubt that anyone else knows. MML should be released under GPL http://www.gnu.org/copyleft/gpl.html or under the GFDL http://www.gnu.org/copyleft/fdl.html , maybe under both. The software should be GPL. I doubt that there is some strong intention leading to the current obscure licensing situation, it is rather the laziness to deal with it. If there is any fear of ``theft'' of MML or Mizar (I have heard such words quite often) the lack of clear license is the best way to make such a thing possible. On the other hand, very large business corporations today rely on GPL and employ expensive lawyers to fight for it.

Open Source

Even if Mizar somehow succeeded in formalizing large part of mathematics without going open-source, it would only stimulate creation of open-source substitutes. The recent history of useful software systems is full of such examples. Mizar as a piece of software is complicated (sometimes quite unnecessarily), but not as complicated to survive a larger reverse-engineering effort. My own work with Mizar started by reverse-engineering it, recently e.g. Paul Cairns and Jeremy Gow tried that too.

Hiding the source of a strictly academic and very hardly commercial project that tries to be collaborative, is in my opinion nonsense, or at least very uninformed attitude. Many people explicitely refuse to contribute any work to a project dependent on some secret part, and I (and anyone else who reads the Mizar forum) already know of several people who have so far refused Mizar for exactly that reason. All the ``make the world better'' motivation of the volunteers is safely stopped by that.

There are people who want to do interesting research on MML, which could eventually lead to easier searching and formalization. Just in the last two years I have heard about the team of Andrea Asperti, Michael Kohlhase, Paul Cairns and Jeremy Gow, Hazel Duncan. The Mizar sources are the only real documentation of the system, and without them serious work with the MML is almost impossible.

Leadership

Even if Mizar and MML are GPL-ed in some near future, it does not mean that automatically there will be hundreds of new contributors. Each successful collaborative project started from small, and the strength of the project idea and the communication and managing skills of the project founders decided about its growth. While I have no doubts about the strength and attractiveness of the idea of formalization of mathematics, I think there could (and should) be a lot of improvement when it comes to discussing and managing the project. As said above, the volunteers grouped around the successful projects are usually self-motivated and working out of their interest and for their fun, and definitely not to suffer any excessive dictatorship of the leaders, lack of enthusiasm for their own ideas, or simply lack of communication. Mizar is still very far from being the ultimate system for formalization of mathematics, and it needs a lot of new good ideas and experimenting, much faster and parallel than is the current progress. People should be allowed to experiment with the software, to create new tools for transforming MML, to suggest and implement changes and even fork the system or the MML if necessary. All these things should be discussed in a supportive way and possibly in real time, not with one week delay. The current practice is very far from that.

Modularization

The Mizar verifier and especially its checker part is now quite a monolithic piece of software. One way how to make more experimenting and improvements possible is to make it more modular. It should be as easy as possible to add some semantic strengthening or some syntactic sugar or support for new constructs. There should be very clear semantic interface for such new modules or plugins. E.g. if some new computer algebra module is added, it should be possible to print the steps it does and their standard Mizar justification, and to check those detailed steps without the module.

Gray zone

I do not agree with the idea that every theorem in MML has to be proved. It delays formalization of many mathematical fields and excludes many mathematicians that would be otherwise interested. I would prefer a more ``gap-filling'' approach. After all, anyone who has ever written some formalization very likely used that method, and Mizar is quite suitable for it.

Yes, it is a bit dangerous, but not much. First of all, Mizar does not protect from writing nonsense anyway. It is not hard to find in MML a lot of nonsense. Second, if there is an error, the computer processing very often allows an easy fix. In CARD\_FIL, I have first defined GCH for the type Cardinal instead of Aleph (infinite Cardinal), and thus it was simply false. When I noticed it about a year later and changed Cardinal to Aleph, everything worked without the necessity of any further changes. From time to time a bug is found in Mizar and sometimes such a bug is really abused in some articles. Again, it is usually quite easy to fix such misuse. Even if some proof relies on a fact which is not true, the proof is usually still valuable - it is simply a proof of something else, and it is still a piece of formalized mathematics that can be reused.

Refactoring of MML

It should be much more common and easier to edit existing MML articles. The complete MML (i.e. with proofs) should be published on the web in a version system, with very easy editing access and with the possibility of immediate online verification of the changes. The online verification should also immediately create the hyperlinks to symbol definitions, theorems etc., and announce whether something got possibly broken in later articles.

The current habit of ``one article or nothing'' is unnecessarily demanding and creating quite artificial barrier for people who just spotted a possible improvement in MML or want to add only a smaller bit. There is now a very small group doing any of these things, without the discipline imposed by the version system and common peer review, and this is simply unjust and again deterring other possible self-motivated editors.

This facility would be also very useful for the ``Gray zone'' formalization. While there will be people who are good at introducing formal definitions in an advantageous way and sketching whole theories in the defined notions, the task of filling the missing proofs can be taken up by others, sometimes e.g. as a student exercise or a task for automated theorem provers.

Better dependence resolving

The MML processing is now in some sense very fragile. One added definition in an article in the middle of the MML can cause that all the rest of MML has to be regenerated. This is bad enough for the revisions that are being done today, and would be very bad for the online refactoring suggested above.

Mizar notation is overloaded, and that is often useful, because short mathematical symbols are a scarce resource. But there is no good way how to refer in an absolute way to the Mizar functors, predicates and other constructors. This is bad not just for all kinds of semantic presentation and searching systems and semantic communication with other systems, but also for authoring articles in more advanced domains. In such articles clashes of notation become more frequent, and it is increasingly hard for the users to decipher the meaning of their text.

The best answer that can be now given to the question ``Which + is this?'' is something like ``It is the fourth functor defined in article FOO''. This kind of answer is actually used often by the Mizar system, and it causes a lot of the processing fragility (the numbering changes when something is added, moved to other article or removed completely). Many overloaded programming languages create the absolute names in smarter ways, e.g. by appending the function name with the names of parameter types, parent object (if it is a method) and e.g. the name of the module. This allows also smarter rebuilding and linking when some module is changed.

It probably would not be too difficult to create the absolute names in a similar automated way in Mizar, and it would be definitely better than the current naming scheme. My opinion is that formalization of mathematics should go one step further and force the authors to create unique pretty-printing names for the objects that are defined. I believe that good unique mathematical terminology is simply one of the subgoals of the field of formalization of mathematics, and the fact that Mizar does not care for this is in fact rather surprising.

Many other proof assistants already have some simple pragmas for includingtext. Many programming languages have that capability too. Mizar has now a complicated translator towhich is not directly usable by the authors and it omits all the comments. This can and should be simplified, so that the users could watch theoutput immediately after processing, and improve by hand its ugly parts.

First, simple pragma denoting thepart in the Mizar text should be added. If we really believe in formalization of mathematics, we should make e.g. complete textbooks based on formalized mathematics possible. Yes, part of the additional textbook syntax could and should be captured by new Mizar keywords, e.g. ``example'', ``exercise'', ``corollary'', etc. But it is fairly hard to imagine formalization of e.g. historical notes in the textbooks.

Second, Mizar should really try to serve as a semantic anchor even for articles which do not attempt full-scale formalization. A goal which is not unrealistic, provided the ``Gray zone'' idea gets some support, is to have in some short time the possibility to express the main results of normal mathematical articles in the formalized language. This alone would be of great importance to normal mathematics, it would be a big step towards semantic searching and classification of normal mathematics. Our goal should be that such a description becomes wide-spread, e.g. in the same way as is now the ``keywords'' annotation of articles. Since we should make such annotation as easy as possible, we definitely should forget about forcing such mathematicians into the ``environment creation hell'' which is one of the sore points of formalization in current Mizar. This is another reason, why we should provide the unique names for Mizar objects and guarantee their stability as much as possible.

Conclusion

I have described what could (in my opinion) be done with Mizar, so that it became the wide-scale collaborative effort leading to the formalization of mathematics. The ideas are a bit high-level and some of them can probably be applied also to other proof assistants, my lower-level wishes both for the Mizar software and MML are not here. There are not many new ideas here, most of them are just copied from the successful internet collaborative projects that have arisen during the last ten years and surprised the world with their vitality and power. There will obviously be the objection that it is too much work for the small Mizar team. The readers who think so should read this article again and try to understand that the main idea (practically working in many other projects) is to involve more enthusiasts on democratic principles, so that they could identify with that work and help to do it.

Obviously, this does not come for free. Fair cooperation and communication with others is the key to success in large collaborative efforts. To put it in a very straightforward way, Mizar has two options. The first is to remain to be the project solely and rigidly steered by the Bialystok Mizar team, advancing at a pace that makes prof. Rudnicki's estimate more than justified. All the glory, grants, awards and also responsibility for the project will belong to that team. The second option is to give Mizar and MML to the rest of the world, so that they could play with it, experiment with it, contribute to it and participate in making decisions about it. Yes, it is possible (and probable) that someone else would eventually do major contributions to the Mizar system or MML, and also get some of the Mizar based grants and the fame for doing useful and important work. In my opinion, letting others to build on one's (scientific) results is not just a extremely successful feature of the last-decade's internet collaborative projects, it should be the basic creed of all true scientists.

-- JosefUrban - 21 Aug 2004