Lisp, the Universe and Everything: 2011

2011-12-21

Реальная норма часов работы программиста в день

В 2009-10 годах у меня был некоторый опыт удаленной работы программистом на основе почасовой оплаты. С тех пор я периодически задумывался над тем, сколько же реальных часов работы над кодом в день должен в среднем выдавать программист. На самом деле, этот вопрос, а точнее его последствия, актуален для любого формата работы: как офисной, так и удаленной, так и всего, что между. В "офисном" варианте оплачивается фиксированное время работы и задача в оптимизации реального времени в рамках этого интервала (гибкий график, уменьшение количества встреч и т.п.) Проблемой фриланс-варианта является то, что большинство заказчиков не понимают, что они должны платить не только за время реальной работы над кодом, но и за сопутствующие активности, типа ковыряния в носу, и/или не знают, какое должно быть соотношение этих активностей к собственно программированию.

И вот, на днях на хакерньюз я наткнулся на отличную статью профессиональной писательницы о повышении эффективности ее труда в 5 раз (с 2 до 10 тысяч слов в день). Фактически, ей удалось это сделать за счет выработки метода систематического вхождения в поток. А как все мы творческие работники знаем, поток — это залог эффективной работы*. Но, что в этой статье для меня было действительно важным — это то, что человек, добившийся такой эффективности, констатировал факт: собственно непрерывной работы у нее в среднем 4 часа в день — в самых удачных случаях до 7 — плюс 2 часа с утра, которые расходуются на подготовительную работу. А если вырабатывать больше, то эффективность начинает падать. Не говоря о том, что можно перегореть.

Это наблюдение совпадает и с моим опытом: у меня такое же количество (4-6) часов эффективного писания кода в день. В этом нет ничего удивительного, поскольку труд программиста ничем не выделяется из ряда других творческих профессий: писателя, дизайнера или же композитора. Что же тогда делать с 8-ми часовым рабочим днем? На самом деле, ничего плохого в том, что только 50-75% рабочего времени тратится собственно на работу нет: остальное время тоже не пропадает зря, посколько тратится на комуникацию (без которой любая компания и проект обречены), а также какие-то нерабочие активности, которые формируют ту самую мифическую "корпоративную культуру". И даже если этих активностей нет (фриланс) — это не значит, что: а). программист может это время потратить на работу над кодом б). программисту они не нужны (нужны, возможно, только в другой форме: социализация нужна всем).

Т.е. выходит 2 числа, характеризующие нормального программиста: максимум 6 часов (а в среднем: 4 часа) программирования в день и коэфициент 1.5 соотношения часов программирования к часам работы. И эти числа нужно принять как исходные данные, на основании которых менеджеры могут стоить какие-то предположения, оценки и методики.

Кроме того, такой взгляд разрешает для меня диллему эстимейтов: должен и может ли, вообще, их делать программист?

Функция time профилировщиков программ выдает обычно 2 числа: total time и real/cpu time. Первое значение — это фактическое время, которое прошло от старта программы до завершения. На него могут влиять такие факторы, как кеширование, ожидание ввода-вывода и т.п. А сpu time — это время, которое программа действительно выполнялась на процессоре. По-хорошему, именно это, второе время, может научиться оценивать программист: т.е. время, которое ему понадобиться для написания и отладки кода, когда он будет находится в "разогретом" режиме работы. А вот общее время — которое, на самом деле, интересует бизнес — может научиться оценивать только менеджер, беря во внимание оценки программистов (и их историческое качество), а также 100500 других факторов, которые могут повлиять на режим его работы: кеширование, своевременное заполнение пайплайнов, переключене контекста и т.д.

* Те, кто не знает, может очень быстро узнать из уст человека, который сформулировал эту концепцию (или из его классической книги Flow):

2011-11-30

Clojure & Complexity

I gave a rather messy lightning talk at the recent ECLM on this topic (see below). I think, the messiness can be attributed mostly to my undesire to criticize anything, built with good intentions, including Clojure. Yet in software development there's clearly a need for thorough evaluation of different approaches, languages and technologies, because, we must admit, lots and lots of decisions on such things as architecture or language/platform choice are made on purely subjective and even emotional bases (see "hype"). So below is a more detailed account of the (excessive) complexities, I've encountered working with Clojure in a real-world environment: a half-year project to develop a part of a rather big Java-based server-side system. Also I should note, that I was following Clojure almost since its initial public release, have participated in the early flames on c.l.l. and even edited a Clojure introduction article in the Russian functional programming journal fprog.ru. But this was only the first chance to make a reality check...

But, first of all, I'd like to refer you to the talk of Rich Hickey at the Strange Loop conference "Simple made Easy", the principles of which really resonate with me. Yet it's so often the case, that it's hard to follow your abstract principles, when you're faced with reality (also guilty of that). And another point is that it's not really beneficial to push the principles to the extreme, because there's always the other side, and engineering is the art of making trade-offs: if you don't find room for them, the other side will bite you. So the points below basically boil down to these 2 things: examples of "complecting", and "things should be made as simple, as possible, but not simpler" (attributed to Einstein).

Interactive development

Lisp is considered a so called platform-language in a sense that it implies the existence of certain facilities and runtime environment constraints, that form a distinct and whole environment. Unlike some other languages, usually called scripting, which rely on a pre-existing environment, like the OS (e.g. Posix), web server, web browser etc. Other platform languages are, for example, Java, C# or Erlang, while scripting languages are JavaScript, Perl or PHP. Clojure is a language on a pre-existing platform, which is JVM, and so doesn't define its own platform. This is the source of, probably, the biggest complecting in the language, as it tries to be a stand-alone dynamic, functional language, explicitly discouraging imperative object-oriented style. But the JVM-platform is oriented at static imperative object-oriented ones.

From the Lisp dynamic point-of-view a lot of JVM's facilities are inferior:

mostly static runtime image and object system with only partial opportunities for redefining things on-the-fly instead of a fully dynamic one
static namespaces (tied to a file system) instead of dynamic packages
static exception system instead of a dynamic (restartable) condition system
limited and flawed number representation system instead of a full numeric tower
more limited calling convention (only positional parameters and absense of multiple return values)
more limited naming scheme
XML-based build facilities instead of eDSL-based ones - althouh, here Clojure can provide its own option, but the currently existing one - Leiningen is a weird beast of its own (for example, you can get CLASSPATH out of it, but can't put it in except through project.clj, which has a lot of limitations)
absense of tail call optimization

Surely, there are also some current advantages of the Java platform:

availability of a concurrent GC (although it's not the default one)
good JIT-optimizing compiler
and what's most important, larger amount of available tools and libraries

Yet, if we return to the top of the list of shortfalls, we can see, why interactive development in Clojure is much less productive, then in Lisp. What adds to it is, that Clojure uses a one-pass compiler (not very modern).

Going into more details on this will be a whole separate post, so I'll just sum up, that interactive development in Clojure is hampered both by the JVM sticking in different places (especially, if you work on projects, combining Clojure and Java code) and Clojure's own misfeatures.

Syntax

From its early days Clojure was positioned as a "modern" Lisp. And what this "modernity" actually implied is:

more accessible syntax and broader support for vectors and maps as opposed to Common Lisp, in which only lists, allegedly, were first-class citizens
built-in concurrency primitives
being Lisp-1 instead of Lisp-2, which makes heavy functional style programming more concise
cleaning up some minor "annoyances": 4 equality operators, interning reader, etc.

Starting from the bottom, the minor issues make it not easy to start, but they are actually conceptually simple and useful, once you get accustomed. Lisp-1 vs Lisp-2 is a matter of taste: for example, I prefer #', because it's an annotation, while others perceive it as noise. And there's no objective advantage of one over another: yes, Lisp-1 makes something like (<map> </key>) instead of (gethash <map> <key>) possible, yet it makes macros more complicated. And concurrency I'll discuss separately.

What's left is broader support for vectors and maps, including destructuring. I agree, that declarative syntax for common datastructures is a crucial for productive use of any language up to the point of defining the same literal syntax ({}) for hash-tables in CL. Thankfully, that is supported by the language, so this syntax is as first-class in CL, as in Clojure, and, as in many aspects, nothing prevents "modernizing" Lisp in this aspect without creating a whole separate language... This doesn't hold for Clojure, as it doesn't have facilities to control the reader in the same way CL does: actually, in this regard Clojure is very different from Lisp — it hardly provides facilities for controlling any aspect of the language — and this control is a critical part of Lisp's DNA.

And pushing syntax to the extreme has it's shortcomings in Clojure as well. Rich argues, that defining both lists and vectors with parens (in Lisp list is '() and vector is #()) is complecting. But, I'd say, that a much harder case of complecting is this:

)))))))])))))])]) — the joy of debugging ASTs

And it's not even about Clojure: although here it's even worse, because for some reason, that escapes me, let (and defn, and many others) uses vectors for argument lists. Aren't they called lists for a reason? So this once again actualizes the problem of counting closing parens, effectively solved for Lisp long ago with emacs.

At the same time "modern" Clojure poorly supports such things as keyword arguments in functions or multiple return values and many other not so "modern", but very effective facilities, that I personally would expect to see in a modern language...

There's only one true way: functional

In my talk I referred to one of the ugliest pieces of code, I've ever written, which was a very complicated Clojrue loop, i.e. a loop/recur thing (and I've honestly tried to factor it out).

Basically there are two opposite approaches to iteration: imperative looping and functional recurring. Many languages have a strong bias towards one or another, like Python discouraging recursion and Clojure discouraging imperative loops by imposing immutability. But the thing is, that there are problems, for which one of the approaches yields by far more concise and understandable code. If you want to traverse a tree, recursion is a way to go. While if you are accumulating several sequences at once, which may reference results, obtained at the previous computations and also at each iteration there's not one, but several outcomes, recursions often becomes too messy. Yet in Clojure there's not even good support for recursion (which has an advantage of factoring different pieces of code into functions), but a special construct loop/recur, which shares the downsides of both approaches and does hardly provide any of the benefits. That's a pity, as iteration is the basic programming construct and no code file can do without it. And here we see a case of detrimental over-simplification.

And there are also lazy sequences, which complect sequences and streams. In theory, those are the same things, but, as the saying goes, in theory, theory and practice are the same, but in practice... Surely this makes writing a compiler easier at the cost of complicating reasoning about sequences in day-to-day programming.

EDIT: as was correctly pointed by the commentators, you can have sort of mutable local variables with transients, and that allows to express some of the imperative loops in a more concise manner.

Concurrency

In the early days of Clojure in one of the discussions on c.l.l. S.Madhu (if I remember correctly) called Clojure's concurrency primitives "snake oil". I thought, that there may be some personal issues in such attitude, but having tried it for myself and learned about all the alternatives in the concurrency space, I don't think it was too far from reality. First of all, Clojure addresses only shared-state concurrency on a single computer, while most hard concurrency problems appear in the context of distributed systems and are currently solved with approaches, like the Actor model or MapReduce. And on the single machine I've seen very few problems, that can't be solved with a combination of thread pools (with thread-local storage), Lisp-style special variables and databases. In fact Clojure provides its own (often limited) variants of all the above mentioned approaches and nothing more:
- agents instead of actors
- vars (analogue of CL's special variables)
- STM for databases
- map/reduce, pmap

To be able to provide them Clojure imposes some restrictions, most notably the one of immutability. Also in their implementation it doesn't follow the Lisp principle of giving control to the programmer: you can't swap one STM variant/strategy for the other. Heck, you can't even control the number of threads, that pmap uses!

Among all this approaches, STM is the most notable one (as others are just a copy of the established technologies). And it has a tough competition from databases and other transactional datastores. The advantages are no impedance mismatch and, possibly, better efficiency. Yet the database is language-agnostic, which is often useful, it's more accessible and understandable. And, what's most important: there's durability and there's a choice, that you can utilize, depending on your needs. The best use case for STM I've found so far was to hold statistical counters, accessed simultaneously from many threads, yet this problem is easily solvable with Redis, for example. And the same applies to other uses I can think of, So, the paradox of Clojure is that, although it was ushered with the idea of solving concurrency problems, it still has a lot to prove in this space: and not with toy examples of ant colonies, but real-world applications, made possible with its approach (like RabbitMQ or Ejabberd showcasing Erlang's aptitude for building massively parallel systems).

Final thoughts

I've written a long article, but there are many more details left out (just adding code snippets, will make it twice as big). There's also a lot of good things about Clojure, which I didn't mention: its seamless integration with Java, for example, which makes it a good scripting language for Java applications. Macros, basic datastructures, vars, they all work great. Ring was also an enlightenment.

Yet, overall the language doesn't leave up to its promise of being a modern Lisp: actually, it's not a Lisp at all. Well, how do you define Lisp? S-expressions, macros, closures etc? Surely, all (at least most of) those features may be present, although Dylan was a Lisp without s-expressions, for example. But, in my opinion as a Lisp programmer, what makes a true Lisp is dynamicity, simplicity and putting control in developer's hands (flexibility and extensibility). The mentioned features are mostly a derivative of this principles, and also they all should have a backend for extension: reader, compiler, condition system, object system, special variables etc, they all do in Lisp. And Clojure gives up on all these principles if not altogether, than substantially: many of the features are there and the new ones arrived, but the backend is missing.

Common Lisp is well optimized for the common use-cases in the design space of dynamic flexible languages. And it's probably very close to local maximum in it. At least it's good enough. Clojure is more of a hybrid, and hybrids are built by complecting: in Clojure's case complecting Lisp and Java, Lisp and Haskell. The result is interesting, but, in my opinion, it's not optimal. Surely it has its use-cases.

Still, Clojure sees decent adoption (which is good, because it proves, that parens aren't a problem after all :) I think it's a combination of several things. The language still has a lot of good stuff from Lisp (as well as some good things from Haskell, although they don't match very well), and those who start using it are mostly Java and Ruby programmers, for whom it breaks the psychological barriers to using Lisp (we all know the FUD). The other thing is lack of a decent native (not like Jython or JRuby) dynamic language on the JVM. And finally, there's marketing, i.e. Clojure concurrency primitives. And the challenge for Clojure community is to produce a "killer" application, that will utilize them, or the hype will wane, and pretty soon...

2011-11-27

Videos from ECLM 2011

Today I've finally finished uploading all the videos to the ECLM channel on blip.tv/eclm. There are a lot of very interesting talks and the topics range from Lisp-based companies' experiences to exciting Lisp projects (like quicklisp or zen X server) to community issues (like the announcement of the Stichting Common Lisp Foundation or good Lisp style). Of 7 lengthy talks, announced at the event's website, only 5 are available now:

Luke Gorrie's account of his exciting startup Teclo Networks will be published somewhat later not to compromise the business (and it's a great talk, so don't miss it :),
and Paul Miller's showcase of LinkExplorer, a Windows LispWorks CAPI-based GUI application was not recorded by the request of his company.

In total, there are 12 videos, including the lightning talks (and the lucky 13th will join later). My personal favourite is Jack Harper's story of building Secure Outcomes, with a lot of invaluable engineering insights.

The sound quality is not stellar, because, unfortunetely, there was a laying of the ground 50Hz signal on the mic's signal at the site. But judging from my experiences in organizing TEDxKyiv, sound is the usual point of failure at conferences, and it's pretty hard to get it right. Yet, I hope, that next meeting will have a more professional recording set up (maybe through CLF), because, in my opinion, these videos are extremely valuable to the community. I really miss some talks from ECLM 2009, like Kuroda Hisao's, Dave McClure's or Dan Weinreib's, and most of all, the Piano.aero one...

For me the overall meeting experience was really fantastic, and the talks actually form only a minor half of that. Immersing myself into the Lisp community and participating in so many profound conversations with brilliant Lisp programmers was the really exciting part. The good overview of the conference experience is given by Luís Oliveira. So I'm really looking forward for the ECLM 2012. Maybe, organize it in a different European city this time?..

2011-04-05

Book review: Algorithms of the Intelligent Web

TL;DR The book should have been named "Building Machine Learning Apps in Java for Dummies". Worth reading if such name excites you.

This book aims to become a comprehensive guide in commonly used Machine Learning technics. And generally it succeeds in that with one reservation: it more resembles a sophomore synopsis, than a professor's lecture.

I should say, that I couldn't manage to finish the book, and had read only Chapters 1,2 and 5. But the structure of each chapter is roughly the same: a brief high-level description of the common approaches in the described sphere. Then a list of Java libraries, available for the task. Selection of one of them and a description of how solid, robust and mature it is. Then several pages of Java code, showing how to prepare input parameters and actually make use of the libraries. Finally a note, that there are some limitations or pecularities, but discussing them in more detail is out of the scope of the book. Thus, the amount of new material for anyone at least slightly familiar with the subject and programming in general is minimal (I would say, that for me it was around 10%.) No insights, nothing novel, that can't be found on the first page of Google results on the topic, no account of personal experience or "war stories". Such book doesn't require 20 years of industrial experience under the hood to write — this could be done by any working programmer. The recipe seems simple: for each topic take the first two pages of Wikipedia entry on it, list the available libraries for solving the problem, add a couple of examples from their manual, bingo!

There's also another problem: the code is in Java. This is good for the authors, as it increases the book size and makes it seem much more solid, than in reality. But all the unnecessary cruft like half-line type declarations and bookkeeping code for loops (which amounts to more than half of the code, I'd say) makes it really hard to follow along. You need to really focus to be able to translate the code in mind to some high-level conceptual form before grasping it. (Although, there's a shortcut: just assume, that all the code is mostly irrelevant and skip). It's like the authors used the trick of all students: hide your poor understanding of the subject behind all the recondite words you've heard. So the advice for any algorithmical book: if you want to make the code easy to follow write it in Python or Lisp. At least it would make really apparent, if there's not a lot of essence to it... ;)

So the book is worth reading only if you're new to the field, don't want to spend time learning the subject, but need to get some working program fast (and don't object using Java for that).

2011-01-08

various Lisp stuff

In the spirit of xach's "policy" for half-baked projects I've also published some of my experiments:

If somebody finds that useful, let me know.

Lisp, the Universe and Everything