People who think they know everything really annoy those of us who know we don't.
Bjarne Stroustrup
...to The Lost Continent of. My name is Leon Matthews. Programmer, father, New Zealander, and business owner. This is my personal website, and has been for some time.
If you're in New Zealand, that is...
Last year's conference was great, and I'm looking forward to this one. Hearing about all the fantastic things that are being done with Python locally is great fun, and very inspiring. What's great is that it's not a dry for-experts-by-experts sort of event — the talks range from strategic to technical to hey-look-what-I-can-do.
I'll be giving a introductory/intermediate talk on Unicode strings from a Pythonic perspective. I've always felt that Unicode was one of those technologies which seems hard, but is built on simple concepts. Once those are understood, the details fall into place, and you'll never want to go back to plain strings ever again. In Python 3 all strings are Unicode, so it's rather timely to talk about it now.
I'm thrilled to announce that my wonderful wife Alyson and I finally tied the knot in beautiful Fiji on the 11th of September 2010, with with our rascally little two-year-old son, Blake.
We've been together ten years now, so have done things in rather the 'wrong' order: House, Baby-carriage, then our marriage. A huge thanks to my new wife Alyson for doing all the organising, and not calling it off at the last minute. To all our family and friends — both to those who could make it, and to those who couldn't, and to everyone else who made it such a wonderful day for us. Thank you.
Computers process only ones and zeros — or more generally, numbers. Processing some other type of data requires that you find a way to represent, or encode, that type as a number, or a series of numbers. Colours, music, pictures, even Hollywood movies are all represented as various, often extremely creative, sequences of numbers.
I've always been fascinated by the various encoding schemes that we humans have used to shoe-horn our analog world into the digital one of our computers. Some schemes are obvious (ASCII), others surprisingly deep (IEEE 754, UTF-8). Others are horribly complicated because they have to be (video files), while others are that way to maintain a commercial advantage (some office, and graphics file formats are distressingly guilty of this). Those are all interesting, but best of all is an elegant encoding scheme.
In my mind, the most elegant scheme of all is POSIX Epoch — the representation of a date and time by a single large integer. It uses the count of seconds that have elapsed since a given point in time. For example, as I write this the POSIX epoch is 1,273,107,528. What makes this scheme elegant is that it is actually easier to work with than the original representation.
Last month I gave a presentation about what makes it so easy to work with at a meeting of my local Python Users Group, and now I've finally gotten around to updating my site with the contents of the talk.
I like justified text. I still use LaTeX (via LyX usually) whenever I can, despite the cruftiness, because the output always looks so great. For the same reasons, I never use 'text-align: justify' on the web. It sounds like a good idea, but always ends up looking seven kinds of ugly. Why? Because browsers, even modern ones, don't split words in order to maintain sane interword spacing.
I did an experiment this week to try and force the behaviour that I desired. You can see the results below. The left column is standard 'web justified' text, the right is the same text but using the TeX hyphenation algorithm to split words properly.
I ran a little throw-away Python script to insert HTML 'soft hyphens', using the ­ entity, at the appropriate points in every word. Browsers are then able to use that information to break words and then justify the text passage properly.
But...
The problem is that all those & entities all over the place absolutely kill the readability of the source code — and that's not a price I'm willing to pay. Compare:
<p> Shyness is most likely to occur during unfamiliar situations, though in severe cases it may hinder an individual in his or her most familiar situations and relationships as well. Admitting feelings may become difficult for the individual. Shy persons avoid the objects of their apprehension in order to keep from feeling uncomfortable and inept; thus, the situations remain unfamiliar and the shyness perpetuates itself. </p>
<p> Shy­ness is most like­ly to oc­cur dur­ing un­fa­mil­iar sit­u­a­tion­s, though in se­vere cas­es it may hin­der an in­di­vid­ual in his or her most fa­mil­iar sit­u­a­tions and re­la­tion­ships as well. Ad­mit­ting feel­ings may be­come dif­fi­cult for the in­di­vid­ual. Shy per­sons avoid the ob­jects of their ap­pre­hen­sion in or­der to keep from feel­ing un­com­fort­able and in­ep­t; thus, the sit­u­a­tions re­main un­fa­mil­iar and the shy­ness per­pet­u­ates it­self. </p>
So, server side text manipulation is out of the question. What about client-side? Once I actually looked I found a couple of JavaScript implementations of the same idea, but a 20-30kiB download to implement word breaking seems... a tad overkill.
I've come to the conclusion, having come this far, that the proper place to do decent word breaking, and hence good justified text is in the web browser itself. Anything else is just a work-around (at best). How about it browser makers? A 30kiB language specific hyphenation dictionary won't bloat your installs too much...
Our little boy is almost walking, but has decided that dancing is easier, and far more fun! I've posted lots more videos of Blake on YouTube for maximum cuteness overload!
I've finally gotten my teeth into Diomidis D. Spinellis' book Code Quality. It's refreshingly complete and precise. The chapter on Maintainability opens with four attributes of a maintainable system (from ISO/IEC 9126-1:2001) that really struck a chord with me.
I know maintainable code when I see it — it has a certain feel... Up until now I've often struggled to express that feeling to non-programmers.
Overall, the book's been a very worthwhile read. The author doesn't shy away from explaining difficult or intricate concepts, where necessary, and each point is illustrated with example code from real systems. I'm very much looking forward to reading the first book in this series, 'Code Reading'.