Wednesday, January 2, 2013

LISA / Usenix Conference 2012

I’ve spoken at a number of conferences in several countries over the last year, the latest being https://www.usenix.org/conference/lisa12 in San Diego.  This was the first conference at which I had presented where the audience was not comprised of programmers, but instead all of the participants were system administrators.

First and foremost, this is the best organized conference that I have ever attended or presented.  The contract was delivered months in advance and was clear about the expectations.  There were email reminders about deadlines and links for supplying the information, and I received prompt responses to all my questions.

The check-in at the hotel took all of 60 seconds.  I dropped my luggage in my room and went to the registration desk.  The recognized me, without having seen me, printed my badge and showed my where my tutorial room would be the next day.  And did I mention that they were incredibly nice.

I arrived at the tutorial room an hour before the start time, since I use Ubuntu on a Thinkpad and I have never had it work correctly with a projector without a lot of futzing about.  I plugged in the video connection, booted the laptop, and everything worked perfectly.

The students were responsive and asked detailed and probing questions.  Unlike many programmers, their questions did not deal with language details (Why does Python use self instead of this?) but were instead concrete inquiries how how to use Python to solve a real world problem.

If you are a system administrator of a large installation, get your company to send you next year.  You won’t be disappointed.


#usenix #python #lisa

Monday, December 17, 2012

Pycon Code of Conduct

I have stated that opinions about the Code of Conduct should be discussed in a public forum.  I’ve had, at their request, ex parte emails with Jesse Noller and Steve Holden over the subsequent reaction to my post on Hacker News.  The substance of the objections to my post and my responses have been addressed in these emails.  In his last email, Steve suggested that if I wanted to discuss it in a public forum, which I have always advocated, I should try his blog http://holdenweb.blogspot.com/.  So this is being posted there and at  my blog bobhancock.org.

This all started with my post https://news.ycombinator.com/item?id=4894937
I’ve not had any desire to be anonymous and signed the Hacker New post as bob_hancock.  I would suggest reading the original post before proceeding so that there is no misunderstanding of what was said.

Steve says, referring to the event, that “first the quote is inaccurate”.  This is where we can agree to disagree.  I remember it clearly and I am sure that I’ve accurately recounted the conversation.  I’ve been told that “you didn't express your opinion, except in the most oblique way” .   I think I’ve been clear.  I don’t see how I have been oblique.

My post was to take issue with the code.  “The Code has a very broad definition of harassment that makes no distinction between a one time comment and a pattern of repetitive behaviour intended to intimidate or cause harm.”  I recounted the one-eyed snake event not to imply that the organizers should have taken some action, since the woman in question took no more umbrage than, "No, it is just creepy, but I'm an adult.", but to illustrate that one of the people who approved the wording was engaged in behavior that under the current wording could have been grounds for a complaint.  

Specifically, the Code of Conduct states “Be careful in the words that you choose. Remember that sexist, racist, and other exclusionary jokes can be offensive to those around you. Excessive swearing and offensive jokes are not appropriate for PyCon. and I believe this is the type of conduct to which they are referring which is why I referenced it.  I thought the joke was in bad taste, and mentioned that to Steve later, but it did not rise to the level of something that required action by the organizers--which was the point.

I’ve been told that Hacker News was not an appropriate forum to discuss this.  My post was in response to a thread in which Jesse had already responded, so I added my opinion.

I recieved emails stating that the purpose of my post was of “malicious intent”, that I am “sadly misinformed about gender diversity issues”, my behavior is “inept”, that as a result “staff is being bullied and harassed due to this”, and that I have “pulled the rug out from under” Jesse.  

Some people have taken the fallout from this situation as an opportunity to dump on Jesse and engage in cyber-bullying.  The amount of work that Jesse puts into Pycon is unbelievable and has immeasurably improved Pycon.  The conference in its current form would have been impossible without him.  To personally criticize Jesse or to question his commitment to the Python community is to willfully ignore the substance of the issue.

Jesse clarified the difference between the Board’s ratification and Pycon.  The difference which was not clear to me and why I asked “From what I understand, the code was approved by the Board of Directors of the PSF, and not the PSF as a whole. Please, correct me if I am wrong.”   The board resolution says “RESOLVED, that the PSF will only sponsor conferences that have or agree to create and publish a Code of Conduct/Anti Harassment guide for their conference. A basic template to work from has been generated by the Ada Initiative at http://geekfeminism.wikia.com/wiki/Conference_anti-harassment/Policy”  So, we should make a clear distinction between the Board’s resolution and Pycon’s choice of a Code.  

They could have chosen the Pycon UK version http://pyconuk.net/CodeOfConduct or the O’Reilly Conference version http://oreilly.com/conferences/code-of-conduct.html, and be in full compliance with the PSF resolution.

I’ve not read, except for a couple of vituperative Tweets by Zed Shaw to Diana Clarke, the broadsides that have been aimed at Jesse and Steve.  I would be interested to know if the content is primarily personal attacks or people taking issue with the wording of the Code of Conduct.  Hate mail is inevitable when you take a stand for something you believe in.  The only way to avoid this is to take no stand at all, and I applaud the Pycon organizers for making their intentions clear.

To link the hate mail and cyber-bullying with my post is specious.  It may have acted as a catalyst for some disgruntled people predisposed to invective, but the demeaning outbursts of the peevish cannot be a deterrent to rational and civil discourse.

pythonchelle has made the most pertinent comment so far “Let's not derail the conversation that needs to be had about CoCs. It's a little ridiculous to discount the attempts that are being made to make the community better because one of the directors that helped write it told an off-color joke at a conference that one time.”

Sunday, October 7, 2012

goxmeans Version 0.1a

We have released version 0.1a of goxmeans at https://github.com/bobhancock/goxmeans.git.  This is the result of an idea that has been brewing in my mind for a long time on how to accelerate the clustering process.   k-means is one of the the classic algorithms for dealing with unstructured data http://en.wikipedia.org/wiki/K-means.  It is straight forward, but slow.

I wanted to have an open source application for clustering that would run on commodity hardware and scale horizontally.  Finding patterns in unstructured data is the next big frontier (well, one of them) and the easier we make it for everyone from students to Data Scientists to get used to the unstructured world, the more rapidly we can start to understand what to do with big data.

I considered the idea of parallelizing some of the k-means computations with threads, but once I had mapped out an initial design the thought of dealing with the mutexes, synchronized queues, etc. and the amount of time I would spend trying to avoid context switches I was dissuaded from going forward.

Then I started looking at Go and its goroutines and channels and they seemed to have solved most of my problems.  I started building a prototype about 2 ½ months ago with the help of Dan Frank, who worked on the centroid selection process and was the main code reviewer, Ralph Yozzo, who modified and extended the gomatrix library, and Anthony Foglia who made the Bayesian Information Criterion clear and wrote the notes on it that you find in our project.  

Getting something working was easier than I had anticipated.  Once you I got my mind out of the thread world and started to think in terms of communicating sequential processes the design became clearer and clearer.  I could wax lyrical about all the things I like about Go, but you should try it for yourself.  It makes programming fun again, is free of the strictures of OOP, provides rapid iteration, and is fast.

We went through a number of design modifications, but when the final model was stable the CPU and memory profiling tools were invaluable.  The initial run calculated three two-dimensional models of 250,000 points each in 190 seconds on a desktop machine.  The latest release 0.1a takes 65 seconds.  

The next step is to create a kd-tree library so we can avoid redundant calculations and cache statistics.   This should speed up the process even more for large data sets.After that is adaptation so that it runs across multiple machines.

I’ve dealt with similar problems in Python and every solution has been much more complex and obviously slower.  You can see my Pycon talk on this at http://goo.gl/lPZix.  

I’m already thinking of where else this can be applied, and I’d be interested to know if you have used Go for anything similar.

#golang #goxmeans

Thursday, September 20, 2012

Burning Down My Laptop with Go


[This is an extended version of a previous post I made directly to Google+.]

I wrote a program in Go that clusters unstructured data and involves a lot of multi-dimensional distance calculations.  You can see the ongoing process at https://github.com/bobhancock/goxmeans.git.  It contains a pipeline that prepares jobs, fans out to worker goroutines (one for each CPU) and stores the results.  All communication is done with channels.

I’ll write more about goxmeans in the future.

I started some performance testing with relatively small datasets of 250,000 points and 2-6 centroids and it worked as expected.  When I increased the number of centroids to 9, the program would run for a while, but then my laptop would shut down.  My first thought was that there was some sort of bug with memory or the underlying mechanism of pointers.  

I viewed syslog and it showed that the Intel ACPI controller registered that the temperature had hit 100 degrees celsius.  psensor showed that it had hit a max of 94 degrees--still high compared to the average of 55 but not deadly.  In order to save my hardware, it did not issue any warnings, it immediately shut down the computer.

I've never had this type of problem with C++ or Python.  Go was so efficiently feeding jobs to the CPUs that the temperature rose almost 100% and stayed there.  I reset the fan to manual and ran it at max and everything worked fine, however, to be safe I reset the default to auto.  I'll perform the rest of the performance tests on a larger machine.

If I had tried to write this in C++ with threads it would have taken much, much longer and experience makes me doubt that the performance would have been as good.  Tests show that the CPUs are being used at close to 100% with occasional small dips for what, from profiling, appears to be garbage collection.  The shape of CPU usage on the Ubuntu system monitor is an almost vertical rise to 100% and then a flat line for all CPUs until there is a vertical drop back to normal usage at program completion.

So, Go was so efficient that a Thinkpad X201 almost burned down.  I wish I had problems like this with other languages.


#golang #goxmeans

Sunday, June 10, 2012

Engineering and Integrity

Feynman on integrity.  http://www.lhup.edu/~DSIMANEK/cargocul.htm

This made me think of conversations I have had with programmers over the last year.  I have the feeling, and it may be confirmation bias on my part since I do not have raw data, that I am hearing more Cargo Cult arguments.


I recently gave a talk in Washington, D.C. on Go.  During the talk, I cited a statistic from “Inside the Erlang VM: http://www.erlang.se/euc/08/euc_smp.pdf.  “If a program scale[s] well with the SMP VM over many cores depends very much on the characteristics of the program, some programs scale linearly up to 8 and even 16 cores while other programs barely scale at all even on 2 cores.”



I said, “Even with a six processor commodity PC Erlang performance can degrade.”  This was in the context of the discussion of the penalty languages pay for dealing with locks and context switches.  The purpose was to point out that even a language as highly optimized for concurrency as Erlang is subject to the bookkeeping effect of locks and context switches produced by our current kernel designs.

I received an email from one of the participants who had posted on his local Erlang Meetup board the message, “did anybody hear that erlang speed drops after a 6th core is involved...”


A member of the group, whom I conclude is an Erlang adherent, replied, “Six sounds pretty low, but I didn't have any good numbers, so I did a bit of digging and found a recent (2011) academic paper on Characterizing the Scalability of Erlang VM on Many-core Processors
It's got way more detail than I was willing to wade through, but the conclusion seems to be that Erlang works well up to at least 60 cores.”


I waded through the details of the thesis, and the author concludes in Section 5.2 “"The scalability of the Erlang VM can be improved by reducing lock contention and the overhead associated with it. The most critical locks are those for memory allocators. They are based upon Pthread mutex locks." ..."Reducing lock contention is not sustainable if the number of cores keeps increasing. "


The paper had “way more detail” than the respondent “was willing to wade through”, yet he was able to draw a conclusion that Erlang “works well” in a general case.  


The fan-boy mentality has always been present in the consumer population and is an integral part of how marketing campaigns are designed.  It is the iPhone versus Android argument and is in the same category as whether the Red Sox or the the Yankees are the better team.  It is bar talk based solely on personal preference and opinion.


When applied to a an engineering problem it produces a result that allows us to fool ourselves and feel good about our current assumptions, but tells us nothing about a real solution.  In fact, the diffusion of this type of opinion as fact, and the ready acceptance by programmers, is harmful to both our profession and the people who ultimately use our software.


So, in a totally unscientific manner, let me know if it is my own confirmation bias at work, or you have experienced the same phenomenon.


#golang, #erlang, #programming

Sunday, May 13, 2012

Ubuntu 12.04 with a Projector

I frequently speak to groups and I always use my Ubuntu laptop.  Sometimes I have to adjust the settings to get the projector  to recognize my PC feed.  I recently upgraded to Ubuntu 12.04 and was unprepared for the problems I encountered.

I arrived at the New York City Google Developer's Group to talk about Go version 1.0, plugged in my laptop and the signal was automatically recognized, but the the workspace on both the aptop and the projector was suddenly split in half vertically.  I tried modifying all the settings under display, but nothing helped.  Since I only had 15 minutes until I had to speak, I ended up moving the browser with my presentation slides between two workspaces.   This meant that it displayed correctly on the projector screen but was half off the screen of my laptop.

It worked well enough, and thankfully, I only had slides for this presentation and not code examples.

I had another talk the next week, so I worked through the workspace and display options and found that you need to reconfigure the default 4x2 workspace configuration.  I used compiz, went to General Settings,  chose the Desktop Size tab, and set:

Horizontal Virtual Size = 1
Vertical Virtual Size = 1
Number of Desktops = 1

and then everything lined up correctly on both the laptop and the projector.

Since I had numerous code examples and command line statements to execute in several  terminals, I opened a tmux session.

It all worked out, but I felt like I had gone backward several versions of Ubuntu.  This did not occur in the previous two releases, so I wonder if this has affected anyone else.   I would like to know if it is specific to my Lenovo or if something radically changed in 12.04.

Let me know if you have had similar problems.

Tuesday, May 1, 2012

Power of 2

As part of an event, I had to come up with a number of problems to be solved in Python.  One was to write a procedure to determine if a number is a power of two.  I supplied answers for the judges and for this problem I provided:

def is_power_of_2(x):
   while x > 1 and (x % 2 == 0):
      x = x//2
   return x == 1

and

import math
def is_power_of_2(x):
   return not (math.log(x, 2) % 1)


But one of the participants came up with a solution that I hadn't thought of and for some reason appeals to me the most out of the three.  This is my version:

def is_power_of_2(x):
    s = bin(x)[2:]
   return s[0] == "1" and s[1:] == "0"*len(s[1:])

bin is built into the language, but I've never used it and had actually forgotten that it existed.

Let me know if you have other ways of solving this that do not involve pre-computing large sets of data in advance.