Wednesday, 17 March 2010

Using ironpython to generate nhibernate mapping files from a fluent configuration

I admit, this is a bit of a goldberg machine, but here's the context:

In my current project we need to store a bunch of events related to some 20 entities. Those events are the same for all entities, and they all need to be stored in the database. We use NHibernate to handle the "talk to the database" part, and fluent-nhibernate to configure the mappings via conventions for the most part. There are some entities which are either manually mapped or have specific overrides for particular cases (manually mapping some entities and automapping others is a mistake I think I'll avoid in the future).

For the most part this setup does wonders. We can add and change our entities and the mappings are generated without we thinking much of it. Since development speed has a larger priority than runtime speed in this project, we don't have much problems with the default mappings that are generated.

Except in the case of those events. Since the events are the same for all the entities, the event classes are open generics. Now, nhibernate can't map open generics, but can map closed generics, which is how we do it. We just iterate through all the types in the assembly that correspond to our entities and map those with fluent. The problem then becomes one of performance. As of now there are about 500 generated classes, and those take a while to discover and map. Since we're using fluent nhibernate to configure it in runtime, this means that each time the app is restarted there's a small downtime (on the scale of about 1 minute). While that's not much for an app in production, while developing one tends to restart the app a bit often. That's a problem in the day to day work.

I used our integration test suite to time the mapping, and on my pc it took about a minute to run. On the other hand, if we already have an .hbm file to feed to nhibernate, we can shave about 30 to 40 seconds. That's less than half the time for a simple trade-off of having to know when to generate the mappings.

To generate the mapping we could have created a new console project to just call the correct methods on our session provider factory, which would be a bit overkill. All we needed was to setup the environment and ask fluent for the mapping file.

And here is something that IronPython shines in.

Just create an .ipy file, add references to the correct assemblies, and since there's a clear separation of concerns (or so we hope), it's easy to configure fluent and ask it to generate the mappings for us. Most of the time you'll use .net code like you do in c# or vb.

There are only a couple of issues here:

- We need to remember to regenerate the hbm files when classes/mapping changes

- We need to remember to recompile both before and after regenerating hbms. Since we're loading the assemblies with ironpython outside of visual studio, the script doesn't know if the assemblies are stale or not.

Most of this can be avoided by having a build server rebuild the hbm files on its own and fail the build if they don't match with the latest in the source control. Of course, we could just have it commit the files, but that's one step I'm not fully comfortable with.

Wednesday, 10 March 2010

Notes and reflections about the second annual scrum meeting in Portugal

A month ago I attended the second annual Scrum Meeting in Portugal, and ended up taking some notes about two of the presentations.

Here they are, both for the team at weListen, and for the world at large.

Notes on "A Practical Roadmap to Great Scrum:A Systematic Guide to Hyperproductivity" presented by Jeff Sutherland

This was the presentation that changed the way I perceive scrum. Where before I saw scrum as an agile software development methodology like xp, now I see it as a management and process template. This means a team should use scrum and complement it with technical practices. Apart from that, the largest focus to me was on 3 velocity multipliers for teams and 2 required practices for hyper-productive teams.

The 3 velocity multipliers are ready, done and self-organization. Ready and done shield an iteration from churn. A story only enters development when it is ready, and by the end of the sprint it should be done. Ready means that the story is understood and sized by the team with input from the client, and done means the same story is tested and approved to be deployed. Self-organization ties nicely into the two required practices mentioned: talk about problems and fix the same problems.

Two classes of problems mentioned: bugs that take more than 8 hours to fix, and stories that take more than twice the calendar time than estimated ideal time. For each bug or story in this situation, do a root cause analysis to figure out the underlying problem. This can be multitasking (a dev assigned to several projects), final inspection or testing done too late or interruptions and form the basis of an impediment list for the team together with the product owner and scrum-master to solve.

There were several interesting data points mentioned. In a given company, the peak of productivity was measured on 60 hour work weeks in waterfall projects, and on 30 hour work weeks with double story points delivered. This means an productivity increase of about 3 to 4 times. Part of the presentation used Systematic, a dutch software company as the source for several data points for scrum teams. They noticed a linear scale in developer productivity for project sizes, going against Brook's Law, although I think the Law mentioned adding people to an already late project, and didn't mention scale. They apparently implemented scrum using Mary Poppendieck's lean tools as described in her book.

Some "required truths" about hyper-productive teams were discussed, and I found them interesting insights. The ones that stuck to my mind the most were:

  • Everyone must be trained in scrum. This is so that all the players follow the same playbook and the concepts and practices are well understood.
  • Backlog must be ready to implement before the sprint, and done by the end (tying into the concepts of ready and done mentioned earlier) .
  • Pair immediately on a task if there's only one person capable of handling it, to avoid bottlenecks in the throughput.
  • Short sprints (often just one week)
  • Servant leadership, where the product owner and scrum-master are a resource of the team, instead of using the team as a resource.
  • No multitasking. This one I'm a bit curious as to how realistic it is, considering maintenance work, and teams with many past projects.

Overall it was a very interesting presentation. There are several videos on youtube with Jeff Sutherland about scrum. This one is very similar to the one I saw.

Notes on "Scrum and TFS 2010" by Mitch Lacey

This was mostly a storytelling and demo about new features in TFS. I did end up asking Mitch about the importance of accounting for time originally estimated, time in task and time remaining for small and recent to scrum teams, as to me it seemed a bit too much bureaucracy for a small team. The answer was enlightening, since those numbers are particularly important for those teams as a way to surface problems (either with estimation, delays, interruptions or other causes). This means I might start looking into accounting for such numbers. "You can't manage what you don't measure". When a team gels and is proficient, the numbers may not matter as the team starts getting into the groove and noticing impediments becomes intuitive.