Who asked for that?

My last few gigs have each seen stuff added to systems that weren’t actually needed. In Lean terms this is waste, and what’s interesting is that the waste came from an unexpected direction: enterprise architecture.

I’ve had run-ins with architects before, but it’s only really now that I’ve been around Lean practitioners that I think I understand why. It’s often pretty simple: they’re having the wrong conversations.

In a note to my future self, here’s some thoughts about what the wrong conversations look like.

Continue reading “Who asked for that?”

On OpsDev

Don’t build software. Except when you can’t not.

 

I’ve heard it said that working in insurance is more boring than banking and without the pressure. But from an IT perspective there’s so much that’s done with handshakes and informal nods that creating and maintaining systems to handle all the exception cases is … tricky. So underwriting systems tend to be very custom and very particular to a company’s own way of doing business.

To complicate things, a few years ago the strategic decision was taken at the site I was working at to Not Build Software. Rather, the firm would buy in whatever they could and run their business on packaged software. Unfortunately the result was that for the more specialized bits of the business — like, for instance, underwriting — the software they needed was pretty specialized in turn and so quite tricky to get right. So in practice they were really outsourcing their development to a third party, who sold them a “package that can be customized,” which really meant a fixed-price, fixed term contract.

The particular package they bought is quite clever and can probably be made to do pretty much anything an insurance firm needs. Technically it’s a big ball o’mud, written in Java with a single back-end database. Customisation can be done through database scripts or by bits of Java code slapped around the central core.

In theory, then, the package should have been really easy to deploy. Take a JBoss application server, shove in the EAR, point it at a database, job’s done. But like most enterprise software the package wasn’t designed with maintenance and deployment in mind: too much Dev, not enough Ops.

On paper this suited the insurance company. “We’ll take the package as it is. A little bit of customisation and everything will be fine. The operations guys can deploy the software, heck, we can have the support guys do it.”

But in practice, two years down the line, there were four parallel development streams and over 30 test environments. Deployment frequency to production decreased from one every two weeks to ‘big bang’ releases every three months with a succession of small patches in the meantime. The time to deploy went up from fifteen minutes to three hours. Quality went down and so too did the rate of delivery of new features.

What went wrong?

No plan survives contact with the enemy. Or a contract lawyer

So there’s a reason why the term is ‘DevOps’.

Sometimes I’ll see job ads for ‘DevOps Engineers’ which need 4+ years experience with shell scripting in Bash and a bit of Puppet knowledge. At a rate rather less than half that of a competent developer. Which makes me wonder with those jobs what’s happened to the ‘Dev’ part of ‘DevOps’ — this is really a job description for a ‘operations team member that knows a bit of scripting.” A reasonable place to start but – those aren’t DevOps engineers.

The insurance company had plenty of network engineers, server engineers and DBAs. There was a complete department who could have Active Directory do their bidding. Marvellous knowledgeable people all. But scripting? ‘No, that’s development. Don’t know how to do that.’

Some of the build engineers put together some neat scripts in Puppet to automatically create IaaS virtual machines in Azure and deploy clustered SQL Servers there. Clever stuff. Unfortunately nobody thought to check with the teams who’d use those servers, so nobody found out that the three or four hours that it would take to create a test environment would be – suboptimal. “Well, then, the database should be made smaller.”

Er, no – this is packaged software, we have to deal the hand we’re given and put up with the package we have. It might be better to split customer data from standing data, or split the application into microservices. But without development effort that won’t happen.

Similarly, requests to move off Windows to Linux, or to update the JVM, or re-code the presentation layer so it wasn’t stateful, or rework the persistence layer to handle retries in case of failure… there’s no developers to do that work. You’d have to ask the business analysts to prioritize those demands against underwriting requirements – and that’s a tough sell.

The anti-pattern: Ops with no Dev

So in a nutshell: non-functional requirements aren’t just response times, resilience, and database sizes: they’re also operability.

That can cover “can we monitor the system”, “can we control it easily” and “can we deploy updates easy”. It can also cover more subtle requirements like “can we have two streams of development working in parallel”, “can we automate our acceptance testing” and “can we quickly deploy small changes”.

It’s those last that got lost at the insurance company. There was no ‘Dev’ in the ‘Ops’ – so totally reasonable operational NFRs like 100% infrastructure automation were prioritized above development lifecycle requirements like rapid deployment.

This approach was endemic. Some other examples:

  • Everything was deployed to fresh application servers and database servers on each release. The application servers were considered ‘cattle’, so they had random names. If an application server failed there was no way to quickly find out what its role or function was – rather, a Puppet database query was necessary or an API call to Azure. Sacrifice was made of operability for ideological purity.
  • The logical unit of deployment was a set of Azure resource groups, comprising network infrastructure, load balancers, virtual machines, SQL databases, storage, a full database backup, and the application. Everything apart from the application was effectively static, changing perhaps once or twice a month – yet the whole infrastructure had to be re-deployed every time a single code change was made.
  • The tools chosen for deployment – principally Puppet – were really hard to debug and control. Even getting hold of log files (and figuring out which server they might be on) was a challenge. Debugging with anything more than print statements wasn’t an option.

Sure, buy don’t build, but remember the NFRs

Nobody would buy a program that doesn’t run on the computer they use, right? Well, the firm I moved to is doing just that – they have zero Oracle knowledge and no Solaris expertise but a key part of their authentication system is going to use a package based on that platform…

That’s an extreme example, but the point is clear: involve your operations and systems administrations teams in the selection and vetting of your packaged applications.

Interview questions

Some random thoughts about interviews. I’ve had a few, and they continue, especially with me being a contractor and all that.

It’s worth remembering that an interview is a two-way street. As an English chap, and with my personal background, I’m used to being subservient: ‘what is it you need me to do’ tends to be about the limit in my regular repertoire.

But after I’ve landed there’s Regrets that I didn’t ask some more searching questions, and, possibly didn’t take the job at all.

This goes double for the well-paid jobs where the interviewer doesn’t ask so many questions themselves but offers the job anyway. Why have they offered immediately? Is it because I’m that White Middle-class Middle-aged Male who won’t rock the boat but will prop up their hiring statistics? If that’s the case there’s a good chance I won’t actually fit so well. Better to be somewhere that I can contribute, feel valued and otherwise be content. So these questions should, hopefully, help draw that out.

Some searching questions might include questions about metrics. What do they measure? Why? What has that told them?

Or something more qualitative. What would it take to show success at the job? What impact would that have?

More generally, what are the general themes for the department? Am I hired for a project or for a capability? How long is that likely to last? Who else is involved, across the company? Most interviews would mention at least this as interviewers generally like to talk about themselves, but that’s not given by any means.

And what about the people? Who are the others in the team? Who would I work closely with? Am i replacing someone or is this a new position? How diverse is the group? Do team members move elsewhere in the company? Even as a contractor who wouldn’t normally be moved laterally in a company,

Who else has been considered for the position? Do my qualifications and experience match that?

Then something on the financial side. Why are they considering a contractor? Is this likely to be a long-term commitment?

Finally, engineering questions. Will I cut code? What’s the time taken for a developer to get their code live? Perhaps not so relevant for architecture or strategy jobs but aspects of the Joel Test can illustrate corporate sanity. Or not.

Well, that was interesting: on changing jobs.

Both of my long-term readers know that I work as a contractor. That basically means I stick around in a job for a little while until either I get bored or burnt out, or my client gets bored or burnt out.

It’s a precarious existence, and not for everyone, but I do get to work with some interesting people and see some interesting things.

I’ve just changed jobs so it’s about time I put together a few posts on things I’ve learned in my last couple of roles. And, perhaps, some things to remember when I’m next interviewing.

Network security group configuration for virtual machines

I like Azure, and since I worked with it in anger last a few years ago Microsoft have revised their portal (twice) and added lots and lots of new features. One that I really like is the “resource group” which is a way to logically group together Azure resources.

Inside a resource group it’s possible to create a Network Security Group. This is a neat container for combining together firewall rules and routing between network interfaces.

And this is where it all goes a bit wrong.

Continue reading “Network security group configuration for virtual machines”

WordPress on Azure

As I just mentioned in my last post, this ‘ere blog is running in Azure, because I’m a cheapskate and get free hosting from Microsoft.

But it wasn’t as easy as I’d thought.

In the day job I’m doing lots of Azure stuff again. Our esteemed leader wants us to use Platform-as-a-Service where we can, which makes sense as the firm I’m working with don’t want to write anything themselves if they can help it. Buy, not build is their mantra.

So I merrily went to Azure and chose Add New, “WordPress + MySQL”. Entered all the passwords (so many!) and left the wheels to spin. Whoops!

Continue reading “WordPress on Azure”

StyleCop, NuGet Package Restore and Jenkins: beware, caustic mixture

Individually, they’re lovely: now it’s open-source, StyleCop seems to be (finally!) getting the love and attention it needs, NuGet has rapidly come of age to be the one-stop-shop for package management in .NET without the angle-bracket heartache that is Maven, and Jenkins, well, Jenkins just rocks.

But together they don’t play nice at all.

Continue reading “StyleCop, NuGet Package Restore and Jenkins: beware, caustic mixture”