Putting #noestimates in action

Sharing a real life example from an idea to the roll-out in three teams. Part 3 in estimation series.

Published in

New Work Development

10 min readNov 1, 2016

Let’s start off by clarifying what is #noestimates, besides being just another hashtag. There is lots of information on the web and yet it is not easy to distill the exact meaning. The first and the obvious thought — #noestimates is all about removing estimation out of the way. At least that is what the name suggests, right?

Well, it is not that simple.

People promoting ideas behind #noestimates are not fully against estimation. For example, here is what Woody Zuill writing on his blog: “By the way — I don’t believe that all estimates and estimating is useless and wasteful, just a great deal of how they are typically used in software development”. Which means that #noestimates is not about completely and utterly removing estimation.

From the other side, #noestimates is not just about estimation itself or lack of it. Quoting Vasco Duarte’s interview: “Focus on selecting high-impact experiments (Features or Stories), and not mindlessly deliver of a long list of detailed requirements”. Neil Killick raises a question in similar direction: “Is it possible, in certain situations, to deliver value to the customer at a rate which negates the need for doing any estimating at all, both up front and ongoing?”

Rephrasing in my own words what #noestimation stands for

focus on the end user impact instead of activity metrics of the development team like velocity and throughput
rapid delivery and continuous re-planning for maximum user value
minimal, if any, use of estimation process

XING has been focusing on points 1 and 2 even before I joined. We have been re-planning the roadmap at least quarterly, running a number of A/B tests, trying out ideas from design thinking, google design sprint, showing mockups to the end users before any line of code has been written, conducting user interviews before creating any mockups. We have been investing in making sure that continuous integration and delivery are in place. We have been creating monitoring dashboards to enable teams easily access not only the data about the health status of the app, but also to see to which extend is the app being used and whether user’s behaviour is changing. While still estimating everything.

Looking at the whole set of ideas promoted by #noestimates hashtag, the biggest value for us I saw in limiting waste from estimation process. Which is reflected in the scope of this article. The article explores only point 3 — minimal use of estimation process.

Step 1 — Building up motivation

The idea to run an experiment around removing some bits of our estimation process was fuelled by our wish to reduce the waste affiliated with the estimation process and by an inspirational talk by Vasco Duarte’s.

Internal struggles with the estimation process and the specific routine we had been using at a time — story points with planning poker — has already been covered in great detail.

When story points misfire

Trying to figure out how to estimate better. Part 1

medium.com

While constantly having at the back of my mind a thought that we need to get better at estimating, I participated in one of the Agile Coaching Circle events dedicated to #noestimates by Vasco Duarte.

Vasco was proposing to completely remove estimation process. His alternative to estimating was forecasting. With forecasting we would not need to estimate any backlog item, we would need to simply count the number of backlog items that has been delivered on average per sprint and use that number as a guideline to plan the next sprint.

Vasco was showing one case study where the team, which has been estimating at the time, tried to project the release date.

First time they projected the release date by summing up the story points for all of the estimated user stories and then dividing it by the number of average story points delivered per sprint.

When they have tried to project the release date second time around, they have disregarded estimates. They have simply counted the number of user stories and divided that on the average number of user stories delivered per sprint. Computed release dates from both attempts were in the rage in 2–3 weeks difference for a 10 month long initiative. Which is not bad taking into account normal track record of projected vs real release dates in software development field.

More information on Vasco’s ideas on #noestimates and forecasting approach you can find in his book or by checking out his blog.

Step 2 — Gathering the stats

For all of the three teams I dug up stats for the previous 6 sprints that showed:

how many story points teams have taken into the sprints (blue line)
how many story points were delivered by the end of the sprint (lime green line)

To give some perspective as to why the graphs are the way they are for different teams:

Team A was entering a new domain around Sprint 3
Team B changed their estimation scale in Sprint 3. They have been estimating previously a complex story with 20 story points, now they would estimate the story of the same complexity with 8 story points. The team has basically shifted the story point scale they have been using.
Team C had the biggest amount of team members with small children, which often times led to sick leaves whenever a new virus would pop into the kindergarten.

Next step was to calculate the prediction rate. If we promised to deliver a sprint backlog worth of 100 story points and we delivered at the end of the sprint a backlog worth of 100 story points, then predictability is 100%. If we have delivered user stories worth of only 50 story points and have previously promised 100, then our predictability was only 50%.

Average predictability when estimating. Measurements taken for 3 teams over 6 sprints

On the average prediction rate in all of the team was around 70%.

Besides planned and delivered story points, I have dug out data for the number of completed backlog items per sprint. Backlog items included:

stories
bugs
technical tasks

Average # of backlog items per sprint in 3 teams over 6 sprints.

Purple line above represents the number of backlog items completed per sprint. The number in the circle is the average number of backlog items per sprint.

Just by looking at the graphs put together, at the line with completed story points and at the one with the number of backlog items, it seems that the latter produces less noise. At the same time the average number of backlog items per sprint still gives a good picture of what a team can do within certain timeframe.

At this point I had all of the stats needed to build up confidence in the following

Hypothesis: If we were to stop using story point estimation and simply count the number of backlog items, team’s predictability would not decrease

Due diligence is done, next step is the pitch!

Step 3 — Designing the experiment

We try to approach any change as an experiment.

In the end every user story is a hypothesis that the end user really wants that new feature and will take advantage of it.

Similarly, every process change we plan to introduce is also a hypothesis. We are guessing that the change we are planning to introduce will fix the challenge at hand, but in the end it is still just a guess.

I had a hypothesis and the stats research that was supporting it. Next step was to pitch the idea to the teams. The idea of removing the estimation of anything smaller than an epic.

During an offsite I have pitched a topic “Estimation vs #noestimation” as a session for an open space. We started the discussion by trying to figure out why do we estimate, whether we could stop estimating and what we would have lost if we were.

Why do we estimate and can we do without?

Trying to find a way of being more predictable. Part 2

medium.com

Once we went through every possible reason why we estimate and found a way to suffice our needs without or with minimal estimation, I have presented my stats findings to the group.

At the end of the session one of the teams agreed to go along with an experiment.

After the open space we have started designing an experiment together. Here is what we have agreed on:

Timeframe

minimum 3 sprints to make a decision.

Success criteria

Team’s predictability does not decrease. To calculate predictability we would count the number of backlog items taken into the sprint vs the amount of backlog items delivered
We would still have valuable conversations during backlog refinement session. How would we measure it? Everyone would give his opinion whether the quality of conversations improved or not.

Change of routine during backlog refinement meetings

Previously we were using story point estimation to show that the user story was discussed and is ready for development. For the time of the experiment we have agreed to introduce a new label “dev ready” to show that the story matches our DoR (definition of ready) and is ready to be taken into the sprint.
We were also using story point estimation as a trigger to kick off a discussion. Now, instead of the story point estimation routine, we moved to asking questions, like “What is the business value of this story?”, “What is the technical implication of it?”, “What’s missing to put a dev_ready label on it?”
We would not introduce additional size limitation for user stories. They don’t have to be of the same size. Only old size limitation still applies — a user story must fit in the sprint.

Change of planning routine

Instead of looking at previous sprints velocity in story points, we were looking into previous sprints velocity in backlog items as a guide to define how many stories to take into upcoming sprint.

Roll-back scenario

It was absolutely ok to go back if success criteria was not met. The team would decide whether to continue estimating or not

Step 4 — Pilot run

We ended up running the experiment in the first team for 7 sprints, each 2 weeks long. There were already good results after the 3rd sprint, but we decided to stay in the experiment mode for a bit longer to be on the safe side.

Here’s what we got

Rolling out #noestimates experiment in the first team. Measurement are taken for 7 sprints

First chart shows:

the number of backlog items (user stories, bugs, tech tasks) taken into the sprint — blue line
the number of backlog items completed per sprint — lime green line

The second chart shows the ratio of taken vs completed backlog items and the average predictability during those 7 sprint.

The third chart compares predictability before and after moving away from story point estimation.

All in all, it was a successful pilot run

Predictability increased within the margin of an error
Most of the team members said that the quality of the conversations improved during the backlog refinement meetings
Team decided not to go back to story point estimation

Step 5 — Further roll-out

It did not take long before other two teams jumped on the train. I have presented them the results of the pilot run, our learnings and findings. We have designed similar experiments for them and off we went.

I will not bore you with graphs any more. The results of the experiment in the other two teams were similar. Predictability stayed on the same level. The quality of the conversations improved.

We have started the series of #noestimates experiments in the summer of 2015. Since then none of the teams reverted back to estimating backlog items. They still estimate epics using number of sprints as a metric or t-shirt sizes. Epic estimation is used for roadmap planning. Roadmap is re-planned at least once in a quarter.

The focus of this article has been on sharing a real life example alongside with a tangible comparison of before and after. The part that is still left uncovered is what we have learned throughout the year. So once again stay tuned for more.