When story points misfire
Trying to figure out how to estimate better. Part 1
I started working at XING about 2.5 years ago. When I arrived, I got three teams to work with. All of them were using Scrum for quite some time. It seemed that agile values and principles were already part of their DNA. There was a well established routine of two-week sprints with weekly backlog refinement sessions. All of the teams had continuous delivery implemented, meaning every user story could have been easily deployed to production by triggering one Jenkins job. It was supposed to be an easy start.
And yet, shortly after my arrival, one of the teams was not able to deliver a new feature within the promised time frame. This new feature was originally estimated to take 3 sprints (1,5 months), and there we were 4 months later, still not able to deliver.
In the end, it took whole 6 months from the start of development till the point when the feature was live to 100% of XING users.
After launch, there was a strong message from upper management that we need to learn to estimate better. If they would have known that this feature would end up being so expensive, they may not have invested in it in the first place.

Back then the planning routine comprised of following steps:
1. Define a rough set of epics, initiatives for half a year or more.
2. Discover epics together with the Product Owner
3. Estimate epics in T-Shirt sizes
4. Gradually break the epics down into stories
5. Estimate user stories in story points.
To be fair, for the most part this kind of planning approach was working well. Until it wasn’t.
Keeping at the back of my mind the message of getting better at estimation, I started paying more attention to how we were doing it.
On a number of occasions during backlog refinement sessions the conversations triggered by story point estimation lead in an unexpected direction.
Anti-pattern #1 — Same estimate, different understanding

Product Owner presented the story. A number of people give the same estimation in story points. No discussion follows. We have an agreement that when the story point estimation provided by all team members is the same, we don’t discuss further. Only afterwards would we find out that people have understood what it takes to implement that story and what it really is about in completely different ways.
Anti-pattern #2: Plain bargaining

Same refinement session, after a brief presentation of a user story by the Product Owner guys arrive at different story point estimation. There is a short ping pong of:
- It’s an 8.
- Nah, it’s a 5! It’s not THAT difficult
- Well, fine. I can also go with a 5. Let’s move on to the next one.
Some exchange of opinions happened, but wether it was of any value remains to be seen.
Anti-pattern #3: Superficial risk assessment

- It’s a 5
- Nah, it’s an 8. The code we are about to touch is quite ‘smelly’ and I would like to be on the safe side.
- Well, fine, let’s go for an 8!
There is at least some risk assessment, it looks like we are moving into the right direction. But when taking a closer look — is this superficial exchange of risk assessment of any good? Especially if not followed by “why we have the smelly code” and “how to make sure we don’t have it again”?
Anti-pattern #4: What’s complexity again?

- It’s a 5!
- It’s an 8! It appears to be simple but requires lot’s of effort and manual work.
- Wait. We are measuring complexity. It is not as complex as our pattern story for an ‘8’.
- Right, but what’s complexity again?
Time after time I would see puzzled faces when we were to estimate some long running tasks that were not complex. The statement that we are estimating complexity and not time, and that such estimation will even itself out on the average was, somehow, not getting through. Or it was, but not for the cases of long lasting fairly simple tasks.
Anti-pattern #5: Re-estimating partly done stories

- Alright, so “Feature A” is almost done, from 13 story points only about 2 are left. “Feature B” is also half way done, let’s include only 3 story points from it in the total amount for the sprint planning.
Situations like this one would occasionally happen during sprint planning meeting. Even after agreeing to count the story as a whole when the story spans multiple sprints. When thinking out-loud, how much more stories to pull into the next sprint, this kind of reassessment would normally happen.
Anti-pattern #6: Mapping story points to time

Whenever a new developer would start, and he would be onboarded by someone else from the team, I would hear a dialogue along the following lines:
- How do you estimate? What’s a 5, for example?
- Well, 5 story points are about 1 week of work, 8 is about 2 weeks of work. 1 is like a 1 line changer… You’ll get the drill soon.
Regardless of the whole complexity talk and having ‘pattern’ stories for every number of story points, people would map story point estimation to time. Because it’s easier to understand, remember and communicate further.
Anti-pattern #7: “I need an estimate”

Especially in situations when the bargaining talk would last for some time, the Product Owner would often times encourage to wrap it up and give the number of story points. “Guys, I need an estimate” would indeed reflect her needs. The process is designed in a way that the user story that has not been estimated, cannot go into the sprint. During backlog refinement discussions we were supposed to arrive at the shared understanding of the value of the user story and its technical implementation. However, the implicit message that the team members were getting was — the estimation itself is what really matters. And that is not the fault of the Product Owner or anyone else, we simply designed the process in a way to make story point estimation as a pre-requisite for any idea to be implemented. If the Product Owner wanted anything to be put into the sprint backlog, it had to be estimated.
Take away
Any practice or any tool can be misused. I was wondering whether in all of the cases described above, we were simply misusing story point estimation as a tool.
Should I have been stricter in the way we as a team were approaching story point estimation and planning poker? Would we be able to avoid some anti-patterns if we were to discuss every user story in detail? Should I have reminded everyone that we are estimating complexity and not time during on every meeting?
But more importantly, if looking at story point estimation as a tool for a job, what kind of job exactly I was hoping to fulfil with it?
The “job” or my expectations towards story point estimation were:
- make the output of the team predictable, i.e. help define the scope for the next sprint, help in communicating milestones and clarifying expectations with the stakeholders.
- trigger valuable conversation around the user story that would lead to establishing shared understanding of what we are trying to build between Product Owner and the team and within the team itself
After paying close attention to estimation process for a couple of months, I had my doubts about story point estimation being the best trigger to have valuable conversation around the subject of what we want to build. I had a feeling that simply asking specific questions would provide a better starting point.
For example, instead of proposing: “Let’s estimate” after presenting the user story, simply ask 2 questions:
Question 1 — Can someone else on the team explain in his own words what’s the value of the user story?
Question 2 — What’s the technical implication?
It was easy to find a different/better tool for the job of “triggering valuable conversation”.
The only other job that I was expecting story point estimation to address was to make the output of the team predictable. And another wild gut feeling which was partially inspired by #noestimates talks was — if we were to remove story point estimation, we would end up keeping the same level of predictability. Thus making story point estimation not needed anymore.
Once I was clear with what were my expectations towards estimation process, I wanted to see how far those expectations were from the expectations of the teams I have been working with. To figure that out I have pitched a session during our offsite — Why do we estimate and can we do without? The results of that open space are shared in a separate blog post.
The rest of the story — how we designed the experiment around removing story point estimation and what were the results are covered in part 3 of the series.