Click here to view the complete list of archived articles

This article was originally published in the Summer 2011 issue of Methods & Tools

Everything You Always Wanted to Know About Software Measurement

Sue Rule & P. Grant Rule, s.rule @ smsexemplar.com

Software Measurement Services Ltd., www.smsexemplar.com

What Does Functional Size Measurement Mean?

When Grant Rule was working with early pioneers of agile methods back in the late 1980s, estimating was a big issue. No one, it seemed, knew how to determine accurately in advance how big a project was, how long it would take, or how much it would cost. This was particularly a problem for a software house delivering fixed price contracts. Focusing on rapid, iterative delivery to ensure the user gets the right software product is good; but at what price?

While advocates of Lean-Agile rightly emphasise the importance of delivering the right thing, it is essential that professional software managers also keep reliable accounts of software process performance for estimating costs, auditing performance, and improving productivity. Effectiveness is delivering optimum value for the minimum cost.

The solution to Grant�s estimating issue, and many associated software project control issues, was the IFPUG function point measurement method developed by Alan Albrecht in 1979. Yet for various reasons, functional size measurement has failed to become widely adopted as a software project management tool. Perhaps it seems arcane - difficult - part of that top-heavy bureaucracy development teams want to be rid of. But as a result, estimating remains a widespread problem for Agile and non-Agile developers alike. Process performance remains hugely variable and unpredictable, with measurement often still the most neglected area of process improvement. By stripping away much of the dysfunctional and redundant process control that can accumulate around the software process, the current interest in Agile is bringing many of these issues to the surface. But many IT professionals seem stuck in a rut, ploughing new land with a newly Agile team but using the same blunt plough and running into the same obstacles.

CIOs, bid teams, development teams, vendor managers and legal advisors seeking better ways of contracting - particularly for Lean-Agile software development - are still faced with the dilemmas Albrecht was seeking to tackle. How do you know your development teams - in-house or outsourced - are delivering best value for money? How does a supplier manage the risk of a fixed price contract? How does he demonstrate competitive productivity levels and on-going improvement? How will Agile benefit my organisation? How does the business manage the risk if we can�t detail the requirements up-front? How do you measure the benefits of outsourcing IT? Which technology should we choose? Will a change of technology affect the value delivered? What are the relative merits of buying it, building it or outsourcing it?

Alan Albrecht�s answer to addressing these issues was to measure software size in terms of the output delivered to the user - its functionality - rather than simply in terms of the time and effort put into the development. Deriving a software size from the user�s requirements which can then be mapped to the time, resources and budget needed to realise those requirements gives you the necessary objectivity to tackle the scope, project and performance control issues. It also takes you at least part way to measuring the business value and benefit delivered by software. At least you know how much it will cost and how long it will take to deliver the specified functionality; the software should do what it "says on the tin" and it should be delivered on schedule, within budget. If the customer has bought beans instead of tomatoes, or needed a three course meal and not a tin of beans at all, there is something amiss elsewhere in the process.

Functional size measurement means the ability to make objective performance comparisons across teams, across market sectors, and between different suppliers. It means reliable productivity and pricing benchmarks. It means accurate baselines and measures of improvement. It means early and accurate estimations of software project cost and duration. That was why Grant Rule first adopted the technique back in the 1980s - and remains a knowledgeable advocate of it today.

Estimating

The predictability of software projects remains a major issue for many business managers. Agile can exacerbate this if delivered software products or software product components have to be routinely re-factored. Customers considering outsourced Agile development will rightly require some assurance of value for money, and suppliers need a reliable oversight of costs to ensure prices remain competitive. The �Story Point� measures used by many Agile teams to estimate and manage projects are little more than a new take on our old friend, the measure of input. Estimates using Story Points rely on team knowledge and capability, based on individual experience. The process of arriving at them is even known as Planning Poker, an indication of the level of personal skill and informed guesswork required. Story Points cannot be compared between one team and another (let alone one market sector or one supplier and another). Their use either as an effective and reliable estimating tool or as objective and comparable measures of productivity is therefore very limited.

In 2010, Grant Rule published a short paper on using the most modern of the functional size methods - COSMIC - to derive accurate estimates from User Stories (used by Agile teams to capture requirements.) As measurement specialists, SMS recommend the use of COSMIC as the quickest, most reliable, and most easily learned method of introducing robust software estimating practice for developers using Agile or more traditional development practices.

Size Matters - Accurate Early Estimating

Project failure rates have been shown to relate strongly to the size of project undertaken, with the largest projects accounting for the highest failure rate. Rule�s Relative Size Scale was a table developed by Grant Rule to provide a quick and easy early range estimate of project size and cost in terms of clothing sizes - Small, Medium, Large, Extra Large. The scale uses benchmark data to provide a valuable and reliable early-stage feasability check.

COCOMO II Profiling - Accounting for Non-Functional Requirements

The Constructive Cost Model (COCOMO) is an algorithmic software cost estimation model developed by Barry Boehm. The model uses a basic regression formula, with parameters that are derived from historical project data and current project characteristics. COCOMO enables more refined estimates to be derived from simple unadjusted function points by taking into account differences in non-functional project attributes (Cost Drivers). Detailed COCOMO can also account for the influence of individual project phases.

COCOMO II was brought out in 2000, to adapt the original model for use estimating software projects in modern development conditions such as incremental development and rational unified process. COCOMO II is continually evolving to adapt to changing software methods.

What are Function Points?

Function Points measure the end users requirements in terms of data movements. This means that an early functional size estimate can be derived before any code has been written. They are then also used to measure the actual cost of developing the requirements, which can be compared directly to the estimate and used to calibrate a scale for a particular development environment. If development productivity figures are known, the cost of developing new software can therefore be estimated early enough to make direct comparisons to the cost of buying a software package, or simply using a non-technical solution. Function Point measures are independent of technology so can compare different suppliers, technologies and different teams to give an objective basis for a cost benefit decision.

Because they encompass both cost (development effort consumed) and benefit (requirements fulfilled), Function Points unlock the door to measuring productivity; velocity; quality; and value in software development.

The capacity to measure the amount of output produced for input provided gives an objective productivity figure.

gives an objective figure. Measured against time - E.g. FP delivered per elapsed month - function points provide objective and comparable measures of velocity .

- function points provide objective and comparable measures of . The number of defects detected over total size of delivery - i.e. no of defects delivered over no of FP delivered - gives a measure of quality .

- gives a measure of . By comparing costs, effort and time expended per function point delivered, we can also demonstrate the huge waste generated by ineffective software processes compared to the standards of measurable efficiency shown by effective, Right-shifted organisations. Although a long way from being a comprehensive measure of value delivered to the customer, this offers a measure of the component of value for which the software developer is solely responsible. Ensuring the functionality delivered is aligned to business need is the joint responsibility of the business user and the development team and outside the scope of functional size measurement.

A Comparison of Functional Size Methods

The most widely used function point methods are IFPUG, NESMA, Mk II and COSMIC. Of these, IFPUG is based on the original method designed by Alan Albrecht. COSMIC is the newest method, developed by an international team of metrics experts to support modern software.

Where organisations have already made some investment in IFPUG capability, it may make sense to use this more widely known method. It is actively managed and maintained by the International Function Point User Group, with the latest version 4.3 released in January 2010. The Certified Function Point Specialist qualification is a recognised standard for IFPUG counting competency. There is considerable IFPUG benchmarking data available, although it is questionable how valuable some of the older data is in terms of benchmarking modern software performance.

While the older functional size measures - typically, IFPUG - can be used for Agile projects, there are significant problems. IFPUG counting is not the easiest process to learn or apply, and even with trained practitioners, time can be wasted debating fuzzy areas. Interpretation of the IFPUG counting rules for complex modern software environments can be tricky, sometimes resulting in differences of opinion between supplier and customer.

COSMIC offers all the advantages of functional size measurement, without some of the shortcomings of earlier methodologies which were not designed with 21st century software in mind. It will prove easier to introduce, easier for both business users and software developers to understand and apply, and a more effective communication mechanism for negotiating the delivery of value. It is an established standard with recognised training qualifications. There is a growing repository of benchmark data available.

See the table below for a more detailed comparison.

Types of measurement scale and permissible operations using them

The type of scale depends on the nature of the relationship between values on the scale. Four types of scale are commonly defined:

Nominal � arbitrary labels, classification data, no ordering � the measurement values are categorical but it makes no sense to state that one category is �greater than� another. For example: Yes/No; Black/White/Yellow/Red; male/female, animal/vegetable/mineral; the classification of defects by their type.

Ordinal � ordered but differences between values are not important � the measurement values are rankings. For example: restaurant �star� ratings; political parties on left to right of the spectrum are given labels Red, Orange, Blue; Likert scales that rank �user satisfaction� on a scale of 1..5; the assignment of a severity level to defects.

Interval � ordered, constant scale, but no natural zero � the measurement values have equal distances corresponding to equal quantities of the attribute. For example: dates, temperature on Celsius or Fahrenheit scales � differences make sense, but ratios do not (e.g., 30�-20� = 20�-10�, but 20� is not twice as hot as 10�! Other examples: cyclomatic complexity has the minimum value of one, but each additional path increments the count by one.

Ratio � ordered, constant scale, natural zero � the measurement values have equal distances corresponding to equal quantities of the attribute where the value of zero corresponds to none of the attribute. For example: height; weight; age; length; temperature on Kelvin scale (e.g. absolute zero = 0�K, and 200�K is twice as hot as 100�K); the size of a software source listing in terms of Non-Commentary Source Statements (or Source Lines Of Code).

The method of measurement usually affects the type of scale that can be used reliably with a given attribute. For example, subjective methods of measurement usually only support ordinal or nominal scales.

Only certain operations can be performed on certain scales of measurement. The following list summarizes which operations are legitimate for each scale. Note that you can always apply operations from a 'lesser scale' to any particular data, e.g. you may apply nominal, ordinal, or interval operations to an interval scaled datum.

Nominal Scale. You are only allowed to examine if a nominal scale datum is equal to some particular value or to count the number of occurrences of each value. For example, gender is a nominal scale variable. You can examine if the gender of a person is F (female) or to count the number of Ms (males) in a sample. Valid statistics: mode, chi square.

Ordinal Scale. You are also allowed to examine if an ordinal scale datum is less than or greater than another value. Hence, you can 'rank' ordinal data, but you cannot 'quantify' differences between two ordinal values. For example, political party is an ordinal datum with the Liberal Democratic Party to the left of the Conservative Party, but you can't quantify the difference. Another example are preference scores, e.g. ratings of eating establishments where 10=good, 1=poor, but the difference between an establishment with a 10 ranking and an 8 ranking can't be quantified. Valid statistics: mode, chi square, median, percentile.

Interval Scale. You are also allowed to quantify the difference between two interval scale values but there is no natural zero. For example, temperature scales are interval data with 25C warmer than 20C and a 5C difference has some physical meaning. Note that 0C is arbitrary, so that it does not make sense to say that 20C is twice as hot as 10C. Valid statistics: mode, chi square, median, percentile, mean, standard deviation, correlation, regression, analysis of variance.

Ratio Scale. You are also allowed to take ratios among ratio scaled variables. Physical measurements of height, weight, and length are typically ratio variables. It is now meaningful to say that 10 metres is twice as long as 5 metres. This ratio holds true regardless of which scale the object is being measured in (e.g. metres or yards). This is because there is a natural zero. Valid statistics: mode, chi square, median, percentile, mean, standard deviation, correlation, regression, analysis of variance, geometric mean, harmonic mean, coefficient of variation, logarithms.

A comparison of the most common Function Size Measurement (FSM) Methods

General Information IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 Origin Created by Allan Albrecht at IBM in 1978



Latest release (January 2010) of the original method Believed to have been created by NESMA (aka NEFPUG) in mid-1980s

Derived from IFPUG Created by Charles Symons at Nolan Norton in 1984 (put into public domain 1991) Updated method for use with DBMS, structured methods, CASE tools, etc Created by international consortium of industry subject matter experts and academics from 19 countries in 1997 Updated method for use with OOA/D, layered architectures, Web2.0, lean/agile, etc Counting Practices Manual Available to IFPUG members Available for sale Available - public domain Available - public domain Counting Practices Manual - languages available English & some other language versions available to members Dutch-language version

English-language version English-language version 9 language versions:

Arabic, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish Used by Public & private sector organisations, large & small, both customers & vendors, around the world

Mostly MIS users Stable user base � international Public & private sector organisations, large & small, both customers & vendors, primarily in The Netherlands Mostly MIS users Declining user base � mostly The Netherlands Originally HM Government�s preferred method for sizing & estimating software. Now used by a few public sector customers & their vendors



Declining user base � mostly United Kingdom Public & private sector organisations, large & small, both customers & vendors, around the world MIS and Engineering users Growing user base � international Terminology used Founded in the 1970s Founded in the 1970s Uses structured methods terminology Compatible with OOA/D, & software eng. principles Availability Available only to members of IFPUG (but easy to join organisation) Public domain -� download from NESMA Public domain � download from UKSMA Public domain � download from COSMIC Design Authority (independent of vendors) International Function Point Users Group (IFPUG)

www.ifpug.org Netherlands Software Metrics Association (NESMA)

www.nesma.nl United Kingdom Software Metrics Association (UKSMA)

www.uksma.co.uk COmmon Software Measurement International Consortium (COSMIC) www.cosmicon.com

Common Features

1. Compliance

All four methods comply with the international standard for Functional Size Measurement Methods � ISO14143:

IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 ISO/IEC 20926:2003 ISO Standard applies only to

unadjusted FP ISO/IEC 24570:2005 ISO Standard applies only to

unadjusted FP ISO/IEC 20968:2002 Recommended method for HM Government (UK) ISO/IEC 19761:2003/2010 BCS Technology Award Winner in 2006 Recognised as a National Standard in Spain & Japan

2. Certification

All four methods operate certification schemes for training measurement staff:

IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 Yes

Certified Function Point Specialist (CFPS)

Uses IFPUG CFPS Yes

Certified Function Point Analyst

(CFPA) Yes

COSMIC Practitioner Certification

Benchmarking Data

All four methods are supported by the International Software Benchmarking Standards Group (ISBSG). There are differences, however, in the size of the comparative data pool:

IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 Large data set compiled over many years � the utility of antique data is questionable Large

Comparisons use

IFPUG data Small

Some native data; can be compared to IFPUG data if care is taken Moderate and growing

Data since 1997; ISBSG benchmark released 2009; can be compared to older data if care is taken

All four methods share the following characteristics:

Oriented toward user-required functionality

Helps verify consistency & completeness of user-required functionality

Analyses can be used as basis for construction of tests independent of code & test activities

Measures functional size of dynamic (behavioural) aspects of system (expressed as e.g. use cases, conversational dialogues, user stories, epics & themes, etc)

Measures development of new requirements

Measures adaptive maintenance (enhancements)

Designed for MIS systems - flat & indexed files, batch systems, OLTP systems

Can be used to measure Functional User Requirements before design, code & test

Can be used to measure Functional User Requirements after design, code & test

Can be used to (re)estimate during product life-cycle

Size can be used as input into top-down software cost models such as COCOMO.II.2000, SLIM, SEER, Price-S, etc

Can be used to construct product burndown charts, calculate takt time, #sprints, etc

Independent of product non-functional requirements

Independent of project constraints

Independent of developer experience

Independent of process, project management & development methods

Early estimates of functional size can be made based on incomplete knowledge of Functional User Requirements � enabling consistent use of one size scale for estimating & measurement throughout project:

IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 Can produce early estimates using various methods:

e.g. Fast Eddy, File-Based Approach, Transaction-Based approach Can produce early estimates using various methods:

e.g. Fast Eddy, File-Based Approach, Transaction-Based approach Can produce early estimates using various methods:

e.g. Data Model Approach (CRUDL), Transaction-Based approach Can produce early estimates using various methods:

Event-Based Approach, Object-Based Approach, Story-Based Approach

NONE of the four methods deliver the following characteristics:

Measures corrective maintenance (fixes)

Measures perfective maintenance (refactoring for improved performance)

Measures algorithmic complexity

Measures reuse of code

Differences between the four main methods:

Characteristic IFPUG FPA r4.3 NESMA FPA v2.0 Mark II FPA r1.3.1 COSMIC FSM r3.0.1 Measures functional size of static (data storage) aspects of a system (expressed as files, tables, entity types, classes, etc) Yes Yes Regarded as �double accounting�

only information processing measured Regarded as �double accounting�

only information processing measured Compatible with modern methods of requirements analysis Partially

(1975/85s concepts)

requires data model Partially

(1980/85s concepts)

requires data model Yes

(1980/95s concepts)

requires data model Yes

(1995/2010s concepts)

incl. incremental Designed for MIS systems - Relational DBMS No

But mapping rules have been developed No

But mapping rules have been developed Yes Yes Designed to be applicable to real-time and/or embedded systems No

MIS concepts only No

MIS concepts only No

terminology can be re-interpreted for real-time Yes

one common model applicable across MIS, real-time & embedded systems Can be used to measure complex, layered architectures No

Rules assume monolithic system � infrastructure & middleware is �invisible� No

Rules assume monolithic system � infrastructure & middleware is �invisible� Yes

Limited � can recognise

3-tier architecture Yes

Designed to recognise �layered architectures� �

measures all functional requirements allocated to software systems Scale type:



Nominal � distinguishes members of sets, unordered

Ordinal � relationship between sets, unequal intervals

Interval � comparisons, equal intervals, arbitrary zero

Ratio � comparisons, equal intervals, a natural zero

ref: ISO/IEC CD 15939. �Nominal/Ordinal� Scale Unequal intervals between Low & Average, and between Average & High �Nominal/Ordinal� Scale Unequal intervals between Low & Average, and between Average & High �Ordinal/Interval� Scale Weights derived so that

1 MkII fp = 1 IFPUG fp

approximately comparing functional processes Ratio Scale Empirical data suggests

1 cfp = 1 IFPUG fp

approximately comparing functional processes Permissible arithmetic & statistical operations Categories assigned relative weights: Data can be 'ranked', but 'quantifying' differences between values is difficult due to �cut off� (Low is c. half of High) �

ratios are problematic Categories assigned relative weights: Data can be 'ranked', but 'quantifying' differences between values is difficult due to �cut off� (Low is c. half of High) �

ratios are problematic Ordered, synthetic scale with a natural zero: Data can be ranked; differences & ratios between values can be quantified within limits but are problematic due to the use of weights Ordered, constant scale with a natural zero: Data can be ranked; differences between values can be quantified; ratios make sense (i.e. 20 is twice the size of 10, and 2000cfp is twice 1000cfp). Accounts for information processing by: Sizing static data and dynamic behaviour Sizing static data and dynamic behaviour Sizing dynamic behaviour, the use of data Sizing dynamic behaviour, the use of data Models the functional user requirements as: File Types

and

Elementary Process

(= Input-Process-Output) File Types

and

Elementary Process

(= Input-Process-Output) Logical Transactions

(= Input-Process-Output) Functional Processes

(= Input-Process-Output) Equivalent of

stimulus/response message pair (i.e. a �thread of control with some input, related processing, and some output) Elementary Process either:

External Input (EI),

External Output (EO) or

External Query (EQ)

depending on �primary intent� Elementary Process either:

External Input (EI),

External Output (EO) or

External Query (EQ)

depending on �primary intent� Logical Transaction (LT) All stimulus/response message pairs regarded at LT irrespective of �primary purpose� Functional Process (FP) All stimulus/response message pairs regarded at FP irrespective of �primary purpose� Rules for measuring size Different rules apply depending on elementary process type Different rules apply depending on elementary process type Same rules apply to all logical transactions Same rules apply to all functional processes Base Functional Component(s)





Internal Logical File

External Interface File

External Input

External Output

External Query Internal Logical File

External Interface File

External Input

External Output

External Query Input Data Element

Entity Reference

Output Data Element Data Movement



(either: Entry, eXit, Read, or Write depending on direction of movement) Contributors to functional size Per File Type:

#static Data Element Types & #Record Element Types Per Transaction Type:

#dynamic Data Element Types

& #File Type References Per File Type:

#static Data Element Types & #Record Element Types Per Transaction Type:

#dynamic Data Element Types

& #File Type References Per Logical Transaction:

#Input Data Elements

#Entity References

#Output Data Elements Per Functional Process:

#Data Movements i.e. the movement (Entry, eXit, Read or Write) of one Data Group Unit of measure Different weights assigned to 5 function types depending on their relative �complexity�

Unit = 1 fp (IFPUG) Different weights assigned to 5 function types depending on their relative �complexity�

Unit = 1 fp (NESMA) Weights assigned to the �minimum size logical transaction� add to 2.5 to establish comparability between MkII and IFPUG Unit = 1 fp (MkII) 1 Data Movement

=

1 COSMIC Function Point

Unit = 1 cfp Sensitivity to small changes to requirements Low (only detects changes at boundaries between Low, Average, High categories) Low (only detects changes at boundaries between Low, Average, High categories) High (detects changes of single data element types and single entity references) Moderate (detects changes to single data-groups) Integrity of measures (how well do the measures reflect the thing measured?) Artificial limits (weights, thresholds, uneven intervals) limit size of function types measured.

Integrity is limited. Artificial limits (weights, thresholds, uneven intervals) limit size of function types measured.

Integrity is limited. No artificial limits imposed on size of functional process.



Integrity is good. No artificial limits imposed on size of functional process.



Integrity is excellent. Sensitivity to variation in functional size of dynamic model of system i.e. functional processes Stepped:

minimum step 3fp

maximum step 7fp Stepped:

minimum step 3fp

maximum step 7fp Stepped:

minimum step either

0.26, 0.58 or 1.66

maximum step infinity Accommodates size variation from zero to infinity in steps of 1 cfp Sensitivity to variation in functional size of static model of system i.e. data stores Stepped:

minimum step 5 fp

maximum step 15 fp Stepped:

minimum step 5 fp

maximum step 15 fp Data stores are considered to deliver functionality only when the data is referenced in transactions Data stores are considered to deliver functionality only when the data is used in functional processes Smallest feasible functional process 3 fp 3 fp 2.5 fp 2 cfp Smallest feasible enhancement 3 fp 3 fp 0.26 fp 1 cfp

Related Articles

Estimating With Use Case Points

Back to the archive list