TLDR: We looked at a lot of different systems to compare welfare and ended up combining a few common ones into a weighted animal welfare index (or welfare points for short). We think this system captures a broad range of ethical considerations and should be applicable across a wide range of both farm and wild animals in a way that allows us to compare interventions.

The goal of Charity Entrepreneurship is to compare different charitable interventions and actions so that new, strong charities can be founded. One of the necessary steps in such a process is having a way to compare different animals in different conditions. For example, how does moving a chicken from a battery cage to cage free compare welfare wise for the chicken, or how does giving up red meat, thus resulting in one less cow being brought into existence, compare to an insect dying more humanely because of a change in which insecticide is used. These are complex questions surrounded by both ethical and epistemic uncertainty. In the health community, DALYs have become fairly common and established as a metric. Sadly, there is not the same level of consensus within the animal rights community. We expected there would be multiple competing systems, so we first outlined what we would look for within a system to assess its helpfulness to us. This could be described as the “goal” or purpose of the metric. Of course, the fundamental goal is to help us evaluate different possible actions, but more specifically, we broke down what we were looking for in the criteria below.

Underlying goals of metrics

Proxies’ ethical value accuracy

Strength of correlation between the metric and ethical value



Encapsulation - captures a broad range of what is important



Directness



Gamability

Cross-applicability

Cross-intervention applicability



Cross-animal applicability



Ethical robustness



Externally understandable



External precedent of use

Operationalizability

Amenable to numerical quantification



Ease/speed of use



Objectiveness



Generates few false positives or false negatives



Intuitive to work with



Easy to collect



Easy to explain

After establishing what we were looking for, the next step was to take a look at all current systems and see if any of them was conducive or could be used partly by an organization like ours. We ended up finding quite a wide range.

EA community

We first looked within the EA community, since there had been some solid attempts at quantification and the ones below are just a few of many examples.

Within the EA community

These metrics were generally very hard, quantified, and often even explicitly cost-effectiveness focused. Sadly, they were also extremely specific and not built for generalization across different interventions and charities. Thus, for our purposes, they were more helpful as inspiration for the factors to consider, or standards that we would want to be able to measure, rather than for practical cross-intervention use.

Biology-based markers

The next set of metrics we looked at was biology-based markers. We had some background knowledge about cortisol readings as a measure of stress and hoped that we would find other objective markers that could make up part of a more inclusive system and add some objectivity to other soft systems. Some of the ones we considered (although, there are many other possible biological indicators) are listed below.

Biology-based markers

Cortisol

Dopamine

Endocrine changes

Circulating catecholamines and corticosteroids

Death rate

Behavior changes

Visible injury rate

Reduced life expectancy

Impaired growth

Impaired reproduction

Body damage

Disease

Immunosuppression

Adrenal activity

Behavior anomalies

Self-narcotization

Biological markers were useful in that they were much less subjective than other metrics but sadly, it was also very hard to find consistent data across animals on many of them (with the death rate being a notable exception). We ended up thinking these would make up a part of a larger system, but even an index of them would not be inclusive enough to cover all the possible sources of animal welfare situations that could occur.

Academic measures of quality

The third type of system we considered was “academic measures of quality of life”. WAS research had a great summary of many of the different systems used, but we also looked outside of their research for other possible systems.

Academic measures

Five freedoms

The Five Domains model

Five Provisions model

Botreau’s twelve criteria

McMillan’s five elements, which play a fundamental role in quality of life

Fraser’s animal welfare’s four core values

Webster’s animal welfare’s three questions

Taylor and Mills’s domains for assessing companion animal’s quality of life:

Swaisgood’s ten motivational theories which have currency among animal-welfare researchers

Many of these systems were beautifully comprehensive and described metrics and criteria in such a way that it would be cross-applicable to a wide range of animals across a wide range of conditions. Some even specified different grade levels (although, these were generally not numeric) to provide more consistency across reports. It seemed possible that some researchers would have already used these systems, though sadly, we did not find much research showcasing the modern practical use of these systems. The main drawback of these systems was their subjectivity. Even with the ones with specific grade levels, a lot would be left up to the evaluator about making calls between one system and another: for example, how does not being fed for several days, while being otherwise perfectly fed, compare to semi-chronic but low level hunger. Overall, we took a large number of elements of our system from the five domains model, which felt like the most extensively quantified and broad one of these models.

Systems used in global poverty

Next, we considered the current systems used in global poverty alleviation and other cause assessment areas. We thought it might be possible to modify one of these metrics to be usefully applicable to animals.

Modified poverty based metrics

Animal QALYs

Animal DALYs

Animal Income

Animal subjective well-being estimates

Equivalent lives saved

Preference from behind the veil of ignorance

Generally, these metrics were too unapplicable (e.g. income) or would have required considerably more time to modify and put into the animal welfare context (e.g. DALYs do not have a way to have a net negative existence, which is a key consideration in the case of factory farmed animals).

Creating our own system

Finally, we considered creating a cross-applicable system from scratch

Our own ideas for possible systems

SAD - suffering-adjusted life-day

Sentience-adjusted suffering years

Net negative lives averted

Total world net expected value

Numerical criteria for animals’ quality of life, e.g. a -100 to 100 rating

We did end up using some of the ideas drawn from considering this option but, overall, found that taking elements from other systems would both increase quality and reduce the time that we would otherwise spend on creating a new system from scratch.

Results: an inclusive index

We ended up putting many of these systems onto a spreadsheet and comparing them on the original metric criteria we had derived. Some criteria ended up getting narrowed down. For example, we combined various biological markers into a single “biological markers” category. Some criteria were made more numerical and cross-comparable, for example, by translating the 5 domains model into number-based scores, instead of grades. Other elements were given their own category and weighting based on how well they met the top line criteria (for example, death rate). Most criteria were ruled out as redundant or not helpful for our purposes.

We ended up with 8 criteria with an importance weighting for each. Combined, they added to a range of +100 (an ideal life) to -100 (a perfectly unideal life) with 0 representing uncertainty about the life being net positive or negative. Each area can have positive or negative welfare scores and is to be rated independently, giving a more robust cluster approach to the overall endline score. The weighting of each factor is different, depending on how well it scored on our original metric criteria. For example, death rate gets a relatively higher weighting (20 welfare points) than our index of other biological markers (4 welfare points) due to its ease to work with and its clearer relation to direct animal suffering (e.g. we are more confident that animals with very high and painful death rates will correlate more strongly with a life not worth living than the more abstract biological markers will).

Factors we ended up using:

Death rate/reason - 20

Human preference from behind the veil of ignorance - 20

Disease/injury/functional impairment - 17

Thirst/hunger/malnutrition - 15

Anxiety/fear/pain/distress - 15

Environmental challenge - 5

Index of Biological markers - 4

Behavioral/interactive restriction - 4

Our full spreadsheet with factors, scores, and metric criteria scores gives a deeper sense of why different areas were given the weighting they were, as well as a narrative explanation of what a negative, middling, and positive score would look like in each category.

Overall, we felt like this system gave us a good balance between both the more subjective metrics that could capture more data and the harder metrics that were more objective. We feel that this system could be used across a wide range of both animals and interventions, and lead to cross-comparable results.