On the SDGs and more, we need to move from data-driven to data-supported cities
The proliferation of global data sets, enabled by satellite imagery and mobile technology, has created means to capture urban information of all kinds, including air quality, land use, transit use, flood risk and much more. This tsunami of information has prompted a second wave of “data-driven” digital products that seek to solve complex social and environmental problems.
But as sophisticated as these products are, they often struggle to translate what the data tells us about global challenges, such as lack of energy access or persistent poverty. Nor do they offer much insight on the lived experiences of people at the city and neighbourhood level. So, can global-scale data sets actually help cities address the problems they face?
To answer this question, we analyzed the ability of global data sets to inform local implementation of the Sustainable Development Goals (SDGs), the 15-year global framework that went into effect last year. Amid a complex network of city and other stakeholders working to achieve these goals, we focused on the role of NGOs — including non-profits, think tanks and research institutions, but also the private sector — and the products they provide to decision-makers at the city level.
The results of this work suggest that the capacity of data products to solve social and environmental ills is overstated, especially given the high cost of data generation and analysis.
Big data’s big limits
For the international community, one objective of using global data products is to ensure the use of consistent metrics to measure progress toward meeting the SDGs’ 169 individual targets. The U. N. Statistics Division is leading the effort to synchronize and aggregate existing data sets, but the process of international data collection is not as effective as intended. (Although the SDGs were adopted in 2015, the process to refine the specific “indicators” on which national governments will report their implementation progress is ongoing.)
There are several reasons for this. First, the data available for analysis is typically collected at one of four scales — urban, national, regional/continent and global — and is rarely comprehensively integrated to produce a holistic picture of the urban fabric. Further, the data tends not to be disaggregated to fine-enough scales to make the requisite analysis.
“The results of our research suggest that the capacity of data products to solve social and environmental ills is overstated, especially given the high cost of data generation and analysis.”
For example, the United Nations currently relies on regional data to measure progress toward achieving the municipal solid waste production and disposal targets under SDG 11, on sustainable cities. Although the objective for this indicator (11.6.1) is municipal self-reporting of the data, that process remains far from certain, especially from locales that stand to benefit most from implementation of the SDGs.
Another reason the efficacy of global data sets is limited is due to competing data sets (and indicators) that can be used to measure SDG progress. In the health sector, a review of data available on airborne particulate matter found that the United Nations is using a range of data sets, some country-wide and others restricted to urban areas (Indicator 11.6.2). This data represents an estimate of mean annual exposure, thus limiting the ability to draw meaningful conclusions.
In contrast, a group of researchers independently verified progress toward meeting the health-related SDGs using a different data set (the Global Burden of Disease) and different metrics. This suggests that combining robust global data sets along with U. N.-commissioned data sets can help facilitate independent tracking and measurement of goals.
These two divergent approaches to analyzing health outcomes in the SDGs highlight the ongoing struggle to consistently define indicators and targets. Although the U. N. Statistics Division convened experts to identify the indicators and targets during the development of the SDGs, differing professional opinions on indicators and metrics remain prevalent.
Finally, many cities already collect data that could be relevant to the SDGs, but the collection methods often require adjustments in order to synchronize with similar urban data sets elsewhere. Cities probably will need incentives or instruction to make the tweaks necessary to ensure data collection is relevant to the SDGs. When data is aggregated at the urban level, new challenges arise: Cities may offer competing definitions of a problem, and data availability is highly variable across multiple scales.
Insight vs. solution
As is seen with the challenge of tracking SDGs progress, data products designed by NGOs to serve the global good often have the misguided assumption that a lack of data is at the crux of social and environmental challenges.
“As is seen with the challenge of tracking SDGs progress, data products designed by NGOs to serve the global good often have the misguided assumption that a lack of data is at the crux of social and environmental challenges.”
For example, many NGOs interested in reducing the impacts of air pollution say access to exact readings of particulate matter is the missing link in policymakers’ ability to reduce related outcomes. The same could be said of predicting flood risks or modelling how new urban development will affect a city’s transportation system.
But conversations with urban decision-makers suggest that statistical models by themselves are insufficient to determine future policy. Models and their underlying data sets alone cannot illustrate the complexities of a problem. Further, they are incapable of incorporating the full range of social and political forces that both create and respond to environmental challenges.
The issue here is not that more-detailed data on urban environmental challenges is unimportant or that data cannot illuminate what was previously unseen. Rather, we must acknowledge the limitations of the statistical models underpinning “big data” efforts.
Exposing previously unseen trends or relationships can provide benefits. However, many data-based products erroneously conflate providing data about a problem with being a solution for that problem. This assumption leaves undone the work of connecting data to political processes, a step that may be far more important (and complex) than the data production itself.
What’s the lesson here? For one, organizations that aim to foster social and environmental change in cities would do well to envision data as a supporting piece of larger projects rather than the solution in and of itself. Translating scientific data into effective policy programmes takes patience and persistence, as well as skilled communications and capacity-building.
Even when policymakers can agree that a problem needs to be solved — for instance, making waterfront communities more resilient to rising sea levels and more frequent storms — the process by which data is used to shape policy approaches is often shrouded in mystery. Indeed, the “black box” nature of data-driven decision-making tends to perpetuate the divide between experts and the communities they ostensibly serve.
“Organizations that aim to foster social and environmental change in cities would do well to envision data as a supporting piece of larger projects rather than the solution in and of itself.”
Because only trained data scientists are able to build and use the analytical tools that interpret much of the data that is informing policymaking today, the power of big data is concentrated in relatively few hands, often in well-resourced NGOs. In order to ensure that big-data processes are an improvement over the status quo, the principles of openness and accountability need to be applied throughout the policymaking process, not simply during the creation and collection of data.
While a data producer may understand the limitations of a data set, for instance, final data products tend not to convey these nuances. As a result, data consumers such as city officials read such products uncritically, without understanding that these bring their own perspective.
To correct for this, new products and corresponding analytical models need to be developed with the goal of making big data accessible to, and informed by, the communities in question, who are in the best position to both define the problems they face and know what types of solutions are appropriate.
Of course, this is easier said than done. Part of what needs to happen are technological advances in what’s known as natural language processing, enabling non-experts to interact with big data without the ability to code. That will have to take place alongside strengthened basic computer and data literacy education, starting in elementary school and continuing through to adult education and workforce training initiatives.
Groups such as Transport for Cairo are providing on-the-ground opportunities to visualize and understand that city’s largely unmapped public transit system. Here a member hands out information on Open Data Day Cairo 2016. (Transport for Cairo)
Certainly groups across the globe already are tackling these issues. Organizations such as the U. K.-based Raspberry Pi are working to democratize technology through the widespread distribution and use of an ultra-low-cost computer and associated educational materials designed to teach people to code. And coalitions such as the Data-Pop Alliance are aiming to make big data “smaller” in order to involve many more people in the processes of data collection and analysis.
What does this mean in practice? Groups such as Transport for Cairo are providing on-the-ground opportunities to visualize and understand that city’s largely unmapped public transit system. It aims to supplement the barebones maps provided by the municipal government and provide information, including real-time transport updates, to ease the pain of commuting for Cairo’s nearly 20 million residents, two-thirds of whom rely on public transportation.
International NGOs, which typically are major drivers and sources of funding for cities to implement the SDGs, are often vexed in their desire to implement community-based solutions. How can they appropriately prioritize their resources to support the needs of communities across the world?
It is in this context that global data sets are particularly attractive. Their broad scope and standardized measures enable development organizations to see similarities between communities around the world and to use advanced technology to solve problems.
To truly meet local needs, however, NGOs need to ensure that data solutions are the outgrowth of community processes, not driving the process from the start — that solutions are data-supported rather than data-driven. In this way, international NGOs are starting to realize that many solutions are being spearheaded by cities themselves, instead of being conceived and implemented at the global level.
New York City offers an example of how big data can offer a data-supported solution rather than a data-driven one. In 2007, the city developed PlaNYC, a blueprint to address fundamental long-term challenges facing the city, including a network for monitoring air quality. Aware of the power of their data-collection system, scientists were conscious to site monitoring devices equitably throughout the city, thus supporting the underlying goals of local communities to improve public health outcomes.
On the other side of the world, Tokyo offers a similar data-supported solution. In 2004, as part of a major initiative to reduce water leakage from pipes, the city made available resources to buy big-data solutions such as remote and localized leakage monitoring, in addition to increasing capacity for quick action to repair leaks. Today, the local utility has cut leakage to just 3 percent, becoming a case study for monitoring at the local level.
Tokyo engineers work on below-ground pipes and electric lines in 2016. More than a decade ago, the local government invested in big data as a strategy for cutting down on water leakage. The programme has since become a global model. (JIrat Teparaksa/Shutterstock)
At times there can be outright dangers from imposing broad data sets on a community. For example, Yale analysis found that when informal settlements and the energy they surreptitiously use are publicly mapped, these already vulnerable populations can be exposed to state backlash. Understanding where such missteps might occur requires collaboration with those on the ground and forethought as to how open data might negatively impact sensitive communities.
To protect against this, community-based engagement can keep data local and tailor recommendations to address a community’s unique challenges. Such an arrangement provides settlements with the authority to control the flow of information and design solutions that can address needs and political realities of their particular locale.
Still, it is difficult to extrapolate the community-based approach to a global scale. Perhaps more importantly, organizations that are trying to make this leap obfuscate the very real benefits that are gained through the community engagement process by keeping such methods local.
Most NGOs — and the cities with which they work — approach problem-solving strategically. Why, then, have many groups adopted an entirely different approach in their development and deployment of data tools?
Rather than assuming that producing a data set will result in desired outcomes, more work needs to be done to determine whether and how data can be linked to specific communities and policy processes to solve problems. We recommend that NGOs revisit their core assumptions regarding the use of big data to achieve the SDGs, give thought to the following issues and ask themselves a few key questions:
Leverage local data in the SDGs: Many cities are already collecting data, but its potential use often has little relevance in the international context of the SDGs. By engaging in targeted educational campaigns, shifts in data collection and measurement methodologies can allow local data to truly inform the global context. Question: How can NGOs better incorporate local data into their data products?
Incorporate politics into the solution: While data products are commonly thought of as apolitical, NGOs and the cities they work with must recognize that the inability to address environmental and social problems is not the result of a lack of data but rather a lack of political will. Question: How can NGOs invest in political solutions and campaigns rather than over-relying on “neutral” data products?
Unwrap the black box of data: Local leaders are in the best position to make decisions about how to improve their communities, but policymakers often are limited in their knowledge of how data and data products work. As a result, local leaders often rely on NGOs to create these products on their behalf. Question: How can NGOs ensure that decision-makers understand big data and data products, and use that information to craft policies?
Create bottom-up rather than top-down solutions: Rather than investing heavily in data products that can solve problems for cities across the world, invest in consulting services that can work with individual cities on specific problems they face, and develop data products to meet those challenges, with successful solutions adopted elsewhere as appropriate. Question: How can NGOs help governments reach people at the ground level to better understand and address their needs?
Karen C. Seto was involved in this project as faculty adviser.