Data sources
Short overview
General Description
Big Data as a term describes not only the volume, but other characteristics, such as the velocity at which it is generated and the variety of data. Big Data can come from cameras, cellphones or any other device that gathers any format of information. Beyond the three characteristics described, the datasets can be described by their veracity (how representative they are), variability (how data collection varies) and value (valorization attached to them) (Google Cloud n. d.). In contrast to crowdsourcing, Big Data is usually generated as a by-product of user's movements or interactions, often without their explicit knowledge, while in crowdsourcing users are intentionally providing their observations, ideas and opinions.
Open Data is described as “data that can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike” (Open Knowledge n. d.). This means that this type of data is more described by its accessibility and freedom of use, rather by its own characteristics. An important difference here has to be drawn between Public and Open Data. Public Data refers to data generated by the public sector, which is mostly mandated to be open. However, this might not necessarily be the case. At the same time, Open Data can also come from private organizations (opendatasoft n. d.), research and civil society. Thus, Open Data represents a term that includes all forms of data, regardless of its origin, as long as it is open and free for use.
Potential for Climate Change Adaptation
Big Data on human-environment interactions is crucial for climate change adaptation (CCA), growing with cellphone and social media use. Key sources include remote sensing, sensors, LiDAR, simulated data, crowdsourcing, social media, GPS, and call records (Sarker et al. 2020). Furthermore, formatting this data adequately and making it public as Open Data can increase trust in government and spur CCA innovation (Boeck 2021).
Big Data has several applications in CCA, particularly helping the processes of initial assessment and monitoring (Ford et al. 2016):
Vulnerability Assessment
- Identification of who and what is vulnerable, risks and timescales.
- Potential to fill data gaps and provide real-time insights. It can outperform traditional time-costly survey forms, especially, even in data-poor environments.
- Creation of georeferenced datasets on population, infrastructure and environmental trends.
- Assessment of disaster risk using call records and geospatial data for dynamic mapping.
- Analysis of text-based climate change discourses to support adaptive approaches.
Monitoring & Evaluation
- Insights into adaptation impacts on perceptions and behavior through social media data sources.
- Collection of real-time adaptation policy data through crowdsourcing, improving environmental monitoring.
Open Data also contributes with CCA, particularly through fostering participation and accountability (Boeck 2021):
Collaborative Adaptation Efforts
- Accessible data allows governments, civil society, academia, and the private sector to develop CCA tools.
- Engaging multiple stakeholders enhances adaptation strategies with diverse insights.
Comprehensive Climate Risk Assessments
- Merging datasets deepens understanding of climate risks and helps prioritize adaptation measures.
- Insights can guide investments into resilient infrastructure.
Transparency and Public Engagement
- Public access to climate data builds trust and fosters citizen engagement.
- Policy-legitimacy can be enhanced through transparent data-sharing practices.
Potential for Disaster Risk Management
Both Big Data and Open Data have contributions across all stages of DRM. Preparedness, mitigation, response, and recovery efforts in disaster management are significantly enhanced through the use of Big Data technologies (Sarker et al. 2020). Remote sensing and Big Data analytics bolster early warning systems, while mobile metadata tracks human movement to identify vulnerable areas, and social media monitoring highlights potential risk hotspots. In mitigation, continuous remote sensing monitors environmental changes, crowdsourced data provides real-time population insights for targeted interventions, and data-driven decision-making supports proactive measures in high-risk areas. During the response phase, real-time mobile metadata aids damage assessments and response coordination, complemented by crowdsourced data offering immediate feedback from affected communities. Recovery efforts are strengthened through population cluster analysis to ensure inclusivity, continuous data streams for adaptive planning, and infrastructure development, especially in regions lacking official datasets. Together, these technologies create a robust framework for reducing disaster impact and fostering resilience.
The potential of Open Data in DRM lies in its ability to enable rapid and coordinated responses, strengthen vulnerability identification and mitigation, and build trust in crisis management (Boeck 2021). Open Data infrastructure ensures real-time access to critical information, facilitating the coordination of efforts and supporting quick, informed decision-making during disasters. By promoting accessibility, Open Data also enhances early risk detection, encouraging proactive mitigation measures. Furthermore, it fosters transparency and effective communication, which are essential for building public trust and legitimacy in crisis management. Transparent data-sharing enables public cooperation, reinforcing the overall effectiveness of disaster response and recovery efforts.
Application in different Climate Hazards
Flooding
Big Data for large-scale processing and the assessment of trends and patterns in the case of floodings (Monrat et al. 2018) can be used for:
- Real-time monitoring and early warning systems: provide timely alerts and monitor conditions (see also Girotto et al. 2024).
- AI and Machine Learning models: predictive analytics for potential events.
- Mapping: use Open and Big Data to map impervious surfaces and risk zones.
- Mobile Big Data: reveal population mobility patterns during floods.
- Crowdsourcing: inform authorities about ground-level situations during emergencies in combination with Big Data generated by mobile tools, for example.
Sea Level Rise
Big Data can inform predictions, while Open Data can disseminate information. It can be understood as follows:
- Modelling and risk identification: Big Data enables the modelling of sea level rise, the monitoring and identifying risk-prone areas. Models may even use Open Data themselves (Depsky et al. 2023).
Landslide
To monitor and approach the prediction of landslides, detailed data is crucial. In the context of a citizen-science approach for landslides, there is a set of applications of Open and Big Data (Ramesh et al. 2023):
- Social media monitoring: platforms like X (former Twitter) can be used as a source of Big Data for monitoring. Analytics can inform about landslide events.
Water Scarcity / Drought
In contexts of water scarcity or drought, the availability of data sources and Open Data can become crucial for public policy and public engagement. Open Data can especially enable constant evaluation of the situation and inform planning (Landry et al. 2016). The city of Cape Town offers insights based on the drought events of 2017-18 (van Belle and HIabano 2019):
- Open Data modelling: use up-to-date information of water reservoirs.
- Opening Micro Data: highlight more deprived and vulnerable areas by opening data on consumption and characteristics.
- GIS mapping and visualization of Micro Data: set up maps and Open Data portals for citizen engagement.
Strong Winds / Storms
No specific sources covering the usage of Big or Open Data directly on storms, cyclones or hurricanes were found. However, potential applications include:
- Meteorological models: Open Data to help citizens prepare.
- Infrastructure data: provide data on safe areas, risk zones and escape routes.
- AI models: use Big Data for accurate storm tracking (see AI factsheet).
Forest / Bush Fires
Applications of Big Data range from predictive models and mapping past population movements based on mobile Big Data. These applications can improve evacuation routes, especially in informal settlements. Furthermore, based on the three Vs of Big Data – Volume, Variety and Velocity – it has the potential to monitor wildfires (Sayad et al. 2019):
- Fire detection systems and wildfire monitoring: use data from remote sensing weather stations, and sensors.
- Comprehensive data foundation: integrate data of diverse temporal scales, resolutions, and formats.
- Real-time data processing: one of the significant potentials of big data for wildfire management is the benefit of processing collected data in real-time.
Extreme Temperatures
Big Data can comprehensively and accurately describe heat wave vulnerability in urban areas (He et al. 2019). Key applications include:
- Identifying urban centers and activity areas: use urban night-time light and Google’s point of interest (POI) for example.
- Urban heat island mapping: utilizing Big Data can contribute to implementation of targeted solutions to mitigate heat effects.
For Open Data, the Data Dashboard of UrbanShift and Cities4Forests (Mackres et al. 2023) offers clear examples.
Saltwater Penetration
No specific sources or examples of big data were found.
For open data, for example the Aquifer Vulnerability to Sea Water Intrusion dataset is provided by the Government of Canada, for river estuaries such as the Clarence River in New South Wales, Australia (State Government of NSW and NSW Department of Planning, Housing and Infrastructure 2024).
Application in DRM / CCA Measures
Nature-based Solutions
Open Data functions as a public good by providing information on the status, development and benefits of nature-based solutions (CityLAB Berlin n. d.). It can further support urban planning decisions on where and how to implement them. This is especially true when it supports urban planning decisions and functions as a public resource for planning support systems (McEvoy et al. 2020).
Integrated Coastal Zone Protection
Big Data can support coastal zone protection when various data sources are harmonized into one general dataset. Data handling and validation of satellite imagery can be supported with a structured governance system as with the EU Copernicus Program (Pollard et al. 2018). It also aids in risk assessments, satellite imagery validation, and emergency response by AI models (Pollard et al. 2018). Big Data, constructed based on Open Data, can serve for risk assessment, supporting AI-based prediction.
Stormwater Management
Data sharing platforms with Open Data can help monitor stormwater and infrastructure status, even when the latter is under construction. They can increase awareness, trust, and resilience by keeping citizens informed about potential risks (see World Bank example in Vu 2024).
Waste Management
Open Data can provide information on waste management progress and disposal locations, hence functioning as a public good. It therefore can support local efforts, aids in developing publicly led AI models for resource management and has the potential to engage citizens through mobile apps (Mancino 2023; datos.gob.es 2021).
Relevance within the Project Cycle
Big and Open Data can be helpful throughout all phases of project implementation.
Project Preparation:
Big and Open Data enhance project preparation by improving risk assessment and fostering inclusive planning. Big Data helps to identify risks and guide infrastructure placement using environmental and population data.
Project Implementation:
Data sources can enable adaptive management and efficient resource allocation. Aggregated IoT sensor data can provide real-time monitoring, while data streams can guide dynamic resource distribution. Open data can promote transparency and stakeholder input.
Verification and Project Progress:
Tracking consumption data and publishing milestones on Open Data platforms can enhance accountability and can measure project effectiveness, aligning progress with objectives.
Final Project Review:
Big and Open Data can enable thorough project reviews by assessing outcomes and promoting external evaluation. Aggregated satellite imagery and sensors can offer quantitative evidence of project impact, while open data can allow access for independent evaluations.
Ex-Post Evaluation:
Open Data from aggregated sensors and sensing tools can track resilience over time, providing insights into project effectiveness. Moreover, Open Data facilitates replication and scaling in other urban areas.
Technology Requirements
The application of Big Data for urban resilience depends on establishing an adequate infrastructure but also guaranteeing the knowledge and capacities to benefit from the amount of data generated.
Physical Infrastructure
- Data collection mechanisms: networks of sensors and devices (e.g., traffic cameras, air quality monitors)
- Scalable storage systems: robust, cloud-based storage solutions for large and extending data volumes (IEC 2024).
- High-performance computing resources: powerful computing infrastructure for efficient data processing and analysis (IEC 2024).
- Open Data platforms: user-friendly platforms for sharing data to diverse stakeholders (UNSTAT 2020).
- Data standardization tools: tools to ensure data quality and oversee data sharing practices (Moro and Page 2024).
Expertise
- Data scientists and analysts: skilled in data mining, machine learning and statistical analysis.
- IT and data management specialists: maintain and optimize data systems (IEC 2024).
- Security experts: protect sensitive data and ensure privacy compliance (Santos 2024).
- Data governance professionals*: manage data policies, quality and sharing practices (Radecki and Dieguez 2022).
- Community engagement specialists: promote open data use and gather feedback.
- Developers and innovators: create applications using open data to address urban challenges (Moro and Page 2024).
For the use of Open Data both the physical infrastructure and expertise are crucial for yielding better results.
Legal Aspects
Data protection: The data sources must be managed in a manner that is in line with the principles of data minimization and proportionality. Any data (structured or unstructured) may contain or reveal personal information of individuals and hence harm their privacy rights if not managed adequately. No individual data should be collected without prior consent and no data should be published to outsiders without a level of aggregation that allows for anonymization of the provided information. An agreement about usage and publication rights should be always obtained with the data providers. Thus, only personal data strictly relevant for the project should be collected and processed. If initial data minimization is not possible, data must be anonymized (e.g., by redaction or pixelation). The collected data must be securely stored and protected. Flawed and inadequate data security puts the rights of individuals to enjoy robust data protection at risk (see RMMV Guidebook, section 2.3.3).
Data security requirements can also arise from data protection regulations like the GDPR, which stipulate basic security requirements. Controllers of personal data must also have appropriate technical and organizational measures in place to satisfy data protection law. Business processes that handle personal data must be designed and implemented to meet security principles and provide adequate safeguards to protect personal data. Entities may be required under those rules to ensure the ongoing confidentiality, integrity, availability, and resilience of processing systems and services (see RMMV Guidebook, Section 2.3).
If KfW (or persons acting on behalf of it) are (also) processing personal data, the privacy check in RMMV Guidebook Section 2.3.1 must be followed.
Before (re-)publishing information based on open data, you need to check its respective licence type: https://opendatacommons.or
Summary Assessment
Overall Effectiveness
Big Data has a deep role in technical tasks, while Open Data can support innovation and legitimacy purposes. The latter is especially important from a societal perspective and the promotion of participation and transparency. Both thus function as inputs for social good and urban resilience. A city will only be able to reach its full resilience potential if information is communicated fast and effectively and Big Data can play an important role in doing that.
Overall Efficiency
Gathering Big Data can be costly in terms of infrastructure and expertise. However, public-private partnerships (PPP) have the potential to support this effort. Furthermore, geospatial and remote sensing data, partially enabled by satellite imagery as Open Data, is already available. The effectiveness of Big Data will depend on the sources a city has and its potential to sustain such efforts over time, which vary by context. Hence, mapping potential sources is an essential step before establishing infrastructure.
Open Data faces similar challenges but engages citizens directly, offering social and political benefits. It requires data-sharing platforms and expertise to develop and maintain them. These platforms need to be user-friendly, and governments must have the capacity to integrate and harmonize various datasets.
The time and resources required to establish secure Big Data sources and enable Open Data depends on the existing infrastructure and data landscape. Despite these challenges, Big Data is foundational for many digital solutions and should be a priority in urban resilience strategies.
Key Challenges and Limitations
Big Data presents several challenges in its application, particularly in interpretation, processing, and access (Ford et al. 2016). While statistical correlations in Big Data can reveal patterns, they do not necessarily imply causation, which complicates data-driven decision-making. Processing challenges arise when simplifications, such as focusing on social media activity, overlook deeper societal causes, like migration drivers, which are crucial for understanding long-term behaviors. Additionally, access to Big Data is often restricted by private ownership and ethical or human rights considerations, though open-source platforms and data anonymization can help bridge this gap (For more comprehensive information see Principles for Digital Development n.d., UN Global Digital Compact n.d. and Mejias and Couldry 2024)
Financing Big Data initiatives also faces significant hurdles due to the high costs of building and maintaining infrastructure, as well as securing informed consent (Mulwa et al. 2022). Continued investments in skill development are equally vital, as the growing demand for data analytics expertise underscores a persistent skills gap. Further barriers include navigating data privacy concerns and ensuring compliance with complex security regulations, which add to the financial and operational burden.
Open Data, while promising, comes with limitations in standardization, detail, and accessibility (Boeck 2021). To integrate effectively, data must conform to national and international standards, which is often a challenging requirement. The granularity of data can both enhance and restrict analyses, requiring careful balancing to address privacy concerns. Moreover, static file formats hinder accessibility and reusability, while APIs offer a more dynamic solution by enabling direct queries and automated updates, enhancing the versatility and user-friendliness of Open Data for diverse applications.
Recommendations to optimize the Use of the Digital Tool
Effective disaster risk management requires robust data linkage and strong data privacy measures. Integrating diverse Big Data sources, such as remote sensing and crowdsourced information, enhances adaptation activities by enabling location-specific predictive analytics (Ford et al. 2016). The interplay between Big and Small Data offers further potential; Big Data can reveal patterns for deeper investigation with smaller volumes of data, or traditional datasets can form hypotheses to test with Big Data. However, these advancements must be underpinned by comprehensive data privacy frameworks and policies at both national and municipal levels. Developing and enacting data protection bills ensures ethical data use, fostering trust and compliance in managing sensitive information (Mulwa et al. 2022).
In order to identify and mitigate such human rights risks within KfW-financed projects, we recommend to use the KfW Human Rights Check for Financial Development Cooperation during project preparation and implementation.
Project Examples / Use Cases
-
The Gieß den Kiez interactive online Platform from CityLAB Berlin n. d. serves as an Open Data example to engage with citizens. It provides a map of Berlin’s trees, detailing their type, age, location and water needs. Users can check recent rainfall, water necessities and report their own watering activities via a mobile app. The platform makes use of following open and public datasets: (1) Urban tree register providing the number, types and location of trees, (2) precipitation data from the German weather service and (3) OpenStreetMap for location of handle pumps. The platform promotes urban resilience by involving citizens in tree preservation, offering protection against heat and air pollution, and raising awareness.
-
Faced with a “Day Zero” water event in May 2018, Cape Town’s local government used Open Data strategies to save and augment water supplies, reducing domestic water consumption by over 50% in a few months (van Belle and HIabano 2019). These strategies aimed to change water user’s behavior and attitudes. The four strategies used were (1) Open Data modelling as the core of the drought intervention, including parameters like rainfall, leakages, evaporation, water sources and consumption patterns. Publicly available models helped to build stakeholder trust; (2) Opening Micro Data: in a socio-economically diverse environment, releasing water usage statistics countered accusations of water wasting and suppressed fake news, fostering community and trust, (3) GIS mapping and visualization of Micro Data for household-level consumption visualizations encouraged citizens to adhere to water restrictions through social pressure, enhancing social cohesion, and (4) Crowdsourcing data in form of public reporting of water infrastructure issues. The crowdsourced data reduced water losses from 25% to 15%, however, the success relied on follow-up processes and rapid-response teams, reducing cognitive dissonance among citizens.
Links to further Sources
- Boeck (2021): How open data can create more resilient cities
- Ford et al. (2016): Big data has big potential for applications to climate change adaptation
- Sarker et al. (2020): Disaster resilience through big data: Way to environmental sustainability
Linkages to other Tool Types
- Artificial Intelligence (AI): Both Big Data and Open Data function as inputs for ML and DL models. Big data might include text data from social media, where large language models (LLM) or natural language processing models (NLP) models might be useful. Open Data could also include datasets like OpenStreetMap, which can be used and analysed by both types of models.
- Digital Twins: Big data technology harmonizes multiple data sources, handling great volumes of diverse formats. Digital twins can use Big Data as input for more comprehensive and accurate representations of physical systems. Given the high velocity of Big Data updates, they enable actualized simulations and monitoring within the digital twin. See also Building Information Modelling
- Communication and Collaboration: Data stemming from communication channels can function as input for Big Data systems. As an example, data generated on usage patterns in e-learning platforms could be integrated with other datasets to personalize and enhance learning. See also Collaboration and E-learning tools.
- Earth Observation/Geospatial tools: When only being used for data production, earth observation tools can be integrated into Big Data systems through harmonization with pre-existing information. , Due to the nature of earth observation data being space and time-specific, the frequency and coverage of data is key to obtaining more representative and relevant datasets. Moreover, the data can be shared as Open Data, increasing its utility and accessibility. See also Geospatial tools and GIS.
- Mobile tools: Due to mobile tools’ data collection nature, big data technology is frequently used in storing and harmonizing the collected data. Generated data can also be made available as open data. See also Crowdsourcing Tools.
- (Remote) Management Information Systems (R/MIS): The integration of Big Data and MIS improves the capacity to provide accurate insights, since the data is more varied and covers diverse areas and formats. Additionally, open datasets are a relevant alternative for acquiring and feeding data into the system, while insights or results from MIS can be shared publicly as Open Data. See also Management Maintenance Systems (MMS) and (Remote) Management Information Systems.
- Internet of Things (IoT): Given IoT’s data collection nature, it can consistently cover specific areas and domains in a controlled fashion. When integrated with Big Data for storage, processing and accessibility, it can be made available as input for other technologies. Additionally, like other data collection tools, the data generated by IoT can be made public as Open Data. See also Sensors / SmartMeters (Internet of Things)
Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).