Introduction
The plethora of sources of urban data, be it open city data, IoT data, or data from 3rd parties, presents both opportunities and challenges for researchers and policymakers. Opportunities in what the scale, breadth and depth of the data enables practitioners and researchers to achieve, and challenges in achieving useable results due to the quality, sparseness, validity, interoperability and relevance of the data.
The mission of the Urban Data Centre (UDC) is to enhance the design, planning and operations of cities by expanding the opportunities presented by data, and addressing the challenges inherent in data. To achieve this mission, the Centre will initially pursue five objectives. The development of:
-
a Canadian Urban Data Repository,
-
standards for the representation of urban data,
-
tools for the management of urban data,
-
tools to support the analysis and interpretation of urban data, and
-
a national Urban Data Network.
WS 1: Canadian Urban Data Repository
The development and operation of “smart cities and villages” is predicated on the availability of relevant, accurate data. Yet the plethora of sources of data, be it open city data, IoT data, or data from 3rd parties, presents both opportunities and challenges: opportunities in what the scale, breadth and depth of the data enables practitioners and researchers to achieve, and challenges in achieving useable results due to the quality, sparseness, validity, interoperability, accessibility and relevance of the data.
The Urban Data Repository (UDR) that will be an open repository of Canadian urban data. It will provide researchers and practitioners with a vastly broader set of data and data sources that will enable a richer set of analyses. UDR will support a wide array of ground-breaking Canadian research hitherto impossible due to limitations on access to relevant data, its linkage and compatibility. It will enable cities and villages to answer questions that go unanswered due to lack of integrated, relevant data.
Critical to the success of the UDR is providing access to both open and closed data sources. A review of city open data portals (Fox & Pettit, 2015, Hugh & Fox, 2018) has shown that while much data has been made available, there is a large gap between what is available and what is needed to support both research and city operations. We call this the relevance gap. This project will address the relevance gap by: 1) creating a platform that provides awareness of and/or access to both open and closed infrastructure datasets in both original and integrated forms, 2) proactively identifying the types of data needed by researchers and practitioners, and 3) entering into partnerships with government, non-government and other organizations for access to urban data.
UDR supports awareness of and access to urban data sources beyond those that are openly available for direct download. UDR provides access to three categories of datasets: 1) datasets stored in UDR in their original format and are openly available for download; 2) datasets not stored in UDR but are accessible under separate agreement; and 3) datasets that are accessible by webservices (API), including real-time feeds, e.g., sensor data. Critical to the provisioning of these three categories of datasets is the creation of a dataset/source catalog composed or rich descriptions (meta-data) of datasets, including provider, creation date, usage license, data model, quality, etc. The meta-data will make it possible to discover datasets and sources previously difficult to find.
UDR is a polymorphic repository where data will be represented in its original format, and in an integrated graph database where the original data is mapped onto a shared semantic data models (aka ontologies).
WS 2: Standards for the representation of urban data
A review of city data portals (Fox & Pettit, 2015, Hugh & Fox, 2018) has shown that while much data has been made openly available, there is a large gap between what is available and what is needed to support both research and practice. We call this the relevance gap. The gap has many causes: 1) cities publishing only what is easily available, 2) privacy concerns, 3) lack of understanding of the data researchers and practitioners need, 4) datasets being “locked up” in commercial organizations, 5) datasets lying dormant in academe after the completion of research projects, or most importantly 6) lack of a repository to deposit datasets.
Urban policy is one area in which better urban data is needed. Researchers struggle to gain access to this fine-grained data on individual parcels, businesses, and residents. Yet, this is the exact data needed to examine complex relationships between the built environment and individual outcomes to develop more effective policy, for example, understanding the relationship between housing quality and resident health in order to advance preventative medicine, or between mixed-use development and business innovation in order to promote zoning reform.
These data are, for the most part, a by-product of business activity or public services and range from sensor or mobile phone data to data volunteered by or collected from individuals using social media or apps. This data tends to be person-level or address-based, and are thus sensitive. Although users may access some of this data via web scraping, private companies often make this data available through their Data for Good divisions. In the public and not-for-profit sectors, agencies often negotiate agreements with individual researchers to facilitate access to the administrative data from programs focused on housing, employment, education, the social safety net, health, criminal justice, and more.
This work stream will take a targeted approach to extract needed data from the variety of sources. We will:
Develop a roadmap of the types of data that might be made available and the current barriers to access either as microdata or data aggregated at a fine geographic scale such as the block level. Expert panels, composed of researchers and practitioners, will be constituted for each sector to identify key hypotheses, questions, tasks, etc. for understanding and operating cities. To support the answering of these questions, we will identify the types of data needed.
Working with key urban organizations, such as the Canadian Urban Institute, we will work o determine the feasibility of establishing access. Working closely with stakeholders, we will establish a priority ranking for each data resource and map out a path to gaining access.
For each of the type of data source, i.e., open, closed, or web service, we will develop formal partnerships with government, industry, and not-for-profit entities. These partnerships will enable us to make safeguarded data available to researchers and practitioners.
We will establish a streamlined governance process to make data available and ensure they are used to improve urban decision-making. We will work with each partner to establish standardized data use agreements that streamline access for researchers who agree to comply with the specific dataset terms of use.
WS 3: Tools for the management of urban data
The focus of this work stream is the research and development of tools that support the integration of multi-sourced urban data. Urban data, specifically data sourced from governments, the web and social media is a morass where the validity of information varies widely, and mirrors the beliefs and agendas that groups and individuals in society possess, and the messiness of data provided by both human and machine sensor nets. The problem we face is how to distinguish valid information from fake or simply incorrect. What is a truly reliable source of valid information? Is it a government? Certainly some governments are more reliable than others. Is it a "trusted source" such as a newspaper? The same holds true for newspapers.The source of the problem lies at the core of our information society; Much of the web, and particularly social media, is crowd-sourced, hence by its nature it is impossible to enforce any scheme for ensuring information quality and validity. Consequently, we have to rely on evidence "buried" in the web itself to determine the degree of validity of any piece of information.
Predicated on the existence of a shared ontology, as described in work stream 2, this work stream focuses on tools to support integration.
WS 4: Tools to support the analysis of urban data
The third area of exploration is software tools to support urban analysis, design, planning and operations. Tools fall into four categories:
-
Data transformation. Data transformation tools focus on the syntactic and semantic transformation of data so that it can be integrated and consumed by an analysis process.
-
Data analysis. Data analysis tools include software libraries for statistical analysis, machine learning, and visualization.
-
Analysis process definition. Analysis process definition tools enable the definition of an analysis process/workflow specifying the operations to be performed and the flow of data amongst them.
-
Experiment management. Experiment management supports the definition and archiving of urban data analysis experiments. The searchable archive includes the datasets used, the analysis process definition, and analysis results.
WS 5: National Urban Data Network
The School of Cities, in partnership with universities/libraries across the country, will facilitate the creation of a National Urban Data Network that will provide researchers and policy makers across Canada with unprecedented access to multi-sourced urban data leading to potentially revolutionary new insights into how cities function. The network will be composed of curators housed in libraries across Canada. Their role will be proactive and reactive; proactively searching for new datasets, and reactively responding to requests for data from Canadian researchers. Curators will:
- identify sources of urban data,
- secure rights to use the data,
- annotate the data with meta data covering ownership, usage license, quality, etc., and
- deposit the data into the Canadian Urban Data Repository.
The network of curators will be supported by the UDC.