Geo and Sensor Datasets


This article lists some Geo datasets.

1. GeoLife GPS Trajectories

This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point. This dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.

2. T-Drive Trajectory Data Sample

This is a sample of T-Drive trajectory dataset that contains a one-week trajectories of 10,357 taxis. The total number of points in this dataset is about 15 million and the total distance of the trajectories reaches 9 million kilometers.

3. SmartSantander

SmartSantander proposes a unique in the world city-scale experimental research facility in support of typical applications and services for a smart city. This unique experimental facility will be sufficiently large, open and flexible to enable horizontal and vertical federation with other experimental facilities and stimulates development of new applications by users of various types including experimental advanced research on IoT technologies and realistic assessment of users’ acceptability tests. The project envisions the deployment of 20,000 sensors in Belgrade, Guildford, Lübeck and Santander (12,000), exploiting a large variety of technologies. There is a map view of SmartSantander. You should try it.

4. Berkeley’s Mobile Century Data

The Mobile Century data was collected on February 8, 2008, as part of a joint UC Berkeley - Nokia project, funded by the California Department of Transportation, to support the exploration of uses of GPS enabled phones to monitor traffic. In addition to the cell phone GPS data, two additional data sources are available for the experiment site. Inductive loop detector data obtained through the Freeway Performance Measurement System (PeMS), and travel time data obtained through vehicle re-identification using high resolution video data are included with this release. All identifiers assigned to the cell phones used during the Mobile Century experiment have been randomized to protect the participants in the experiment. The video data is also processed and random number has been assigned to represent each vehicle.

5. IEEE ICDM Contest 2010 (Synthetic)

This contest provides a quite large volume of synthetic traffic data. Maybe sometime this dataset could be used to test something. I have no idea.

6. City of Chicago

This dataset contains the current estimated congestion for the 29 traffic regions. The Chicago Traffic Tracker estimates traffic congestion on Chicago’s arterial streets (non-freeway streets) in real-time by continuously monitoring and analyzing GPS traces received from Chicago Transit Authority (CTA) buses. Two types of congestion estimates are produced every 10 minutes: 1) by Traffic Segments and 2) by Traffic Regions or Zones. Congestion estimates by traffic segments gives observed speed typically for one-half mile of a street in one direction of traffic. Traffic Segment level congestion is available for about 300 miles of principal arterials. Congestion by Traffic Region gives the average traffic condition for all arterial street segments within a region. A traffic region is comprised of two or three community areas with comparable traffic patterns. 29 regions are created to cover the entire city (except O’Hare airport area). There is much volatility in traffic segment speed. However, the congestion estimates for the traffic regions remain consistent for a relatively longer period. Most volatility in arterial speed comes from the very nature of the arterials themselves. Due to a myriad of factors, including but not limited to frequent intersections, traffic signals, transit movements, availability of alternative routes, crashes, short length of the segments, etc. Speed on individual arterial segments can fluctuate from heavily congested to no congestion and back in a few minutes. The segment speed and traffic region congestion estimates together may give a better understanding of the actual traffic conditions.

A related Blog article report this project’s 9 million new rows worth of traffic data.

7. Beijing City Lab*** (I like this most.)

The Beijing City Lab (BCL) is a virtual research community, dedicated to studying, but not limited to, China’s capital Beijing. The Lab focuses on employing interdisciplinary methods to quantify urban dynamics, generating new insights for urban planning and governance, and ultimately producing the science of cities required for sustainable urban development. The lab’s current mix of planners, architects, geographers, economists, and policy analysts lends unique research strength.

8. KONECT Networks Dataset Collection***

KONECT (the Koblenz Network Collection) is a project to collect large network datasets of all types in order to perform research in network science and related fields, collected by the Institute of Web Science and Technologies at the University of Koblenz–Landau.

9. 9th DIMACS Implementation Challenge - Shortest Paths

The 9th DIMACS Implementation Challenge - Shortest Paths is a Challenge is to create a reproducible picture of the state of the art in the area of shortest path algorithms. To this end we are identifying a standard set of benchmark instances and generators, as well as benchmark implementations of well-known shortest path algorithms. It provides road networks datasets of USA for participators.

10. OpenStreetMap Data Extracts

This server has data extracted from the OpenStreetMap project which is normally updated every day. Select your continent and then your country of interest from the list below. (If you have been directed to this page from elsewhere and are not familiar with OpenStreetMap, we highly recommend that you read up on OSM before you use the data.) The download service is offered for free by Geofabrik GmbH.

In addition, there is a tool to analyse the data of OSM.


Previous: Basic Linux Commands for Text Processing Next: Articles about Uber GPS Log