Data Sets

This page provides data sets I used in my research.

Anomaly detection using Gaussian graphical models

Problem setting

Proximity-Based Anomaly Detection using Sparse Structure Learning“, Tsuyoshi Ide, Aurelie C. Lozano, Naoki Abe, and Yan Liu, Proceedings of 2009 SIAM International Conference on Data Mining (SDM 09), pp.97-108 [ppt].

Data sets

  • Actual Spot Rates
    • This data was originally downloaded from a Web site of Duke University ( http://www.stat.duke.edu/data-sets/mw/ts_data/all_exrates.html ), which is not available as of 11/06/2016.
    • Here is the original description (pdf):
      • Two files contain the spot prices (foreign currency in dollars) and the returns for daily exchange rates of the following currencies relative to the US dollar
        • AUD Australian Dollar
        • BEF Belgian Franc
        • CAD Canadian Dollar
        • FRF French Franc
        • DEM German Mark
        • JPY Japanese Yen
        • NLG Dutch Guilder
        • NZD New Zealand Dollar
        • ESP Spanish Peseta
        • SEK Swedish Krone
        • CHF Swiss Franc
        • GBP UK Pound
      • There are 2567 (work-)daily spot prices, and so 2566 daily returns for each of these 12 currencies, over the period of about 10 years — 10/9/86 to 8/9/96 (spot)
    • The description and data files still exist at http://www2.stat.duke.edu/~mw/data-sets/ts_data/ as of 11/06/2016.
  • sensor_error

Trajectory Regression: primal approach

Problem setting

Trajectory Regression on Road Networks
Tsuyoshi Ide and Masashi Sugiyama,
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-11), pp.203-208, 2011 [ppt].

Data sets

  • 25×25 Square Grid
    • Simulated traffic data on a synthetic square grid with a 25×25 configuration. The number of total edges is 2400 (both directions on each edge), and the number of paths are 1200. For the similarity,omega=0.5 and d0=2 are used (see the definition for the text).
    • 25x25_edgeDef.csv
      • Defines the length and the legal speed limit of each edge. Each row contains (edge ID, length, speed). The lengths are in meter. The speeds are in km/h.
    • 25x25_affinity.csv
      • Defines the similarity between edges.
    • 25x25_yN.csv
      • List of the deviation of trajectory cost (travel time) from the baseline, which is the twice of the travel time computed with the legal speed limit. Note that the values can be negative, since they are the deviation from the baseline.
    • 25x25_pathList.csv
      • List of trajectories generated with the simulator. The origin and the destination were chosen randomly.
  • Kyoto
    • Simulated traffic data on a real Kyoto down town map, whose number of links are 3478. The number of generated paths are N=1739. For the similarity, omega=0.5 and d0=2 are used (see the definition for the text). Descriptions on the format of each file are the same as above.
    • Kyoto_edgeDef.csv
    • Kyoto_affinity.csv
    • Kyoto_yN.csv
    • Kyoto_pathList.csv

Trajectory Regression: dual (kernel) approach

Problem setting

Travel-Time Prediction using Gaussian Process Regression: A Trajectory-Based Approach
Tsuyoshi Ide, and Sei Kato,
Proceedings of 2009 SIAM International Conference on Data Mining (SDM 09), pp.1185-1196 [ppt].

Data sets

See the home page of Social Simulation Project at IBM Research – Tokyo.