Investigating Hadoop for Large Spatiotemporal Processing Tasks

MapReduce-type approaches using Hadoop have been explored for solving sets of computational problems that were previously too large for traditional GIS systems to handle. Three cases are introduced in this study to demonstrate the implementation of this approach. Case one (funded by NEH): Hadoop is used for crawling the web for map services. It enables us to farm out work to many processors in parallel, making search against 5 billion web pages relatively manageable.  Case two: Hadoop is used to enable researchers to query, analyze, and subset large spatio-temporal datasets containing billions of records. This is demonstrated on global geo-tweets collected over one year. Case three: Hadoop is used to divide up and process calculations of network and straight line distances between thousands of points in New York City over a seven year period.  The total number of distance calculations required for this project is 3.5 billion. What these projects have in common is all can be broken down into many separate, discreet processing tasks where the output of one task does not influence the output of another.  This kind of problem, amenable to a “divide and conquer” approach, is the kind of challenge Hadoop excels at.

Links:
Download Slides

no file attachments

Share