Speaker
Description
A growing number of scientific fields require the ability to analyze data in near real-time, so that results from one experiment can guide selection of the next—or even influence the course of a single experiment. The experiments are often tightly scheduled, with timing driven by factors ranging from the physical processes involved in an experiment to the travel schedules of on-site researchers. With improvements in the sensor and detector technologies at experimental facilities (e.g., synchrotron light sources and neutron sources), data produced at these facilities significantly exceed their own local processing capabilities. Thus, the data needs to be moved to remote compute facilities both within and outside a country (or continent) as the users of these facilities often span diverse geographic locations.
The computing and network resources must be available at a specific time, for a specific period. Ondemand network bandwidth, though provided by backbone research and education networks such as ESnet and Internet2, is not easy to get end-to-end in an automated fashion. Even though compute resources can be obtained on-demand (at least in some institutions), those resources are not typically connected to the wide-area network (WAN). The typical model is that the data coming from the WAN goes into the parallel file system via the dedicated data transfer nodes (DTNs) and compute nodes access the data from the parallel file system. This model does not work well for near real-time analysis of the data streams coming from an experiment or simulation. We need international (and intercontinental) testbeds to evaluate solutions to enable these emerging science workflows.
(download PDF for full text)