Scientific and Grid Workflow Management Cesare Pautasso University of Lugano http://www.pautasso.info 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
1
Abstract Grid workflow management systems coordinate multiple job submissions over heterogeneous Grid resources. They feature visual programming environments to give scientist a high-level view over distributed computations composed of Grid services. This brief introduction to the field of scientific and Grid workflows includes a survey of selected workflow management tools and outlines current research trends.
Swiss Grid School SGS10, Lugano, Switzerland 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
2
Cesare Pautasso Ph.D. at ETH Zürich (2004) Post-Doc at ETH Zürich in the Systems (IKS) Group • Software: JOpera: Process Support for more than Web services http://www.jopera.org/
Researcher at IBM Zurich Research Lab (2007) Assistant Professor at the new Faculty of Informatics, University of Lugano (USI), Switzerland (since September 2007) • USI Representative in the SwiNG Assembly (since 2007) • Grid Workflow Working Group Lead (since 2007)
More Information: http://www.pautasso.info/ Follow me on: http://twitter.com/pautasso/ 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
3
Acknowledgements Some material contained in this tutorial was adapted from slides originally published by: Gustavo Alonso Win Bausch Ewa Deelman Ian Foster Yolanda Gil Carole Goble Roy Grønmo Thomas Heinis Rajesh Kalyanam Francesco Lelli Omer F. Rana Heiko Schuldt Frank Terpstra 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
4
Why Workflow Management on the Grid?
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
5
Kinds of Grid Computation One Job Submission
24.6.2010
Parameter Sweep
Scientific and Grid Workflow (Cesare Pautasso)
6
Variablility and error assessment
Condition A
Annotate and normalize data wet lab processing
MicroArray Scanner
Significance assessment Determine parameters for likelihood for differential error model expression for each gene Data Preprocessing
Clustering Extract raw spot intensities
Condition B Cell 24.6.2010
population
1 spot = 1 gene Expression level: Green: A > B Red: A < B Black: A = B
Image processing
Scientific and Grid Workflow (Cesare Pautasso)
Determine Expression Pattern
7
in vitro in silico
Condition A
Annotate and normalize data wet lab processing
MicroArray Scanner
Variablility and error assessment Significance assessment Determine parameters for likelihood for differential error model expression for each gene Data Preprocessing
Clustering Extract raw spot intensities
Condition B Cell 24.6.2010
population
1 spot = 1 gene Expression level: Green: A > B Red: A < B Black: A = B
Image processing
Scientific and Grid Workflow (Cesare Pautasso)
Determine Expression Pattern
8
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
9
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
10
Vision for Scientific and Grid Workflows Make it easy to build Grid applications composed of multiple jobs
“
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
Provide the scientist with a platform that takes care of all data handling and record keeping chores so that the user can concentrate on the science and not computer science
”
11
Workflow GRID 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
12
Some (Scientific) Workflow Management Systems Askalon Bigbross Bossa BioPipe BPMN Breeze Carnot Con:cern DAGMan DiscoveryNet Dralasoft GEL GridAnt Grid Job Handler 24.6.2010
GWFE GWES ICENI Inforsense JIGSA JOpera Kepler Karajan Oakgrove's reactor OSIRIS OSWorkflow OpenWFE
Pegasus Pipeline Pilot P-GRADE PowerFolder Ptolemy II Savvion Seebeyond SCIRun ScyFLOW SDSC Matrix SHOP2 Taverna Teuta (UML)
Scientific and Grid Workflow (Cesare Pautasso)
Triana Trident Twister Ultimus Versata Viztrails wftk XFlow YAWL Wildfire WFEE WS-BPEL ZBuilder 13
Outline Why Workflow Management on the Grid? Discussion: Scientific vs. Grid vs. Business Workflows • Some Application Examples Workflow Modeling Languages and Tools Overview • Grid Workflow Language Patterns Running Workflows on the Grid • JOpera: Scientific Workflow for Eclipse • Workflows and Provenance
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
14
Scientific vs. Grid vs. Business Workflows
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
15
The Origins: Business Process Management
who has to do what, when
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
16
The Origins: Business Process Management •
A business process describes key procedures within an organization. They involve: • multiple steps • numerous people • large amounts of resources
•
In large business organizations there are many factors that increase the complexity of the business processes: • • • • •
•
processes are not well documented conformance to rules not guaranteed people lack information about context company lacks monitoring tools steps, people and resources are not properly coordinated
Workflow Management Systems try to address these problems by automating the coordination aspects of a business process: who has to do what, when, and with which software tools.
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
17
Business Workflows
“
The automation of a business process where documents, information to be processed or tasks to be carried out are passed from one participant to another following a set of procedural rules
”
Worfklow Management Coalition (WfMC, 1993) 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
18
Scientific Workflows
“
are networks of analytical steps that may involve, e.g., database access and querying, data analysis and mining, and many other steps including computationally intensive jobs submitted to high performance clusters and Grids
”
24.6.2010
Bertram Ludäscher
Scientific and Grid Workflow (Cesare Pautasso)
19
Modeling Workflows
activity 0
4
activity 1 User
Control Flow
Data Flow Data
24.6.2010
Software Tool
User
Software Tool Activity
1
2
Activity
3
4
5 Data
6 Scientific and Grid Workflow (Cesare Pautasso)
20
Business Workflows
activity 0
4
activity 1 User
Software Tool
User
Activity
1
2
Control Flow
Activity
3
4
5 6 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
21
Scientific Workflows
activity 0
4
activity 1
Software Tool
Software Tool Activity
Control Flow
Data Flow Data
24.6.2010
1
2
Activity
3
4
5 Data
6 Scientific and Grid Workflow (Cesare Pautasso)
22
Similarities: Are scientists doing e-Business? Capturing knowledge/best practices
Capture business processes within a company Capture scientific experiments
Executable Models for Repeated Execution
Run a well defined procedure many times Ensure that an experiment can be reproduced
Incorporate human decision in the process
Can we always do straight-through processing? Hard to achieve full automation 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
23
Differences: Do scientists need business transactions? Rate of change
Changing business procedures requires management approval Exploratory scientific processes require high flexibility
Which kind of data?
Travel reservations, Loan applications Large protein sequence databases, Astronomy image catalogs Making profit
24.6.2010
What is the ultimate goal? Making science Scientific and Grid Workflow (Cesare Pautasso)
24
Scientific vs. Grid Workflows Scientific workflows emphasize the design of virtual experiments: • Data flow models • Reusable “scientific computing” component library • Interactive debugging, monitoring and steering • Data provenance and lineage tracking for reproducibility • Model versioning for exploratory customization 24.6.2010
Grid workflows focus on the large-scale execution of scientific workflows: • Mapping and adaptation to a dynamic run-time environment • Provide access to shared workflows as a Grid service • Parameterized Execution • Centralized vs. Distributed Execution Architectures • Fault Tolerance • Optimization
Scientific and Grid Workflow (Cesare Pautasso)
25
Scientific Workflows on the Grid • How can Scientific Workflow benefit from the Grid? 1. Leverage underlying Grid middleware: • • •
Resource Management Job Scheduling for parallel Activities Large Data Transfers (GridFTP) between Activities
2. Improved QoS based on the workflow model • • • •
24.6.2010
Grid resource reservation Data replication Data placement Fault Tolerance
Scientific and Grid Workflow (Cesare Pautasso)
26
http://www.jopera.org/
Example
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
27
A Web Service-Enabled Workflow System for Climate Modeling Data Processing in TeraGrid Rajesh Kalyanam Lan Zhao Taezoon Park Sebastien Goasguen
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
28
Architecture Desktop Application
Application
Workflow Engine
Web Poral
JOpera
Workflow Components Computation Components
Metadata Query
Data Discovery
Data Transfer
Purdue Data Management System OPeNDAP
THREDDS
Local Datasets LARS
24.6.2010
PTO
NWS
Data Transformation
Globus Job
Remote Data Proxies
SRB/MCAT
Climate Modeling
Data Proxy
Remote Datasets Remote Datasets
Scientific and Grid Workflow (Cesare Pautasso)
Condor Job
Models / Tools
Computation Middleware Condor -G
Globus
Computation Resources Local Clusters
TeraGrid
From Rajesh Kalyanam
Data Components
29
From Rajesh Kalyanam
Portal
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
30
From Rajesh Kalyanam
Workflow Model
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
31
From Rajesh Kalyanam
Workflow Execution
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
32
From Rajesh Kalyanam
Workflow Results
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
33
Workflow Lifecycle
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
34
Workflow Lifecycle
Scientific Computation
Workflow Instances
Workflow Model
Simulation
24.6.2010
Workflow Execution
Log Analysis
Scientific and Grid Workflow (Cesare Pautasso)
From Gustavo Alonso
Modeling
35
Workflow Modeling Methodologies
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
36
Bottom up Composition 4. Share and Publish it as Web Service 3. Run, Test, and Debug the execution within the same modeling environment 2. Build a workflow using a drag, drop and connect modeling environment 1. Select components from a library a. Lookup services in a public registry b. Import from external Web service (WSDL) c. Search the standard library 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
37
Top down Decomposition 1. Define a goal and Draw a skeleton of the workflow that satisfies it 2. Refine it and Bind services into it: • • •
Search for existing matching services Build missing services (if necessary) Add required data transformations
3. Run, Test, and Debug the execution within the same modeling environment 4. Share and Publish it as Web Service
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
38
Iterative Composition Change, Rediscover Build New services
Model Service Composition
Manage Deploy Run, Test Compile 24.6.2010
Refactor
Scientific and Grid Workflow (Cesare Pautasso)
Check 39
Workflow Modeling Languages and Tools Overview
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
40
HeNCE - The Ancestor of Grid Workflows?
A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, V. S. Sunderam, Graphical Development Tools for Network-Based Concurrent Supercomputing, in: Proc. of the 1991 ACM/IEEE conference on Supercomputing, Albuquerque, New Mexico, 1991, pp. 435–444. 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
41
From Roy Gronmo
Extended UML Activity Diagrams
JOpera Data Flow Graph
ExampleStockQuoteConvert Input usa
country
country1
symbol
symbol
country2
?
getQuote
getRate Result
b
a
Result
DataIntegration c quote
ExampleStockQuoteConvert Output
Triana
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
44
Taverna
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
45
VizTrails
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
46
Trident
OSIRIS
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
48
JOpera 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
49
Grid Workflow Language Patterns
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
50
Workflow Pattern
Variants Implicit
1. Simple Parallelism
2. Data Parallelism
Explicit Static Dynamic
Best Effort Blocking 3. Pipelining
Buffered
Hybrid
Superscalar Streaming 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
Synchronized Out of Order 51
Modeling Simple Parallelism Data Flow, Graph Based
SCIRun Kepler Triana 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
52
Modeling Simple Parallelism Control Flow, Graph Based
JOpera GEL 24.6.2010
UML Scientific and Grid Workflow (Cesare Pautasso)
53
Modeling Simple Parallelism Control Flow, Block Based
BPMN 24.6.2010
WS-BPEL Scientific and Grid Workflow (Cesare Pautasso)
54
Modeling Data Parallelism Data Flow, Graph Rewriting
Static or Dynamic Triana
24.6.2010
Taverna JOpera Scientific and Grid Workflow (Cesare Pautasso)
55
Modeling Data Parallelism Control Flow, Block Based, Dynamic
WS-BPEL AGWL Karajan GEL 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
56
Modeling Pipelined Execution 1 2
1, 2, 3, …
3
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
57
Pipelining Semantics
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
58
Best Effort Pipelined Execution
Drop data elements on pipeline collisions Advantages: • •
Simplified implementation Some applications may tolerate data loss
Problem: •
24.6.2010
Downsampling is non deterministic
Scientific and Grid Workflow (Cesare Pautasso)
59
Blocking Pipelined Execution
Tasks are blocked if successors are busy Advantages: •
Avoid data loss in the pipeline
Problem: • •
24.6.2010
Pipeline speed limited by slowest task Data may be lost before it enters the pipeline
Scientific and Grid Workflow (Cesare Pautasso)
60
Buffered Pipelined Execution
Tasks are decoupled by buffers Advantages: • •
Collisions are prevented Best applied to tasks having variable speed
Problem: •
24.6.2010
Buffer capacity is limited (Blocking still needed – Hybrid semantics) Scientific and Grid Workflow (Cesare Pautasso)
61
Streaming Pipelined Execution
Tasks exchange data while running Advantages: •
Suitable for a distributed (P2P) engine
Problems: • • •
24.6.2010
Shifts complexity from the workflow engine to the tasks Tasks exchange data while running Workflow/Task interface more complex
Scientific and Grid Workflow (Cesare Pautasso)
62
Running Workflows on the Grid
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
63
Basic Architecture
Workflow Model Act 2 Act 1
Act 4 Act 5
Act 3
Act 7
Act 6 Workflow Management System
Adapters Grid Schedulers Grid Resources 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
64
Workflow Users
Workflow Model Act 2 Act 1
Act 4 Act 5
Act 3
Workflow Participants
Act 7
Act 6 Workflow Management System
Workflow Modelers
Scientific Software Developers 24.6.2010
Adapters Grid Schedulers Grid Resources Scientific and Grid Workflow (Cesare Pautasso)
Wrapper Developers
65
Standard APIs
Process Definition Tools Interface 1
Workflow API and Interchange formats
Other Workflow Enactment Service(s)
Workflow Enactment Service
Administration & Monitoring Tools
Workflow Engine(s)
Interface 2
Workflow Engine(s)
Interface 3
Workflow Client Applications 24.6.2010
Interface 4
Invoked Applications
Scientific and Grid Workflow (Cesare Pautasso)
From WFMC, Workflow Reference Model, 1998
Interface 5
66
Wrappers and Grid Applications Act X
Worklist Handler
Workflow Management System
adapter
wrapper
Grid Scheduler
Application X
Application X Application X 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
Application X 67
Wrappers and Legacy Applications • • •
•
•
The workflow engine is also in charge of connecting the different scientific applications. These applications do not have to talk directly to each other, they do it through the workflow engine. Most engines target a service oriented applications for which they provide very good connectivity through standardized protocols. Otherwise, the interface adapters must be developed on a case by case basis (as a last resort manual integration may be required!) For legacy application, a wrapper must be built so that the workflow engine can communicate with the application. The wrapper can be a simple relay of commands and data, or a complete translation program implementing functionality not present in the legacy application. For most Grid applications, the interaction takes place through a Grid scheduler, which is responsible for managing the distributed execution of the applications.
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
68
Run-time Abstraction Levels
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
69
Run-time Abstraction Levels • • •
A design-time workflow model needs to be mapped across different abstraction levels in order to be executed at run time. User request the execution of a new workflow instance. The abstract workflow is mapped to an executable instance by: • Finding suitable service implementations and binding them to the tasks • Rewriting the workflow graph based on a set of refinement rules • Planning required data staging, registration, placement, replication and transfer operations
• •
Each task of the resulting executable workflow is then submitted to a Grid resource manager so that it can be scheduled on suitable resources The mapping can be done: • when the workflow is started at instantiation time (statically) • incrementally as the workflow runs (adaptive execution with dynamic late binding)
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
70
Example: Binding with WS-BPEL
set of services (BPEL partner link type)
Activity
24.6.2010
one service (WSDL port type) service end point (port)
Scientific and Grid Workflow (Cesare Pautasso)
71
Workflow Binding Lifecycle
Library Registration time (classification) Modeling time (static early binding) Compilation time (blacklisting) Deployment time (customization) Startup time (testing) Task Execution time (dynamic late binding) Failed invocation time (rebind on retry) 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
72
http://www.jopera.org/
JOpera Scientific Workflow for Eclipse
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
73
High Level Workflow Language
Open and Extensible Component Model
24.6.2010
Data and Control Aspects (Visual Representation) Recursion, Iteration, Parallelism and Pipelining Run existing code without changes Synchronous, Asynchronous, and Streaming interaction Web services support (Axis, WSIF, REST) Secure access to remote file systems and hosts (SSH) Easy to integrate with existing schedulers (e.g. Condor)
© Cesare Pautasso | www.pautasso.info
74
High Level Workflow Language
Open and Extensible Component Model
Run existing code without changes Synchronous, Asynchronous, and Streaming interaction Web services support (Axis, WSIF) Secure access to remote file systems and hosts (SSH) Easy to integrate with existing schedulers (e.g. Condor) Strong Eclipse Foundation
24.6.2010
Data and Control Aspects (Visual Representation) Recursion, Iteration, Parallelism and Pipelining
Platform Independent (Eclipse/Java) Flexible, Extensible, Modular and Embeddable © Cesare Pautasso | www.pautasso.info
75
JOpera Visual Composition Language Workflows are modeled using multiple viewpoints: 1. Data Flow Graph
2. Control Flow Graph QueryBookPrice
isbn
QueryBookPrice
3. Service Bindings QueryBookPrice Amazon.com
price
CurrencyConvert
XE.com
amount
CurrencyConvert
24.6.2010
CurrencyConvert
amount
Exception Handler
Scientific and Grid Workflow (Cesare Pautasso)
76
JOpera Example: Doodle Map Mashup Setup a Doodle with Yahoo! Local search and visualize the results of the poll on Google Maps
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
77
Doodle Map Mashup Architecture RESTful API
Web Browser
Workflow Engine GET
RESTful Web Services APIs
POST GET
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
78
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
79
Extensible JOpera Component Model Combine in the same workflow jobs implemented using an open and extensible set of technologies
JOpera Workflow WSDL Java Human XML SQL SSH Condor •Snippets •Methods 24.6.2010
•XSLT •XPath
Scientific and Grid Workflow (Cesare Pautasso)
80
Sharing Workflows as a Service JOpera processes are automatically published to clients using a variety of access protocols
Web Clients
WS Clients
REST
WSDL
Eclipse RCP Clients
Java
JOpera Workflow WSDL Java Human XML SQL SSH Condor 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
81
JOpera ARC Integration Demo 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
82
Workflows and Provenance
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
83
Lineage in Scientific Workflows
Scientists consider the “capture and generation of provenance information as a critical part of the workflow-generated data” “Sharing workflows is an essential element of education, and acceleration of knowledge dissemination.” Ewa Deelman et al. 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
84
Where does this picture come from? METADATA This photo was taken July 21, 1981, when the Voyager 2 spacecraft was 33.9 million km from the Saturn planet
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
85
Is this the right metadata?
Title: White Arabian Horse 24.6.2010
METADATA Date: 16.4.2005 Dimension: 640x480 Colors: 32bits Size: 1.2MB Format: JPEG Camera Model: D100 Equipment Make: Nikon Flash Used: No Focal Length: 8.2 mm F-Number: F/2.6 Exposure: 1/100 s Metering: pattern GPS: 50.2 Lat. 60.4 Lon.
Scientific and Grid Workflow (Cesare Pautasso)
86
Would you buy a horse without this?
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
87
Lineage in Spreadsheets
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
88
Lineage in Spreadsheets
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
89
Lineage in Spreadsheets
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
90
Lineage in Spreadsheets
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
91
Lineage in Databases What is the relationship between these tuples?
SQL Problem: Query Inversion 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
92
Lineage in Software Development What’s in a Makefile? CC = gcc CFLAGS = -Wall -g program: main.o input.o output.o logic.o $(CC) $(CFLAGS) main.o input.o output.o logic.o -o program main.o: main.c input.h output.h logic.h $(CC) $(CFLAGS) -c main.c input.o: input.c input.h $(CC) $(CFLAGS) -c input.c output.o: output.c output.h $(CC) $(CFLAGS) -c output.c logic.o: logic.c logic.h $(CC) $(CFLAGS) -c logic.c 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
93
Lineage in Software Development Where does my program come from? CC = gcc CFLAGS = -Wall -g program: main.o input.o output.o logic.o $(CC) $(CFLAGS) main.o input.o output.o logic.o -o program main.o: main.c input.h output.h logic.h $(CC) $(CFLAGS) -c main.c input.o: input.c input.h $(CC) $(CFLAGS) -c input.c output.o: output.c output.h $(CC) $(CFLAGS) -c output.c logic.o: logic.c logic.h $(CC) $(CFLAGS) -c logic.c 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
94
Lineage in Scientific Workflows Input Data
Theory Output Data
Published Paper
Scientific Workflow Input Data 24.6.2010
Observation Scientific and Grid Workflow (Cesare Pautasso)
An ideal scientific workflow should document all of the steps linking the original observations with the final published results so that the process can be reproduced 95
?
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
Where does this output document come from?
Data Provenance
96
24.6.2010
!
What to recompute if this input changes?
Change Propagation
Scientific and Grid Workflow (Cesare Pautasso)
97
Conclusion
24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
98
Workflow and Component Libraries
Data Products Adapt, Modify
Workflow Template
Execute
Populate with data Workflow Instance
Data, Metadata, Provenance Information
Execution
Executable Workflow
Map to available resources
Compute, Storage Distributed and Network Resources 24.6.2010
Data, Metadata Catalogs
Resource, Application Component Descriptions
Mapping Scientific and Grid Workflow (Cesare Pautasso)
From Ewa Deelman
Reuse
Modeling
99
Executed
e-Science as Workflow?
Executing
What I Did
Not yet executable
What I Am Doing
What I Want to Do
Execution environment 24.6.2010
Schedule
Scientific and Grid Workflow (Cesare Pautasso)
Model
… From Ian Foster
Provenance Query
Executable
100
Some References
Gil, Y. et al.; Examining the Challenges of Scientific Workflows. IEEE Computer, Dec 2007 Taylor, I.J.; Deelman, E.; Gannon, D.B.; Shields, M. (Eds.) Workflows for e-Science: Scientific Workflows for Grids, Springer 2007 Yu, J.; Buyya, R.: A taxonomy of workflow management systems for grid computing, Journal of Grid Computing, 3(3–4):171–200 (2005) Pautasso, C.; Alonso, G.: Parallel Computing Patterns for Grid Workflows, Proc. Of
[email protected], Paris, France, 2006 OGF Workflow Research Group http://www.isi.edu/~deelman/wf–rg/
Download This Tutorial Material http://www.pautasso.info/lectures/sgs10workflow.pdf 24.6.2010
Scientific and Grid Workflow (Cesare Pautasso)
101
Free Download
http://www.jopera.org/ http://www.pautasso.info/lectures/sgs10workflow.pdf
24.6.2010
© Cesare Pautasso | www.pautasso.info
102