|
| 1 | +[[faq]] |
| 2 | += Frequently Asked Questions |
| 3 | + |
| 4 | +== Is it possible to execute jobs in multiple threads or multiple processes? |
| 5 | + |
| 6 | +There are three ways to approach this - but we recommend exercising caution in the analysis of such requirements (is it really necessary?). |
| 7 | + |
| 8 | +* Add a `TaskExecutor` to the step. The `StepBuilder`s provided for configuring Steps have a "taskExecutor" property you can set.This works as long as the step is intrinsically restartable (idempotent effectively). The parallel job sample shows how it might work in practice - this uses a "process indicator" pattern to mark input records as complete, inside the business transaction. |
| 9 | +* Use the `PartitionStep` to split your step execution explicitly amongst several Step instances. Spring Batch has a local multi-threaded implementation of the main strategy for this (`PartitionHandler`), which makes it a great choice for IO intensive jobs. Remember to use `scope="step"` for the stateful components in a step executing in this fashion, so that separate instances are created per step execution, and there is no cross talk between threads. |
| 10 | +* Use the Remote Chunking approach as implemented in the `spring-batch-integration` module. This requires some durable middleware (e.g. JMS) for reliable communication between the driving step and the remote workers. The basic idea is to use a special `ItemWriter` on the driving process, and a listener pattern on the worker processes (via a `ChunkProcessor`). |
| 11 | + |
| 12 | +== How can I make an item reader thread safe? |
| 13 | + |
| 14 | +You can synchronize the `read()` method (e.g. by wrapping it in a delegator that does the synchronization). |
| 15 | +Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set `saveState=false` on the reader. |
| 16 | + |
| 17 | +== What is the Spring Batch philosophy on the use of flexible strategies and default implementations? Can you add a public getter for this or that property? |
| 18 | + |
| 19 | +There are many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). |
| 20 | +We expect clients to create their own more specific strategies that can be plugged in to control things like commit intervals ( `CompletionPolicy` ), |
| 21 | +rules about how to deal with exceptions ( `ExceptionHandler` ), and many others. |
| 22 | + |
| 23 | +In general we try to dissuade users from extending framework classes. The Java language doesn't give us as much flexibility to mark classes and interfaces as internal. |
| 24 | +Generally you can expect anything at the top level of the source tree in packages `org.springframework.batch.*` to be public, but not necessarily sub-classable. |
| 25 | +Extending our concrete implementations of most strategies is discouraged in favour of a composition or forking approach. |
| 26 | +If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability. |
| 27 | + |
| 28 | +== How does Spring Batch differ from Quartz? Is there a place for them both in a solution? |
| 29 | + |
| 30 | +Spring Batch and Quartz have different goals. Spring Batch provides functionality for processing large volumes of data and Quartz provides functionality for scheduling tasks. |
| 31 | +So Quartz could complement Spring Batch, but are not excluding technologies. A common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression |
| 32 | +and the Spring Core convenience `SchedulerFactoryBean` . |
| 33 | + |
| 34 | +== How do I schedule a job with Spring Batch? |
| 35 | + |
| 36 | +Use a scheduling tool. There are plenty of them out there. Examples: Quartz, Control-M, Autosys. |
| 37 | +Quartz doesn't have all the features of Control-M or Autosys - it is supposed to be lightweight. |
| 38 | +If you want something even more lightweight you can just use the OS (`cron`, `at`, etc.). |
| 39 | + |
| 40 | +Simple sequential dependencies can be implemented using the job-steps model of Spring Batch, and the non-sequential features in Spring Batch. |
| 41 | +We think this is quite common. And in fact it makes it easier to correct a common mis-use of schedulers - having hundreds of jobs configured, |
| 42 | +many of which are not independent, but only depend on one other. |
| 43 | + |
| 44 | +== How does Spring Batch allow project to optimize for performance and scalability (through parallel processing or other)? |
| 45 | + |
| 46 | +We see this as one of the roles of the `Job` or `Step`. A specific implementation of the Step deals with the concern of breaking apart the business logic |
| 47 | +and sharing it efficiently between parallel processes or processors (see `PartitionStep` ). There are a number of technologies that could play a role here. |
| 48 | +The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing. |
| 49 | +Since the business processing is already typically modularised - e.g. input an item, process it - Spring Batch can strategise the distribution in a number of ways. |
| 50 | +One implementation that we have had some experience with is a set of remote web services handling the business processing. |
| 51 | +We send a specific range of primary keys for the inputs to each of a number of remote calls. |
| 52 | +The same basic strategy would work with any of the Spring Remoting protocols (plain RMI, HttpInvoker, JMS, Hessian etc.) with little more than a couple of lines change |
| 53 | +in the execution layer configuration. |
| 54 | + |
| 55 | +== How can messaging be used to scale batch architectures? |
| 56 | + |
| 57 | +There is a good deal of practical evidence from existing projects that a pipeline approach to batch processing is highly beneficial, leading to resilience and high throughput. |
| 58 | +We are often faced with mission-critical applications where audit trails are essential, and guaranteed processing is demanded, but where there are extremely tight limits |
| 59 | +on performance under load, or where high throughput gives a competitive advantage. |
| 60 | + |
| 61 | +Matt Welsh's work shows that a Staged Event Driven Architecture (SEDA) has enormous benefits over more rigid processing architectures, |
| 62 | +and message-oriented middleware (JMS, AQ, MQ, Tibco etc.) gives us a lot of resilience out of the box. There are particular benefits in |
| 63 | +a system where there is feedback between downstream and upstream stages, so the number of consumers can be adjusted to account for the amount of demand. |
| 64 | +So how does this fit into Spring Batch? The spring-batch-integration project has this pattern implemented in Spring Integration, |
| 65 | +and can be used to scale up the remote processing of any step with many items to process. |
| 66 | +See in particular the "chunk" package, and the `ItemWriter` and `ChunkHandler` implementations in there. |
0 commit comments