Up until recently I knew about the existence of Spring Batch but never took the chance to get to know this framework. However, the release of JEE 7 earlier this year brought back my interest for this framework. The mean reason is the fact that JEE 7 contains the long awaited JSR-352: batch applications for the Java platform. This JSR is mostly based upon the principles and implementation of Spring Batch.
So I decided to give Spring Batch a try. But before I show some examples, first a short introduction into Spring batch. Spring Batch is an opensource framework that can be used to create batch-oriented applications. It provides functionalities like restarting of batch jobs, persisting batch status and transaction/logging facilities. Spring Batch is already out there for a couple of years. The framework evolves around the following concepts:
- Job - The abstract unit of work representing an entire batch operation.
- Step - An abstract part of a Job.
- Task - An abstract unit of work within a step.
Each of these abstract entities also have their concrete counterparts. These counterparts are created as soon as a concrete execution of a job takes place.
In this blog post I will give an example of importing a csv file containing person data. I will show how this person data can be imported into the database. For this purpose, I will also use dependency injection based on annotations and Spring JDBC for the persistence part.
First of all we need the Person pojo. It can be very simple:
public class Person {
private String id;
private String name;
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
In order to insert this Person into the database, the following DAO (Data Access Object) will be used. This DAO is using Spring JDBC to put a list of persons into the database.
@Named
public class PersonDao {
private NamedParameterJdbcTemplate jdbcTemplate;
@Inject
public PersonDao(DataSource dataSource) {
this.jdbcTemplate = new NamedParameterJdbcTemplate(dataSource);
}
public void insertPersons(List<? extends Person> persons) {
@SuppressWarnings("unchecked")
Map<String, String>[] parameters = new HashMap[persons.size()];
int parameterIndex = 0;
for (Person person : persons) {
Map<String, String> parameter = new HashMap<String, String>();
parameter.put("id", person.getId());
parameter.put("name", person.getName());
parameters[parameterIndex] = parameter;
parameterIndex++;
}
jdbcTemplate.batchUpdate("insert into PERSON (ID, NAME) VALUES (:id, :name)", parameters);
}
}
Since we will import a csv file, we first have to define a format for this file. Fortunately, these csv files will have a very simple format which will look like:
id, name
1, Pete
2, John
Ok, now we have the basic classes, we can start setting up a batch job that will read in this csv file. Spring Batch contains annotations which can be used within Spring Batch java classes. The alternative is a configuration within a Spring context file. Since this is the shortest and most straightforward approach we will use this. Configuring a batch job looks like this:
<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager" lazy-init="true"/>
<bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean"/>
<bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">
<property name="jobRepository" ref="jobRepository" />
</bean>
<batch:job id="importJob">
<batch:step id="personImportStep">
<batch:tasklet>
<batch:chunk commit-interval="10" reader="personReader" writer="personItemWriter"/>
</batch:tasklet>
</batch:step>
</batch:job>
Here, we define a batch job that contains one step. This step takes care of doing some work which is called a tasklet in batch jargon. The above tasklet reads in data through a personReader object and then writes it through a person writer object. Spring batch reads this data in chunks of 10 person lines and then writes 10 person objects to the database. Next, it reads 10 lines again etc. etc.
The lines before the batch job definition define the following objects:
- transaction manager: manages transactions around job and step execution data. In this example I use a resourceless transaction manager which can be used when the job and step execution data doesn't need to be stored in a persistent storage.
- job repository: stores job and step execution data
- job launcher: can be used to start jobs
Spring batch contains many out-of-the-box readers and writers. However, it is also very easy to write your own. In this example, we will use an out-of-the-box reader and we will write our own writer. The reader can be configured like this:
<bean id="personReader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="linesToSkip" value="1"/>
<property name="resource" value="#{jobParameters['fileName']}" />
<property name="lineMapper" ref="personLineMapper"/>
</bean>
A bit of explanation:
- items are read in through the build-in class 'FlatFileItemReader'
- first line is skipped (therefore the linesToSkip property)
- the resource property sets the file to import
- the line mapper is a Java class that transforms csv lines into domain objects. In this example I will write my own mapper class just for the sake of illustration
The line mapper class:
@Named
public class PersonLineMapper implements LineMapper<Person> {
private static final String FIELD_PERSON_ID = "id";
private static final String FIELD_PERSON_NAME = "name";
@Override
public Person mapLine(String personLine, int i) throws Exception {
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(",");
tokenizer.setNames(new String[]{FIELD_PERSON_ID, FIELD_PERSON_NAME});
FieldSet fieldSet = tokenizer.tokenize(personLine);
String id = fieldSet.readString(FIELD_PERSON_ID);
String name = fieldSet.readString(FIELD_PERSON_NAME);
Person person = new Person();
person.setId(id);
person.setName(name);
return person;
}
}
As can be seen here, the line mapper uses a DelimitedLineTokenizer that splits the csv lines. The DelimitedLineTokenizer is a Spring batch class. By setting the names on the tokenizer (setNames method), the tokenizer is also forced to check the number of fields in each line. Furthermore, it is clearer to use field names to lookup the fields in each line than using field indices.
Now we have the file reading ready we will proceed to the item writer that will write person data to the database:
@Named
public class PersonItemWriter implements ItemWriter<Person> {
private PersonDao personDao;
@Inject
public PersonItemWriter(PersonDao personDao) {
this.personDao = personDao;
}
@Override
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void write(List persons) throws Exception {
personDao.insertPersons(persons);
}
}
This item writer gets the person DAO injected. The write method is an implementation of the write method in the ItemWriter interface. By implementing this interface we can use this PersonItemWriter in the tasklet (see above xml).
Now we have everything in place, we can start a job and get things working:
public void runImportJob(String fileName) throws JobExecutionException {
ApplicationContext applicationContext = new ClassPathXmlApplicationContext("applicationContext.xml");
JobLauncher jobLauncher = applicationContext.getBean("jobLauncher", JobLauncher.class);
Job importJob = applicationContext.getBean("importJob", Job.class);
JobParametersBuilder jobParametersBuilder = new JobParametersBuilder();
jobParametersBuilder.addString("fileName", fileName);
JobExecution execution = jobLauncher.run(importJob, jobParametersBuilder.toJobParameters());
logger.info(String.format("Job (%s) execution status: %s", execution.getJobId(), execution.getStatus()));
}
This code looks up a job launcher and an import job. Then a job parameter is created that contains the file to import (remember this file name is used in the item reader configuration in the spring context.
Finally, the job launcher is told to run the job. The result of this is a job execution object which holds the job execution status.