Split by delimiter in IBM’s WebSphere adapter for Flat Files or Who owns the fence?

IBM’s adapter for Flat Files features an ability to split files as they are read in, either by size or by a delimiter. For the sake of completeness, I should mention that splitting is also possible in outbound adapter (when writing files). But today I’d like to focus on more interesting case of inbound adapter. In general, splitting feature deals with a situation when your incoming files might be a concatenation of units you’d like to process one at a time.

When splitting inbound files at a delimiter, Flat File adapter provides an option to include the delimiter in resulting objects. This is akin to looking at land lots separated by fences and asking whether someone owns the land under the fence. Decision on including separators depends on their nature. If input combined file includes artificial separators between individual units (e.g. “***SPLITMEHERE***”), you should not include separators. On the other hand, if you use a distinct start- or end-of-file signature as a separator (such as “<?xml version=”1.0”?>), separator is an integral part of each unit and must be included.

And so you decided that you want to include the separator.

Split by delimiter in Flat File adapter Export

But how do you specify whether the separator should be included at the end of the previous unit or at the beginning of the next? Assume your separator is “<?xml “, look at this section of the file:

</Widget>

<?xml version=”1.0”?>

<Widget xmlns…..

In which unit do you think this “<?xml” separator belongs? In other words, which is right: this

</Widget>

<?xml

[split]

version=”1.0”?>

<Widget xmlns…..

Or this

</Widget>

[split]

<?xml version=”1.0”?>

<Widget xmlns…..

The answer is obvious to you from context, but how will the Flat File adapter answer the same question? From Flat File adapter’s perspective, who owns the fence between land plots?

Answer: it depends on whether there is a fence BEFORE the first plot. If the file starts with delimiter, the Flat File adapter will decide that delimiter belongs at the start of each segment. If the file does NOT start with the delimiter, the Flat File adapter will reason that delimiter belongs at the end of each segment.

This is reasonable assumption, although I did not see it documented anywhere. In our example, Flat File adapter will correctly split he file before “<?xml “ prolog, because the file begins (hopefully) with this string. But please keep this rule in mind when configuring your file splitting. For example, setting splitter to “\n<?xml” (line break followed by <?xml) will NOT work, because it is almost certain that your file does not begin with line break.

Happy splitting!

Advertisements

Comments are closed.

%d bloggers like this: